You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the Rogers Park desk review spreadsheet, occasionally comp1 and comp2 will share the same PIN and sale price but different comp score. Example is for subject PIN 11304200220000. My hypothesis: these are PINs that sold twice, hence the different comp scores, but the sale price printed in the sheet is the PIN's most recent sale price.
Is that expected/desired behavior, and at what stage of the pipeline does it arise (later for the desk review spreadsheet, or earlier in the comps algorithm)?
Let's identify where this occurs and determine whether it's something to correct.
The text was updated successfully, but these errors were encountered:
WITH unpivoted_comps AS (
SELECT pin, comp_pin
FROM (
SELECT
pin,
ARRAY[
comp_pin_1, comp_pin_2, comp_pin_3, comp_pin_4, comp_pin_5,
comp_pin_6, comp_pin_7, comp_pin_8, comp_pin_9, comp_pin_10,
comp_pin_11, comp_pin_12, comp_pin_13, comp_pin_14, comp_pin_15,
comp_pin_16, comp_pin_17, comp_pin_18, comp_pin_19, comp_pin_20
] AS comp_array
FROMmodel.compWHERE run_id ='2024-03-17-stupefied-maya'
) AS comp_data
CROSS JOIN UNNEST(comp_array) AS comp_pin(comp_pin)
)
SELECT pin, COUNT(comp_pin) AS num_comps, COUNT(DISTINCT comp_pin) as num_distinct_comps
FROM unpivoted_comps
GROUP BY pin
HAVINGCOUNT(DISTINCT comp_pin) <COUNT(comp_pin);
What we learned:
Results suggest that 137,094 / 1,098,988 or about 13% of PINs have a PIN show up more than once in their set of 20 comps
But if we restrict the query to just look for dupes in the top two comps (comp_pin_1 and comp_pin_2) we only get 6,204 results, or 0.6% of PINs
Therefore, when analyzing comps, we should take care to not assume distinct rows for PINs in the training data and in the comps. Such that:
if a PIN only appears once in a comp set, it's reasonably a safe assumption to select its most recent sale.
if a PIN appears more than once in a comp set, assign the most recent sale to the comp with the higher comp_score, then the next lowest comp_score to the next most recent sale
In the Rogers Park desk review spreadsheet, occasionally comp1 and comp2 will share the same PIN and sale price but different comp score. Example is for subject PIN 11304200220000. My hypothesis: these are PINs that sold twice, hence the different comp scores, but the sale price printed in the sheet is the PIN's most recent sale price.
Is that expected/desired behavior, and at what stage of the pipeline does it arise (later for the desk review spreadsheet, or earlier in the comps algorithm)?
Let's identify where this occurs and determine whether it's something to correct.
The text was updated successfully, but these errors were encountered: