Comps check #224

ccao-jardine · 2024-03-25T14:45:09Z

In the Rogers Park desk review spreadsheet, occasionally comp1 and comp2 will share the same PIN and sale price but different comp score. Example is for subject PIN 11304200220000. My hypothesis: these are PINs that sold twice, hence the different comp scores, but the sale price printed in the sheet is the PIN's most recent sale price.

Is that expected/desired behavior, and at what stage of the pipeline does it arise (later for the desk review spreadsheet, or earlier in the comps algorithm)?

Let's identify where this occurs and determine whether it's something to correct.

ccao-jardine · 2024-03-29T19:35:42Z

Thanks @jeancochrane for investigating. Pasting our discussion here for @julia-klauss and @dfsnow and closing for now.

Query to check dupe comps:

WITH unpivoted_comps AS (
    SELECT pin, comp_pin
    FROM (
        SELECT
            pin,
            ARRAY[
                comp_pin_1, comp_pin_2, comp_pin_3, comp_pin_4, comp_pin_5,
                comp_pin_6, comp_pin_7, comp_pin_8, comp_pin_9, comp_pin_10,
                comp_pin_11, comp_pin_12, comp_pin_13, comp_pin_14, comp_pin_15,
                comp_pin_16, comp_pin_17, comp_pin_18, comp_pin_19, comp_pin_20
            ] AS comp_array
        FROM model.comp
        WHERE run_id = '2024-03-17-stupefied-maya'
    ) AS comp_data
    CROSS JOIN UNNEST(comp_array) AS comp_pin(comp_pin)
)
SELECT pin, COUNT(comp_pin) AS num_comps, COUNT(DISTINCT comp_pin) as num_distinct_comps 
FROM unpivoted_comps
GROUP BY pin
HAVING COUNT(DISTINCT comp_pin) < COUNT(comp_pin);

What we learned:

Results suggest that 137,094 / 1,098,988 or about 13% of PINs have a PIN show up more than once in their set of 20 comps
But if we restrict the query to just look for dupes in the top two comps (comp_pin_1 and comp_pin_2) we only get 6,204 results, or 0.6% of PINs

Therefore, when analyzing comps, we should take care to not assume distinct rows for PINs in the training data and in the comps. Such that:

if a PIN only appears once in a comp set, it's reasonably a safe assumption to select its most recent sale.
if a PIN appears more than once in a comp set, assign the most recent sale to the comp with the higher comp_score, then the next lowest comp_score to the next most recent sale

ccao-jardine assigned jeancochrane Mar 25, 2024

ccao-jardine closed this as completed Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comps check #224

Comps check #224

ccao-jardine commented Mar 25, 2024

ccao-jardine commented Mar 29, 2024

Comps check #224

Comps check #224

Comments

ccao-jardine commented Mar 25, 2024

ccao-jardine commented Mar 29, 2024