Skip to content

Conversation

@dfsnow
Copy link
Contributor

@dfsnow dfsnow commented Aug 1, 2023

This PR adds a vignette outlining the process of extracting comparable properties from a LightGBM model using a novel leaf node matching technique. It does not include code updates to the package since the code required for comp finding is short enough to simply be inlined anywhere we would actually use it.

Most of the credit for this goes to our former intern Claire Boyd.

Since we don't have PR previews setup yet for our package docs, you'll need to download the following zip file, decompress it, then open the HTML doc to view the knit content: finding-comps.html.zip

The vignette code itself is a mess, but is extremely non-critical, so I'm fine with it.

Closes #1.

@codecov
Copy link

codecov bot commented Aug 1, 2023

Codecov Report

Merging #3 (93ceb98) into master (767c332) will not change coverage.
The diff coverage is n/a.

❗ Current head 93ceb98 differs from pull request most recent head 4dfe751. Consider uploading reports for the commit 4dfe751 to get more accurate results

@@           Coverage Diff           @@
##           master       #3   +/-   ##
=======================================
  Coverage   80.32%   80.32%           
=======================================
  Files           3        3           
  Lines         122      122           
=======================================
  Hits           98       98           
  Misses         24       24           

@wrridgeway
Copy link
Member

praise (negging): I love this.

suggestion (condescending):

  1. Be more explicit about how boolean multiplication works.
  2. Perhaps include "i.e." before the variable you use as an example when describing splits (gr_liv_area).

jeancochrane
jeancochrane previously approved these changes Aug 1, 2023
Copy link
Member

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clear and easy to follow, bravo!

@dfsnow dfsnow force-pushed the 1-create-function-to-extract-the-leaf-node-assignments-of-an-observation-and-output-comparable-observations branch from 11f0a95 to de41664 Compare August 1, 2023 17:55
@dfsnow dfsnow merged commit 060c8b2 into master Aug 1, 2023
@dfsnow dfsnow deleted the 1-create-function-to-extract-the-leaf-node-assignments-of-an-observation-and-output-comparable-observations branch August 1, 2023 18:07
@ccao-jardine
Copy link
Member

This is looking great! I'd like to propose a few slight refinements before publication. Happy to discuss.

The goal is to answer, "What sales did you use to value my property?" The novel contribution is the ability to answer this precise question, and it's important to distinguish "sales used for valuation by the model" from "all comparable properties." CookViewer and SmartFile are way to properties that are merely comparable based on characteristics, regardless of sale status, and it's important to make salient that this is a different method to answer a foundationally different question. To that end...

General suggestions about conventions used throughout the vignette:

  • Replace the term "comparables" with something more precise. Source sales? Input comps? Training comps? (Sale comps is obvious but, because it is already well defined in the appraisal space, might be confusing. Maybe.)
  • Rename ID --> Property ID

Specific suggestions

  • at #ames-housing-data-example-decision-tree, let's unpack this kind of diagram even more broadly and remind people that this is about the training set by adding something like: "The below decision tree has learned about how each property variable, like neighborhood and gr_liv_area, contributed to sale prices."
  • For the example, let's unpack Split 0, then Split 2.
  • in Final Results tables, for the second table with IDs not of the target property, can we include sale price? (Ideally for column alignment purposes between the two final results tables, the column would also exist in the table for ID = 2 but be empty.) That table would help clarify that this answers "What sales did you use to value my property?"

@dfsnow dfsnow mentioned this pull request Aug 1, 2023
5 tasks
@dfsnow
Copy link
Contributor Author

dfsnow commented Aug 1, 2023

@ccao-jardine You missed the boat on this PR but these are good points. Moved to issue #4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create function to extract the leaf node assignments of an observation and output comparable observations

4 participants