-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Overview / Context
Our office often gets asked "What properties did you look at to value my property?" or "What are my comparables?" There's not a clear answer to this question since our models effectively look at all available sales when training. However, there may be a way to extract pseudo-comparable observations by leveraging the structure of LightGBM's trees.
When LightGBM is training, it creates decision trees that split the continuous and categorical features into separate leaf nodes. For a single decision tree, it is likely that the observations that share the same terminal leaf nodes are very similar to each other, since they've ended up in the same split. We may be able to extract leaf node assignments and then back those assignments out into a set of comparables, or a similarity score.
The goal of this issue is to create a function that extracts either a set of comparables, a similarity index, or something similar. This will be part 1 of a longer project dedicated to making better tooling around comparables.
Task
Create a function to extract comparables or a score given a:
- Serialized LightGBM model object
- Parsnip workflow object
- Data frame of sales (training data)
- A vector of observation IDs that require comparables
The function should have a signature something like:
extract_comps <- function(object, workflow, new_data, obs_id) {
...
}It should return a nested list, where each nested item is a list of comparables for the input obs_ids.
Additional Requirements
- The function should include documentation and should be integrated into the larger package.
- Please submit a merge request when ready to close this issue.
- The function should take a vector input of IDs and quickly return an output. Imagine scaling this up to 1M+ IDs.
- Add as few dependencies as possible to the package
DESCRIPTION. Ideally, add none.