Create function to extract the leaf node assignments of an observation and output comparable observations

# Overview / Context

Our office often gets asked "What properties did you look at to value my property?" or "What are my comparables?" There's not a clear answer to this question since our models effectively look at _all_ available sales when training. However, there may be a way to extract pseudo-comparable observations by leveraging the structure of LightGBM's trees.

When LightGBM is training, it creates decision trees that split the continuous and categorical features into separate leaf nodes. For a single decision tree, it is likely that the observations that share the same terminal leaf nodes are very similar to each other, since they've ended up in the same split. We may be able to extract leaf node assignments and then back those assignments out into a set of comparables, or a similarity score.

The goal of this issue is to create a function that extracts either a set of comparables, a similarity index, or something similar. This will be part 1 of a longer project dedicated to making better tooling around comparables.

## Task

Create a function to extract comparables or a score given a:

- Serialized LightGBM model object
- Parsnip workflow object
- Data frame of sales (training data)
- A vector of observation IDs that require comparables

The function should have a signature something like:

```r
extract_comps <- function(object, workflow, new_data, obs_id) { 
   ...
}
```

It should return a nested list, where each nested item is a list of comparables for the input `obs_id`s.

## Additional Requirements

* The function should include documentation and should be integrated into the larger package.
* Please submit a merge request when ready to close this issue.
* The function should take a _vector_ input of IDs and quickly return an output. Imagine scaling this up to 1M+ IDs.
* Add as few dependencies as possible to the package `DESCRIPTION`. Ideally, add none.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create function to extract the leaf node assignments of an observation and output comparable observations #1

Overview / Context

Task

Additional Requirements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create function to extract the leaf node assignments of an observation and output comparable observations #1

Description

Overview / Context

Task

Additional Requirements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions