-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record doc numbers along with PINs when calculating comps #246
Record doc numbers along with PINs when calculating comps #246
Conversation
pipeline/04-interpret.R
Outdated
\(idx_row) { | ||
training_data[idx_row, ]$meta_sale_document_num | ||
}, | ||
.names = "comp_doc_no_{str_remove(col, 'comp_idx_')}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a variety of different ways we refer to this type of data -- alternately, doc_no
or instruno
or document_num
, depending on the context. What's the best choice for this particular case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stick with what's used in the model pipeline - document_num
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, done in d510fe7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this should be super helpful. Before we merge, let's also use this as an opportunity to test the Batch action by doing a re-run of the final res model with the updated comps. This will give us the new output and will ensure we haven't busted anything in the interim since we wrapped modeling.
pipeline/04-interpret.R
Outdated
\(idx_row) { | ||
training_data[idx_row, ]$meta_sale_document_num | ||
}, | ||
.names = "comp_doc_no_{str_remove(col, 'comp_idx_')}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stick with what's used in the model pipeline - document_num
.
@dfsnow I finally got a completed run finished with run ID |
@jeancochrane I took a look and everything looks good. Let's merge it! |
This PR updates the comps calculation step in the
interpret
stage to save comp document numbers along with PINs. With document numbers, we can update downstream code that consumes these comps to uniquely tie each comp to a specific sale, instead of having to infer the sale based only on the comp's PIN.I tested this locally on a subset of training/assessment data. The comps aren't great due to the limited size of the data, but it ran reasonably fast and confirmed that this processing code works. I'm happy to share the output of my test, or to kick off a remote comps run if that would make things easier to review. In the meantime, here's a quick peek at the output schema:
Closes https://github.com/ccao-data/pinval/issues/7.