Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new fast_predict() API method for prediction #581

Merged
merged 12 commits into from
Jan 9, 2023

Conversation

desilinguist
Copy link
Member

Closes #546.

The current method that rsmpredict uses is called compute_and_save_prediction() and is meant for batch prediction – it reads files from disks, does a lot of validation, and writes the predictions to disk. Therefore, it is not suited for real-time prediction on single instances which needs to happen in memory.

This PR adds a new function called fast_predict() to rsmpredict.py meant to run on a single data point rather than a batch. This function expects the user to have already read all of the necessary files from disk (e.g., in a RFS service's setup() function) and simply pass the in-memory data structures to this function. This function then pre-processes the features, generates predictions for the single data point using the model, and then post-processes the predictions.

In addition to adding this new function, this PR also adds tests for it and updates the API documentation to include this new function.

- Add a new function for real-time inference vs. the existing
  `compute_and_save_prediction()` function that is better suited
  to batch prediction.
- Also add logger creation that was missing before.
@codecov
Copy link

codecov bot commented Jan 5, 2023

Codecov Report

Base: 93.08% // Head: 93.12% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (48a8055) compared to base (5fe384b).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #581      +/-   ##
==========================================
+ Coverage   93.08%   93.12%   +0.03%     
==========================================
  Files          31       31              
  Lines        4529     4551      +22     
==========================================
+ Hits         4216     4238      +22     
  Misses        313      313              
Impacted Files Coverage Δ
rsmtool/preprocessor.py 96.60% <ø> (ø)
rsmtool/__init__.py 84.21% <100.00%> (ø)
rsmtool/modeler.py 97.59% <100.00%> (+0.01%) ⬆️
rsmtool/rsmpredict.py 99.14% <100.00%> (+0.17%) ⬆️
rsmtool/test_utils.py 72.54% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@desilinguist desilinguist requested review from tazin-afrin and cadygansen and removed request for aloukina January 5, 2023 17:53
Copy link
Contributor

@mulhod mulhod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Definitely been looking for something like this for awhile! (I still like the idea of being able to save something like these steps directly in a model file, but I think this is at least going in a good direction.)

I have a couple minor comments. One is just about the return value of the new function.

rsmtool/rsmpredict.py Outdated Show resolved Hide resolved
rsmtool/rsmpredict.py Outdated Show resolved Hide resolved
- Use a dictionary instead of a data frame since that better matches the
  input semantics and is easier to reason about.
tests/test_experiment_rsmpredict.py Outdated Show resolved Hide resolved
rsmtool/rsmpredict.py Show resolved Hide resolved
- Make `min_score` and `max_score` optional parameters
- Check that the optional parameters are specified if `predict_expected`
  is `True`, since we do not need them otherwise.
- Add a new unit test for this new check.
- Make all of the trimming and scaling parameters optional.
- Update return values to only return what can be computed given what
  parameters have been provided.
- Update the tests to test all possible combinations.
- Remove the computation of expected scores since that's very rarely
  used and just adds unnecessary complexity to this function.
@desilinguist
Copy link
Member Author

@mulhod, @amandameganchan, @tazin-afrin I have made the requested changes. Please re-review.

rsmtool/rsmpredict.py Outdated Show resolved Hide resolved
rsmtool/rsmpredict.py Outdated Show resolved Hide resolved
rsmtool/rsmpredict.py Outdated Show resolved Hide resolved
@desilinguist
Copy link
Member Author

@blongwill thanks for the great review! I have incorporated your suggestions. Please re-review and approve if everything looks good.

Copy link
Contributor

@blongwill blongwill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! It looks great!

Copy link

@tazin-afrin tazin-afrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new changes looks good to me, thanks!

@desilinguist desilinguist merged commit 722f3b8 into main Jan 9, 2023
@delete-merged-branch delete-merged-branch bot deleted the 546-refactor-rsmpredict branch January 9, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor compute_and_save predictions
5 participants