Add new `fast_predict()` API method for prediction #581

desilinguist · 2023-01-05T17:09:34Z

Closes #546.

The current method that rsmpredict uses is called compute_and_save_prediction() and is meant for batch prediction – it reads files from disks, does a lot of validation, and writes the predictions to disk. Therefore, it is not suited for real-time prediction on single instances which needs to happen in memory.

This PR adds a new function called fast_predict() to rsmpredict.py meant to run on a single data point rather than a batch. This function expects the user to have already read all of the necessary files from disk (e.g., in a RFS service's setup() function) and simply pass the in-memory data structures to this function. This function then pre-processes the features, generates predictions for the single data point using the model, and then post-processes the predictions.

In addition to adding this new function, this PR also adds tests for it and updates the API documentation to include this new function.

- Add a new function for real-time inference vs. the existing `compute_and_save_prediction()` function that is better suited to batch prediction.

- Also add logger creation that was missing before.

codecov · 2023-01-05T17:37:22Z

Codecov Report

Base: 93.08% // Head: 93.12% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (48a8055) compared to base (5fe384b).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #581      +/-   ##
==========================================
+ Coverage   93.08%   93.12%   +0.03%     
==========================================
  Files          31       31              
  Lines        4529     4551      +22     
==========================================
+ Hits         4216     4238      +22     
  Misses        313      313

Impacted Files	Coverage Δ
rsmtool/preprocessor.py	`96.60% <ø> (ø)`
rsmtool/__init__.py	`84.21% <100.00%> (ø)`
rsmtool/modeler.py	`97.59% <100.00%> (+0.01%)`	⬆️
rsmtool/rsmpredict.py	`99.14% <100.00%> (+0.17%)`	⬆️
rsmtool/test_utils.py	`72.54% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

mulhod

This is great. Definitely been looking for something like this for awhile! (I still like the idea of being able to save something like these steps directly in a model file, but I think this is at least going in a good direction.)

I have a couple minor comments. One is just about the return value of the new function.

rsmtool/rsmpredict.py

- Use a dictionary instead of a data frame since that better matches the input semantics and is easier to reason about.

rsmtool/rsmpredict.py

tests/test_experiment_rsmpredict.py

rsmtool/rsmpredict.py

- Make `min_score` and `max_score` optional parameters - Check that the optional parameters are specified if `predict_expected` is `True`, since we do not need them otherwise. - Add a new unit test for this new check.

- Make all of the trimming and scaling parameters optional. - Update return values to only return what can be computed given what parameters have been provided. - Update the tests to test all possible combinations. - Remove the computation of expected scores since that's very rarely used and just adds unnecessary complexity to this function.

desilinguist · 2023-01-06T21:40:56Z

@mulhod, @amandameganchan, @tazin-afrin I have made the requested changes. Please re-review.

tests/test_experiment_rsmpredict.py

rsmtool/rsmpredict.py

Better refactoring of trim computation.

desilinguist · 2023-01-08T16:04:07Z

@blongwill thanks for the great review! I have incorporated your suggestions. Please re-review and approve if everything looks good.

blongwill

Thanks for this! It looks great!

tazin-afrin

The new changes looks good to me, thanks!

desilinguist added 6 commits January 4, 2023 18:00

Add new fast_predict() function for prediction.

4a4c394

- Add a new function for real-time inference vs. the existing `compute_and_save_prediction()` function that is better suited to batch prediction.

Add test for new fast_predict() function.

bc92ed6

Add fast_predict() function to API docs.

5cdc747

Handle non-numeric features in fast_predict().

0e78db4

- Also add logger creation that was missing before.

Increase test coverage for fast_predict().

6e18cc5

Add missing period.

9194d67

desilinguist requested review from mulhod, slava92, aloukina, Frost45, damien2012eng and tamarl08 January 5, 2023 17:09

desilinguist self-assigned this Jan 5, 2023

desilinguist requested a review from amandameganchan January 5, 2023 17:13

desilinguist requested review from tazin-afrin and cadygansen and removed request for aloukina January 5, 2023 17:53

mulhod approved these changes Jan 6, 2023

View reviewed changes

rsmtool/rsmpredict.py Outdated Show resolved Hide resolved

rsmtool/rsmpredict.py Outdated Show resolved Hide resolved

Change return type of fast_predict().

9958a22

- Use a dictionary instead of a data frame since that better matches the input semantics and is easier to reason about.

mulhod approved these changes Jan 6, 2023

View reviewed changes

amandameganchan reviewed Jan 6, 2023

View reviewed changes

rsmtool/rsmpredict.py Outdated Show resolved Hide resolved

tazin-afrin reviewed Jan 6, 2023

View reviewed changes

tests/test_experiment_rsmpredict.py Outdated Show resolved Hide resolved

rsmtool/rsmpredict.py Show resolved Hide resolved

desilinguist added 2 commits January 6, 2023 15:58

Make Modeler.predict() more flexible.

78ddd5f

- Make `min_score` and `max_score` optional parameters - Check that the optional parameters are specified if `predict_expected` is `True`, since we do not need them otherwise. - Add a new unit test for this new check.

desilinguist requested review from tazin-afrin, mulhod and amandameganchan January 6, 2023 21:41