Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [VRD-711] Add batch prediction method to client #3645

Merged
merged 33 commits into from Mar 13, 2023

Conversation

hmacdonald-verta
Copy link
Contributor

@hmacdonald-verta hmacdonald-verta commented Mar 6, 2023

Impact and Context

Create a batch_predict method that accepts and returns pandas.DataFrames. This method takes a dataframe, splits it into smaller dataframes of the provided batch_size to make predictions against the model, then reassembles the output to return to the user as one dataframe.

Risks and Area of Effect

Doesn't affect existing features.

Testing

  • Unit test
  • Deployed to dev env
  • Other (explain)

Added unit tests to cover the basic functionality. Planning to add more complex property tests shortly, in a separate PR. Also deployed to a dev env to ensure the pipeline works from end to end.

Also, python model tests look good with the updated json request with nans thing: https://jenkins.dev.verta.ai/job/test/job/python-models/job/python-models/34/

Reverting

  • Contains Migration - Do Not Revert

@hmacdonald-verta hmacdonald-verta marked this pull request as ready for review March 8, 2023 18:59
client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved
client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved
client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved
client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved
client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved
client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved
client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved
hmacdonald-verta and others added 7 commits March 8, 2023 13:27
Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>
Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>
Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>
Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>
Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>
Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>
Copy link
Contributor

@liuverta liuverta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙇

client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved
Copy link
Contributor

@liuverta liuverta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicknitpick

client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved
client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved
creds=creds,
token=TOKEN,
)
# the input below is entirely irrelevant since it"s smaller than the batch size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you cmd-F replace ' with "

Suggested change
# the input below is entirely irrelevant since it"s smaller than the batch size
# the input below is entirely irrelevant since it's smaller than the batch size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya

# the input below is entirely irrelevant since it"s smaller than the batch size
prediction_df = dm.batch_predict(pd.DataFrame({"hi": "bye"}, index=[1]), 10)
# Since no index was provided, we can"t guarantee the index type for assertions
pd.testing.assert_frame_equal(expected_df.reset_index(drop=True), prediction_df.reset_index(drop=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I feel like resetting the index conceptually weakens the test (now we can't check index values if we ever wanted to).

assert_frame_equal() appears to have a check_index_type parameter. Would that have worked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a different tests with indexes, does that change your opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the check_index_type and it didn't do enough 😭

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, somewhere along the line, number indexes are converting to string indexes and that's sketchy. I should look into that.

client/verta/tests/unit_tests/test_deployed_model.py Outdated Show resolved Hide resolved
client/verta/tests/unit_tests/test_deployed_model.py Outdated Show resolved Hide resolved
input_df = pd.DataFrame({"a": [1, 2, 3, 4, 5], "b": [11, 12, 13, 14, 15]})
prediction_df = dm.batch_predict(input_df, 1)
expected_final_df = pd.concat(expected_d_list)
# Since no index was provided, we can"t guarantee the index type for assertions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Since no index was provided, we can"t guarantee the index type for assertions
# Since no index was provided, we can't guarantee the index type for assertions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you tell I find-replaced all the single quotes? 😅

client/verta/tests/unit_tests/test_deployed_model.py Outdated Show resolved Hide resolved
client/verta/tests/unit_tests/test_deployed_model.py Outdated Show resolved Hide resolved
client/verta/tests/unit_tests/test_deployed_model.py Outdated Show resolved Hide resolved
Copy link
Contributor

@liuverta liuverta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Comment on lines 148 to 149
if not isinstance(body, bytes):
body = body.encode("utf-8")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can omit this since we're firmly in Python 3, now. json.dumps() will give us a utf-8 str

token=TOKEN,
)
input_df = pd.DataFrame({"a": [1, 2, 3, 4, 5], "b": [11, 12, 13, 14, 15]})
prediction_df = dm.batch_predict(input_df, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is a bit hard to read (and therefore probably hard to main/debug); it's unclear that the batch_size=1 here forces us to have five batches because input_df has five rows, and that it's related to expected_df_list having five DataFrames.

But I don't know how I'd write it any better 🤷

@hmacdonald-verta hmacdonald-verta merged commit 2200a64 into main Mar 13, 2023
@hmacdonald-verta hmacdonald-verta deleted the hm/VRD-711_addBatchPredictToClient branch March 13, 2023 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants