feat: [VRD-711] Add batch prediction method to client #3645

hmacdonald-verta · 2023-03-06T23:36:50Z

Impact and Context

Create a batch_predict method that accepts and returns pandas.DataFrames. This method takes a dataframe, splits it into smaller dataframes of the provided batch_size to make predictions against the model, then reassembles the output to return to the user as one dataframe.

Risks and Area of Effect

Doesn't affect existing features.

Testing

Unit test
Deployed to dev env
Other (explain)

Added unit tests to cover the basic functionality. Planning to add more complex property tests shortly, in a separate PR. Also deployed to a dev env to ensure the pipeline works from end to end.

Also, python model tests look good with the updated json request with nans thing: https://jenkins.dev.verta.ai/job/test/job/python-models/job/python-models/34/

Reverting

Contains Migration - Do Not Revert

client/verta/verta/deployment/_deployedmodel.py

Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>

liuverta

🙇

client/verta/verta/deployment/_deployedmodel.py

liuverta

nitpicknitpick

client/verta/verta/deployment/_deployedmodel.py

liuverta · 2023-03-10T22:58:45Z

client/verta/tests/unit_tests/test_deployed_model.py

+        creds=creds,
+        token=TOKEN,
+        )
+    # the input below is entirely irrelevant since it"s smaller than the batch size


did you cmd-F replace ' with "

Suggested change

# the input below is entirely irrelevant since it"s smaller than the batch size

# the input below is entirely irrelevant since it's smaller than the batch size

liuverta · 2023-03-10T23:05:52Z

client/verta/tests/unit_tests/test_deployed_model.py

+    # the input below is entirely irrelevant since it"s smaller than the batch size
+    prediction_df = dm.batch_predict(pd.DataFrame({"hi": "bye"}, index=[1]), 10)
+    # Since no index was provided, we can"t guarantee the index type for assertions
+    pd.testing.assert_frame_equal(expected_df.reset_index(drop=True), prediction_df.reset_index(drop=True))


Hmmm I feel like resetting the index conceptually weakens the test (now we can't check index values if we ever wanted to).

assert_frame_equal() appears to have a check_index_type parameter. Would that have worked?

I have a different tests with indexes, does that change your opinion?

I tried the check_index_type and it didn't do enough 😭

well, somewhere along the line, number indexes are converting to string indexes and that's sketchy. I should look into that.

client/verta/tests/unit_tests/test_deployed_model.py

liuverta · 2023-03-10T23:10:58Z

client/verta/tests/unit_tests/test_deployed_model.py

+    input_df = pd.DataFrame({"a": [1, 2, 3, 4, 5], "b": [11, 12, 13, 14, 15]})
+    prediction_df = dm.batch_predict(input_df, 1)
+    expected_final_df = pd.concat(expected_d_list)
+    # Since no index was provided, we can"t guarantee the index type for assertions


Suggested change

# Since no index was provided, we can"t guarantee the index type for assertions

# Since no index was provided, we can't guarantee the index type for assertions

can you tell I find-replaced all the single quotes? 😅

client/verta/tests/unit_tests/test_deployed_model.py

Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>

liuverta

🚀

liuverta · 2023-03-13T21:39:47Z

client/verta/verta/deployment/_deployedmodel.py

+            if not isinstance(body, bytes):
+                body = body.encode("utf-8")


We can omit this since we're firmly in Python 3, now. json.dumps() will give us a utf-8 str

liuverta · 2023-03-13T21:47:21Z

client/verta/tests/unit_tests/test_deployed_model.py

+        token=TOKEN,
+        )
+    input_df = pd.DataFrame({"a": [1, 2, 3, 4, 5], "b": [11, 12, 13, 14, 15]})
+    prediction_df = dm.batch_predict(input_df, 1)


This test is a bit hard to read (and therefore probably hard to main/debug); it's unclear that the batch_size=1 here forces us to have five batches because input_df has five rows, and that it's related to expected_df_list having five DataFrames.

But I don't know how I'd write it any better 🤷

update url

8f0b66b

hmacdonald-verta commented Mar 6, 2023

View reviewed changes

client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved

ewagner-verta reviewed Mar 7, 2023

View reviewed changes

client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved

ewagner-verta reviewed Mar 7, 2023

View reviewed changes

client/verta/verta/deployment/_deployedmodel.py Show resolved Hide resolved

ewagner-verta reviewed Mar 7, 2023

View reviewed changes

client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved

hmacdonald-verta added 4 commits March 7, 2023 17:35

Getting better, just need to fill out the TODOs now

96f4c7a

Finished adding the todos

9a25cac

remove unused import

b27f0b4

Clean up!

c3ff09b

hmacdonald-verta marked this pull request as ready for review March 8, 2023 18:59

Add pandas as an optional requirement

bc102bb

liuverta requested changes Mar 8, 2023

View reviewed changes

hmacdonald-verta and others added 7 commits March 8, 2023 13:27

Update client/verta/verta/deployment/_deployedmodel.py

0be35d7

Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>

Make batch prediction url a property

12fd2b4

Update client/verta/verta/deployment/_deployedmodel.py

6e5b26f

Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>

Update client/verta/verta/deployment/_deployedmodel.py

596a5fa

Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>

Update client/verta/verta/deployment/_deployedmodel.py

93468a5

Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>

Update client/verta/verta/deployment/_deployedmodel.py

37a4f5d

Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>

Update client/verta/verta/deployment/_deployedmodel.py

83d114e

Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>

liuverta requested changes Mar 8, 2023

View reviewed changes

client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved

hmacdonald-verta requested review from ewagner-verta and liuverta March 8, 2023 21:48

hmacdonald-verta added 6 commits March 8, 2023 14:56

Add one unit test!

a58e53a

handle indexes

a856e44

handle indexes correctly this time and fix tests so far

4d9f243

lots of fixes

93f4b8a

remove axis thing

6d45ee2

clean up more

e666ffa

liuverta requested changes Mar 10, 2023

View reviewed changes

client/verta/verta/deployment/_deployedmodel.py Outdated Show resolved Hide resolved

Fix doc string

6bed084

hmacdonald-verta added 2 commits March 10, 2023 14:30

add nan test

b59b61f

Finish tidying up

fa5abd8

hmacdonald-verta requested a review from liuverta March 10, 2023 22:43

liuverta requested changes Mar 10, 2023

View reviewed changes

client/verta/tests/unit_tests/test_deployed_model.py Outdated Show resolved Hide resolved

hmacdonald-verta and others added 8 commits March 10, 2023 16:46

handle nans by converting to json ourselves

6378145

Update client/verta/tests/unit_tests/test_deployed_model.py

eda87f4

Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>

Update client/verta/tests/unit_tests/test_deployed_model.py

8868050

Co-authored-by: Liu <96442646+liuverta@users.noreply.github.com>

cleanup

7bc14ba

more cleanup

cdd5d12

EVEN more fixes

c88594e

EVEN more fixes

bf823f6

Okay wow it works flawlessly

a2e9dcb

hmacdonald-verta requested a review from liuverta March 13, 2023 18:58

hmacdonald-verta added 2 commits March 13, 2023 12:00

fix quote

f271d79

Remove unnecessary comment

16ac5b5

liuverta approved these changes Mar 13, 2023

View reviewed changes

Remove unnecessary encoding

71e1715

hmacdonald-verta merged commit 2200a64 into main Mar 13, 2023

hmacdonald-verta deleted the hm/VRD-711_addBatchPredictToClient branch March 13, 2023 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: [VRD-711] Add batch prediction method to client #3645

feat: [VRD-711] Add batch prediction method to client #3645

hmacdonald-verta commented Mar 6, 2023 •

edited

liuverta left a comment

liuverta left a comment

liuverta Mar 10, 2023

hmacdonald-verta Mar 13, 2023

liuverta Mar 10, 2023

hmacdonald-verta Mar 11, 2023

hmacdonald-verta Mar 11, 2023

hmacdonald-verta Mar 11, 2023

liuverta Mar 10, 2023

hmacdonald-verta Mar 11, 2023

liuverta left a comment

liuverta Mar 13, 2023

liuverta Mar 13, 2023

	# the input below is entirely irrelevant since it"s smaller than the batch size
	# the input below is entirely irrelevant since it's smaller than the batch size

	# Since no index was provided, we can"t guarantee the index type for assertions
	# Since no index was provided, we can't guarantee the index type for assertions

feat: [VRD-711] Add batch prediction method to client #3645

feat: [VRD-711] Add batch prediction method to client #3645

Conversation

hmacdonald-verta commented Mar 6, 2023 • edited

Impact and Context

Risks and Area of Effect

Testing

Reverting

liuverta left a comment

Choose a reason for hiding this comment

liuverta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liuverta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hmacdonald-verta commented Mar 6, 2023 •

edited