Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: [VRD-711] Add batch prediction method to client #3645

Merged
merged 33 commits into from Mar 13, 2023
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8f0b66b
update url
hmacdonald-verta Mar 6, 2023
96f4c7a
Getting better, just need to fill out the TODOs now
hmacdonald-verta Mar 8, 2023
9a25cac
Finished adding the todos
hmacdonald-verta Mar 8, 2023
b27f0b4
remove unused import
hmacdonald-verta Mar 8, 2023
c3ff09b
Clean up!
hmacdonald-verta Mar 8, 2023
bc102bb
Add pandas as an optional requirement
hmacdonald-verta Mar 8, 2023
0be35d7
Update client/verta/verta/deployment/_deployedmodel.py
hmacdonald-verta Mar 8, 2023
12fd2b4
Make batch prediction url a property
hmacdonald-verta Mar 8, 2023
6e5b26f
Update client/verta/verta/deployment/_deployedmodel.py
hmacdonald-verta Mar 8, 2023
596a5fa
Update client/verta/verta/deployment/_deployedmodel.py
hmacdonald-verta Mar 8, 2023
93468a5
Update client/verta/verta/deployment/_deployedmodel.py
hmacdonald-verta Mar 8, 2023
37a4f5d
Update client/verta/verta/deployment/_deployedmodel.py
hmacdonald-verta Mar 8, 2023
83d114e
Update client/verta/verta/deployment/_deployedmodel.py
hmacdonald-verta Mar 8, 2023
a58e53a
Add one unit test!
hmacdonald-verta Mar 8, 2023
a856e44
handle indexes
hmacdonald-verta Mar 9, 2023
4d9f243
handle indexes correctly this time and fix tests so far
hmacdonald-verta Mar 9, 2023
93f4b8a
lots of fixes
hmacdonald-verta Mar 10, 2023
6d45ee2
remove axis thing
hmacdonald-verta Mar 10, 2023
e666ffa
clean up more
hmacdonald-verta Mar 10, 2023
6bed084
Fix doc string
hmacdonald-verta Mar 10, 2023
b59b61f
add nan test
hmacdonald-verta Mar 10, 2023
fa5abd8
Finish tidying up
hmacdonald-verta Mar 10, 2023
6378145
handle nans by converting to json ourselves
hmacdonald-verta Mar 11, 2023
eda87f4
Update client/verta/tests/unit_tests/test_deployed_model.py
hmacdonald-verta Mar 11, 2023
8868050
Update client/verta/tests/unit_tests/test_deployed_model.py
hmacdonald-verta Mar 11, 2023
7bc14ba
cleanup
hmacdonald-verta Mar 11, 2023
cdd5d12
more cleanup
hmacdonald-verta Mar 11, 2023
c88594e
EVEN more fixes
hmacdonald-verta Mar 11, 2023
bf823f6
EVEN more fixes
hmacdonald-verta Mar 11, 2023
a2e9dcb
Okay wow it works flawlessly
hmacdonald-verta Mar 13, 2023
f271d79
fix quote
hmacdonald-verta Mar 13, 2023
16ac5b5
Remove unnecessary comment
hmacdonald-verta Mar 13, 2023
71e1715
Remove unnecessary encoding
hmacdonald-verta Mar 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
134 changes: 134 additions & 0 deletions client/verta/tests/unit_tests/test_deployed_model.py
Expand Up @@ -3,6 +3,8 @@
import os
from typing import Any, Dict

import numpy as np
import pandas as pd
hmacdonald-verta marked this conversation as resolved.
Show resolved Hide resolved
import pytest
from requests import Session, HTTPError
from requests.exceptions import RetryError
Expand All @@ -15,6 +17,7 @@
from verta._internal_utils import http_session

PREDICTION_URL: str = 'https://test.dev.verta.ai/api/v1/predict/test_path'
BATCH_PREDICTION_URL: str = 'https://test.dev.verta.ai/api/v1/batch-predict/test_path'
TOKEN: str = '12345678-xxxx-1a2b-3c4d-e5f6g7h8'
MOCK_RETRY: Retry = http_session.retry_config(
max_retries=http_session.DEFAULT_MAX_RETRIES,
Expand Down Expand Up @@ -379,3 +382,134 @@ def test_predict_400_error_message_missing(mocked_responses) -> None:
'400 Client Error: Bad Request for url: '
'https://test.dev.verta.ai/api/v1/predict/test_path at '
)


def test_batch_predict_with_one_batch_with_no_output_index(mocked_responses) -> None:
""" Call batch_predict with a single batch. """
expected_df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "B": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]})
expected_df_json = expected_df.to_dict(orient="index")
mocked_responses.post(
BATCH_PREDICTION_URL,
json=expected_df_json,
status=200,
headers={"verta-request-id": "hereISaTESTidFROMtheUSER"},
)
creds = EmailCredentials.load_from_os_env()
dm = DeployedModel(
prediction_url=PREDICTION_URL,
creds=creds,
token=TOKEN,
)
# the input below is entirely irrelevant since it"s smaller than the batch size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you cmd-F replace ' with "

Suggested change
# the input below is entirely irrelevant since it"s smaller than the batch size
# the input below is entirely irrelevant since it's smaller than the batch size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya

prediction_df = dm.batch_predict(pd.DataFrame({"hi": "bye"}, index=[1]), 10)
# Since no index was provided, we can"t guarantee the index type for assertions
pd.testing.assert_frame_equal(expected_df.reset_index(drop=True), prediction_df.reset_index(drop=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I feel like resetting the index conceptually weakens the test (now we can't check index values if we ever wanted to).

assert_frame_equal() appears to have a check_index_type parameter. Would that have worked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a different tests with indexes, does that change your opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the check_index_type and it didn't do enough 😭

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, somewhere along the line, number indexes are converting to string indexes and that's sketchy. I should look into that.



def test_batch_predict_with_one_batch_with_output_index(mocked_responses) -> None:
""" Call batch_predict with a single batch, where the output has an index. """
expected_df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "B": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]}, index=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"])
expected_df_json = expected_df.to_dict(orient="index")
mocked_responses.post(
BATCH_PREDICTION_URL,
json=expected_df_json,
status=200,
headers={"verta-request-id": "hereISaTESTidFROMtheUSER"},
)
creds = EmailCredentials.load_from_os_env()
dm = DeployedModel(
prediction_url=PREDICTION_URL,
creds=creds,
token=TOKEN,
)
# the input below is entirely irrelevant since it"s smaller than the batch size
prediction_df = dm.batch_predict(pd.DataFrame({"hi": "bye"}, index=[1]), 10)
# Since an index WAS provided, we should be able to assert with indexes included
pd.testing.assert_frame_equal(expected_df, prediction_df)


def test_batch_predict_with_five_batches_of_one_with_no_indexes(mocked_responses) -> None:
""" Call batch_predict with five batches. """
hmacdonald-verta marked this conversation as resolved.
Show resolved Hide resolved
expected_d_list = [pd.DataFrame({"A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it should be df because the tests above use that in variable names

Suggested change
expected_d_list = [pd.DataFrame({"A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}),
expected_df_list = [pd.DataFrame({"A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}),

pd.DataFrame({"B": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]}),
pd.DataFrame({"C": [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]}),
pd.DataFrame({"D": [31, 32, 33, 34, 35, 36, 37, 38, 39, 40]}),
pd.DataFrame({"E": [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]}),
]
for expected_d in expected_d_list:
hmacdonald-verta marked this conversation as resolved.
Show resolved Hide resolved
mocked_responses.add(
responses.POST,
BATCH_PREDICTION_URL,
json=expected_d.to_dict(orient="index"),
status=200,
headers={"verta-request-id": "hereISaTESTidFROMtheUSER"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the header from these tests because it's unrelated haha (unless we do need it??)

)
creds = EmailCredentials.load_from_os_env()
dm = DeployedModel(
prediction_url=PREDICTION_URL,
creds=creds,
token=TOKEN,
)
input_df = pd.DataFrame({"a": [1, 2, 3, 4, 5], "b": [11, 12, 13, 14, 15]})
prediction_df = dm.batch_predict(input_df, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is a bit hard to read (and therefore probably hard to main/debug); it's unclear that the batch_size=1 here forces us to have five batches because input_df has five rows, and that it's related to expected_df_list having five DataFrames.

But I don't know how I'd write it any better 🤷

expected_final_df = pd.concat(expected_d_list)
# Since no index was provided, we can"t guarantee the index type for assertions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Since no index was provided, we can"t guarantee the index type for assertions
# Since no index was provided, we can't guarantee the index type for assertions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you tell I find-replaced all the single quotes? 😅

pd.testing.assert_frame_equal(expected_final_df.reset_index(drop=True), prediction_df.reset_index(drop=True))
hmacdonald-verta marked this conversation as resolved.
Show resolved Hide resolved


def test_batch_predict_with_five_batches_of_one_with_indexes(mocked_responses) -> None:
""" CCall batch_predict with five batches, where each dataframe has an explicitly defined index. """
hmacdonald-verta marked this conversation as resolved.
Show resolved Hide resolved
expected_d_list = [pd.DataFrame({"A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}, index=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]),
hmacdonald-verta marked this conversation as resolved.
Show resolved Hide resolved
pd.DataFrame({"B": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]}, index=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]),
pd.DataFrame({"C": [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]}, index=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]),
pd.DataFrame({"D": [31, 32, 33, 34, 35, 36, 37, 38, 39, 40]}, index=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]),
pd.DataFrame({"E": [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]}, index=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]),
]
for expected_d in expected_d_list:
mocked_responses.add(
responses.POST,
BATCH_PREDICTION_URL,
json=expected_d.to_dict(orient="index"),
status=200,
headers={"verta-request-id": "hereISaTESTidFROMtheUSER"},
)
creds = EmailCredentials.load_from_os_env()
dm = DeployedModel(
prediction_url=PREDICTION_URL,
creds=creds,
token=TOKEN,
)
input_df = pd.DataFrame({"a": [1, 2, 3, 4, 5], "b": [11, 12, 13, 14, 15]}, index=["A", "B", "C", "D", "E"])
prediction_df = dm.batch_predict(input_df, 1)
expected_final_df = pd.concat(expected_d_list)
pd.testing.assert_frame_equal(expected_final_df, prediction_df)




def test_batch_predict_with_five_batches_with_nans(mocked_responses) -> None:
""" CCall batch_predict with five batches, where each dataframe has an explicitly defined index. """
expected_d_list = [pd.DataFrame({"A": [1, 2, 3, 4, 5, np.nan, 7, 8, 9, 10]}, index=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]),
pd.DataFrame({"B": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]}, index=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]),
pd.DataFrame({"C": [21, 22, np.nan, 24, 25, 26, 27, 28, 29, 30]}, index=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]),
pd.DataFrame({"D": [31, 32, 33, 34, 35, 36, 37, np.nan, 39, 40]}, index=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]),
pd.DataFrame({"E": [41, 42, 43, 44, np.nan, 46, 47, 48, 49, 50]}, index=["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]),
]
for expected_d in expected_d_list:
mocked_responses.add(
responses.POST,
BATCH_PREDICTION_URL,
json=expected_d.to_dict(orient="index"),
status=200,
headers={"verta-request-id": "hereISaTESTidFROMtheUSER"},
)
creds = EmailCredentials.load_from_os_env()
dm = DeployedModel(
prediction_url=PREDICTION_URL,
creds=creds,
token=TOKEN,
)
input_df = pd.DataFrame({"a": [1, 2, np.nan, 4, 5], "b": [11, np.nan, 13, 14, 15]}, index=["A", "B", "C", "D", "E"])
prediction_df = dm.batch_predict(input_df, 1)
expected_final_df = pd.concat(expected_d_list)
pd.testing.assert_frame_equal(expected_final_df, prediction_df)