Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Standard Ensembler with Route Name Path to SDK #248

Conversation

deadlycoconuts
Copy link
Contributor

@deadlycoconuts deadlycoconuts commented Sep 22, 2022

Context

With the changes introduced in #237 and #233 to support the usage of Standard Ensemblers with Custom Experiment Engines, the Turing API has been updated to handle different request/response payloads, most notably the addition of the route_name_path field in the standard ensembler configuration, thus necessitating a change to the Turing SDK to also handle these changes.

This PR introduces minor changes to the EnsemblerStandardConfig and StandardRouterEnsemblerConfig to support these changes as well as a new e2e test to mirror that of the newest API test introduced in #237.

Main Modifications

  • .github/workflows/turing.yaml - Removal of the paths-ignore value representing all sdk files in the repo
  • sdk/e2e/09_deploy_router_with_route_name_path_std_ensembler_test.py - Addition of an e2e test that corresponds to the latest API e2e test added in Apply new Standard Ensembler routing policy with Custom Experiment Engines  #237
  • sdk/turing/router/config/router_ensembler_config.py - Addition of the route_name_path field to EnsemblerStandardConfig and StandardRouterEnsemblerConfig

@deadlycoconuts deadlycoconuts self-assigned this Sep 22, 2022
@deadlycoconuts deadlycoconuts force-pushed the add_standard_ensembler_with_custom_exp_engine_to_sdk branch 2 times, most recently from 7647b93 to b943038 Compare September 22, 2022 09:40
@@ -27,7 +27,6 @@ on:
- "docs/**"
- "engines/pyfunc-ensembler-job/**"
- "engines/pyfunc-ensembler-service/**"
- "sdk/**"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the addition of SDK e2e tests, I believe it's probably useful to make the main CI pipeline run (e2e tests) whenever changes to the SDK are made.

Comment on lines 35 to 37
experiment_mappings: List[Dict[str, str]]
experiment_mappings: List[
turing.generated.models.EnsemblerStandardConfigExperimentMappings
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the type hint because it was always of the wrong type 👀

"""
self.experiment_mappings = experiment_mappings
self.route_name_path = route_name_path
self.experiment_mappings = experiment_mappings if experiment_mappings else []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since route_name_path and experiment_mappings are optional fields, we'll need to handle such edge cases. when both are empty, we'll be letting the API return an error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just configure the default [] value in the parameters?

experiment_mappings: List[Dict[str, str]] = [],

If None is configured, typically the typing used should be Optional, rather than a List in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If None is configured, typically the typing used should be Optional, rather than a List in this case

Agree with this - Optional is more appropriate.

Copy link
Contributor Author

@deadlycoconuts deadlycoconuts Sep 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just configure the default [] value in the parameters?

I believe there's an issue with using mutable default arguments in Python, which can potentially lead to unexpected behaviour - this causes warnings to pop up on certain IDEs:
https://medium.com/geekculture/do-not-use-objects-as-default-arguments-in-python-1c940212db2e

Thanks for the reminder on Optional, I'll add that in to the type hint.

@deadlycoconuts deadlycoconuts requested a review from a team September 22, 2022 11:01
@deadlycoconuts deadlycoconuts marked this pull request as ready for review September 22, 2022 11:01
@deadlycoconuts deadlycoconuts force-pushed the add_standard_ensembler_with_custom_exp_engine_to_sdk branch 2 times, most recently from a425c09 to 1683035 Compare September 23, 2022 04:23
Copy link
Collaborator

@terryyylim terryyylim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @deadlycoconuts ! 🚀 Left some comments to be addressed.

):
"""
Method to create a new standard ensembler

:param experiment_mappings: configured mappings between routes and treatments
:param route_name_path: configured routh name path that points to the route name within a given treatment config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo routh -> route

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow thanks for catching that!

):
"""
Method to create a new standard ensembler

:param experiment_mappings: configured mappings between routes and treatments
:param route_name_path: configured routh name path that points to the route name within a given treatment config
:param experiment_mappings: configured mappings between routes and treatments
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: formatting is misaligned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting that! Strange black didn't fix that 🤔

"""
self.experiment_mappings = experiment_mappings
self.route_name_path = route_name_path
self.experiment_mappings = experiment_mappings if experiment_mappings else []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just configure the default [] value in the parameters?

experiment_mappings: List[Dict[str, str]] = [],

If None is configured, typically the typing used should be Optional, rather than a List in this case.

@@ -545,6 +582,9 @@ def to_open_api(self) -> OpenApiModel:
)
for experiment_mapping in self.experiment_mappings
],
route_name_path=self.route_name_path
if self.route_name_path is not None
else "",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe an empty string "" is required because the API validator requires this field to be set right? Also, in what cases will this value be None, asking this because the default value should already be "" based on how you initialise this class, so I think this is not required?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question about why we need to manipulate the value here - can we just use None for experiment_mappings and route_name_path if they are not set? The API doesn't require them to be sent in, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe an empty string "" is required because the API validator requires this field to be set right?

can we just use None for experiment_mappings and route_name_path if they are not set? The API doesn't require them to be sent in, right?

Yeah the OpenAPI classes needs some value to be set and we can't just pass a None value directly. One way to go around this is to selectively pass non-None values into the kwargs dict:

    def to_open_api(self) -> OpenApiModel:
        kwargs = {}
        if self.experiment_mappings is not None:
            kwargs["experiment_mappings"] = self.experiment_mappings
        if self.route_name_path is not None:
            kwargs["route_name_path"] = self.route_name_path

        return turing.generated.models.EnsemblerStandardConfig(
            **kwargs,
        )

I've refactored the StandardRouterEnsemblerConfig class a little so we aren't manipulating the values of route_name_path in the to_open_api method but I've also changed the default values of experiment_mappings and route_name_path members to be None.

Also, in what cases will this (route_name_path) value be None[?]

In general, one of route_name_path or experiment_mappings will be None but not the other. This corresponds to the scenario when the user only wants the standard ensembler to be explicitly by the experiment mappings OR the route name path.

Note also that the SDK doesn't perform any validations to ensure one and only of these two fields are set, but the API will perform this and return an error if this property is violated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the OpenAPI classes needs some value to be set and we can't just pass a None value directly.

I've managed to change this by setting the nullable field as True within the OpenAPI specs for both experiment_mappings and route_name_path.

@@ -0,0 +1,123 @@
import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test necessary? I'm just thinking we don't include any E2E tests for all other ensembler types too. And, by this logic, shouldn't we also add an experiments_mapping ensembler test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question. Probably doesn't make sense to have a test for the "route name path" specifically. But we don't have any other e2e test for the standard ensembler at the moment. So I think it's useful to add this. Likewise, we have a test case internally using experiment_mappings, with the proprietary experiment engine.

Copy link
Contributor Author

@deadlycoconuts deadlycoconuts Sep 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, by this logic, shouldn't we also add an experiments_mapping ensembler test?

But we don't have any other e2e test for the standard ensembler at the moment.

We do!! There's an e2e test that uses the standard ensembler with the proprietary experiment engine (it's found in the blandly named 01_create_router_test file). We can perhaps rename that 01 test with a more explicit name to make it appear more explicitly as a test using the standard ensembler with experiment mappings if that helps 🤔

Is this test necessary?

Well, it is an SDK mirror of the 09_deploy_router_with_route_name_path_std_ensembler_test that we introduced for the API and so far we've decided to maintain both sets of e2e tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do!! There's an e2e test that uses the standard ensembler with the proprietary experiment engine

It's using the docker ensembler, no?

https://github.com/caraml-dev/turing/blob/main/api/e2e/test/testdata/create_router_nop_logger_proprietary_exp.json.tmpl#L77

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, oops I got ensemblers confused with the proprietary experiment engine 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it wouldn't take too much effort to set up an e2e test with the proprietary experiment engine and a standard ensembler 😅

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the e2e tests are not meant to test every combination of values (that's expected to be done by the unit/integration tests). So, IMO renaming this test to 09_deploy_router_with_std_ensembler_test, to have it represent std ensembler in general, may suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I've just renamed the test :)

.github/workflows/turing.yaml Show resolved Hide resolved
"""
self.experiment_mappings = experiment_mappings
self.route_name_path = route_name_path
self.experiment_mappings = experiment_mappings if experiment_mappings else []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If None is configured, typically the typing used should be Optional, rather than a List in this case

Agree with this - Optional is more appropriate.

@@ -545,6 +582,9 @@ def to_open_api(self) -> OpenApiModel:
)
for experiment_mapping in self.experiment_mappings
],
route_name_path=self.route_name_path
if self.route_name_path is not None
else "",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question about why we need to manipulate the value here - can we just use None for experiment_mappings and route_name_path if they are not set? The API doesn't require them to be sent in, right?

self.experiment_mappings = experiment_mappings
self.route_name_path = route_name_path
self.experiment_mappings = (
experiment_mappings # sif experiment_mappings else None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just self.experiment_mappings = experiment_mappings? And, I think the comment is self-explanatory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahah thanks for catching the comment that's not meant to be a comment 😅

Comment on lines 2350 to +2356
experiment_mappings:
items:
$ref: '#/components/schemas/EnsemblerStandardConfig_experiment_mappings'
nullable: true
type: array
route_name_path:
nullable: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two nullable conditions have to be added to force the autogenerated openapi classes to accept None values for both experiment_mappings and route_name_path.

@deadlycoconuts deadlycoconuts force-pushed the add_standard_ensembler_with_custom_exp_engine_to_sdk branch from 85a8ced to 2e6e523 Compare September 26, 2022 08:08
@deadlycoconuts deadlycoconuts force-pushed the add_standard_ensembler_with_custom_exp_engine_to_sdk branch from 2e6e523 to b0df608 Compare September 27, 2022 01:51
@deadlycoconuts
Copy link
Contributor Author

Thanks everyone for the comments! I'll be merging this now! :)

@deadlycoconuts deadlycoconuts merged commit ba721c3 into caraml-dev:main Sep 27, 2022
@deadlycoconuts deadlycoconuts deleted the add_standard_ensembler_with_custom_exp_engine_to_sdk branch September 27, 2022 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants