Skip to content

Conversation

@krithika369
Copy link
Collaborator

@krithika369 krithika369 commented May 4, 2022

This MR adds similar changes to the UI and SDK as #192, for standard ensemblers, to allow the user to select one of the configured routes as a fallback route. The fallback route will be activated in the following scenarios:

  • Call to the experiment engine failed
  • Experiment engine returned an experiment and/or treatment that is not configured in the treatment <> route mapping on the standard ensembler

In a scenario where the call to the chosen route fails, the fallback route will not be activated and the error response will be returned.

Under the hood, the fallback route still relied on the "default route" configuration in the Turing API and router.

Screenshot 2022-05-03 at 9 19 00 AM

Modifications

Router

  • engines/router/missionctl/fiberapi/routing_strategy.go - Experiment engine failure should use fallback route (used to return an error response). Selected route failure should be returned as is (used to choose default route).

UI

  • ui/src/services/router/TuringRouter.js, ui/src/services/version/RouterVersion.js - Handle standard ensembler's fallback route property translation with the API
  • ui/src/router/components/form/components/ensembler_config/standard_ensembler/StandardEnsemblerFormGroup.js - Form changes for configuring fallback route
  • ui/src/router/components/configuration/components/EnsemblerConfigSection.js - View changes for fallback route
  • ui/src/router/components/form/components/ensembler_config/RouteSelectionPanel.js - Common component for configuring Nop final response route / standard ensembler fallback route
  • ui/src/router/components/configuration/components/route_config_section/RouteConfigSection.js - Common component for displaying Nop final response route / standard ensembler fallback route

SDK

  • sdk/turing/router/config/router_ensembler_config.py - Add wrapper data class EnsemblerStandardConfig to encapsulate the fallback response route info, along with other standard config expected by the API
  • sdk/turing/router/config/router_config.py - Configure Nop/Standard ensembler's final response / fallback properties via the ensembler property setter rather than on init (to handle update correctly).

Updated the docs and unit tests where appropriate.

@krithika369 krithika369 requested a review from a team May 4, 2022 01:54

## Standard Ensembler

For experiment engines configured to work with standard ensemblers (i.e., Standard Experiment Engines with experiment selection enabled), the router will return a response from from one of the routes based on the configured mapping between routes and experiment treatments. At run time, the treatment returned by the engine will be used to select the corresponding route’s response.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: from one of the routes (extra from)


For experiment engines configured to work with standard ensemblers (i.e., Standard Experiment Engines with experiment selection enabled), the router will return a response from from one of the routes based on the configured mapping between routes and experiment treatments. At run time, the treatment returned by the engine will be used to select the corresponding route’s response.

In addition, a fallback route may be configured whose results will be used at runtime when the call to the experiment engine fails or if there does not exist a route mapping for the treatment generated by the experiment engine.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: or if a route mapping for the treatment generated by the experiment engine does not exist

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My way sounds fancier. :P But I'll take your suggestion.

if expErr != nil {
log.WithContext(ctx).Errorf(expErr.Error())
return nil, fallbacks, createFiberError(expErr)
return nil, fallbacks, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering this change, I don't think we're returning any errors anymore? So perhaps the 3rd response object could be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This method implements an interface exposed by Fiber (here). We could certainly consider removing it but we have to examine the usage more closely before making that change (i.e., even if it has 0 benefits within the Turing router, should Fiber, as a standalone library, stop supporting it?). I think we will cover this when we get to the error handling improvements in Turing.

Copy link
Collaborator

@deadlycoconuts deadlycoconuts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall!

ensembler = NopRouterEnsemblerConfig(final_response_route_id=final_response_route_id)
assert ensembler.nop_config == nop_config
assert ensembler.to_open_api() == expected
assert ensembler.nop_config == nop_config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reordering this for consistency!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha it's actually less for consistency. Now, the nop_config is only populated when the to_open_api() is run, so we have to check it after that. But I guess this behavior is consistent with the previous implementations (eg: PyfuncRouterEnsemblerConfig).

Earlier, I was also populating the nop_config property on the base class RouterEnsemblerConfig on init of the child class NopRouterEnsemblerConfig but figured it doesn't help much (i.e., you can keep changing the properties exposed by NopRouterEnsemblerConfig and it wont be reflected into the nop_config unless to_open_api is run anyway).

return None

@dataclass
class EnsemblerStandardConfig:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm not catching the motivation here, but what's the reason behind creating intermediary classes like EnsemblerStandardConfig and EnsemblerNopConfig?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy you asked this. It's a bit of a question I was having myself - whether these classes were really needed. I have come to the conclusion we can do away with them, with some refactoring.

Why they exist today is, when the router config is parsed out of the API response (such as Get Router), we initialise a RouterEnsemblerConfig object as opposed to the child class objects. With the base class object, you need to rely on the .nop_config or .standard_config to get all the properties, which is what these intermediate classes achieve (the autogenerated Turing classes do not have the final_response_route_id or fallback_response_route_id properties and these are initialised on the intermediate data classes).

Ideally, we can init the proper child classes using some creator factory in which case the child classes alone will suffice to hold these new properties as well as encompass the autogenerated Turing classes for the standard config, pyfunc config, etc. I haven't dug into this yet but will either include it in my next PR which closes the loop on this feature or add it to the backlog.

Does it make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes perfect sense; the thing about the router config needing to be parsed out of the API response and having to handle the final_response_route_id or fallback_response_route_id properties completely slipped my mind.

Ideally, we can init the proper child classes using some creator factory in which case the child classes alone will suffice to hold these new properties as well as encompass the autogenerated Turing classes for the standard config, pyfunc config, etc.

Yeah this would be perfect since it'll remove the potentially confusing naming of all the ensembler config classes (that are as of present only distinguishable from each other by the order in which words are being stuck together 😅); I kinda left the SDK in an awkward spot whereby the users get to create ensembler configs using the child classes when building a router config from scratch, but are forced to only manage the parent RouterEnsemblerConfig object when the router config conversely parsed out of the API response. But yeah this approach will definitely take a longer time to design/implement so it's definitely out of the scope of this PR.

Thanks a lot for the current implementation and for the clarification!

);
};

export const RouteSelectionPanel = ({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for refactoring this by moving this to a separate file!

@krithika369
Copy link
Collaborator Author

Thanks, all, for the review and the comments. I'll be merging this shortly.

@krithika369 krithika369 merged commit 0e9bf5a into caraml-dev:main May 5, 2022
krithika369 added a commit that referenced this pull request Jun 6, 2022
* Update workflows to use Python 3.7 for ensembler engines and sdk (#190)

* Update workflows to use Python 3.7 for ensembler engines and sdk

* Add none return option for config in Router SDK class

* Update text display settings to display entire image name

* Set container image name to overflow instead of being truncated

* Include response headers in logs (#191)

* Update proto and classes

* Add response headers and refactor naming of response bodies

* Refactor logging methods

* Fix tests

* Fix kafka tests

* Fix kafka tests

* Fix line breaks

* Fix line breaks

* Fix line breaks

* Fix typo in error message

* Refactor response fields

* Refactor response headers as map of strings

* Refactor tests to use map of strings to represent headers

* Fix non deterministic serialisation of hashmap in tests

* Refactor log handler

* Remove debug statement

* Refactor HTTP header formatting into a helper function

* Fix kafka protobuf test to use JSON as means of comparison

* Rename body in proto to response to avoid breaking changes

* Refactor all tests to use response instead of body to refer to request body

* Fix lint import suggestion

* Remove debug statements

* Rename variables in http header formatting helper function

* Fix BQ marshalling issues

* Fix lint import suggestion

* Fix lint comments

* Minor fixes for experiment engine configs in the helm chart (#193)

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Nop Ensembler Config (#192)

* UI changes for nop ensembler config

* Make handling of router / version status consistent

* SDK changes for default route

* Correct the default route id in unit tests

* Add tests for the nop ensembler config

* Update sample code and doc

* Add PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* [BUG] Fix undeploy router error (#194)

* Fix bugs in sample SDK script

* Refactor UndeployRouterVersion to take in cleanup flag

* Update DeploymentService mock class

* Fix bug in deployment controller test

* Refactor if else blocks

* Add tests for IsKnativeServiceInNamespace and IsSecretinNamespace

* Rename test method to make it consistent with the method tested

* Refactor k8s deletion methods into separate methods

* Fix bug in deleting deployments and services

* Fix typo in router name

* Rename default route from nothing to control

* Make undeploying a pending router status a cleanup job

* Refactor code to use ignoreNotFound flag

* Fix go mod file

* Bugfix: Turing API should process experiment engine passkey only if client selection enabled (#196)

* Bugfix: Passkey should not be processed if client selection disabled

* Update hardcoded sample plugin to use experiment variables, consistent with the runner

* Update RPC plugin example and docs

* Correct numbering in doc and plugin name change

* Add debug message

* Update Deployment controller to consider if client selection enabled

* Add another unit test case for TestIsClientSelectionEnabled

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Add Fallback Response Route Config for Standard Ensemblers (#197)

* UI changes for standard ensembler Fallback response

* Add validation for fallback resonse route id

* Update docs for the Standard Ensembler config

* Routing stragey changes for default route handling

* SDK changes for fallback response route id

* Amend user docs

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Remove the default route configuration (#198)

* Remove unused default route property

* Router: Make the default_route_id only required for DefaultTuringRoutingStrategy

* Make Default Route ID optional for the Turing Router creater / update API

* Update e2e tests

* UI: Update router view / edit to stop handling default route explicitly

* UI: Exclude routes with traffic rules in the final/fallback response options

* SDK: deprecate the default_route_id config

* SDK: Remove default route id from samples

* Update user docs

* Update OpenAPI bundle

* Address PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Adding SDK support for Python 3.8 and 3.9 (#199)

* Update SDK / engines to support multiple Python versions

* Pin cloudpickle at 2.0.0

* Introduce Python Version on the Pyfunc ensembler config

* Update SDK unit tests

* Update Github workflows

* Update chart values

* Update docs, unit tests

* Update SDK CI workflows to test on all Python versions

* Pin protobuf version at 3.20.1

* Address PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Bugfix: Clear default route Id from custom ensembler configs (#200)

* Miscellaneous bug fixes

* Add unit tests for the SDK changes, address PR comments

* Bugfix: Regression in display of container configs for Pyfunc

* Add type annotation to class methods

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Update chart version for app release (#201)

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Add dynamic loading of Experiment Engine config (#202)

* Add dynamic loading of exp engine config

* Address PR comments

* Add useEffect rerender

* Address PR comments

* Simplify conditional logic

* Attempt to fix yarn install error

Co-authored-by: Ewe Zi Yi <36802364+deadlycoconuts@users.noreply.github.com>
Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>
Co-authored-by: Terence Lim <terencelimxp@gmail.com>
krithika369 added a commit that referenced this pull request Jun 9, 2022
* Upgrade Go version/Knative/Spark Operator/Kubernetes Client/Docker Compose for dev environment. (#183)

* Bump versions for all k8s based libs

* Use proper context for each scenario.

* Upgrade virtual service version.

* Update k3d version to kubernetes 1.22.

* Upgrade Go to 1.16.

* Fix linting errors.

* Update k3d flags.

* Upgrade knative/istio/spark-operator versions in cluster init.

* Update default versions in cluster-init, change e2e test to new api

* Fail fast if default environment is wrong. Extra logging.

* Fix unit tests.

* Addressed PR comments.

* Pull requests to be run on any target branch.

* Upgrade to go 1.18, upgrade linter.

* Upgrade experiment and router to go 1.18.

* Update PR comments.

* Parameterise Go and Go Linter Versions.

* Update documentation with new versions and ports.

* Merge from main -> knative-upgrade branch (#203)

* Update workflows to use Python 3.7 for ensembler engines and sdk (#190)

* Update workflows to use Python 3.7 for ensembler engines and sdk

* Add none return option for config in Router SDK class

* Update text display settings to display entire image name

* Set container image name to overflow instead of being truncated

* Include response headers in logs (#191)

* Update proto and classes

* Add response headers and refactor naming of response bodies

* Refactor logging methods

* Fix tests

* Fix kafka tests

* Fix kafka tests

* Fix line breaks

* Fix line breaks

* Fix line breaks

* Fix typo in error message

* Refactor response fields

* Refactor response headers as map of strings

* Refactor tests to use map of strings to represent headers

* Fix non deterministic serialisation of hashmap in tests

* Refactor log handler

* Remove debug statement

* Refactor HTTP header formatting into a helper function

* Fix kafka protobuf test to use JSON as means of comparison

* Rename body in proto to response to avoid breaking changes

* Refactor all tests to use response instead of body to refer to request body

* Fix lint import suggestion

* Remove debug statements

* Rename variables in http header formatting helper function

* Fix BQ marshalling issues

* Fix lint import suggestion

* Fix lint comments

* Minor fixes for experiment engine configs in the helm chart (#193)

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Nop Ensembler Config (#192)

* UI changes for nop ensembler config

* Make handling of router / version status consistent

* SDK changes for default route

* Correct the default route id in unit tests

* Add tests for the nop ensembler config

* Update sample code and doc

* Add PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* [BUG] Fix undeploy router error (#194)

* Fix bugs in sample SDK script

* Refactor UndeployRouterVersion to take in cleanup flag

* Update DeploymentService mock class

* Fix bug in deployment controller test

* Refactor if else blocks

* Add tests for IsKnativeServiceInNamespace and IsSecretinNamespace

* Rename test method to make it consistent with the method tested

* Refactor k8s deletion methods into separate methods

* Fix bug in deleting deployments and services

* Fix typo in router name

* Rename default route from nothing to control

* Make undeploying a pending router status a cleanup job

* Refactor code to use ignoreNotFound flag

* Fix go mod file

* Bugfix: Turing API should process experiment engine passkey only if client selection enabled (#196)

* Bugfix: Passkey should not be processed if client selection disabled

* Update hardcoded sample plugin to use experiment variables, consistent with the runner

* Update RPC plugin example and docs

* Correct numbering in doc and plugin name change

* Add debug message

* Update Deployment controller to consider if client selection enabled

* Add another unit test case for TestIsClientSelectionEnabled

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Add Fallback Response Route Config for Standard Ensemblers (#197)

* UI changes for standard ensembler Fallback response

* Add validation for fallback resonse route id

* Update docs for the Standard Ensembler config

* Routing stragey changes for default route handling

* SDK changes for fallback response route id

* Amend user docs

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Remove the default route configuration (#198)

* Remove unused default route property

* Router: Make the default_route_id only required for DefaultTuringRoutingStrategy

* Make Default Route ID optional for the Turing Router creater / update API

* Update e2e tests

* UI: Update router view / edit to stop handling default route explicitly

* UI: Exclude routes with traffic rules in the final/fallback response options

* SDK: deprecate the default_route_id config

* SDK: Remove default route id from samples

* Update user docs

* Update OpenAPI bundle

* Address PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Adding SDK support for Python 3.8 and 3.9 (#199)

* Update SDK / engines to support multiple Python versions

* Pin cloudpickle at 2.0.0

* Introduce Python Version on the Pyfunc ensembler config

* Update SDK unit tests

* Update Github workflows

* Update chart values

* Update docs, unit tests

* Update SDK CI workflows to test on all Python versions

* Pin protobuf version at 3.20.1

* Address PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Bugfix: Clear default route Id from custom ensembler configs (#200)

* Miscellaneous bug fixes

* Add unit tests for the SDK changes, address PR comments

* Bugfix: Regression in display of container configs for Pyfunc

* Add type annotation to class methods

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Update chart version for app release (#201)

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Add dynamic loading of Experiment Engine config (#202)

* Add dynamic loading of exp engine config

* Address PR comments

* Add useEffect rerender

* Address PR comments

* Simplify conditional logic

* Attempt to fix yarn install error

Co-authored-by: Ewe Zi Yi <36802364+deadlycoconuts@users.noreply.github.com>
Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>
Co-authored-by: Terence Lim <terencelimxp@gmail.com>

* Update CI specs

* Revert UI changes during merge

* Update CI specs

* Update e2e deployment timeout

* Remove WIP inline comment

Co-authored-by: Ashwin <ashwinath@hotmail.com>
Co-authored-by: Ewe Zi Yi <36802364+deadlycoconuts@users.noreply.github.com>
Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>
Co-authored-by: Terence Lim <terencelimxp@gmail.com>
@krithika369 krithika369 deleted the std_ensembler_fallback branch June 14, 2022 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants