Skip to content

Conversation

@krithika369
Copy link
Collaborator

@krithika369 krithika369 commented May 25, 2022

The Turing Python SDK currently only supports Python version 3.7. This has been enforced because of cross-version compatibility issues with pickling.

Background

The Pyfunc ensemblers created using the SDK are saved as pickled artifacts and loaded into the Python server, when deployed as a batch job or a real-time webservice. In some cases (such as, when the Python objects are defined in the main scope; see cloudpipe/cloudpickle#404), the artifacts are unable to be loaded into a different Python version than the version they were created in. For example, an ensembler created using Python 3.8 / 3.9 cannot be served using a webserver built with Python 3.7. To workaround this limitation, both the SDK and the Pyfunc server base images were pinned at Python 3.7. This PR adds support for newer Python versions.

Solution

When the ensembler is submitted using the SDK, the user's minor python version is captured and saved to the pyfunc_ensemblers table. Base images for the batch ensembling jobs and real-time ensemblers will now be created for each supported Python version. At the time of ensembler deployment, the matching base image is chosen.

Alternatively, the base image could have been loaded with different Python versions and the right one activated at the time of ensembler image building and serving. But this would make the base image very bulky (adds another GB to the current size of 1.6 GB).

Changes

API

  • api/api/specs/ensemblers.yaml - Add python_version as an attribute of the PyFuncEnsembler
  • api/db-migrations/000011_add_pyfunc_ensemblers_python_version.up.sql - Add python_version to the DB table, defaulting it to 3.7.* for existing records.
  • api/turing/config/config.go - BaseImageRef is changed from a string to a map, to hold the image name per minor Python version
  • api/turing/batch/ensembling/runner.go, api/turing/service/router_deployment_service.go - Set the Python version as the base image tag when calling the image builder
  • api/turing/imagebuilder/imagebuilder.go - Pick the base image using the supplied tag

Engines

  • engines/**/Dockerfile - Use Python-version specific conda environment file
  • engines/**/Makefile - Add PYTHON_VERSION variable
  • engines/**/env-3.*.yaml - Split the existing environment.yaml into one per supported version
  • engines/**/requirements.txt - Upgraded some of the requirements to versions supported by all 3 supported Python versions, upgraded cloudpickle to 2.0.0. Pinning protobuf to 3.20.1 so the mlflow library can work on Python >=3.8.

SDK

  • sdk/setup.py - Update Python versions required
  • sdk/turing/ensembler.py - During create / update of Pyfunc ensembler, capture the user's Python version in the API call. The version is also added to the conda_env before logging with mlflow.

Other Changes

  • Update Github CI workflows to test the SDK / Pyfunc servers and publish the Pyfunc servers for each supported Python minor version
  • Update default Helm values to match the new Imagebuilder configs

CC: Merlin Dev (@ariefrahmansyah @tiopramayudi @pradithya)

@krithika369 krithika369 marked this pull request as draft May 25, 2022 00:47
batchEnsemblingJobRunner := batchensembling.NewBatchEnsemblingJobRunner(
batchEnsemblingController,
ensemblingJobService,
ensemblersService,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch job runner now needs the EnsemblersService, to query the python version information at the time of image building. See: api/turing/batch/ensembling/runner.go

jinja2==3.0.3 # temporary fix to downgrade to Python 3.7
jinja2==3.0.3
mlflow==1.14.1
numpy==1.17.3
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy 1.17.3 is not available for Python 3.9 (See: https://anaconda.org/anaconda/numpy/files?version=1.17.3)

cloudpickle==1.2.2
orjson==2.6.8
cloudpickle==2.0.0
orjson==3.6.8
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orjson 2.6.8 is not available for Python 3.9

DestinationRegistry: ghcr.io
BaseImageRef: ghcr.io/gojek/turing/pyfunc-ensembler-service:latest
BaseImageRef:
3.7.*: ghcr.io/gojek/turing/pyfunc-ensembler-service:latest
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the base values to match the new format. Will capture the mapping for the new python versions and the corresponding public artifacts in chart version 0.2.9.

@krithika369 krithika369 changed the title Draft: Adding SDK support for Python 3.8 and 3.9 Adding SDK support for Python 3.8 and 3.9 May 26, 2022
@krithika369 krithika369 marked this pull request as ready for review May 26, 2022 00:49
@krithika369 krithika369 requested a review from a team May 26, 2022 01:44
@krithika369 krithika369 self-assigned this May 26, 2022
Copy link
Collaborator

@terryyylim terryyylim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @krithika369, just a few nitpicks! 🚀

@krithika369 krithika369 merged commit 5f146f2 into caraml-dev:main May 28, 2022
@krithika369 krithika369 deleted the python_sdk_versions branch May 28, 2022 08:17
krithika369 added a commit that referenced this pull request Jun 6, 2022
* Update workflows to use Python 3.7 for ensembler engines and sdk (#190)

* Update workflows to use Python 3.7 for ensembler engines and sdk

* Add none return option for config in Router SDK class

* Update text display settings to display entire image name

* Set container image name to overflow instead of being truncated

* Include response headers in logs (#191)

* Update proto and classes

* Add response headers and refactor naming of response bodies

* Refactor logging methods

* Fix tests

* Fix kafka tests

* Fix kafka tests

* Fix line breaks

* Fix line breaks

* Fix line breaks

* Fix typo in error message

* Refactor response fields

* Refactor response headers as map of strings

* Refactor tests to use map of strings to represent headers

* Fix non deterministic serialisation of hashmap in tests

* Refactor log handler

* Remove debug statement

* Refactor HTTP header formatting into a helper function

* Fix kafka protobuf test to use JSON as means of comparison

* Rename body in proto to response to avoid breaking changes

* Refactor all tests to use response instead of body to refer to request body

* Fix lint import suggestion

* Remove debug statements

* Rename variables in http header formatting helper function

* Fix BQ marshalling issues

* Fix lint import suggestion

* Fix lint comments

* Minor fixes for experiment engine configs in the helm chart (#193)

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Nop Ensembler Config (#192)

* UI changes for nop ensembler config

* Make handling of router / version status consistent

* SDK changes for default route

* Correct the default route id in unit tests

* Add tests for the nop ensembler config

* Update sample code and doc

* Add PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* [BUG] Fix undeploy router error (#194)

* Fix bugs in sample SDK script

* Refactor UndeployRouterVersion to take in cleanup flag

* Update DeploymentService mock class

* Fix bug in deployment controller test

* Refactor if else blocks

* Add tests for IsKnativeServiceInNamespace and IsSecretinNamespace

* Rename test method to make it consistent with the method tested

* Refactor k8s deletion methods into separate methods

* Fix bug in deleting deployments and services

* Fix typo in router name

* Rename default route from nothing to control

* Make undeploying a pending router status a cleanup job

* Refactor code to use ignoreNotFound flag

* Fix go mod file

* Bugfix: Turing API should process experiment engine passkey only if client selection enabled (#196)

* Bugfix: Passkey should not be processed if client selection disabled

* Update hardcoded sample plugin to use experiment variables, consistent with the runner

* Update RPC plugin example and docs

* Correct numbering in doc and plugin name change

* Add debug message

* Update Deployment controller to consider if client selection enabled

* Add another unit test case for TestIsClientSelectionEnabled

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Add Fallback Response Route Config for Standard Ensemblers (#197)

* UI changes for standard ensembler Fallback response

* Add validation for fallback resonse route id

* Update docs for the Standard Ensembler config

* Routing stragey changes for default route handling

* SDK changes for fallback response route id

* Amend user docs

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Remove the default route configuration (#198)

* Remove unused default route property

* Router: Make the default_route_id only required for DefaultTuringRoutingStrategy

* Make Default Route ID optional for the Turing Router creater / update API

* Update e2e tests

* UI: Update router view / edit to stop handling default route explicitly

* UI: Exclude routes with traffic rules in the final/fallback response options

* SDK: deprecate the default_route_id config

* SDK: Remove default route id from samples

* Update user docs

* Update OpenAPI bundle

* Address PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Adding SDK support for Python 3.8 and 3.9 (#199)

* Update SDK / engines to support multiple Python versions

* Pin cloudpickle at 2.0.0

* Introduce Python Version on the Pyfunc ensembler config

* Update SDK unit tests

* Update Github workflows

* Update chart values

* Update docs, unit tests

* Update SDK CI workflows to test on all Python versions

* Pin protobuf version at 3.20.1

* Address PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Bugfix: Clear default route Id from custom ensembler configs (#200)

* Miscellaneous bug fixes

* Add unit tests for the SDK changes, address PR comments

* Bugfix: Regression in display of container configs for Pyfunc

* Add type annotation to class methods

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Update chart version for app release (#201)

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Add dynamic loading of Experiment Engine config (#202)

* Add dynamic loading of exp engine config

* Address PR comments

* Add useEffect rerender

* Address PR comments

* Simplify conditional logic

* Attempt to fix yarn install error

Co-authored-by: Ewe Zi Yi <36802364+deadlycoconuts@users.noreply.github.com>
Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>
Co-authored-by: Terence Lim <terencelimxp@gmail.com>
krithika369 added a commit that referenced this pull request Jun 9, 2022
* Upgrade Go version/Knative/Spark Operator/Kubernetes Client/Docker Compose for dev environment. (#183)

* Bump versions for all k8s based libs

* Use proper context for each scenario.

* Upgrade virtual service version.

* Update k3d version to kubernetes 1.22.

* Upgrade Go to 1.16.

* Fix linting errors.

* Update k3d flags.

* Upgrade knative/istio/spark-operator versions in cluster init.

* Update default versions in cluster-init, change e2e test to new api

* Fail fast if default environment is wrong. Extra logging.

* Fix unit tests.

* Addressed PR comments.

* Pull requests to be run on any target branch.

* Upgrade to go 1.18, upgrade linter.

* Upgrade experiment and router to go 1.18.

* Update PR comments.

* Parameterise Go and Go Linter Versions.

* Update documentation with new versions and ports.

* Merge from main -> knative-upgrade branch (#203)

* Update workflows to use Python 3.7 for ensembler engines and sdk (#190)

* Update workflows to use Python 3.7 for ensembler engines and sdk

* Add none return option for config in Router SDK class

* Update text display settings to display entire image name

* Set container image name to overflow instead of being truncated

* Include response headers in logs (#191)

* Update proto and classes

* Add response headers and refactor naming of response bodies

* Refactor logging methods

* Fix tests

* Fix kafka tests

* Fix kafka tests

* Fix line breaks

* Fix line breaks

* Fix line breaks

* Fix typo in error message

* Refactor response fields

* Refactor response headers as map of strings

* Refactor tests to use map of strings to represent headers

* Fix non deterministic serialisation of hashmap in tests

* Refactor log handler

* Remove debug statement

* Refactor HTTP header formatting into a helper function

* Fix kafka protobuf test to use JSON as means of comparison

* Rename body in proto to response to avoid breaking changes

* Refactor all tests to use response instead of body to refer to request body

* Fix lint import suggestion

* Remove debug statements

* Rename variables in http header formatting helper function

* Fix BQ marshalling issues

* Fix lint import suggestion

* Fix lint comments

* Minor fixes for experiment engine configs in the helm chart (#193)

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Nop Ensembler Config (#192)

* UI changes for nop ensembler config

* Make handling of router / version status consistent

* SDK changes for default route

* Correct the default route id in unit tests

* Add tests for the nop ensembler config

* Update sample code and doc

* Add PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* [BUG] Fix undeploy router error (#194)

* Fix bugs in sample SDK script

* Refactor UndeployRouterVersion to take in cleanup flag

* Update DeploymentService mock class

* Fix bug in deployment controller test

* Refactor if else blocks

* Add tests for IsKnativeServiceInNamespace and IsSecretinNamespace

* Rename test method to make it consistent with the method tested

* Refactor k8s deletion methods into separate methods

* Fix bug in deleting deployments and services

* Fix typo in router name

* Rename default route from nothing to control

* Make undeploying a pending router status a cleanup job

* Refactor code to use ignoreNotFound flag

* Fix go mod file

* Bugfix: Turing API should process experiment engine passkey only if client selection enabled (#196)

* Bugfix: Passkey should not be processed if client selection disabled

* Update hardcoded sample plugin to use experiment variables, consistent with the runner

* Update RPC plugin example and docs

* Correct numbering in doc and plugin name change

* Add debug message

* Update Deployment controller to consider if client selection enabled

* Add another unit test case for TestIsClientSelectionEnabled

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Add Fallback Response Route Config for Standard Ensemblers (#197)

* UI changes for standard ensembler Fallback response

* Add validation for fallback resonse route id

* Update docs for the Standard Ensembler config

* Routing stragey changes for default route handling

* SDK changes for fallback response route id

* Amend user docs

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Remove the default route configuration (#198)

* Remove unused default route property

* Router: Make the default_route_id only required for DefaultTuringRoutingStrategy

* Make Default Route ID optional for the Turing Router creater / update API

* Update e2e tests

* UI: Update router view / edit to stop handling default route explicitly

* UI: Exclude routes with traffic rules in the final/fallback response options

* SDK: deprecate the default_route_id config

* SDK: Remove default route id from samples

* Update user docs

* Update OpenAPI bundle

* Address PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Adding SDK support for Python 3.8 and 3.9 (#199)

* Update SDK / engines to support multiple Python versions

* Pin cloudpickle at 2.0.0

* Introduce Python Version on the Pyfunc ensembler config

* Update SDK unit tests

* Update Github workflows

* Update chart values

* Update docs, unit tests

* Update SDK CI workflows to test on all Python versions

* Pin protobuf version at 3.20.1

* Address PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Bugfix: Clear default route Id from custom ensembler configs (#200)

* Miscellaneous bug fixes

* Add unit tests for the SDK changes, address PR comments

* Bugfix: Regression in display of container configs for Pyfunc

* Add type annotation to class methods

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Update chart version for app release (#201)

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Add dynamic loading of Experiment Engine config (#202)

* Add dynamic loading of exp engine config

* Address PR comments

* Add useEffect rerender

* Address PR comments

* Simplify conditional logic

* Attempt to fix yarn install error

Co-authored-by: Ewe Zi Yi <36802364+deadlycoconuts@users.noreply.github.com>
Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>
Co-authored-by: Terence Lim <terencelimxp@gmail.com>

* Update CI specs

* Revert UI changes during merge

* Update CI specs

* Update e2e deployment timeout

* Remove WIP inline comment

Co-authored-by: Ashwin <ashwinath@hotmail.com>
Co-authored-by: Ewe Zi Yi <36802364+deadlycoconuts@users.noreply.github.com>
Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>
Co-authored-by: Terence Lim <terencelimxp@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants