Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeflow Seldon e2e NLP ML pipeline using re-usable components #589

Merged
merged 24 commits into from
May 30, 2019

Conversation

axsaucedo
Copy link
Contributor

@axsaucedo axsaucedo commented May 26, 2019

This pull request contains an example showing step by step how to build an end to end machine learning pipeline with reusable components that are used in a Kubeflow Pipeline and a Seldon Graph.

The approach in this example shows how these reusable components can be trained through the Kubeflow UI after generating the work flows using their DSL, and then deploying that trained pipeline as a Seldon graph using the same re-usable components.

The full example is provided as a step by step notebook which can be found in the kubeflow_seldon_e2e_pipeline.ipynb.

The README.md file was generated from the notebook by running the jupyter nbconvert - - to markup, which provides a useful preview when entering the folder through the github browser.

There is currently a patch that needs to added due to the current version of kubeflow not having the latest argo images, which leads into an issue when attaching a volume The fix is from an issue that @ryandawsonuk managed to find a solution for this after raising it with the kubeflow team through Issue #1327 (Thanks Ryan!)

Some enhancements that I was thinking may be useful to add before it's landed include:

  • Adding a screenshot of Seldon Deploy UI on this Seldon Graph to show metrics
  • Adding a Makefile to allow for all the steps to be run easily without notebook
  • Add a Giff video showing the workflow running both the Kubeflow and Seldon pipeline

The full architectural diagram of this example was designed using Draw.io, and can be found in the repo as follows:

@axsaucedo
Copy link
Contributor Author

The full architectural diagram of this example was designed using Draw.io, and can be found in the repo as follows:

@axsaucedo
Copy link
Contributor Author

There is currently an issue that was uncovered when building this pipeline, which may require either constraining the implementation or extending the functionality. This is basically in the way that SeldonCore currently handles numpy arrays of tokens (i.e. lists of strings).

Currently the example only sends one sentence as a test which works correctly. However if more than one sentence is sent to the SeldonEngine for processing, it then is unable to process this request.

The reason why this is the case is because the return value is of the format np.array(list(str), list(str)), which translates into np.array(list(str).extend(list(str)).

To be more specific with an example, if we send as example np.array(["example one", "example 2")], the spacytokenizer image would return the value np.array.([list("example", "1"), list("example", "2")]), which SeldonEngine then converts into np.array(["example", "1", "example", "2"]).

@axsaucedo
Copy link
Contributor Author

Updated current example to make deployment of Seldon automated as a trigger by the last step in the Kubeflow pipeline. Now triggering the pipeline from the UI will generate a Seldon deployment with a unique name using the workflow ID as identifyer. Thanks to @ryandawsonuk's idea in the MNIST example it was possible to add the final "Deploy seldon":

Additionally, it can also be possible to visualise in Seldon Deploy instead of showing the Seldon Analytics grafana UI below:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants