Kubeflow Seldon e2e NLP ML pipeline using re-usable components #589

axsaucedo · 2019-05-26T07:29:41Z

This pull request contains an example showing step by step how to build an end to end machine learning pipeline with reusable components that are used in a Kubeflow Pipeline and a Seldon Graph.

The approach in this example shows how these reusable components can be trained through the Kubeflow UI after generating the work flows using their DSL, and then deploying that trained pipeline as a Seldon graph using the same re-usable components.

The full example is provided as a step by step notebook which can be found in the kubeflow_seldon_e2e_pipeline.ipynb.

The README.md file was generated from the notebook by running the jupyter nbconvert - - to markup, which provides a useful preview when entering the folder through the github browser.

There is currently a patch that needs to added due to the current version of kubeflow not having the latest argo images, which leads into an issue when attaching a volume The fix is from an issue that @ryandawsonuk managed to find a solution for this after raising it with the kubeflow team through Issue #1327 (Thanks Ryan!)

Some enhancements that I was thinking may be useful to add before it's landed include:

Adding a screenshot of Seldon Deploy UI on this Seldon Graph to show metrics
Adding a Makefile to allow for all the steps to be run easily without notebook
Add a Giff video showing the workflow running both the Kubeflow and Seldon pipeline

The full architectural diagram of this example was designed using Draw.io, and can be found in the repo as follows:

…xample

axsaucedo · 2019-05-26T08:05:47Z

The full architectural diagram of this example was designed using Draw.io, and can be found in the repo as follows:

axsaucedo · 2019-05-27T12:04:24Z

There is currently an issue that was uncovered when building this pipeline, which may require either constraining the implementation or extending the functionality. This is basically in the way that SeldonCore currently handles numpy arrays of tokens (i.e. lists of strings).

Currently the example only sends one sentence as a test which works correctly. However if more than one sentence is sent to the SeldonEngine for processing, it then is unable to process this request.

The reason why this is the case is because the return value is of the format np.array(list(str), list(str)), which translates into np.array(list(str).extend(list(str)).

To be more specific with an example, if we send as example np.array(["example one", "example 2")], the spacytokenizer image would return the value np.array.([list("example", "1"), list("example", "2")]), which SeldonEngine then converts into np.array(["example", "1", "example", "2"]).

…_example

axsaucedo · 2019-05-28T18:36:00Z

Updated current example to make deployment of Seldon automated as a trigger by the last step in the Kubeflow pipeline. Now triggering the pipeline from the UI will generate a Seldon deployment with a unique name using the workflow ID as identifyer. Thanks to @ryandawsonuk's idea in the MNIST example it was possible to add the final "Deploy seldon":

Additionally, it can also be possible to visualise in Seldon Deploy instead of showing the Seldon Analytics grafana UI below:

axsaucedo added 17 commits May 24, 2019 08:24

Initial pipeline steps

0170754

added nlp pipeline

e4c6993

added test pipeline

515ada0

added data downloader

7dc326a

added pipeline

53958ac

added data downloader

a764991

added tests for end to end

a135c2f

changed to model

1c6dc8d

added spacy

a580c03

Moved to s2i for seldon pipeline

79c06e7

Seldon graph working

3b8de40

Finished end to end pipeline, starting documentation

7eaba08

Merge remote-tracking branch 'origin/master' into kubeflow_pipeline_e…

345aefc

…xample

Added initial notebook

88c99a2

Added step by step

ca4fc5e

nbconvert readme

d9520f3

Updated readme

bac4784

seldondev added the size/XXL label May 26, 2019

axsaucedo requested review from ukclivecox, gsunner and ryandawsonuk May 26, 2019 07:44

ryandawsonuk approved these changes May 26, 2019

View reviewed changes

Shortened readme

063f386

axsaucedo added 5 commits May 27, 2019 16:39

Changed readme so it has one-level headers

66dcbb8

Merge remote-tracking branch 'upstream/master' into kubeflow_pipeline…

c400055

…_example

Changed headers to be one-level and added nblink

c9f987a

Added updated nlpipline

53dfb6d

Updated pipeline to include deploy step

7daf647

Updated example to showcase seldon analytics

4cc9c83

axsaucedo mentioned this pull request May 28, 2019

NDArray with values being lists not supported - RESOLVED: Proto lists were not being deep-copied #600

Closed

axsaucedo merged commit b1af919 into SeldonIO:master May 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubeflow Seldon e2e NLP ML pipeline using re-usable components #589

Kubeflow Seldon e2e NLP ML pipeline using re-usable components #589

axsaucedo commented May 26, 2019 •

edited

Loading

axsaucedo commented May 26, 2019

axsaucedo commented May 27, 2019

axsaucedo commented May 28, 2019

Kubeflow Seldon e2e NLP ML pipeline using re-usable components #589

Kubeflow Seldon e2e NLP ML pipeline using re-usable components #589

Conversation

axsaucedo commented May 26, 2019 • edited Loading

axsaucedo commented May 26, 2019

axsaucedo commented May 27, 2019

axsaucedo commented May 28, 2019

axsaucedo commented May 26, 2019 •

edited

Loading