# Seldon-Core Component Demo

If you are reading this then you are about to take Seldon-Core, a model serving framework, for a test drive.

Seldon-Core has been packaged as a [combinator component](https://combinator.ml/components/introduction/), which makes it easy to spin up a combination of MLOps components to make a stack. This notebook is running within the cluster, next to the Seldon-Core installation.

The following demo is a very short introduction to show you how to connect to seldon-core. But I recommend that you follow the [official documentation](https://docs.seldon.io/projects/seldon-core/en/latest/workflow/github-readme.html) for a comprehensive guide.

## Prerequisites

You will primarily interact with Seldon-Core via the Kubernetes API. This means we need to download `kubectl`.

`kubectl` usage, however, requires permission. This notebook needs permission to perform actions on the Kubernetes API. This is acheived in the test drive codebase by connecting the seldon-core operator cluster role to the default service account.

:warning: Connecting pre-existing cluster roles to default service accounts is not a good idea! :warning:

In [40]:
!wget -q -O /tmp/kubectl https://dl.k8s.io/release/v1.21.2/bin/linux/amd64/kubectl 
!cp /tmp/kubectl /opt/conda/bin # Move the binary to somewhere on the PATH
!chmod +x /opt/conda/bin/kubectl

## Deploy a Pre-Trained Model

The manifest below defines a `SeldonDeployment` using a pre-trained sklearn model. This leverages Seldon-Core's sklearn server implementation.

In [35]:
%%writefile deployment.yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: iris-model
  namespace: seldon
spec:
  name: iris
  predictors:
  - graph:
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/sklearn/iris
      name: classifier
    name: default
    replicas: 1

Overwriting deployment.yaml


And apply the manifest to the seldon namespace.

In [41]:
!kubectl -n seldon apply -f deployment.yaml

seldondeployment.machinelearning.seldon.io/iris-model unchanged


## Connect to the Pachyderm Cluster

Once downloaded, we need to connect to the cluster. It's running in the same namespace, so we can use Kubernetes' internal service DNS to get the IP of the running `pachd` container. By default the `grpc` port runs on `650`.

_Note: `pachctl` is a binary, so you need to prepend all demo commands with a `!` to run the command in the shell_

In [3]:
!pachctl config update context `pachctl config get active-context` --pachd-address=pachd:650

In [4]:
!pachctl list repo

NAME CREATED SIZE (MASTER) DESCRIPTION 


## Create a Repository

You haven't created any repositories yet (as seen above) so you need to create one to store the test data. Below I create a repository called images to store the incoming image data you will see shortly.

In [5]:
!pachctl create repo images
!pachctl list repo

NAME   CREATED      SIZE (MASTER) DESCRIPTION 
images 1 second ago 0B                        


## Add Data

Now it is time to add some random image data. Note the new size of the repository. Why not try [inspecting the commit history](https://docs.pachyderm.com/latest/concepts/data-concepts/commit/) at this point? Or try to [download the file](https://docs.pachyderm.com/latest/concepts/data-concepts/file/)?

In [6]:
!pachctl put file images@master:liberty.png -f http://imgur.com/46Q8nDz.png
!pachctl list repo

NAME   CREATED       SIZE (MASTER) DESCRIPTION 
images 5 seconds ago 57.27KiB                  


In [7]:
!pachctl list commit images

REPO   BRANCH COMMIT                           FINISHED      SIZE     PROGRESS DESCRIPTION
images master ae640f0bc8494f0fbfd8a8435f91390d 2 seconds ago 57.27KiB -         


## Create a Pipeline

A pipeline is like a pipe that connects one repository to another, with a processing step in-between. The example below connects to the `images` repository, performs some edge detection, and writes it to an `edges` repository. Take a look at the pipeline definition if you're interested and [the documentation](https://docs.pachyderm.com/latest/concepts/pipeline-concepts/).

In [8]:
!pachctl create pipeline -f https://raw.githubusercontent.com/pachyderm/pachyderm/master/examples/opencv/edges.json

For example, change 'python' to 'python:3' or 'bash' to 'bash:5'. This improves
reproducibility of your pipelines.



Now the pipeline has been created, you will see it automatically start to process all files in the `master` branch of the `images` repository.

In [9]:
!pachctl list pipeline edges 

NAME  VERSION INPUT     CREATED       STATE / LAST JOB   DESCRIPTION                                                                
edges 1       images:/* 3 seconds ago [32mrunning[0m / [33mstarting[0m A pipeline that performs image edge detection by using the OpenCV library. 


The flush command below is a useful way to get the notebook to wait until the pipeline that depends on this is complete. It might take a while to complete the first time because Kubernetes has to download the pipeline container.

In [10]:
!pachctl flush job images@master

ID                               PIPELINE STARTED       DURATION  RESTART PROGRESS  DL       UL       STATE   
81974c3c6b5f4fc49ba9b492047b7979 edges    3 seconds ago 3 seconds 0       1 + 0 / 1 57.27KiB 22.22KiB [32msuccess[0m 


In [11]:
!pachctl list job

ID                               PIPELINE STARTED        DURATION  RESTART PROGRESS  DL       UL       STATE   
81974c3c6b5f4fc49ba9b492047b7979 edges    40 seconds ago 3 seconds 0       1 + 0 / 1 57.27KiB 22.22KiB [32msuccess[0m 


## View the Result of the Pipeline

The pipeline created an output repository, which we can query to get a file. I'm using jupyter's inbuilt markdown processor to show the image.

In [12]:
!pachctl get file edges@master:liberty.png > liberty.png

![](liberty.png)

## Create Another Pipeline

You can create sophisticated graphs in Pachyderm. The example below is creating another pipeline called `montage` that uses the output of the `edges` pipeline.

As before, I run a few commands to wait for it to complete.

In [13]:
!pachctl create pipeline -f https://raw.githubusercontent.com/pachyderm/pachyderm/master/examples/opencv/montage.json

In [14]:
!pachctl flush job edges@master
!pachctl list job

ID                               PIPELINE STARTED       DURATION RESTART PROGRESS  DL       UL       STATE   
b4a6fcbb50774664b86519dc4ba713c5 montage  6 seconds ago 1 second 0       1 + 0 / 1 79.49KiB 381.1KiB [32msuccess[0m 
ID                               PIPELINE STARTED       DURATION  RESTART PROGRESS  DL       UL       STATE   
b4a6fcbb50774664b86519dc4ba713c5 montage  8 seconds ago 1 second  0       1 + 0 / 1 79.49KiB 381.1KiB [32msuccess[0m 
81974c3c6b5f4fc49ba9b492047b7979 edges    3 minutes ago 3 seconds 0       1 + 0 / 1 57.27KiB 22.22KiB [32msuccess[0m 


In [15]:
!pachctl get file montage@master:montage.png > montage.png

![](montage.png)

## Data Driven Pipelines

Automatically trigger both pipelines by adding some more data to the `images` repository. Remember that all commits are retained, and you can move back to them at any time.


In [16]:
!pachctl put file images@master:AT-AT.png -f http://imgur.com/8MN9Kg0.png
!pachctl put file images@master:kitten.png -f http://imgur.com/g2QnNqa.png

In [17]:
!pachctl flush job images@master
!pachctl list job

ID                               PIPELINE STARTED       DURATION  RESTART PROGRESS  DL       UL       STATE   
8eddf1f8918e440bae974547d526ce5c montage  5 seconds ago 5 seconds 0       1 + 0 / 1 371.9KiB 1.292MiB [32msuccess[0m 
0229702d3ea14c48a693e0461ef32e39 edges    9 seconds ago 2 seconds 0       1 + 2 / 3 102.4KiB 74.21KiB [32msuccess[0m 
ID                               PIPELINE STARTED            DURATION  RESTART PROGRESS  DL       UL       STATE   
8eddf1f8918e440bae974547d526ce5c montage  6 seconds ago      5 seconds 0       1 + 0 / 1 371.9KiB 1.292MiB [32msuccess[0m 
a5aff06ed6194402b63cf1ede766eb78 montage  10 seconds ago     3 seconds 0       1 + 0 / 1 195.3KiB 815.1KiB [32msuccess[0m 
0229702d3ea14c48a693e0461ef32e39 edges    10 seconds ago     2 seconds 0       1 + 2 / 3 102.4KiB 74.21KiB [32msuccess[0m 
0c4df8ef12e749f39ac0afceb19f9c8f edges    12 seconds ago     1 second  0       1 + 1 / 2 78.7KiB  37.15KiB [32msuccess[0m 
b4a6fcbb50774664b86519dc4ba713c5 

In [18]:
!pachctl get file montage@master:montage.png > montage.png

![](montage.png)

## Next Steps

That's it for this demo. I recommend looking through [the official documentation](https://docs.pachyderm.com/latest/).

Also try more demos by creating a new notebook and walk through the [Pachyderm examples](https://docs.pachyderm.com/latest/how-tos/).

Also try other [Combinator stacks](https://combinator.ml/stacks/introduction/) which integrate other products with Pachyderm for 10x multipliers!

## Clean Up

The following command deletes all data and pipelines, in case you want to run through the demo notebook again.

In [2]:
!pachctl delete repo --all && pachctl delete pipeline --all