# Basic Examples for SKlearn Prepackaged Server trained with Pachyderm and deployed to MinIO


## Prerequisites

 * A kubernetes cluster with kubectl configured
 * curl
 * pygmentize
 

## Setup Seldon Core

Use the setup notebook to [Setup Cluster](seldon_core_setup.ipynb) to setup Seldon Core with an ingress.


## Setup MinIO

Use the provided [notebook](../../../notebooks/minio_setup.ipynb) to install Minio in your cluster and configure `mc` CLI tool. 
Instructions [also online](./minio_setup.html).

## Python dependencies

This tutorial will require you to install pandas and scikit-learn in followint versions

In [1]:
!cat iris-trainer/requirements.txt

scikit-learn == 0.20.3
numpy >= 1.8.2
joblib >= 0.13.0
pandas >= 1.0.1
pyaml >= 5.3


You can do it by issuing following command

In [2]:
!pip install -r iris-trainer/requirements.txt



## Get Pachyderm CLI (pachctl) client tool

Follow steps relevant to your platform from official [documentation](https://docs.pachyderm.com/latest/getting_started/local_installation/#install-pachctl) in order to get the `pachctl` command line tool.

Verify correct client installation:

In [3]:
!pachctl version --client-only

1.10.0


## Install Pachyderm in cluster

Use pachctl deploy Pachyderm:

In [4]:
%%bash
kubectl create ns pachyderm
pachctl deploy local --no-expose-docker-socket --namespace pachyderm

namespace/pachyderm created
serviceaccount/pachyderm created
serviceaccount/pachyderm-worker created
clusterrole.rbac.authorization.k8s.io/pachyderm created
clusterrolebinding.rbac.authorization.k8s.io/pachyderm created
role.rbac.authorization.k8s.io/pachyderm-worker created
rolebinding.rbac.authorization.k8s.io/pachyderm-worker created
deployment.apps/etcd created
service/etcd created
service/pachd created
service/pachd-peer created
deployment.apps/pachd created
service/dash created
deployment.apps/dash created
secret/pachyderm-storage-secret created

Pachyderm is launching. Check its status with "kubectl get all"
Once launched, access the dashboard by running "pachctl port-forward"



In [5]:
!kubectl rollout status deployment -n pachyderm pachd

Waiting for deployment "pachd" rollout to finish: 0 of 1 updated replicas are available...
deployment "pachd" successfully rolled out


### port-forward pachyderm to localhost

in separate terminal:

```bash
pachctl port-forward
```

## Train model using Pachyderm

### And training data to Pachyderm "iris-input" repository

We will now use the helper python script to pull iris training data from sklearn

In [6]:
!pygmentize get-data.py

[34mfrom[39;49;00m [04m[36msklearn[39;49;00m [34mimport[39;49;00m datasets
[34mimport[39;49;00m [04m[36mpandas[39;49;00m [34mas[39;49;00m [04m[36mpd[39;49;00m
[34mimport[39;49;00m [04m[36mnumpy[39;49;00m [34mas[39;49;00m [04m[36mnp[39;49;00m


[34mdef[39;49;00m [32mmain[39;49;00m():
    [36mprint[39;49;00m([33m"[39;49;00m[33mGetting Iris Dataset[39;49;00m[33m"[39;49;00m)
    iris = datasets.load_iris()
    X, y = iris.data, iris.target

    data = pd.DataFrame(
        data=np.c_[iris[[33m"[39;49;00m[33mdata[39;49;00m[33m"[39;49;00m], iris[[33m"[39;49;00m[33mtarget[39;49;00m[33m"[39;49;00m]],
        columns=iris[[33m"[39;49;00m[33mfeature_names[39;49;00m[33m"[39;49;00m] + [[33m"[39;49;00m[33mtarget[39;49;00m[33m"[39;49;00m],
    )

    data.to_csv([33m"[39;49;00m[33mdata.csv[39;49;00m[33m"[39;49;00m, index=[34mFalse[39;49;00m)
    [36mprint[39;49;00m([33m"[39;49;00m[33mIris dataset saved to [39;49;00m[33m'[

In [7]:
!python get-data.py

Getting Iris Dataset
Iris dataset saved to 'data.csv' file


And put produced `data.csv` file into Pachyderm's  `iris-input` repository

In [8]:
%%bash
pachctl create repo iris-data
pachctl list repo

NAME      CREATED                SIZE (MASTER) DESCRIPTION 
iris-data Less than a second ago 0B                        


And then we use following python script to pull training dataset from sklearn

In [9]:
%%bash
pachctl put file iris-data@master -f data.csv
pachctl list commit iris-data

REPO      BRANCH COMMIT                           FINISHED               SIZE     PROGRESS DESCRIPTION
iris-data master 08f825f2fcdf4e6e8a7850472d1e7b47 Less than a second ago 3.005KiB -         


In [10]:
!pachctl list file iris-data@master

NAME      TYPE SIZE     
/data.csv file 3.005KiB 


### Create Pachyderm pipeline

Pachyderm Pipeline is defined by the following file

In [11]:
%%writefile train.json

{
  "pipeline": {
    "name": "iris"
  },
  "description": "A pipeline that trains simple Iris classifier.",
  "transform": {
    "cmd": [ "python3", "/train_iris.py" ],
    "image": "seldonio/pachyderm-iris-trainer:0.1"
  },
  "input": {
    "pfs": {
      "repo": "iris-data",
      "glob": "/*"
    }
  }
}


Overwriting train.json


In [12]:
!pachctl create pipeline -f train.json

### Verify pipeline success

Give pachyderm a moment to process the pipeline first!

In [14]:
!pachctl list job

ID                               PIPELINE STARTED        DURATION RESTART PROGRESS  DL       UL      STATE   
0cfd9559d38e4ac58cc54712bee5f67d iris     19 seconds ago 1 second 0       1 + 0 / 1 3.005KiB 1.01KiB [32msuccess[0m 


In [16]:
!pachctl list commit iris

REPO BRANCH COMMIT                           FINISHED       SIZE    PROGRESS DESCRIPTION
iris master f8849a38b3f64c4b8998abf1f732f486 27 seconds ago 1.01KiB -         


In [17]:
!pachctl list file iris@master

NAME          TYPE SIZE    
/model.joblib file 1.01KiB 


In [18]:
!pachctl get file iris@master:model.joblib > model.joblib

## Add trained model to remote S3 storage

### Create metadata.yaml 

In metadata we can use DVC's hash to version deployed models

In [19]:
%%writefile metadata.yaml

name: iris
versions: [iris/pachyderm:f8849a38b3f64c4b8998abf1f732f486]
platform: sklearn
inputs:
- datatype: BYTES
  name: input
  shape: [ 1, 4 ]
outputs:
- datatype: BYTES
  name: output
  shape: [ 3 ]

Overwriting metadata.yaml


### Create bucket for our trained model and push it

In [20]:
%%bash
mc mb minio-seldon/pachyderm-iris -p

mc cp model.joblib minio-seldon/pachyderm-iris/
mc cp metadata.yaml minio-seldon/pachyderm-iris/

Bucket created successfully `minio-seldon/pachyderm-iris`.
`model.joblib` -> `minio-seldon/pachyderm-iris/model.joblib`
Total: 0 B, Transferred: 1.01 KiB, Speed: 146.70 KiB/s
`metadata.yaml` -> `minio-seldon/pachyderm-iris/metadata.yaml`
Total: 0 B, Transferred: 205 B, Speed: 24.81 KiB/s


In [21]:
!mc ls minio-seldon/pachyderm-iris

[m[32m[2020-05-24 18:53:00 BST] [0m[33m   205B [0m[1mmetadata.yaml[0m
[0m[m[32m[2020-05-24 18:53:00 BST] [0m[33m 1.0KiB [0m[1mmodel.joblib[0m
[0m

## Deploy sklearn server

In [22]:
%%writefile secret.yaml

apiVersion: v1
kind: Secret
metadata:
  name: seldon-init-container-secret
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: minioadmin
  AWS_SECRET_ACCESS_KEY: minioadmin
  AWS_ENDPOINT_URL: http://minio.minio-system.svc.cluster.local:9000
  USE_SSL: "false"

Overwriting secret.yaml


In [23]:
!kubectl apply -f secret.yaml

secret/seldon-init-container-secret configured


In [24]:
%%writefile deploy.yaml

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: pachyderm-sklearn
spec:
  annotations:
    seldon.io/executor: "true"
  name: iris
  predictors:
  - componentSpecs:
    graph:
      children: []
      implementation: SKLEARN_SERVER
      modelUri: s3://pachyderm-iris
      envSecretRefName: seldon-init-container-secret
      name: classifier
    name: default
    replicas: 1

Overwriting deploy.yaml


In [25]:
!kubectl apply -f deploy.yaml

seldondeployment.machinelearning.seldon.io/pachyderm-sklearn created


In [26]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=pachyderm-sklearn -o jsonpath='{.items[0].metadata.name}')

Waiting for deployment "pachyderm-sklearn-default-0-classifier" rollout to finish: 0 of 1 updated replicas are available...
deployment "pachyderm-sklearn-default-0-classifier" successfully rolled out


## Test deployment

### Test prediction

In [29]:
%%bash
curl -s -X POST -H 'Content-Type: application/json' \
    -d '{"data":{"ndarray":[[5.964, 4.006, 2.081, 1.031]]}}' \
    http://localhost:8003/seldon/seldon/pachyderm-sklearn/api/v1.0/predictions  | jq .

{
  "data": {
    "names": [
      "t:0",
      "t:1",
      "t:2"
    ],
    "ndarray": [
      [
        0.9548873249364185,
        0.04505474761561256,
        5.792744796895459e-05
      ]
    ]
  },
  "meta": {}
}


### Test model metadata (optional)

In [30]:
%%bash
curl -s http://localhost:8003/seldon/seldon/pachyderm-sklearn/api/v1.0/metadata/classifier | jq .

{
  "inputs": [
    {
      "datatype": "BYTES",
      "name": "input",
      "shape": [
        1,
        4
      ]
    }
  ],
  "name": "iris",
  "outputs": [
    {
      "datatype": "BYTES",
      "name": "output",
      "shape": [
        3
      ]
    }
  ],
  "platform": "sklearn",
  "versions": [
    "iris/pachyderm:f8849a38b3f64c4b8998abf1f732f486"
  ]
}


## Cleanup

In [56]:
!kubectl delete -f deploy.yaml

seldondeployment.machinelearning.seldon.io "pachyderm-sklearn" deleted
