# Basic Examples for SKlearn Prepackaged Server train with DVC and deployed to MinIO


## Prerequisites

 * A kubernetes cluster with kubectl configured
 * curl
 * pygmentize
 

## Setup Seldon Core

Use the setup notebook to [Setup Cluster](seldon_core_setup.ipynb) to setup Seldon Core with an ingress.


## Setup MinIO

Use the provided [notebook](../../../notebooks/minio_setup.ipynb) to install Minio in your cluster and configure `mc` CLI tool. 
Instructions [also online](./minio_setup.html).

## Get DVC CLI tool

Using pip
```bash
pip install --user dvc
```

Or follow steps relevant to your platform from official [documentation](https://dvc.org/doc/install)

## Train model

Key points of training are defined in the following Makefile

In [1]:
!pygmentize Makefile

[32menv[39;49;00m:
	python3 -m venv .env
	./.env/bin/pip install --upgrade pip setuptools
	./.env/bin/pip install -r requirements.txt

[32mtrain[39;49;00m:
	.env/bin/python train_iris.py

[32mmodel[39;49;00m: env train


which creates a python environment `.env` and call following training script:

In [2]:
!pygmentize train_iris.py

[34mimport[39;49;00m [04m[36mjoblib[39;49;00m
[34mfrom[39;49;00m [04m[36msklearn[39;49;00m[04m[36m.[39;49;00m[04m[36mlinear_model[39;49;00m [34mimport[39;49;00m LogisticRegression
[34mfrom[39;49;00m [04m[36msklearn[39;49;00m[04m[36m.[39;49;00m[04m[36mpipeline[39;49;00m [34mimport[39;49;00m Pipeline
[34mfrom[39;49;00m [04m[36msklearn[39;49;00m [34mimport[39;49;00m datasets


[34mdef[39;49;00m [32mmain[39;49;00m():
    clf = LogisticRegression(solver=[33m"[39;49;00m[33mliblinear[39;49;00m[33m"[39;49;00m, multi_class=[33m'[39;49;00m[33movr[39;49;00m[33m'[39;49;00m)
    p = Pipeline([([33m"[39;49;00m[33mclf[39;49;00m[33m"[39;49;00m, clf)])
    [36mprint[39;49;00m([33m"[39;49;00m[33mTraining model...[39;49;00m[33m"[39;49;00m)
    p.fit(X, y)
    [36mprint[39;49;00m([33m"[39;49;00m[33mModel trained![39;49;00m[33m"[39;49;00m)

    filename_p = [33m"[39;49;00m[33mmodel.joblib[39;49;00m[33m"[39;49;00m
    [36mpri

### Initial model training (first run)

First training with dvc will crate the `model.dvc` file specifying the hash of output.
We will use that hash to version our model.

In [3]:
%%bash
dvc run -f model.dvc \
          -d Makefile -d requirements.txt -d train_iris.py \
          -o model.joblib \
          --overwrite-dvcfile \
          make model

Stage is cached, skipping.


In [4]:
!cat model.dvc

md5: 99124ac10a601ab1d9f07f9c392b5d89
cmd: make model
deps:
- md5: 65fd61883993b68d1937bdc36c59b20c
  path: Makefile
- md5: 7e8ce9f96492fee21db6a59c2b52f34d
  path: requirements.txt
- md5: 49c19c3ea9deb642066c0a457181cfbf
  path: train_iris.py
outs:
- md5: 8104914e6936da9864603b9bc4be2114
  path: model.joblib
  cache: true
  metric: false
  persist: false


The hash of the output is `8104914e6936da9864603b9bc4be2114`

### Reproducing results (successive later runs)

With DVC it is possible to repeat training in reproducible way as versions (hashes) of dependencies are all stored in the `model.dvc` file

In [5]:
%%bash
rm model.joblib -f
dvc repro model.dvc

Running command:
	make model
Output 'model.joblib' didn't change. Skipping saving.

To track the changes with git, run:

	git add model.dvc




## Add trained model to remote S3 storage

### Create metadata.yaml 

In metadata we can use DVC's hash to version deployed models

In [6]:
%%writefile metadata.yaml

name: iris
versions: [iris/dvc:8104914e6936da9864603b9bc4be2114]
platform: sklearn
inputs:
- datatype: BYTES
  name: input
  shape: [ 1, 4 ]
outputs:
- datatype: BYTES
  name: output
  shape: [ 3 ]

Overwriting metadata.yaml


### Create bucket for our trained model and push it

In [7]:
%%bash
mc mb minio-seldon/dvc-iris -p

mc cp model.joblib minio-seldon/dvc-iris/
mc cp metadata.yaml minio-seldon/dvc-iris/

Bucket created successfully `minio-seldon/dvc-iris`.
`model.joblib` -> `minio-seldon/dvc-iris/model.joblib`
Total: 0 B, Transferred: 1.05 KiB, Speed: 101.06 KiB/s
`metadata.yaml` -> `minio-seldon/dvc-iris/metadata.yaml`
Total: 0 B, Transferred: 199 B, Speed: 40.66 KiB/s


In [9]:
!mc ls minio-seldon/dvc-iris

[m[32m[2020-05-24 18:56:43 BST] [0m[33m   199B [0m[1mmetadata.yaml[0m
[0m[m[32m[2020-05-24 18:56:43 BST] [0m[33m 1.1KiB [0m[1mmodel.joblib[0m
[0m

## Deploy sklearn server

In [10]:
%%writefile secret.yaml

apiVersion: v1
kind: Secret
metadata:
  name: seldon-init-container-secret
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: minioadmin
  AWS_SECRET_ACCESS_KEY: minioadmin
  AWS_ENDPOINT_URL: http://minio.minio-system.svc.cluster.local:9000
  USE_SSL: "false"

Overwriting secret.yaml


In [11]:
!kubectl apply -f secret.yaml

secret/seldon-init-container-secret configured


In [12]:
%%writefile deploy.yaml

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: minio-dvc-sklearn
spec:
  annotations:
    seldon.io/executor: "true"
  name: iris
  predictors:
  - componentSpecs:
    graph:
      children: []
      implementation: SKLEARN_SERVER
      modelUri: s3://dvc-iris
      envSecretRefName: seldon-init-container-secret
      name: classifier
    name: default
    replicas: 1

Overwriting deploy.yaml


In [13]:
!kubectl apply -f deploy.yaml

seldondeployment.machinelearning.seldon.io/minio-dvc-sklearn created


In [14]:
!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=minio-dvc-sklearn -o jsonpath='{.items[0].metadata.name}')

Waiting for deployment "minio-dvc-sklearn-default-0-classifier" rollout to finish: 0 of 1 updated replicas are available...
deployment "minio-dvc-sklearn-default-0-classifier" successfully rolled out


## Test deployment

### Test prediction

In [15]:
%%bash
curl -s -X POST -H 'Content-Type: application/json' \
    -d '{"data":{"ndarray":[[5.964, 4.006, 2.081, 1.031]]}}' \
    http://localhost:8003/seldon/seldon/minio-dvc-sklearn/api/v1.0/predictions  | jq .

{
  "data": {
    "names": [
      "t:0",
      "t:1",
      "t:2"
    ],
    "ndarray": [
      [
        0.9548873249364185,
        0.04505474761561256,
        5.792744796895459e-05
      ]
    ]
  },
  "meta": {}
}


### Test model metadata (optional)

In [16]:
%%bash
curl -s http://localhost:8003/seldon/seldon/minio-dvc-sklearn/api/v1.0/metadata/classifier | jq .

{
  "inputs": [
    {
      "datatype": "BYTES",
      "name": "input",
      "shape": [
        1,
        4
      ]
    }
  ],
  "name": "iris",
  "outputs": [
    {
      "datatype": "BYTES",
      "name": "output",
      "shape": [
        3
      ]
    }
  ],
  "platform": "sklearn",
  "versions": [
    "iris/dvc:8104914e6936da9864603b9bc4be2114"
  ]
}


## Cleanup

In [17]:
!kubectl delete -f deploy.yaml

seldondeployment.machinelearning.seldon.io "minio-dvc-sklearn" deleted


In [20]:
!rm .env -r

rm: cannot remove '.env': No such file or directory
