# Production-level PyTorch Hub models in k8s with 10 lines of Python code

In this example we will showcase how to use Seldon to use our re-usable PyTorch Hub Seldon Deployment to be able to productionise your PyTorch Hub models using Seldon.

PyTorch Hub is a model repository designed to facilitate research reproducibility, and includes trained and untrained versions of the most popular models out there (such as BERT, GP2, VGG, etc). 

Extending PyTorch with Seldon Deployments allows us to go from research models to robust reproducible production machine learning systems.

The architectural diagram of the the PyTorchHub SeldonDeployment project is the following:

![](images/pytorchhub-seldondep-overview.jpg)

### This tutorial will break down in the following sections:

1) Set-up all dependencies

2) Build your own PyTorch Hub Seldon Deployment

3) Let's deploy MobileNet from PyTorch Hub

4) Deploy a few more models and visualise metrics

### Before you start
Make sure you install the following dependencies, as they are critical for this example to work:

* Helm v2.13.1+
* A Kubernetes cluster running v1.13 or above (minkube / docker-for-windows work well if enough RAM)
* kubectl v1.14+
* Python 3.6+
* Python DEV requirements (we'll install them below)

Let's get started! 🚀🔥 


## 1) Set-up all the dependencies

Before we start, we will need to run the Seldon Deployment and Seldon Analytics in your Kubernetes cluster.

Please refer to these two tutorials:

* Installing and running Seldon in your Kubernetes Cluster
* Installing and running Seldon Analytics in your Kubernetes Cluster

## 2) Build your own PyTorch Hub Seldon Deployment 

In this section now we'll learn how we actually built the generic Seldon wrapper for pytorch hub.

If you would like to skip this section and just deploy a model, we've created a pre-built image you can use in the next steps.

This will allow for you to extend it if you need any  specific custom functionality.

This is quite simple, and only requires 4 steps:

2.1) Write a Python simple wrapper that exposes a `predict` function

2.2) Specify the dependencies for your model through a requirements.txt file

2.3) Run the source2image configuration command to build your container image

### 2.1) Write a simple Python wrapper that exposes the functionality through a predict function

In [4]:
!mkdir deployment_image

mkdir: cannot create directory ‘deployment_image’: File exists


In [5]:
%%writefile deployment_image/PyTorchSeldonDeployment.py
import os, torch, pickle

class PyTorchSeldonDeployment:

    def __init__(self, repo, name):
        self._repo = repo
        self._name = name
        self._model = torch.hub.load(
                self._repo,
                self._name,
                pretrained=True)

        if torch.cuda.is_available(): 
            model.to('cuda')

    def predict(self, X_bytes, feature_names=[]):
        X = pickle.loads(X_bytes)

        if torch.cuda.is_available():
            X = X.to('cuda')

        response = self._model(X)

        response_bytes = pickle.dumps(response)
        return response_bytes

Overwriting deployment_image/PyTorchSeldonDeployment.py


### 2.2) Specify the dependencies for your model through a requirements.txt file

In [6]:
%%writefile deployment_image/requirements.txt
numpy==1.15.4
torch==1.1.0
image==1.5.27

Overwriting deployment_image/requirements.txt


### 2.3) Use source2image to build your container image

In [170]:
!mkdir deployment_image/.s2i

In [171]:
%%writefile deployment_image/.s2i/environment
MODEL_NAME=PyTorchSeldonDeployment
API_TYPE=REST
SERVICE_TYPE=MODEL
PERSISTENCE=0

Writing deployment_image/.s2i/environment


In [None]:
!s2i deployment_image/. s2i build . seldonio/seldon-core-s2i-python36:0.8 pytorchseldon:0.1

## 3) Let's deploy MobileNet from PyTorch Hub

Before we dive into the code itself (10 lines of code), we'll try to deploy a couple of models.

For this section we will be deploying a couple of examples.

### Create a Seldon Graph Definition

The seldon graph definition is basically a computational DAG (Directed Acyclic Graph) definition for the components in your machine learning pipeline.

In this case we define our graph below - the structure contains the main three key pieces:

1) An outline of the containers used in the graph (each with a name for reference)

2) The hierarchical order in which the graph will execute (referencing the containers through the names)

3) The parameters that we pass to the model

We will start by creating an easily re-usable template where we'll be able to specify the model by replacing the string `MODEL_NAME` with the command:

```
sed 's|MODEL_NAME|CHOSEN_MODEL|g' pytorch_seldon_template.yaml
```

You can also use the `seldonio/pytorchseldon:0.1` image if you prefer to use our hosted image (it's the same as above).

In [7]:
%%writefile pytorch_seldon_template.yaml
---
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: pytorchhub-DEPLOYMENT_NAME-deployment
spec:
  annotations:
    project_name: PyTorch Hub Engine
    deployment_version: v1
  name: pytorchhub-DEPLOYMENT_NAME-deployment
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: pytorchseldon:0.1
          imagePullPolicy: IfNotPresent
          name: pytorchhub-DEPLOYMENT_NAME
        terminationGracePeriodSeconds: 20
    graph:
      name: pytorchhub-DEPLOYMENT_NAME
      endpoint:
        type: REST
      type: MODEL
      children: []
      parameters:
      - name: repo
        type: STRING
        value: pytorch/vision
      - name: name
        type: STRING
        value: MODEL_NAME
    name: pytorchhub-DEPLOYMENT_NAME-engine
    replicas: 1


Writing pytorch_seldon_template.yaml


### Details: Defining the PyTorch Model through the parameters

As you can see in the `pytorch-MODEL_NAME` model we will add, we are passing the name of the model we want as the parameter "name", which is what will specify which pytorch hub model to use:

```
    graph:
      name: pytorch-MODEL_NAME
      endpoint:
        type: REST
      type: MODEL
      children: []
      parameters:
      - name: repo
        type: STRING
        value: pytorch/vision
      - name: name
        type: STRING
        value: MODEL_NAME
```

In this section is where we can just change the configuration and a new PyTorch Model will be downloaded.

### Run the configuration for the VGG

Now that we've defined the configuration file, we will deploy it with the `mobilenet_v2` model.

For this we just have to run the replaced file into `kubectl apply -f -`. 

In this case we are applying the file with the model `mobilenet_v2` and the deployment name `"mnet"`.

In [1]:
%%bash 
sed 's|MODEL_NAME|mobilenet_v2|g; s|DEPLOYMENT_NAME|mnet|g' pytorch_seldon_template.yaml | \
    kubectl create -f -

seldondeployment.machinelearning.seldon.io/pytorchhub-mnet-deployment created


### Wait until the MobileNet network is ready

The container will first download the trained image and then spin up when ready. You can check if the pod has successfully started by running the following command:

In [122]:
!kubectl get pods | grep mnet

pytorchhub-mnet-deployment-pytorchhub-mnet-engine-e33c4ea-vr454   2/2     Running     0          48s


If you want to make sure that it's still downloading, you can attach to the logs of that container with the following command:

**NOTE: With this command below it attaches to the container, so when you see that the microservice is running you can stop the command so you can continue, otherwise it will just run forever**

In [123]:
!kubectl logs -f pytorchhub-mnet-deployment-pytorchhub-mnet-engine-e33c4ea-vr454 pytorchhub-mnet -f

starting microservice
2019-06-23 09:16:55,856 - seldon_core.microservice:main:154 - INFO:  Starting microservice.py:main
2019-06-23 09:16:55,857 - seldon_core.microservice:load_annotations:104 - INFO:  Found annotation deployment_version:v1 
2019-06-23 09:16:55,857 - seldon_core.microservice:load_annotations:104 - INFO:  Found annotation kubernetes.io/config.seen:2019-06-23T09:16:52.9451781Z 
2019-06-23 09:16:55,857 - seldon_core.microservice:load_annotations:104 - INFO:  Found annotation kubernetes.io/config.source:api 
2019-06-23 09:16:55,857 - seldon_core.microservice:load_annotations:104 - INFO:  Found annotation project_name:PyTorch Hub Engine 
2019-06-23 09:16:55,857 - seldon_core.microservice:load_annotations:104 - INFO:  Found annotation prometheus.io/path:prometheus 
2019-06-23 09:16:55,857 - seldon_core.microservice:load_annotations:104 - INFO:  Found annotation prometheus.io/port:8000 
2019-06-23 09:16:55,857 - seldon_core.microservice:load_annotations:104 - INFO:  F

^C


## Interact with your deployed model via the REST API

We now have a Seldon Deployment of the MobileNet V2 listening for commands. We can send a request using the Seldon Client:

In [2]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(
    gateway="ambassador", 
    gateway_endpoint="localhost:80",
    namespace="default",
    payload_type="bytes",
    transport="rest")

### Get some data to send to the model

In [3]:
import urllib
url, filename = ("https://github.com/pytorch/hub/raw/master/dog.jpg", "dog.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

We can use the image file you downloaded called `dog.jpg`
![](dog.jpg)

### Convert the image into a tensor

In [4]:
from PIL import Image
from torchvision import transforms
input_image = Image.open(filename)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model
input_batch.shape

torch.Size([1, 3, 224, 224])

In [5]:
import pickle
from seldon_core.utils import seldon_message_to_json

seldon_message_proto = sc.predict(
    bin_data=pickle.dumps(input_batch), 
    deployment_name="pytorchhub-mnet-deployment",
    names=["image_features"])

ConnectionError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /seldon/default/pytorchhub-mnet-deployment/api/v0.1/predictions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe279f861d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

In [164]:
import base64

def unpickled_proto_response(bin_data):
    seldon_message_dict = seldon_message_to_json(seldon_message_proto.response)
    decoded_data = base64.b64decode(seldon_message_dict['binData'])
    return pickle.loads(decoded_data)

In [144]:
result = unpickled_proto_response(seldon_message_proto)
print(result)

tensor([[-1.4330e+00, -4.9740e-01, -1.0928e+00, -2.1204e+00, -8.3041e-01,
          1.3389e-01, -8.7533e-01,  1.5980e+00,  1.3235e+00, -1.0711e+00,
         -5.9168e-01, -7.8693e-01, -2.0388e-01, -7.9074e-01, -1.7512e+00,
         -2.5161e-01, -1.4408e-01,  2.6692e-01,  2.1198e-01, -8.0811e-01,
         -2.2025e+00, -1.3507e+00, -2.1353e+00,  5.0871e-01, -7.9707e-01,
         -1.5032e+00, -1.0638e+00, -1.6637e+00, -1.2285e+00, -1.5053e-01,
         -1.1163e+00, -1.2226e+00, -8.9330e-01, -1.5911e-01,  1.1483e-01,
         -7.1935e-01,  1.0405e+00, -1.1046e+00, -6.9327e-01,  1.6837e+00,
          1.9495e-01, -1.0450e+00, -5.6622e-01,  4.1831e-01,  1.4902e-01,
         -1.4781e+00,  3.2513e-02,  3.3965e-01, -1.1519e+00, -1.6894e+00,
         -7.9146e-01,  3.7503e-01,  4.6002e-01, -4.9682e-01, -4.0816e-01,
         -1.2634e+00, -8.6771e-01, -1.5838e+00, -4.5533e-01, -5.6035e-01,
          4.5671e-01, -4.1993e-01, -2.0381e-01, -1.3952e-01, -4.6230e-01,
         -1.5501e-02, -4.7154e-01, -3.

## 4) Deploy a few more models and visualise metrics

Now that we know what our workflow looks like, we can deploy more models. In this case we'll deploy a squeezenet model, and a densenet model.

#### Deploying the squezenet model

In [128]:
%%bash 
sed 's|MODEL_NAME|squeezenet1_0|g; s|DEPLOYMENT_NAME|snet|g' pytorch_seldon_template.yaml | \
    kubectl apply -f -

seldondeployment.machinelearning.seldon.io/pytorchhub-snet-deployment configured


#### Deploying the densenet model

In [131]:
%%bash 
sed 's|MODEL_NAME|densenet121|g; s|DEPLOYMENT_NAME|dnet|g' pytorch_seldon_template.yaml | \
    kubectl apply -f -

seldondeployment.machinelearning.seldon.io/pytorchhub-dnet-deployment created


In [None]:
%%bash 
sed 's|MODEL_NAME|densenet121|g; s|DEPLOYMENT_NAME|dnet|g' pytorch_seldon_template.yaml | \
    kubectl apply -f -

### Interact with both models using the same SeldonClient

##### Let's start with the squeezenet (which we called Snet)

In [163]:
seldon_message_proto = sc.predict(
    bin_data=pickle.dumps(input_batch), 
    deployment_name="pytorchhub-snet-deployment",
    names=["image_features"])

unpickled_proto_response(seldon_message_proto)

<class 'NoneType'>


tensor([[ 7.3344,  7.9111, 13.0409, 11.7355, 11.1145,  9.1319,  9.6167, 18.9222,
         23.1842, 12.5593,  7.8293,  9.0537,  8.3661, 10.6228,  5.0446,  7.1391,
         13.5780, 13.3673, 10.9086,  8.3717, 10.1240, 14.2630, 14.6676, 17.8737,
         13.0535,  4.7774,  5.0082,  7.2305,  4.5269, 15.5195,  5.9449,  3.8930,
          5.0701,  6.1585,  6.2826,  4.8049,  7.6353,  7.0481,  5.4418,  6.2629,
          3.8449,  4.1280,  4.3896,  5.7580,  4.6446,  5.4393,  5.0399,  5.0743,
          7.8497,  5.5055,  5.5675, 11.2643,  5.2156,  5.7681,  7.4943,  5.1767,
          5.8598,  4.3628,  3.9076,  4.4946,  4.3012,  6.5492,  6.7286,  4.0350,
          4.5268,  4.4444,  5.9506,  5.5000,  4.5276,  1.8928,  3.2864,  1.9421,
          3.7542,  4.3784,  4.2017,  5.4236,  6.0290,  4.1888,  5.7185,  4.0853,
          8.0802, 12.2144, 11.7385, 11.3793, 15.5667,  9.3782, 12.8645,  9.7796,
          9.6429, 14.0086,  6.2304,  7.7380,  5.7446, 11.9736, 11.2676,  7.2487,
         12.2320, 12.5119,  

##### And now the densenet (which we called Dnet)

In [162]:
seldon_message_proto = sc.predict(
    bin_data=pickle.dumps(input_batch), 
    deployment_name="pytorchhub-dnet-deployment",
    names=["image_features"])

unpickled_proto_response(seldon_message_proto)

<class 'NoneType'>


tensor([[-8.0496e-01, -3.9624e-01, -5.1752e-01, -1.5001e+00, -6.3708e-01,
         -1.3756e-01, -7.0189e-01,  7.5910e-01,  2.8360e-01, -4.7575e-01,
         -6.6391e-01, -7.8201e-01, -5.0137e-01, -9.0335e-01, -9.3767e-01,
         -5.0170e-01, -7.8611e-01, -4.6869e-01, -6.5123e-01, -7.0452e-01,
         -1.5152e+00, -7.4845e-01, -1.1941e+00,  1.2465e-02, -9.9113e-01,
         -6.6783e-01, -6.5150e-01, -4.8814e-01, -6.9637e-01, -6.5749e-01,
         -1.0803e+00, -8.4734e-01, -4.8819e-01, -4.5481e-01, -1.6184e-01,
         -5.2296e-01,  7.5906e-01, -2.1984e-01, -3.2606e-01,  4.7541e-01,
         -5.2860e-01, -5.5460e-01, -5.5549e-01, -2.5999e-01, -3.4499e-01,
         -3.3859e-01, -6.0929e-01, -3.0058e-01, -1.1790e+00, -8.9682e-01,
         -3.3619e-01,  5.4677e-01, -2.7706e-01, -3.4993e-01, -2.0042e-01,
         -8.8975e-01, -3.1411e-01, -1.0394e+00, -4.0769e-01, -9.2472e-02,
          8.9438e-01, -1.7260e-01, -8.9643e-02,  2.3894e-01, -4.3398e-01,
         -1.3165e-02, -2.9586e-01, -3.

### Visualise metrics and logs from components

Now that we have a few models deployed we can actually visualise the metrics of our models.

For this we'll have to access our Grafana's "Prediction Analytics" dashboard. After you run the commands below you should see something like this:

![](images/grafana-dashboard.jpg)

### What this chart shows

This chart shows all the models that you have deployed, together with the number of success responses, errors, other response codes, requests per second, latency and more. If there are metrics that you'd like to expose from your model you can do so through the SDK we provide.

### Trying it yourself

For starters you need to access the grafana dashboard, and you can do this by forwarding the grafana port by running the following command in the terminal, which should then make it accessible on `localhost:8080`

```
kubectl port-forward svc/grafana-prom 8080:80
```

We can also generate some traffic so we can see some metrics so we can actually visualise some movement:

In [None]:
import random

deployments = [
    "pytorchhub-snet-deployment",
    "pytorchhub-dnet-deployment",
    "pytorchhub-mnet-deployment"
]

while True:
    for d in deployments:
        if random.random() > 0.8:
            seldon_message_proto = sc.predict(
                bin_data=pickle.dumps(input_batch), 
                deployment_name=d,
                names=["image_features"])

### You now have a one line solution to deploy models from PyTorch Hub 😎 

## Where do you go from here?

* You can try to build more complex graphs like our [re-usable NLP pipelines with Kubeflow Example]()
* Deploy an A/B test of the models with our [Canary deployments tutorial]()
* Build a multi-armed bandit routing optimizer across various models like the one we built with an [R, SKlearn & Tensorflow models]()
* Check out the [rest of our tutorials!]()