# Example First Deployment of a Model on Seldon

This demo will show you how to describe runtime inference graphs to deploy on seldon-core. We will take advantage of pre-wrapped dummy models that are ready to be managed by seldon-core. The [next notebook](https://github.com/SeldonIO/seldon-core-launcher/blob/master/seldon-core/getting_started/wrap-model/wrap_model.ipynb) will show you how to wrap your own runtime models for seldon.

## Prerequistes

 * You have a running cluster installed via the Google Marketplace with all the defaults including:
    * NodePort for the Seldon API OAuth Gateway. This gateway is used to connect your business apps to your running models via REST and gRPC.
    * The cluster is running in the default namespace
 * You have kubectl installed and authenticated with the cluster
 
 You will need to install some software for this demo:
 
 
 * Install the [requests library](http://docs.python-requests.org/en/master/) to allow you to make REST calls to the Seldon API gateway.
 * Install [python grpc tools](https://grpc.io/docs/quickstart/python.html) to allow you to make gRPC calls to the Seldon API gateway.
 * Install [graphviz](https://pypi.org/project/graphviz/) a package to display graphs.


## Set up REST and gRPC methods

**Ensure you port forward the seldon api-server REST and GRPC ports**, do this in separate terminals outside the notebook:

REST:
```
kubectl port-forward $(kubectl get pods -l app=seldon-apiserver-container-app -o jsonpath='{.items[0].metadata.name}') 8002:8080
```

GRPC:
```
kubectl port-forward $(kubectl get pods -l app=seldon-apiserver-container-app -o jsonpath='{.items[0].metadata.name}') 8003:5000
```

In [None]:
import requests
from requests.auth import HTTPBasicAuth
from proto import prediction_pb2
from proto import prediction_pb2_grpc
import grpc
try:
    from commands import getoutput # python 2
except ImportError:
    from subprocess import getoutput # python 3

API_HTTP="localhost:8002"
API_GRPC="localhost:8003"

def get_token():
    payload = {'grant_type': 'client_credentials'}
    response = requests.post(
                "http://"+API_HTTP+"/oauth/token",
                auth=HTTPBasicAuth('oauth-key', 'oauth-secret'),
                data=payload)
    print(response.text)
    token =  response.json()["access_token"]
    return token

def rest_request():
    token = get_token()
    headers = {'Authorization': 'Bearer '+token}
    payload = {"data":{"names":["a","b"],"tensor":{"shape":[2,2],"values":[0,0,1,1]}}}
    response = requests.post(
                "http://"+API_HTTP+"/api/v0.1/predictions",
                headers=headers,
                json=payload)
    print(response.text)
    
def grpc_request():
    token = get_token()
    datadef = prediction_pb2.DefaultData(
            names = ["a","b"],
            tensor = prediction_pb2.Tensor(
                shape = [3,2],
                values = [1.0,1.0,2.0,3.0,4.0,5.0]
                )
            )
    request = prediction_pb2.SeldonMessage(data = datadef)
    channel = grpc.insecure_channel(API_GRPC)
    stub = prediction_pb2_grpc.SeldonStub(channel)
    metadata = [('oauth_token', token)]
    response = stub.Predict(request=request,metadata=metadata)
    print(response)


## An Example Model
A simple example is shown below we use a single prepacked model for illustration. The spec contains a set of predictors each of which contains a ***componentSpecs*** which is a list of Kubernetes [PodTemplateSpec](https://kubernetes.io/docs/api-reference/v1.9/#podtemplatespec-v1-core) alongside a ***graph*** which describes how components fit together.

In [None]:
from visualizer import get_graph
get_graph("model.json")

The simple graph we will deploy has a single model "classifier" which is exposed via REST. The actual JSON for this representation is shown below:

In [None]:
!pygmentize ./model.json

Note your PodTemplateSpec inside componentSpecs has in this case kubernetes resource requests for 1 Mb of memory. You can also add volumes, and other requirements for your model to run successfully.

## Create Seldon Deployment

Deploy the runtime graph to kubernetes.

In [None]:
!kubectl apply -f ./model.json

In [None]:
!kubectl get seldondeployments

Get the status of the SeldonDeployment. **When ready the replicasAvailable should be 1 for all components**.

In [None]:
!kubectl get seldondeployments seldon-deployment-example -o jsonpath='{.status}'

## Get predictions

#### REST Request
We will get an OAuth token using the key and secret we specified in the graph above and then call the REST endpoint of the API gateway with some random data.

In [None]:
rest_request()

#### gRPC Request
We will an OAuth token using the key and secret we specified in the graph above and then call the gRPC endpoint of the API gateway with some random data.

In [None]:
grpc_request()

## Update deployment to AB test

We will change the deployment to add an AB test deployment. This illustrates:

 * Updating a deployment with no downtime.
 * Creating a more complex runtime graph.
 

In [None]:
get_graph("random_ab_test.json")

As can be seen above we now have 2 models both serving REST requests and we have attached them behind a builtin component of Seldon which provides a random AB test router to split traffic between the two models.

In [None]:
!pygmentize ./random_ab_test.json

In [None]:
!kubectl apply -f random_ab_test.json

Check the status of the deployments. Note: **Might need to run several times until replicasAvailable is 1 for both predictors**.

In [None]:
!kubectl get seldondeployments seldon-deployment-example -o jsonpath='{.status}'

In [None]:
!kubectl get pods

#### REST Request
We should not have updated our deployment with the new model while the previous one will still be running. We can test again using the REST API. Note the "routing" meta data returned will show which route our request travelled, either "0" or "1" for the two runtime graphs we have created with separate models.

In [None]:
rest_request()

#### gRPC request

In [None]:
grpc_request()

## Tear Down

In [None]:
!kubectl delete -f random_ab_test.json

## Next Steps

Congratulations, you have deployed a model and interacted with it over REST and gRPC. You have then deployed an AB test between two models to update your runtime graph with no downtime.

For the next steps you will want to [wrap your own machine learning models to deploy onto seldon-core](https://github.com/SeldonIO/seldon-core-launcher/blob/master/seldon-core/getting_started/wrap-model/wrap_model.ipynb).