# Summary

This demo shows how to:
* Deploy a model with Seldon and access it through an ingress gateway using the Seldon client, curl, and other methods
* Use a Canary Rollout with Seldon and Istio to split predictions across multiple models
* Protect your model endpoints behind Dex, adding a requirement for authentication to access the model through the external gateway

This demo is modified from [this tutorial](https://docs.seldon.io/projects/seldon-core/en/latest/examples/istio_canary.html)

# Setup

Bootsrap a Juju controller on a Kubernetes cluster, such as shown [here](https://juju.is/docs/olm/microk8s) using Microk8s

Deploy the Seldon and Istio charms, defining a default-gateway for Istio and providing the name of that gateway to Seldon:

In [1]:
gateway_name = "seldon-gateway"
model_name = "seldon-model"

In [None]:
!juju add-model $model_name

!juju deploy istio-gateway istio-ingressgateway --trust --kind=ingress
!juju deploy istio-pilot --trust --config default-gateway=$model_name/$gateway_name
!juju relate istio-pilot:istio-pilot istio-ingressgateway:istio-pilot

!juju deploy seldon-core --config istio-gateway=$model_name/$gateway_name

Wait for everything to deploy and settle, then get the gateway IP

In [3]:
# sudo snap install juju-wait
!juju wait -vw

DEBUG:root:istio-ingressgateway/2 workload status is waiting since 2022-02-28 19:18:05+00:00
DEBUG:root:istio-pilot/2 workload status is maintenance since 2022-02-28 19:18:07+00:00
DEBUG:root:istio-pilot/2 juju agent status is executing since 2022-02-28 19:18:06+00:00
DEBUG:root:istio-ingressgateway/2 workload status is waiting since 2022-02-28 19:18:05+00:00
DEBUG:root:istio-ingressgateway/2 workload status is waiting since 2022-02-28 19:18:12+00:00
DEBUG:root:istio-pilot/2 workload status is maintenance since 2022-02-28 19:18:28+00:00
INFO:root:All units idle since 2022-02-28 19:18:33.363252Z (dex-auth/5, istio-ingressgateway/2, istio-pilot/2, oidc-gatekeeper/1, seldon-controller-manager/1)
DEBUG:root:dex-auth is lead by dex-auth/5
DEBUG:root:istio-ingressgateway is lead by istio-ingressgateway/2
DEBUG:root:istio-pilot is lead by istio-pilot/2
DEBUG:root:oidc-gatekeeper is lead by oidc-gatekeeper/1
DEBUG:root:seldon-controller-manager is lead by seldon-controller-manager/1


In [4]:
gateway_ip=!kubectl get svc istio-ingressgateway -o yaml -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
gateway_ip = gateway_ip[0]
print(f"Our Istio ingressgateway ip: {gateway_ip}")

Our Istio ingressgateway ip: 10.64.140.43


## Helpers

In [5]:
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, "w") as f:
        f.write(cell.format(**globals()))

# Deploy a Seldon Model and Access it through an Ingress Gateway

### Define and deploy the model

Let's deploy a mock classifier provided by seldon, which will take ndarrays of data and return results.

In [6]:
seldon_deployment_name = "sd-example"

In [7]:
%%writetemplate model.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
  name: {seldon_deployment_name}
spec:
  name: example
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: seldonio/mock_classifier:1.7.0
          imagePullPolicy: IfNotPresent
          name: classifier
        terminationGracePeriodSeconds: 1
    graph:
      children: []
      endpoint:
        type: REST
      name: classifier
      type: MODEL
    name: main
    replicas: 1

In [8]:
!kubectl create -f model.yaml

seldondeployment.machinelearning.seldon.io/sd-example created


In [9]:
jsonpath="'{.items[0].metadata.name}'"
deployment_name = !kubectl get deploy -l seldon-deployment-id=$seldon_deployment_name -o jsonpath=$jsonpath
deployment_name = deployment_name[0]

In [10]:
!kubectl rollout status deploy/$deployment_name

Waiting for deployment "sd-example-main-0-classifier" rollout to finish: 0 of 1 updated replicas are available...
deployment "sd-example-main-0-classifier" successfully rolled out


### Predict using the deployed model

Seldon deploys a rest (and GRPC) endpoint to connect to for predictions.  Below are a few examples of how to get the model to make predictions.

#### Using the Seldon Client

In [11]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(
    gateway="istio",
    deployment_name=seldon_deployment_name,
    namespace=model_name,
    gateway_endpoint=gateway_ip
)


"Predict" with our model.  If we don't specify any data in the `.predict()` call, Seldon sends a single dummy value to our "model".  

If successful, the return `r.response` should look something like:
```
{'data': {'names': ['proba'], 'tensor': {'shape': [1, 1], 'values': [SOME_VALUE]}}, 'meta': {'requestPath': {'classifier': 'seldonio/mock_classifier:1.7.0'}}}
```

where `SOME_VALUE` is the returned prediction.

In [12]:
r = sc.predict(gateway="istio", transport="rest")
if r.success:
    print("Congratulations, prediction returned response:")
    print(r.response)
else:
    raise ValueError("Something went wrong - is the gateway set up correctly?")

Congratulations, prediction returned response:
{'data': {'names': ['proba'], 'tensor': {'shape': [1, 1], 'values': [0.0593137795410493]}}, 'meta': {'requestPath': {'classifier': 'seldonio/mock_classifier:1.7.0'}}}


#### Using curl

Through `curl`, you can access the classifier deployed by seldon a number of ways:
* direct to the servce:

In [13]:
jsonpath = "'{.items[0].spec.clusterIP}'"
classifier_svc_ip=!kubectl get svc -l seldon-deployment-id=$seldon_deployment_name,seldon.io/model=true -o jsonpath=$jsonpath
classifier_svc_ip = classifier_svc_ip[0]
content_type = "'Content-Type: application/json'"
data = '\'{"data": { "ndarray": [[1]]}}\''

!curl $classifier_svc_ip:9000/predict -X POST -H $content_type -d $data

{"data":{"names":["proba"],"ndarray":[[0.12823373759251927]]},"meta":{"requestPath":{"classifier":"seldonio/mock_classifier:1.7.0"}}}


* via the ingress:

In [14]:
!curl $gateway_ip/seldon/$model_name/$seldon_deployment_name/api/v1.0/predictions -X POST -H $content_type -d $data

{"data":{"names":["proba"],"ndarray":[[0.12823373759251927]]},"meta":{"requestPath":{"classifier":"seldonio/mock_classifier:1.7.0"}}}


Similarly, you could use any method that lets you hit a REST endpoint to make predictions (for example, using the `requests` package)

# Deploy a Canary Model

Sometimes it is useful to simultaneously release multiple models to be served by the same `SeldonDeployment`.  An example of this is when we are releasing a new version of the model and want to first track its performance in the wild using a small portion of our traffic.  This can be done by adding multiple predictors in our `SeldonDeployment`.  An example of this is shown below

### Define and deploy the model

In this SeldonDeployment, we define two models, `main` and `canary`, with a traffic split of 75:25, respectively.  Note that in this example both the `main` and `canary` predictors use the same model (same image, configuration, etc), but in practice these would be different models.

In [15]:
seldon_deployment_canary_name = "sd-example-canary"

In [16]:
%%writetemplate canary.yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
  name: {seldon_deployment_canary_name}
spec:
  name: canary-example
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: seldonio/mock_classifier:1.7.0
          imagePullPolicy: IfNotPresent
          name: classifier
        terminationGracePeriodSeconds: 1
    graph:
      children: []
      endpoint:
        type: REST
      name: classifier
      type: MODEL
    name: main
    replicas: 1
    traffic: 75
  - componentSpecs:
    - spec:
        containers:
        - image: seldonio/mock_classifier:1.7.0
          imagePullPolicy: IfNotPresent
          name: classifier
        terminationGracePeriodSeconds: 1
    graph:
      children: []
      endpoint:
        type: REST
      name: classifier
      type: MODEL
    name: canary
    replicas: 1
    traffic: 25

In [17]:
!kubectl create -f canary.yaml

seldondeployment.machinelearning.seldon.io/sd-example-canary created


In [18]:
jsonpath="'{.items[0].metadata.name}'"
deployment_name = !kubectl get deploy -l seldon-deployment-id=$seldon_deployment_canary_name -o jsonpath=$jsonpath
deployment_name = deployment_name[0]

In [19]:
!kubectl rollout status deploy/$deployment_name

Waiting for deployment "sd-example-canary-main-0-classifier" rollout to finish: 0 of 1 updated replicas are available...
deployment "sd-example-canary-main-0-classifier" successfully rolled out


### Predict using the deployed models, observing the load split between them

As users of the `SeldonDeployment`, we request/receive predictions exactly the same as above.  To the user there is no difference, but on the backend we can see that Seldon (using Istio) has split the load between our `main` and `canary` models.

#### Using the Seldon Client

We can predict using the `SeldonDeployment` multiple times, this way we can see how the load is split between the `main` and `canary` predictors

In [20]:
from seldon_core.seldon_client import SeldonClient

sc = SeldonClient(
    gateway="istio",
    deployment_name=seldon_deployment_canary_name,
    namespace=model_name,
    gateway_endpoint=gateway_ip
)

In [21]:
n = 100
for _ in range(n):
    r = sc.predict(gateway="istio", transport="rest")
    assert r.success, "Something went wrong - is the gateway set up correctly?"
else:
    print(f"Successfully completed {n} predictions")

Successfully completed 100 predictions


We can parse the logs of the `main` and `canary` pods to see how many times each has been hit to demonstrate the traffic split

In [22]:
jsonpath = "'{.items[0].metadata.name}'"
main_count = !kubectl logs $(kubectl get pod -lseldon-app=$seldon_deployment_canary_name-main -o jsonpath=$jsonpath) classifier | grep "root:predict" | wc -l
main_count = main_count[0]
print(f"Number of times main was hit: {main_count}")

Number of times main was hit: 76


In [23]:
jsonpath = "'{.items[0].metadata.name}'"
canary_count = !kubectl logs $(kubectl get pod -lseldon-app=$seldon_deployment_canary_name-canary -o jsonpath=$jsonpath) classifier | grep "root:predict" | wc -l
canary_count = canary_count[0]
print(f"Number of times canary was hit: {canary_count}")

Number of times canary was hit: 24


And we should see a ratio roughly equal to the 75:25 traffic split we defined above

# Accessing Models Protected behind Dex

If we protect our ingress with Dex+OIDC Gatekeeper, unauthenticated access will be rejected.  Below is a method for programatically authenticating and hitting the prediction endpoints.

## Authentication Setup

By adding the `dex-auth` and `oidc-gatekeeper` charms and setting the credentials and `public-url` configurations, we can add authentication to our existing ingress gateway

(note that this section uses information from [here](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_client.html))

In [24]:
username="admin"
password="admin"

In [None]:
!juju deploy dex-auth --trust --config static-username=$username --config static-password=$password --config public-url=http://$ingress_ip
!juju deploy oidc-gatekeeper --config public-url=http://$ingress_ip

!juju relate dex-auth istio-pilot
!juju relate dex-auth oidc-gatekeeper

!juju relate oidc-gatekeeper:ingress istio-pilot:ingress
!juju relate oidc-gatekeeper:ingress-auth istio-pilot:ingress-auth

In [26]:
!juju wait -vw

INFO:root:All units idle since 2022-02-28 19:24:51.829943Z (dex-auth/6, istio-ingressgateway/2, istio-pilot/2, oidc-gatekeeper/1, seldon-controller-manager/1)
DEBUG:root:dex-auth is lead by dex-auth/6
DEBUG:root:istio-ingressgateway is lead by istio-ingressgateway/2
DEBUG:root:istio-pilot is lead by istio-pilot/2
DEBUG:root:oidc-gatekeeper is lead by oidc-gatekeeper/1
DEBUG:root:seldon-controller-manager is lead by seldon-controller-manager/1


To authenticate and obtain a authorization cookie, we then can either:
* use the `authservice_cookie` from an authenticated browser session
* use the `kubeflow_login` helper below.  For the url passed to the helper, use any valid url that has a VirtualService passing traffic through (such as the models we are serving)

In [27]:
# Helpers for authentication

import logging
import requests
from urllib.parse import parse_qs, urlparse

def kubeflow_login(url, username=None, password=None):
    """Completes the dex/oidc login flow, returning the authservice_session cookie."""
    parsed_url = urlparse(url)
    url_base = f"{parsed_url.scheme}://{parsed_url.netloc}"
    
    data = {
        'login': username or os.getenv('KUBEFLOW_USERNAME', None),
        'password': password or os.getenv('KUBEFLOW_PASSWORD', None),
    }

    if not data['login'] or not data['password']:
        raise ValueError(
            "Missing login credentials - credentials must be passed or defined"
            " in KUBEFLOW_USERNAME/KUBEFLOW_PASSWORD environment variables."
        )

    # GET on url redirects us to the dex_login_url including state for this session
    response = requests.get(
        url,
        verify=False,
        allow_redirects=True
    )
    validate_response_status_code(response, [200], f"Failed to connect to url site '{url}'.")
    dex_login_url = response.url
    logging.debug(f"Redirected to dex_login_url of '{dex_login_url}'")
    
    # Log in, retrieving the redirection to the approval page
    response = requests.post(
        dex_login_url,
        data=data,
        verify=False,
        allow_redirects=False
    )
    validate_response_status_code(
        response, [303], f"Failed to log into dex - are your credentials correct?"
    )
    approval_endpoint = response.headers['location']
    dex_approval_url = url_base + approval_endpoint
    logging.debug(f"Logged in with dex_approval_url of '{dex_approval_url}")
    
    # Get the OIDC approval code and state
    response = requests.get(
        dex_approval_url,
        verify=False,
        allow_redirects=False
    )
    validate_response_status_code(
        response, [303], f"Failed to connect to dex_approval_url '{dex_approval_url}'."
    )
    authservice_endpoint = response.headers['location']
    authservice_url = url_base + authservice_endpoint
    logging.debug(f"Got authservice_url of '{authservice_url}'")

    
    # Access DEX OIDC path to generate session cookie
    response = requests.get(
        authservice_url,
        verify=False,
        allow_redirects=False,
    )
    validate_response_status_code(
        response, [302], f"Failed to connect to authservice_url '{authservice_url}'."
    )
    
    return response.cookies['authservice_session']
    
    
def validate_response_status_code(response, expected_codes: list, error_message: str = ""):
    """Validates the status code of a response, raising a ValueError with message"""
    if error_message:
        error_message += "  "
    if response.status_code not in expected_codes:
        raise ValueError(
            f"{error_message}"
            f"Got response {response.status_code}, expected one of {expected_codes}"
        )


In [28]:
url = f"http://{gateway_ip}/seldon/{model_name}/{seldon_deployment_canary_name}/"
authservice_cookie = kubeflow_login(url, username=username, password=password)

### Predictions using the Seldon Client

Based on [this](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_client.html) the client appears to only support an X-Auth token built-in, not credentials passed by cookie, so we can pass the cookie via a header.

In [29]:
sc = SeldonClient(
    gateway="istio",
    deployment_name=seldon_deployment_canary_name,
    namespace=model_name,
    gateway_endpoint=gateway_ip,
)

In [30]:
r = sc.predict(
    gateway="istio", 
    transport="rest", 
    headers={"Cookie": 'authservice_session=MTY0NjA3MjI5MXxOd3dBTkRRMlZFeEhXVTFVTTFaVFMwTXpUVEphVlRKSVdUSlBXVFphU0VwSFZsQkpVRmRSU0VoU04wNUhOelJKTWsxS1YweElOVkU9fKtVtvdPavl_pIIENxnXbqB7ULan_5rzk1BWf8HEZx8j'},
)

if r.success:
    print("Congratulations, prediction returned response:")
    print(r.response)
else:
    raise ValueError("Something went wrong - is the gateway set up correctly?")

Congratulations, prediction returned response:
{'data': {'names': ['proba'], 'tensor': {'shape': [1, 1], 'values': [0.08470067019190682]}}, 'meta': {'requestPath': {'classifier': 'seldonio/mock_classifier:1.7.0'}}}


### Predictions using requests

Using the cookie obtained above, predictions can also be made using packages such as `requests`:

In [31]:
import requests
cookies = {"authservice_session": authservice_cookie}
prediction_endpoint = f"http://{gateway_ip}/seldon/{model_name}/{seldon_deployment_canary_name}/api/v1.0/predictions"

In [32]:
r = requests.post(
    url=prediction_endpoint,
    cookies=cookies,
    json={"data": {"ndarray": [[1]]}},
)
print(f"response = {r.content}")

response = b'{"data":{"names":["proba"],"ndarray":[[0.12823373759251927]]},"meta":{"requestPath":{"classifier":"seldonio/mock_classifier:1.7.0"}}}\n'


### Predictions using Curl

We can also make predictions using curl:

In [33]:
jsonpath = "'{.items[0].spec.clusterIP}'"
classifier_svc_ip=!kubectl get svc -l seldon-deployment-id=$seldon_deployment_name,seldon.io/model=true -o jsonpath=$jsonpath
classifier_svc_ip = classifier_svc_ip[0]
content_type = "'Content-Type: application/json'"
authservice_cookie_curl = f"'Cookie: authservice_session={authservice_cookie}'"
data = '\'{"data": { "ndarray": [[1]]}}\''

!curl $gateway_ip/seldon/$model_name/$seldon_deployment_name/api/v1.0/predictions -X POST -H $content_type -H $authservice_cookie_curl -d $data

{"data":{"names":["proba"],"ndarray":[[0.12823373759251927]]},"meta":{"requestPath":{"classifier":"seldonio/mock_classifier:1.7.0"}}}
