# Deploying the Flight Delay Model

In this notebook, we deploy the model we trained to predict flight delays, using [Kubeflow Serving](https://www.kubeflow.org/docs/components/serving/kfserving/).

**Note** this notebook requires access to a KFServing installation. See the [KFServing instructions](../kfserving.md) for details. If running the pipeline on the Kubeflow Pipelines runtime, also see the [readme instructions](../README.md) for the link to install KFP.

#### Import required modules

Import and configure the required modules.

In [None]:
! pip install -q kfserving

In [None]:
import os
import numpy as np
import requests
# minio is part of kfserving 
from minio import Minio

### Upload the model to object storage

Our notebook has access to the trained model file, which was exported by the previous pipeline phase. _However_, when using a Kubeflow Pipelines runtime, it is not possible to programatically access the object storage bucket. It also makes execution mechanics different between local and KFP execution mode.

So, here we will use a dedicated bucket for models in object storage, and upload it from the notebook execution environment. We will then deploy the KFServing inference service using that object storage location.

In [None]:
# set up the minio client to access object storage buckets
os_url = os.environ.get('OS_URL', 'minio-service:9000')
access_key = os.environ.get('ACCESS_KEY_ID', 'minio')
secret_key = os.environ.get('SECRET_ACCESS_KEY', 'minio123')

mc = Minio(os_url,
           access_key=access_key,
           secret_key=secret_key,
           secure=False)

print('Current buckets:')
for b in mc.list_buckets():
    print('  ' + b.name)

In [None]:
# create a bucket to upload the model file to
# NOTE we delete any existing model file and re-create the bucket
model_bucket = os.environ.get('MODEL_BUCKET', 'models')
model_dir = os.environ.get('MODEL_DIR', 'models')
model_file = 'model.joblib'
model_path = '{}/{}'.format(model_dir, model_file)

mc.remove_object(model_bucket, model_file)
mc.remove_bucket(model_bucket)
mc.make_bucket(model_bucket)

In [None]:
# upload the model file
file_stat = os.stat(model_path)
with open(model_path, 'rb') as data:
    mc.put_object(model_bucket, model_file, data, file_stat.st_size)

In [None]:
# check whether the model file is there
for o in mc.list_objects('models'):
    print(o)

### Create the inference service

Next, we use the KFServing Python client to create the inference service.

**Note** the prerequisites (see the [KF Serving instructions](../kfserving.md)):
1. A service account and related secret for the object storage service
1. Specify the custom `sklearnserver` Docker image
1. Patch the KFP `pipeline-runner` service account role to allow creating a KFServing `inferenceservice`

In [None]:
from kubernetes import client

from kfserving import KFServingClient
from kfserving import constants
from kfserving import utils
from kfserving import V1alpha2EndpointSpec
from kfserving import V1alpha2PredictorSpec
from kfserving import V1alpha2SKLearnSpec
from kfserving import V1alpha2InferenceServiceSpec
from kfserving import V1alpha2InferenceService
from kubernetes.client import V1ResourceRequirements

In [None]:
KFServing = KFServingClient()

In [None]:
# we need to use the 'kubeflow' namespace so that the KFP runner can create the inference service
namespace = 'kubeflow'
# this is the service account created for S3 access credentials
service_acc = 'kfserving-sa'
model_storage_uri = 's3://{}'.format(model_bucket)
model_name = 'flight-model'

In [None]:
api_version = constants.KFSERVING_GROUP + '/' + constants.KFSERVING_VERSION
default_endpoint_spec = V1alpha2EndpointSpec(
    predictor=V1alpha2PredictorSpec(
        sklearn=V1alpha2SKLearnSpec(
            storage_uri=model_storage_uri,
            resources=V1ResourceRequirements(
                requests={'cpu':'100m','memory':'1Gi'},
                limits={'cpu':'100m', 'memory':'1Gi'}
            )
        ),
        service_account_name=service_acc
    )
)
    
isvc = V1alpha2InferenceService(api_version=api_version,
                                kind=constants.KFSERVING_KIND,
                                metadata=client.V1ObjectMeta(
                                    name=model_name, namespace=namespace),
                                spec=V1alpha2InferenceServiceSpec(default=default_endpoint_spec))
KFServing.create(isvc)

In [None]:
# Wait for the inference service to be ready
KFServing.get(model_name, namespace=namespace, watch=True, timeout_seconds=120)

### Test the inference service

Once the inference service is running and available, we can send some test data to the service.

**Note** that when deployed into KFP, we need to use the cluster-local url for the model. When executing locally, we assume that port-forwarding is enabled to allow access to the ingress gateway.

In [None]:
service = KFServing.get(model_name, namespace=namespace)

In [None]:
# load the 10 example rows from our test data, and display a few rows
examples = np.load('data/test_rows.npy')
examples[:3]

In [None]:
model_mode = os.environ.get('MODEL_MODE', 'local')
model_data = {"instances": examples.tolist()}
if model_mode == 'local':
    # executing locally, use the ingress gateway (we assume port-forwarding) 
    url = f'http://localhost:8080/v1/models/{model_name}:predict'
    service_hostname = '{}.{}.example.com'.format(model_name, namespace)
    headers = {'Host': service_hostname}
    resp = requests.post(url=url, json=model_data, headers=headers)
else:
    # we are executing in KFP, use the cluster-local address
    url = service['status']['address']['url']
    resp = requests.post(url=url, json=model_data)

resp.json()

### Delete the model service

Once we are done, we clean up the service.

In [None]:
KFServing.delete(model_name, namespace=namespace)

### Authors
This notebook was created by the [Center for Open-Source Data & AI Technologies](http://codait.org).

Copyright © 2019 IBM. This notebook and its source code are released under the terms of the MIT License.