# Using Google AI Platform for model Deployment

AI Platfrom Prediction service allows you to host your models in Google cloud and make them available as an API for online prediction or use them for native batch prediction in case of Tensorflow models.
In this tutorial we are going to see how it is done when using our scikit-learn model.

Using AI Platform for model deployment saves you the time to:
- package your model to a web framework like Flask
- you do not need to manage the infrastructure
- your API will autoscale based on traffic

In this notebook you will see:
- how to format your training application with Pipelines in order to make them deployable in AI Platform
- how to deploy your model
- how to create a model version
- how to get online predictions

# Data preparation for tuning

Using the same commands from the training notebook we prepare a bucket and upload our trained model there.

In [None]:
PROJECT_NAME = 'norbert-liki-sandbox'
BUCKET_NAME = 'simonyi-nl-online-serving-training'  # MODIFY THIS. IT NEEDS TO BE GLOBALLY UNIQUE

In [None]:
!gsutil mb -l EU -p $PROJECT_NAME gs://$BUCKET_NAME/

During the upload we rename our model. It must have a name of *model.pkl* or *model.joblib* if using the standard deployment process. In custom prediction routine you can define your own name.

In [None]:
!gsutil cp batch_prediction_src/trained_pipe.pkl gs://$BUCKET_NAME/model.pkl

In [None]:
!gsutil ls gs://$BUCKET_NAME/

# Create trainer application package

In order to make our scikit-learn Pipeline deployable and serveble in AI Platform we need to modify our original training application. The reason behind it is that a deployed model expects data for prediction in a form of JSON requests and the features as lists. 

What does it mean for us? 
- If you don't use any transformation on your data just build a model then there are no complications for you
- if you would like to transform your data at prediction time using Pipelines, then you need to create your initial model and training Pipeline to work on lists. (i.e. lists do not have column names) We will see how it is done.
- if you want to have custom Transformers in your Pipeline then you you should use custom prediction routines.

The caveats above does not apply for Tensorflow models because you can define different training and serving input functions in them.

We created our model with these restrictions in our mind. **In our transformation pipeline we have been using indexes not column names.**

In [None]:
num_feats = list(range(0, 9))
num_cats = [9,10]
num_transform = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
])

# Columns can be accessed with names also.
cat_feats = ['NAME_TYPE_SUITE', 'NAME_INCOME_TYPE'] 
cat_transform = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(transformers=[
    ('num', num_transform, num_feats),
    ('cat', cat_transform, num_cats)
])

# Serve the model
Once the model is successfully created and trained, you can serve it. A model can have different versions. In order to serve the model, create a model and version in AI Platform. This terminology can be confusing because an AI Platform model resource is not actually a machine-learning model on its own. In Cloud ML Engine a model is a container for the versions of the machine learning model.

Define the model and version names:

In [None]:
import time 

MODEL_NAME = "serving_training"
VERSION_NAME = "serving_training{}".format(int(time.time())); VERSION_NAME

Create the model in AI Platform:

In [None]:
!gcloud ai-platform models create $MODEL_NAME --regions europe-west1 --project norbert-liki-sandbox

Create a version that points to your model file in Cloud Storage:

In [None]:
!gcloud ai-platform versions create $VERSION_NAME \
  --model=$MODEL_NAME \
  --framework=scikit-learn \
  --origin=gs://$BUCKET_NAME/ \
  --python-version=3.7 \
  --runtime-version=1.15 \
  --project=norbert-liki-sandbox

# Making online predictions

### Format data for prediction

Before you send an online prediction request, you must format your test data to prepare it for use by the AI Platform Prediction service. Make sure that the format of your input instances matches what your model expects.

We will use a few samples from our training data for our presentation purposes.

In [None]:
import pandas as pd
import warnings
warnings.simplefilter(action='ignore')

train = pd.read_csv('gs://home-credit-simonyi-workshop/input/application_train.subsample.csv', nrows=100)

target = 'TARGET'

features = [
    'DAYS_EMPLOYED',
    'DAYS_BIRTH',
    'AMT_INCOME_TOTAL',
    'AMT_CREDIT',
    'CNT_FAM_MEMBERS',
    'AMT_ANNUITY',
    'EXT_SOURCE_1',
    'EXT_SOURCE_2',
    'EXT_SOURCE_3',
    'NAME_TYPE_SUITE', # categorical
    'NAME_INCOME_TYPE', # categorical
]

X = train.loc[:, features]
y = train.loc[:, target]

### Send the online prediction request

Using Google's Python API we can easily call our model endpoint to recieve predictions. We just need to parse our data into an appropriate format.

In [None]:
import googleapiclient.discovery


instances = X.values[:10].tolist()  # we are transforming the data here in the expected format

service = googleapiclient.discovery.build('ml', 'v1')
name = 'projects/{}/models/{}/versions/{}'.format(PROJECT_NAME, MODEL_NAME,
                                                  VERSION_NAME)

response = service.projects().predict(
    name=name,
    body={'instances': instances}
).execute()


response['predictions']


The prediction output in our case returns class assignments/labels and not probabilities since this is the default behaviour in the predict method in scikit-learn.
If we want to have probabilities instead we could overwrite the predict class of our trained pipeline with our expected answer. E.g.:

```pipe.predict = pipe.predict_proba[:,1]```

## Removing artifacts

In [None]:
!gsutil rm -r gs://$BUCKET_NAME

In [None]:
!gcloud ai-platform versions delete $VERSION_NAME \
--model $MODEL_NAME \
--quiet \
--project $PROJECT_NAME

In [None]:
!gcloud ai-platform models delete $MODEL_NAME \
--quiet \
--project $PROJECT_NAME