In [None]:
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/ai-platform-samples/blob/main/notebooks/samples/tensorflow/sentiment_analysis/ai_platform_sentiment_analysis.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/main/notebooks/samples/tensorflow/sentiment_analysis/ai_platform_sentiment_analysis.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
</table>

# Overview

AI Platform Online Prediction now supports custom python code in to apply custom prediction routines, in this blog post we will perform sentiment analysis using [Twitter data](https://www.kaggle.com/kazanova/sentiment140) and Transfer learning using [Pretrained Glove embeddings](https://nlp.stanford.edu/projects/glove/). This tutorial also uses the new [AI Platform Pipelines](https://cloud.google.com/ai-platform/pipelines/docs) product.


### Dataset

We use the [Twitter data](https://www.kaggle.com/kazanova/sentiment140) which is called `sentiment140` dataset. It contains 1,600,000 tweets extracted using the Twitter AI. The tweets have been annotated (0 = negative, 4 = positive) and they can be used to detect sentiment.

It contains the following 6 fields:

```
target: the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
ids: The id of the tweet ( 2087)
date: the date of the tweet (Sat May 16 23:58:44 UTC 2009)
flag: The query (lyx). If there is no query, then this value is NO_QUERY.
user: the user that tweeted (robotickilldozr)
text: the text of the tweet (Lyx is cool)
```
The official link regarding the dataset with resources about how it was generated is [here](http://%20http//help.sentiment140.com/for-students/)

### Objective

In this notebook, we show how to deploy a TensorFlow model using AI Platform  Custom Prediction Code using `sentiment140` for sentiment analysis.

### Costs 

This tutorial uses billable components of Google Cloud Platform (GCP):

* Cloud AI Platform
* Cloud Storage

Learn about [Cloud AI Platform
pricing](https://cloud.google.com/ml-engine/docs/pricing) and [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

### Set up your local development environment

**If you are using Colab or Vertex AI Workbench notebooks**, your environment already meets
all the requirements to run this notebook. You can skip this step.

**Otherwise**, make sure your environment meets this notebook's requirements.
You need the following:

* The Google Cloud SDK
* Git
* Python 3
* virtualenv
* Jupyter notebook running in a virtual environment with Python 3

The Google Cloud guide to [Setting up a Python development
environment](https://cloud.google.com/python/setup) and the [Jupyter
installation guide](https://jupyter.org/install) provide detailed instructions
for meeting these requirements. The following steps provide a condensed set of
instructions:

1. [Install and initialize the Cloud SDK.](https://cloud.google.com/sdk/docs/)

2. [Install Python 3.](https://cloud.google.com/python/setup#installing_python)

3. [Install
   virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)
   and create a virtual environment that uses Python 3.

4. Activate that environment and run `pip install jupyter` in a shell to install
   Jupyter.

5. Run `jupyter notebook` in a shell to launch Jupyter.

6. Open this notebook in the Jupyter Notebook Dashboard.

### Set up your GCP project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a GCP project.](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project.](https://cloud.google.com/billing/docs/how-to/modify-project)

3. [Enable the AI Platform APIs and Compute Engine APIs.](https://console.cloud.google.com/flows/enableapi?apiid=ml.googleapis.com,compute_component)

4. Enter your project ID in the cell below. Then run the  cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

### Authenticate your GCP account

**If you are using Vertex AI Workbench notebooks**, your environment is already
authenticated. Skip this step.

**If you are using Colab**, run the cell below and follow the instructions
when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

1. In the GCP Console, go to the [**Create service account key**
   page](https://console.cloud.google.com/apis/credentials/serviceaccountkey).

2. From the **Service account** drop-down list, select **New service account**.

3. In the **Service account name** field, enter a name.

4. From the **Role** drop-down list, select
   **Machine Learning Engine > AI Platform Admin** and
   **Storage > Storage Object Admin**.

5. Click *Create*. A JSON file that contains your key downloads to your
local environment.

6. Enter the path to your service account key as the
`GOOGLE_APPLICATION_CREDENTIALS` variable in the cell below and run the cell.

If you are running this notebook in Colab, run the following cell to authenticate your Google Cloud Platform user account

In [None]:
import sys

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

if 'google.colab' in sys.modules:
  from google.colab import auth as google_auth
  google_auth.authenticate_user()

# If you are running this notebook locally, replace the string below with the
# path to your service account key and run this cell to authenticate your GCP
# account.
else:
  %env GOOGLE_APPLICATION_CREDENTIALS ''

## PIP Install Packages and dependencies

In [None]:
!pip install -U tensorflow==1.15.* --user

In [None]:
import tensorflow as tf

print(tf.__version__)

In [None]:
import pandas as pd
import numpy as np
import os

## 1. Project Configuration

In [None]:
PROJECT_ID = '[your-project-id]' # TODO (Set to your GCP Project name)
!gcloud config set project {PROJECT_ID}

In [None]:
BUCKET_NAME = '[your-bucket-name]' # TODO (Set to your GCS Bucket name)
REGION = 'us-central1' #@param {type:"string"}

In [None]:
# Model information.
ROOT = 'ml_pipeline'
MODEL_DIR = os.path.join(ROOT,'models').replace("\\","/")
PACKAGES_DIR = os.path.join(ROOT,'packages').replace("\\","/")

In [None]:
!gsutil rm -r gs://{BUCKET_NAME}/{ROOT}

## 2. Get training data

In this step, we are going to:
1. Download Twitter data
2. Load the data to Pandas Dataframe.
3. Convert the class feature (sentiment) from string to a numeric indicator.


Data can be downloaded directly from [here](https://www.kaggle.com/kazanova/sentiment140) (https://www.kaggle.com/kazanova/sentiment140)

It is also located here: `gs://cloud-samples-data/ai-platform/sentiment_analysis/training.csv`

You can copy it by using the following command:

```
gsutil cp gs://cloud-samples-data/ai-platform/sentiment_analysis/training.csv .
```

In [None]:
!gsutil cp gs://cloud-samples-data/ai-platform/sentiment_analysis/training.csv .

### 2.1. Input data

Create a dictionary with a mapping for each label.

In [None]:
sentiment_mapping = {
    0: 'negative',
    2: 'neutral',
    4: 'positive'
}

In [None]:
df_twitter = pd.read_csv('training.csv', encoding='latin1', header=None)\
             .rename(columns={
                 0: 'sentiment',
                 1: 'id',
                 2: 'posted_at',
                 3: 'query',
                 4: 'username',
                 5: 'text'
             })[['sentiment', 'text']]

In [None]:
df_twitter['sentiment_label'] = df_twitter['sentiment'].map(sentiment_mapping)

Verify number of records

In [None]:
df_twitter['sentiment_label'].count()

### 2.2. Data processing fn

In [None]:
%%writefile preprocess.py

from tensorflow.python.keras.preprocessing import sequence
from tensorflow.keras.preprocessing import text
import re


class TextPreprocessor(object):
    def __init__(self, vocab_size, max_sequence_length):
        self._vocab_size = vocab_size
        self._max_sequence_length = max_sequence_length
        self._tokenizer = None

    def _clean_line(self, text):
        text = re.sub(r"http\S+", "", text)
        text = re.sub(r"@[A-Za-z0-9]+", "", text)
        text = re.sub(r"#[A-Za-z0-9]+", "", text)
        text = text.replace("RT","")
        text = text.lower()
        text = text.strip()
        return text
    
    def fit(self, text_list):        
        # Create vocabulary from input corpus.
        text_list_cleaned = [self._clean_line(txt) for txt in text_list]
        tokenizer = text.Tokenizer(num_words=self._vocab_size)
        tokenizer.fit_on_texts(text_list)
        self._tokenizer = tokenizer

    def transform(self, text_list):        
        # Transform text to sequence of integers
        text_list = [self._clean_line(txt) for txt in text_list]
        text_sequence = self._tokenizer.texts_to_sequences(text_list)

        # Fix sequence length to max value. Sequences shorter than the length are
        # padded in the beginning and sequences longer are truncated
        # at the beginning.
        padded_text_sequence = sequence.pad_sequences(
          text_sequence, maxlen=self._max_sequence_length)
        return padded_text_sequence

Some small test:

In [None]:
from preprocess import TextPreprocessor

processor = TextPreprocessor(5, 5)
processor.fit(['hello Google Cloud AI Platform','test'])
processor.transform(['hello Google Cloud AI Platform',"lol"])

### 2.3.Data preparation

Text preprocessor

In [None]:
CLASSES = {'negative': 0, 'positive': 1}  # label-to-int mapping
VOCAB_SIZE = 25000  # Limit on the number vocabulary size used for tokenization
MAX_SEQUENCE_LENGTH = 50  # Sentences will be truncated/padded to this length

In [None]:
from preprocess import TextPreprocessor
from sklearn.model_selection import train_test_split

sents = df_twitter.text
labels = np.array(df_twitter.sentiment_label.map(CLASSES))

# Train and test split
X, _, y, _ = train_test_split(sents, labels, test_size=0.1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

# Create vocabulary from training corpus.
processor = TextPreprocessor(VOCAB_SIZE, MAX_SEQUENCE_LENGTH)
processor.fit(X_train)

# Preprocess the data
train_texts_vectorized = processor.transform(X_train)
eval_texts_vectorized = processor.transform(X_test)

In [None]:
import pickle

with open('./processor_state.pkl', 'wb') as f:
    pickle.dump(processor, f)

## 3. Model

Create a TensorFlow model

In [None]:
# Hyperparameters

LEARNING_RATE = .001
EMBEDDING_DIM = 50
FILTERS = 64
DROPOUT_RATE = 0.5
POOL_SIZE = 3
NUM_EPOCH = 25
BATCH_SIZE = 128
KERNEL_SIZES = [2, 5, 8]

### 3.1. Basic model

In [None]:
def create_model(vocab_size, embedding_dim, filters, kernel_sizes, dropout_rate, pool_size, embedding_matrix):
    
    # Input layer
    model_input = tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')

    # Embedding layer
    z = tf.keras.layers.Embedding(
        input_dim=vocab_size + 1,
        output_dim=embedding_dim,
        input_length=MAX_SEQUENCE_LENGTH,
        weights=[embedding_matrix]
    )(model_input)

    z = tf.keras.layers.Dropout(dropout_rate)(z)

    # Convolutional block
    conv_blocks = []
    for kernel_size in kernel_sizes:
        conv = tf.keras.layers.Convolution1D(
            filters=filters,
            kernel_size=kernel_size,
            padding='valid',
            activation='relu',
            bias_initializer='random_uniform',
            strides=1)(z)
        conv = tf.keras.layers.MaxPooling1D(pool_size=2)(conv)
        conv = tf.keras.layers.Flatten()(conv)
        conv_blocks.append(conv)
        
    z = tf.keras.layers.Concatenate()(conv_blocks) if len(conv_blocks) > 1 else conv_blocks[0]

    z = tf.keras.layers.Dropout(dropout_rate)(z)
    z = tf.keras.layers.Dense(100, activation='relu')(z)
    model_output = tf.keras.layers.Dense(1, activation='sigmoid')(z)
    model = tf.keras.models.Model(model_input, model_output)
    
    return model

### 3.2. Pretrained Glove embeddings

Embeddings can be downloaded from Stanford Glove project: https://nlp.stanford.edu/projects/glove/
- Download file [here](http://nlp.stanford.edu/data/glove.twitter.27B.zip) (http://nlp.stanford.edu/data/glove.twitter.27B.zip)
- Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 25d, 50d, 100d, & 200d vectors, 1.42 GB download)

It is also located here: `gs://cloud-samples-data/ai-platform/sentiment_analysis/glove.twitter.27B.50d.txt`

You can copy it by using the following command:

```
gsutil cp gs://cloud-samples-data/ai-platform/sentiment_analysis/glove.twitter.27B.50d.txt .
```

In [None]:
!gsutil cp gs://cloud-samples-data/ai-platform/sentiment_analysis/glove.twitter.27B.50d.txt .

Create an embedding index

In [None]:
def get_coefs(word, *arr):
    return word, np.asarray(arr, dtype='float32')

In [None]:
embeddings_index = dict(get_coefs(*o.strip().split()) for o in open('glove.twitter.27B.50d.txt','r', encoding='utf8'))                                                                                                                                    

In [None]:
word_index = processor._tokenizer.word_index
nb_words = min(VOCAB_SIZE, len(word_index))
embedding_matrix = np.zeros((nb_words + 1, EMBEDDING_DIM))

for word, i in word_index.items():
    if i >= VOCAB_SIZE: continue
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None: embedding_matrix[i] = embedding_vector

### 3.3. Create, compile and train TensorFlow model

In [None]:
model = create_model(VOCAB_SIZE, EMBEDDING_DIM, FILTERS, KERNEL_SIZES, DROPOUT_RATE,POOL_SIZE, embedding_matrix)

In [None]:
# Compile model with learning parameters.

optimizer = tf.keras.optimizers.Nadam(lr=0.001)
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['acc'])

In [None]:
#Keras train

history = model.fit(
    train_texts_vectorized, 
    y_train, 
    epochs=NUM_EPOCH, 
    batch_size=BATCH_SIZE,
    validation_data=(eval_texts_vectorized, y_test),
    verbose=2,
    callbacks=[
        tf.keras.callbacks.ReduceLROnPlateau(
            monitor='val_acc',
            min_delta=0.005,
            patience=3,
            factor=0.5),
        tf.keras.callbacks.EarlyStopping(
            monitor='val_loss',
            min_delta=0.005, 
            patience=5, 
            verbose=0, 
            mode='auto'
        ),
        tf.keras.callbacks.History()
    ]
)

In [None]:
with open('history.pkl','wb') as file:
    pickle.dump(history.history,file)

In [None]:
model.save('keras_saved_model.h5')

## 4. Deployment

### 4.1. Prepare custom model prediction

In [None]:
%%writefile custom_prediction.py

import os
import pickle
import numpy as np

from datetime import date
from google.cloud import logging

import tensorflow.keras as keras


class CustomModelPrediction(object):   
    def __init__(self, model, processor): 
        self._model = model
        self._processor = processor       
    
    def _postprocess(self, predictions):
        labels = ['negative', 'positive']
        return [
            {
                "label":labels[int(np.round(prediction))],
                "score":float(np.round(prediction, 4))
            } for prediction in predictions]
    
    def predict(self, instances, **kwargs):       
        preprocessed_data = self._processor.transform(instances)
        predictions =  self._model.predict(preprocessed_data)
        labels = self._postprocess(predictions)
        return labels

    @classmethod
    def from_path(cls, model_dir):       
        model = keras.models.load_model(
          os.path.join(model_dir,'keras_saved_model.h5'))
        with open(os.path.join(model_dir, 'processor_state.pkl'), 'rb') as f:
            processor = pickle.load(f)           
        return cls(model, processor)

Testing custom prediction locally

In [None]:
from custom_prediction import CustomModelPrediction
classifier = CustomModelPrediction.from_path('.')

In [None]:
requests = (['God I hate the north', 'god I love this'])
response = classifier.predict(requests)
response

### 4.2. Package it

In [None]:
%%writefile setup.py

from setuptools import setup

setup(
  name='tweet_sentiment_classifier',
  version='0.1',
  include_package_data=True,
  scripts=['preprocess.py', 'custom_prediction.py']
)

Wrap it up and copy to GCP

In [None]:
!python setup.py sdist
!gsutil cp ./dist/tweet_sentiment_classifier-0.1.tar.gz gs://{BUCKET_NAME}/{PACKAGES_DIR}/tweet_sentiment_classifier-0.1.tar.gz

In [None]:
!gsutil cp keras_saved_model.h5 gs://{BUCKET_NAME}/{MODEL_DIR}/
!gsutil cp processor_state.pkl gs://{BUCKET_NAME}/{MODEL_DIR}/

## 5. Create model and version

In [None]:
MODEL_NAME='twitter_model_custom_prediction'
MODEL_VERSION='v1'

RUNTIME_VERSION='1.15'
PYTHON_VERSION='3.7'

In [None]:
!gcloud beta ai-platform models create {MODEL_NAME} --regions {REGION}  --enable-logging  --enable-console-logging

In [None]:
!gcloud ai-platform versions delete {MODEL_VERSION} --model {MODEL_NAME} --quiet

In [None]:
!gcloud beta ai-platform versions create {MODEL_VERSION} \
--model {MODEL_NAME} \
--origin gs://{BUCKET_NAME}/{MODEL_DIR} \
--python-version {PYTHON_VERSION} \
--runtime-version {RUNTIME_VERSION} \
--package-uris gs://{BUCKET_NAME}/{PACKAGES_DIR}/tweet_sentiment_classifier-0.1.tar.gz \
--prediction-class=custom_prediction.CustomModelPrediction

## 6. Testing

In [None]:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import json

In [None]:
requests = [
    'god this episode is bad',
    'meh, I kinda like it',
    'what were the writer thinking, omg!',
    'omg! what a twist, who would\'ve though :o!',
    'woohoow, sansa for the win!'
]

In [None]:
# JSON format the requests
request_data = {'instances': requests}

## Authenticate and call AI Plaform prediction API 

In [None]:
%%time

api = discovery.build('ml', 'v1')
parent = 'projects/{}/models/{}/versions/{}'.format(PROJECT_ID, MODEL_NAME, MODEL_VERSION)
parent = 'projects/{}/models/{}'.format(PROJECT_ID, MODEL_NAME)
response = api.projects().predict(body=request_data, name=parent).execute()

In [None]:
response['predictions']

In [None]:
# Delete model version resource
! gcloud ai-platform versions delete {MODEL_VERSION} --model {MODEL_NAME} --quiet

# Delete model resource
! gcloud ai-platform models delete {MODEL_NAME} --quiet

## 7. Deploy using AI Platform Pipelines

With AI Platform Pipelines, you can orchestrate your machine learning (ML) workflows as reusable and reproducible pipelines. AI Platform Pipelines saves you the difficulty of setting up Kubeflow Pipelines with TensorFlow Extended on Google Kubernetes Engine.


Install the KubeFlow Pipelines SDK

In [None]:
!pip install 'kfp>=0.1.31' --user

Import dependencies

In [None]:
import json

import kfp
import kfp.components as comp
import kfp.dsl as dsl

import pandas as pd

import time

## Create a Hosted AI Platform Pipeline

Create a new Hosted KubeFlow pipeline under AI Platform -> Pipelines.

Set up you AI Platform Pipeline as indicated [here](https://cloud.google.com/ai-platform/pipelines/docs/setting-up)

**Note:** Verify you are using version 0.2.5 and above.
More information [here](https://www.kubeflow.org/docs/pipelines/overview/pipelines-overview/)

## Seting up credentials

If you run pipelines that requires calling any GCP services, such as Cloud Storage, Cloud ML Engine, Dataflow, or Dataproc, you need to set the application default credential to a pipeline step by mounting the proper GCP service account token as a Kubernetes secret.

[Documentation here](https://github.com/kubeflow/pipelines/blob/master/manifests/gcp_marketplace/guide.md#gcp-service-account-credentials)

## Train and deploy the model

In [None]:
# Project parameters.
CLUSTER='' # TODO Change to your GKE cluster
ZONE='us-central1-a'

# Pipeline Parameters

MODEL_NAME = 'sentiment_classifier' + str(int(time.time()))
MODEL_VERSION = 'v1' + str(int(time.time()))
RUNTIME_VERSION = '1.15'
PYTHON_VERSION='3.7'

PACKAGE_TRAINER_URI = 'gs://cloud-samples-data/ai-platform/sentiment_analysis/trainer-0.1.tar.gz'
PACKAGE_CUSTOM_PREDICTION_URI = 'gs://cloud-samples-data/ai-platform/sentiment_analysis/custom_prediction-0.1.tar.gz'
PACKAGE_URIS = json.dumps([PACKAGE_TRAINER_URI])
PACKAGE_PATH='./trainer'
PYTHON_MODULE = 'trainer.task'

TRAINING_FILE='gs://cloud-samples-data/ai-platform/sentiment_analysis/training.csv'.format(BUCKET_NAME)
GLOVE_FILE='gs://cloud-samples-data/ai-platform/sentiment_analysis/glove.twitter.27B.50d.txt'.format(BUCKET_NAME)

MODEL_DIR='gs://{}/models'.format(BUCKET_NAME)

SAVED_MODEL_NAME='keras_saved_model.h5'
PROCESSOR_STATE_FILE='processor_state.pkl'


PIPELINE_NAME = 'Text Prediction'
PIPELINE_FILENAME_PREFIX = 'twitter'
PIPELINE_DESCRIPTION = 'Text Prediction'


# Note, numeric parameters should be pass as string.
TRAINER_ARGS = json.dumps(['--train-file', TRAINING_FILE,
                            '--glove-file', GLOVE_FILE,
                            '--learning-rate', '0.001',
                            '--embedding-dim', '50',
                            '--num-epochs', '25',
                            '--filter-size', '64',
                            '--batch-size', '128',
                            '--vocab-size', '25000',
                            '--pool-size', '3',
                            '--max-sequence-length', '50',
                            '--saved-model', SAVED_MODEL_NAME,
                            '--preprocessor-state-file', PROCESSOR_STATE_FILE,
                            '--gcs-bucket', BUCKET_NAME,
                            '--deploy-gcp']
                       )

## Train the model

In [None]:
aiplatform_train_op = comp.load_component_from_url(
    'https://raw.githubusercontent.com/kubeflow/pipelines/master/components/gcp/ml_engine/train/component.yaml')

def train(project_id,  trainer_args, package_uris, job_dir, region, python_module, python_version, runtime_version):
    return aiplatform_train_op(
        project_id=project_id,
        python_module=python_module,
        python_version=python_version,
        package_uris=package_uris,
        region=region,
        args=trainer_args,
        job_dir=job_dir,
        runtime_version=runtime_version
    )

## Deploy the model

In [None]:
aiplatform_deploy_op = comp.load_component_from_url(
    'https://raw.githubusercontent.com/kubeflow/pipelines/master/components/gcp/ml_engine/deploy/component.yaml')

def deploy(project_id, model_uri, model_id, model_version, runtime_version, python_version, version):
    return aiplatform_deploy_op(
        model_uri=model_uri,
        project_id=project_id, 
        model_id=model_id, 
        version_id=model_version, 
        runtime_version=runtime_version,
        python_version=python_version,
        version=version,
        replace_existing_version=True,       
        set_default=True)

In [None]:
@dsl.pipeline(
    name=PIPELINE_NAME,
    description=PIPELINE_DESCRIPTION
)
def pipeline(project_id=PROJECT_ID, 
             python_module=PYTHON_MODULE,
             region=REGION,
             runtime_version=RUNTIME_VERSION, 
             package_uris=PACKAGE_URIS,
             python_version=PYTHON_VERSION,
             job_dir=MODEL_DIR,):
    train_task = train(project_id,
                       TRAINER_ARGS,
                       package_uris,
                       job_dir,                      
                       region,
                       python_module,
                       python_version,
                       runtime_version)
    
    deploy_task = deploy(project_id,
                         train_task.outputs['job_dir'],                         
                         MODEL_NAME,
                         MODEL_VERSION,
                         runtime_version,
                         python_version,
                         {                           
                          "deploymentUri": 'gs://news-ml/models', 
                          "packageUris": [PACKAGE_CUSTOM_PREDICTION_URI, PACKAGE_TRAINER_URI],
                          "predictionClass": 'model_prediction.CustomModelPrediction'
                         }
                        )    
    return True

# Reference for invocation later
pipeline_func = pipeline

## Run the KFP pipeline

In [None]:
!kubectl get secrets

In [None]:
!gcloud container clusters get-credentials "$CLUSTER" --zone "$ZONE" --project "$PROJECT_ID"

Obtain the `KFP_HOST` variable from the AI Platform Managed pipelines screen in Google Cloud Console.

In [None]:
KFP_HOST = ''
pipeline = kfp.Client(host=KFP_HOST).create_run_from_pipeline_func(pipeline, arguments={})

In [None]:
pipeline.wait_for_run_completion(timeout=1800)

**References:** 

- [AI in Depth: Serving a PyTorch text classifier on AI Platform Serving using custom online prediction](https://cloud.google.com/blog/products/ai-machine-learning/ai-in-depth-serving-a-pytorch-text-classifier-on-ai-platform-serving-using-custom-online-prediction)
- [Game of Thrones Twitter Sentiment with Google Cloud Platform and Keras blog post](https://towardsdatascience.com/game-of-thrones-twitter-sentiment-with-keras-apache-beam-bigquery-and-pubsub-382a770f6583)
- [AI Platform Pipelines](https://github.com/kubeflow/pipelines/blob/master/samples/core/ai_platform/ai_platform.ipynb)