# TF 2.0 Dependency

This code relies on TF 2.0 which AI Platform online prediction doesn't support yet. Prediction works locally, and the model deploys, but online prediction fails because the AI Platform nodes are running TF 1.13.

**We need to re-test this notebook once TF 2.0 is supported for online prediction.** With any luck it will work without any further changes.

In [1]:
PROJECT_ID = 'vijays-sandbox'
BUCKET = 'vijays-sandbox-ml'
MODEL_PATH = 'translate_models/baseline'
MODEL_NAME = 'translate'
VERSION_NAME = 'v1'

# Deploy for Online Prediction

To get our translations we're not just calling .predict() on a single Keras model. We're using .predict() on two different models with some python code in between. On top of that we're calling each model multiple times in a for loop.

Because of this we can't just export to SavedModel and deploy to AI Platform. 

Instead we'll take advantage of [AI Platforms Custom Prediction Routines](https://cloud.google.com/ml-engine/docs/tensorflow/custom-prediction-routines) which allows us to execute custom python code in response to every online prediction request. There are 5 steps to creating a custom prediction routine:

1. Upload Model Artifacts to GCS
2. Implement Predictor interface 
3. Package the prediction code and dependencies
4. Deploy
5. Invoke API


## 1. Upload Model Artifacts to GCS

Here we upload our model weights (encoder_model.h5 and decoder_model.h5) and our tokenizer objects so that AI Platform can access them.

In [2]:
!gsutil cp $MODEL_PATH/encoder_model.h5 $MODEL_PATH/decoder_model.h5 gs://$BUCKET/translate/model/
!gsutil cp $MODEL_PATH/encoder_tokenizer.pkl $MODEL_PATH/decoder_tokenizer.pkl gs://$BUCKET/translate/model/

Copying file://translate_models/baseline/encoder_model.h5 [Content-Type=application/octet-stream]...
Copying file://translate_models/baseline/decoder_model.h5 [Content-Type=application/octet-stream]...
\
Operation completed over 2 objects/63.4 MiB.                                     
Copying file://translate_models/baseline/encoder_tokenizer.pkl [Content-Type=application/octet-stream]...
Copying file://translate_models/baseline/decoder_tokenizer.pkl [Content-Type=application/octet-stream]...
/ [2 files][652.2 KiB/652.2 KiB]                                                
Operation completed over 2 objects/652.2 KiB.                                    


## 2. Implement Predictor Interface

Interface Spec: https://cloud.google.com/ml-engine/docs/tensorflow/custom-prediction-routines#predictor-class

This tells AI Platform how to load the model artifacts, and is where we specify our custom prediction code.

In [2]:
%%writefile predictor.py
import os
import pickle
import unicodedata
import re

import numpy as np
import tensorflow as tf

import utils_preproc

MAX_TRANSLATE_LENGTH = 11 

class TranslatePredictor(object):
    def __init__(self, encoder_model, encoder_tokenizer, 
                 decoder_model, decoder_tokenizer):
      self.encoder_model = encoder_model
      self.encoder_tokenizer = encoder_tokenizer
      self.decoder_model = decoder_model
      self.decoder_tokenizer = decoder_tokenizer

    
    def _decode_sequences(self, input_seqs, output_tokenizer, max_decode_length=50):
        """
        Arguments:
        input_seqs: int tensor of shape (BATCH_SIZE,SEQ_LEN)
        output_tokenizer: keras_preprocessing.text.Tokenizer used to conver from int to words

        Returns translated sentences
        """
        # Encode the input as state vectors.
        batch_size = input_seqs.shape[0]
        states_value = self.encoder_model.predict(input_seqs)

        # Populate the first character of target sequence with the start character.
        target_seq = tf.ones([batch_size,1])

        # Sampling loop for a batch of sequences
        # (to simplify, here we assume a batch of size 1).
        decoded_sentences = [[] for _ in range(batch_size)]
        for i in range(max_decode_length):
            output_tokens, decoder_state = self.decoder_model.predict(
                [target_seq,states_value])

            # Sample a token
            sampled_token_index = np.argmax(output_tokens[:, -1, :],axis=-1)
            tokens = utils_preproc.int2word(output_tokenizer,sampled_token_index)
            for j in range (batch_size):
                decoded_sentences[j].append(tokens[j])

            # Update the target sequence (of length 1).
            target_seq = tf.expand_dims(tf.constant(sampled_token_index),axis=-1)

            # Update states
            states_value = decoder_state

        return decoded_sentences
    
    def predict(self, instances, **kwargs):
        machine_translations = self._decode_sequences(
            utils_preproc.preprocess(instances,self.encoder_tokenizer),
            self.decoder_tokenizer,
            MAX_TRANSLATE_LENGTH
        )
        return machine_translations
    

    @classmethod
    def from_path(cls, model_dir):
        encoder_model = tf.keras.models.load_model(os.path.join(model_dir,'encoder_model.h5'))
        decoder_model = tf.keras.models.load_model(os.path.join(model_dir,'decoder_model.h5'))
    
        encoder_tokenizer = pickle.load(open(os.path.join(model_dir,'encoder_tokenizer.pkl'),'rb'))
        decoder_tokenizer = pickle.load(open(os.path.join(model_dir,'decoder_tokenizer.pkl'),'rb'))

        return cls(encoder_model, encoder_tokenizer, decoder_model, decoder_tokenizer)

Overwriting predictor.py


### Test Predictor Class Works Locally

In [3]:
import predictor

sentences = [
    "No estamos comiendo.",
    "Está llegando el invierno.",
    "El invierno se acerca.",
    "Tom no comio nada.", 
    "Su pierna mala le impidió ganar la carrera.",
    "Su respuesta es erronea.",
    "¿Qué tal si damos un paseo después del almuerzo?"
]

predictor = predictor.TranslatePredictor.from_path(MODEL_PATH)
predictor.predict(sentences)

W0620 18:32:32.843926 140375193675520 hdf5_format.py:171] No training configuration found in save file: the model was *not* compiled. Compile it manually.
W0620 18:32:33.056538 140375193675520 hdf5_format.py:171] No training configuration found in save file: the model was *not* compiled. Compile it manually.


[['we', 're', 'not', 'eating', '.', '<end>', '', '', '', '', ''],
 ['winter', 'is', 'on', 'the', 'grass', '.', '<end>', '', '', '', ''],
 ['winter', 'is', 'coming', 'on', '.', '<end>', '', '', '', '', ''],
 ['tom', 'didn', 't', 'eat', 'lunch', '.', '<end>', '', '', '', ''],
 ['her', 'car', 'turned', 'red', '.', '<end>', '', '', '', '', ''],
 ['her', 'answer', 'is', 'weak', '.', '<end>', '', '', '', '', ''],
 ['how', 'far', 'is', 'a', 'secret', '?', '<end>', '', '', '', '']]

## 3. Package Predictor Class and Dependencies

We must package the predictor as a tar.gz source distribution package.

In [5]:
%%writefile setup.py
from setuptools import setup

setup(
    name='translate_custom_predict_code',
    version='0.1',
    scripts=['predictor.py','utils_preproc.py'])

Writing setup.py


In [None]:
!python setup.py sdist --formats=gztar

In [None]:
!gsutil cp dist/translate_custom_predict_code-0.1.tar.gz gs://$BUCKET/translate/predict_code/

## 4. Deploy

This is similar to how we deploy standard models to AI Platform, with a few extra command line arguments.

*Warning: If you get a GCS access error, grant the 'Storage Object Viewer' role on the bucket that contains your artifacts to the service account being used.*

In [None]:
!gcloud ai-platform models create $MODEL_NAME --regions us-central1

#Change --runtime-version to 2.0 when supported
!gcloud beta ai-platform versions create $VERSION_NAME \
  --model $MODEL_NAME \
  --runtime-version 1.13 \
  --python-version 3.5 \
  --origin gs://$BUCKET/translate/model/ \
  --package-uris gs://$BUCKET/translate/predict_code/translate_custom_predict_code-0.1.tar.gz \
  --prediction-class predictor.TranslatePredictor

## 5. Invoke API

In [None]:
import googleapiclient.discovery

instances = [
    ["El soldado actuó valientemente."], 
    ["Su pierna mala le impidió ganar la carrera."]
]

service = googleapiclient.discovery.build('ml', 'v1')
name = 'projects/{}/models/{}/versions/{}'.format(PROJECT_ID, MODEL_NAME, VERSION_NAME)

response = service.projects().predict(
    name=name,
    body={'instances': instances}
).execute()

if 'error' in response:
    raise RuntimeError(response['error'])
else:
  print(response['predictions'])