# Text classification: Classify reviews of imdb

**BentoML makes moving trained ML models to production easy:**

* Package models trained with **any ML framework** and reproduce them for model serving in production
* **Deploy anywhere** for online API serving or offline batch serving
* High-Performance API model server with *adaptive micro-batching* support
* Central hub for managing models and deployment process via Web UI and APIs
* Modular and flexible design making it *adaptable to your infrastrcuture*

BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way.


![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=tensorflow&ea=imdb_text_classification&dt=imdb_text_classification)

In [1]:
!pip install -q bentoml tensorflow

You should consider upgrading via the '/home/ruhan/work_env/bin/python -m pip install --upgrade pip' command.[0m


In [1]:
import tensorflow as tf
from tensorflow import keras

In [2]:
# constant variables
MAX_WORDS = 10000
REVIEW_CLASSES = ['negative', 'positive']

In [4]:
## download dataset from keras.

# 10000 high-frequency vocabulary
(_X_train, _y_train), (_X_test, _y_test) = keras.datasets.imdb.load_data(num_words=MAX_WORDS)

In [5]:
## check the data
print("X_train shape: {}\ny_train shape:{}".format(_X_train.shape, _y_train.shape))
print(type(_X_train.shape))
_X_train[:2]

X_train shape: (25000,)
y_train shape:(25000,)
<class 'tuple'>


array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),
       list([1, 194, 1153, 194, 8255, 78, 228,

## Reverse Word Index

In [6]:
# word_index[<str>] = <int>
word_index = tf.keras.datasets.imdb.get_word_index()

word_index = {k:(v+3) for k,v in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2  
word_index["<UNUSED>"] = 3

# word_index.items  <str> to <int>
# reverse_word_index <int> to <str>
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])


def decode_review(text):
    return ' '.join([reverse_word_index.get(i, '#') for i in text])

# <str> to <int>
def encode_review(text):
    words = text.split(' ')
    ids = [word_index["<START>"]]
    for w in words:
        v = word_index.get(w, word_index["<UNK>"])
        # >1000, signed as <UNUSED>
        if v > MAX_WORDS:
            v = word_index["<UNUSED>"]
        ids.append(v)
    return ids    

## Word Embeddings

In [7]:
X_train = keras.preprocessing.sequence.pad_sequences(_X_train,
                                                     dtype='int32',
                                                        value=word_index["<PAD>"],
                                                        padding='post',
                                                        maxlen=256)

X_test = keras.preprocessing.sequence.pad_sequences(_X_test,
                                                    dtype='int32',
                                                       value=word_index["<PAD>"],
                                                       padding='post',
                                                       maxlen=256)


# classification. convert y to 2 dims 
y_train = tf.one_hot(_y_train, depth=2)
y_test = tf.one_hot(_y_test, depth=2)


print("X: ", X_train.shape, X_train.dtype, X_test.dtype)
#print("y: ", y_train.shape, y_train[:2])

X:  (25000, 256) int32 int32


In [8]:
# model setting
model = tf.keras.Sequential([
            tf.keras.layers.Embedding(10000, 8),
            tf.keras.layers.GlobalAvgPool1D(),
            tf.keras.layers.Dense(6, activation="relu"),
            tf.keras.layers.Dense(2, activation="sigmoid"),
        ])


model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

## Train the Model

In [9]:
model.fit(X_train, y_train, epochs=30, batch_size=512)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x7f98b37567f0>

In [10]:
# check the test datasets
model.evaluate(X_test, y_test)



[0.2874763607978821, 0.8834400177001953]

In [13]:
# define a predict function for production
def predict(texts):
    # your verifing code here
    X = [encode_review(t) for t in texts]
    X = keras.preprocessing.sequence.pad_sequences(X,
                                                   dtype="int32",
                                                   value=word_index["<PAD>"],
                                                   padding='post',
                                                   maxlen=256)
    y = model(X)
    return [REVIEW_CLASSES[c] for c in tf.argmax(y, axis=1).numpy().tolist()]

predict(['it is funfunnyny.', 'just so good', 'oh, bad'])

['positive', 'positive', 'negative']

## Create BentoService class

In [14]:
%%writefile tensorflow_text_classification.py

import bentoml
import tensorflow as tf
from tensorflow import keras

from bentoml.artifact import TensorflowSavedModelArtifact
from bentoml.adapters import JsonInput



REVIEW_CLASSES = ['negative', 'positive']

MAX_WORDS = 10000
word_index = tf.keras.datasets.imdb.get_word_index()
word_index = {k:(v+3) for k,v in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2  # unknown
word_index["<UNUSED>"] = 3

# tf.keras.models.load_model("imdb_model/imdb")
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

def encode_review(text):
    words = text.split(' ')
    ids = [word_index["<START>"]]
    for w in words:
        v = word_index.get(w, word_index["<UNK>"])
        # >1000, signed as <UNseED>
        if v > MAX_WORDS:
            v = word_index["<UNUSED>"]
        ids.append(v)
    return ids


@bentoml.env(pip_dependencies=['tensorflow'])
@bentoml.artifacts([TensorflowSavedModelArtifact('model')])
class ImdbTensorflow(bentoml.BentoService):

    @bentoml.api(input=JsonInput(), batch=True)
    def predict(self, texts):
        X = [encode_review(t) for t in texts]
        X = keras.preprocessing.sequence.pad_sequences(X,
                                                       dtype="float32",
                                                       value=word_index["<PAD>"],
                                                       padding='post',
                                                       maxlen=256)
        y = self.artifacts.model(X)
        return [REVIEW_CLASSES[c] for c in tf.argmax(y, axis=1).numpy().tolist()]

Overwriting tensorflow_text_classification.py


In [15]:
import tensorflow_text_classification

service = tensorflow_text_classification.ImdbTensorflow()

service.pack("model", model)
service.save()

Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: /var/folders/c0/p81lrfs94tq4hn8065r74b300000gn/T/tmpii8ip69a/assets
[2020-11-16 10:03:31,575] INFO - Detected non-PyPI-released BentoML installed, copying local BentoML modulefiles to target saved bundle path..


no previously-included directories found matching 'e2e_tests'
no previously-included directories found matching 'tests'
no previously-included directories found matching 'benchmark'


UPDATING BentoML-0.9.2+25.g7796754/bentoml/_version.py
set BentoML-0.9.2+25.g7796754/bentoml/_version.py to '0.9.2+25.g7796754'
[2020-11-16 10:03:32,692] INFO - BentoService bundle 'ImdbTensorflow:20201116100327_8F8C4D' saved to: /Users/agent/bentoml/repository/ImdbTensorflow/20201116100327_8F8C4D


'/Users/agent/bentoml/repository/ImdbTensorflow/20201116100327_8F8C4D'

## Use BentoService with BentoML CLI

**`bentoml get <BentoService Name>` list all of BentoService's versions**

In [None]:
!bentoml get ImdbTensorflow

**`bentoml get <BentoService name>:<bentoService version>` display detailed information of the specific BentoService version**

In [16]:
!bentoml get ImdbTensorflow:latest

[2020-11-16 10:03:56,542] INFO - Getting latest version ImdbTensorflow:20201116100327_8F8C4D
[39m{
  "name": "ImdbTensorflow",
  "version": "20201116100327_8F8C4D",
  "uri": {
    "type": "LOCAL",
    "uri": "/Users/agent/bentoml/repository/ImdbTensorflow/20201116100327_8F8C4D"
  },
  "bentoServiceMetadata": {
    "name": "ImdbTensorflow",
    "version": "20201116100327_8F8C4D",
    "createdAt": "2020-11-16T02:03:31.513604Z",
    "env": {
      "condaEnv": "name: bentoml-default-conda-env\nchannels:\n- conda-forge\n- defaults\ndependencies:\n- pip\n",
      "pythonVersion": "3.6.9",
      "dockerBaseImage": "bentoml/model-server:0.9.2-py36",
      "pipPackages": [
        "bentoml==0.9.2",
        "tensorflow==2.3.1"
      ]
    },
    "artifacts": [
      {
        "name": "model",
        "artifactType": "TensorflowSavedModelArtifact",
        "metadata": {}
      }
    ],
    "apis": [
      {
        "name": "predict",
        "inputType": "JsonInput",
        "docs": "BentoServic

In [19]:
!bentoml run ImdbTensorflow:latest predict --input '"just okay"'

[2020-11-16 10:15:27,439] INFO - Getting latest version ImdbTensorflow:20201116100327_8F8C4D
2020-11-16 10:15:31.750823: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-16 10:15:31.762636: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fd4fe936da0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-16 10:15:31.762659: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
[2020-11-16 10:15:31,956] INFO - {'service_name': 'ImdbTensorflow', 'service_version': '20201116100327_8F8C4D', 'api': 'predict', 'task': {'data': '"just okay"', 'task_id': '166e2fc7-9c2a-4d20-94c1-65e62369f394', 'cli_args': ('--input', '"just 

### **Serve bentoml REST server**

for testing: run this command in shell

> bentoml serve ImdbTensorflow:latest

for production:

> bentoml serve-gunicorn ImdbTensorflow:latest --workers 1

with mincro-batching enabled:

> bentoml serve-gunicorn ImdbTensorflow:latest --workers 1 --enable-microbatch

## Query REST API with python

In [17]:
import requests

headers = {"content-type": "application/json"}
# reviews, a <str>
review = '"good"'
json_response = requests.post(f'http://localhost:5000/predict', data=review, headers=headers)
print(json_response)
print(json_response.text)

<Response [200]>
"positive"


## Query REST API with cURL

In [18]:
!curl -X POST "http://localhost:5000/predict" -H "accept: */*" -H "Content-Type: application/json" -d "\"good\""

"positive"

# Reference

- https://www.tensorflow.org/tutorials/keras/text_classification