# Text classification: Classify reviews of imdb

**BentoML makes moving trained ML models to production easy:**

* Package models trained with **any ML framework** and reproduce them for model serving in production
* **Deploy anywhere** for online API serving or offline batch serving
* High-Performance API model server with *adaptive micro-batching* support
* Central hub for managing models and deployment process via Web UI and APIs
* Modular and flexible design making it *adaptable to your infrastrcuture*

BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way.


![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=tensorflow&ea=imdb_text_classification&dt=imdb_text_classification)

In [1]:
!pip install -q bentoml tensorflow

You should consider upgrading via the '/home/ruhan/work_env/bin/python -m pip install --upgrade pip' command.[0m


In [2]:
import tensorflow as tf
from tensorflow import keras

In [3]:
# constant variables
MAX_WORDS = 10000
REVIEW_CLASSES = ['negative', 'positive']

In [4]:
## download dataset from keras.
(_X_train, _y_train), (_X_test, _y_test) = keras.datasets.imdb.load_data(num_words=MAX_WORDS) # 10000 high-frequency vocabulary

In [5]:
## check the data
print("X_train shape: {}\ny_train shape:{}".format(_X_train.shape, _y_train.shape))
print(type(_X_train.shape))
_X_train[:2]

X_train shape: (25000,)
y_train shape:(25000,)
<class 'tuple'>


array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),
       list([1, 194, 1153, 194, 8255, 78, 228,

## Reverse Word Index

In [6]:
# word_index[<str>] = <int>
word_index = tf.keras.datasets.imdb.get_word_index()

word_index = {k:(v+3) for k,v in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2  
word_index["<UNUSED>"] = 3

# word_index.items  <str> to <int>
# reverse_word_index <int> to <str>
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])


def decode_review(text):
    return ' '.join([reverse_word_index.get(i, '#') for i in text])

# <str> to <int>
def encode_review(text):
    words = text.split(' ')
    ids = [word_index["<START>"]]
    for w in words:
        v = word_index.get(w, word_index["<UNK>"])
        # >1000, signed as <UNUSED>
        if v > MAX_WORDS:
            v = word_index["<UNUSED>"]
        ids.append(v)
    return ids    

## Word Embeddings

In [7]:
X_train = keras.preprocessing.sequence.pad_sequences(_X_train,
                                                     dtype='int32',
                                                        value=word_index["<PAD>"],
                                                        padding='post',
                                                        maxlen=256)

X_test = keras.preprocessing.sequence.pad_sequences(_X_test,
                                                    dtype='int32',
                                                       value=word_index["<PAD>"],
                                                       padding='post',
                                                       maxlen=256)

# classification. convert y to 2 dims 
y_train = tf.one_hot(_y_train, depth=2)
y_test = tf.one_hot(_y_test, depth=2)


print("X: ", X_train.shape, X_train.dtype, X_test.dtype)

#print("y: ", y_train.shape, y_train[:2])

X:  (25000, 256) int32 int32


In [8]:
# model setting
model = tf.keras.Sequential([
            tf.keras.layers.Embedding(10000, 8),
            tf.keras.layers.GlobalAvgPool1D(),
            tf.keras.layers.Dense(6, activation="relu"),
            tf.keras.layers.Dense(2, activation="sigmoid"),
        ])

# 
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

## Train the Model

In [9]:
model.fit(X_train, y_train, epochs=40, batch_size=512)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<tensorflow.python.keras.callbacks.History at 0x7fabb9d66850>

In [10]:
# check the test datasets
model.evaluate(X_test, y_test)



[0.2944161593914032, 0.881600022315979]

## Save the Model

In [11]:
!mkdir -p imdb_model
# model saving
model.save('imdb_model/imdb')
# use keras to load model
saved_model = tf.keras.models.load_model('imdb_model/imdb')
# 
saved_model.summary()

Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: imdb_model/imdb/assets
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 8)           80000     
_________________________________________________________________
global_average_pooling1d (Gl (None, 8)                 0         
_________________________________________________________________
dense (Dense)                (None, 6)                 54        
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 14        
Total params: 80,068
Trainable params: 80,068
Non-trainable params: 0
________________________

In [12]:
# define a predict function for production
def predict(texts):
    X = [encode_review(t) for t in texts]
    X = keras.preprocessing.sequence.pad_sequences(X,
                                                   dtype="int32",
                                                   value=word_index["<PAD>"],
                                                   padding='post',
                                                   maxlen=256)
    y = saved_model(X)
    return [REVIEW_CLASSES[c] for c in tf.argmax(y, axis=1).numpy().tolist()]

predict(['it is funfunnyny.', 'just so good', 'oh, bad'])

['positive', 'negative', 'negative']

## Evaluate the model

In [13]:
# use new model to evaluate
results = saved_model.evaluate(X_test, y_test, verbose=2)
print(results)

782/782 - 0s - loss: 0.2944 - accuracy: 0.8816
[0.2944161593914032, 0.881600022315979]


## Create BentoService class

In [14]:
%%writefile tensorflow_text_classification.py

import bentoml
import tensorflow as tf
from tensorflow import keras

from bentoml.artifact import TensorflowSavedModelArtifact
from bentoml.adapters import JsonInput



REVIEW_CLASSES = ['negative', 'positive']

MAX_WORDS = 10000
word_index = tf.keras.datasets.imdb.get_word_index()
word_index = {k:(v+3) for k,v in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2  # unknown
word_index["<UNUSED>"] = 3

# tf.keras.models.load_model("imdb_model/imdb")
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

def encode_review(text):
    words = text.split(' ')
    ids = [word_index["<START>"]]
    for w in words:
        v = word_index.get(w, word_index["<UNK>"])
        # >1000, signed as <UNseED>
        if v > MAX_WORDS:
            v = word_index["<UNUSED>"]
        ids.append(v)
    return ids


@bentoml.env(pip_dependencies=['tensorflow'])
@bentoml.artifacts([TensorflowSavedModelArtifact('model')])
class ImdbTensorflow(bentoml.BentoService):

    @bentoml.api(input=JsonInput(), batch=True)
    def predict(self, texts):
        X = [encode_review(t) for t in texts]
        X = keras.preprocessing.sequence.pad_sequences(X,
                                                       dtype="float32",
                                                       value=word_index["<PAD>"],
                                                       padding='post',
                                                       maxlen=256)
        y = self.artifacts.model(X)
        return [REVIEW_CLASSES[c] for c in tf.argmax(y, axis=1).numpy().tolist()]

Overwriting tensorflow_text_classification.py


In [15]:
import tensorflow_text_classification

# import importlib
# importlib.reload(tensorflow_text_classification)

service = tensorflow_text_classification.ImdbTensorflow()

service.pack("model", model)
service.save()

INFO:tensorflow:Assets written to: /tmp/bentoml-temp-qwrlvy7c/ImdbTensorflow/artifacts/model_saved_model/assets
[2020-11-04 21:35:40,627] INFO - BentoService bundle 'ImdbTensorflow:20201104213539_7F3867' saved to: /home/ruhan/bentoml/repository/ImdbTensorflow/20201104213539_7F3867


'/home/ruhan/bentoml/repository/ImdbTensorflow/20201104213539_7F3867'

## Use BentoService with BentoML CLI

**`bentoml get <BentoService Name>` list all of BentoService's versions**

In [16]:
!bentoml get ImdbTensorflow

[39mBENTO_SERVICE                         AGE                         APIS                                      ARTIFACTS                            LABELS
ImdbTensorflow:20201104213539_7F3867  0.83 seconds                predict<JsonInput:DefaultOutput>          model<TensorflowSavedModelArtifact>
ImdbTensorflow:20201104213353_E9D9A3  1 minute and 45.84 seconds  predict<JsonInput:DefaultOutput>          model<TensorflowSavedModelArtifact>
ImdbTensorflow:20201104165124_8722E4  4 hours and 44 minutes      predict<JsonInput:DefaultOutput>          model<TensorflowSavedModelArtifact>
ImdbTensorflow:20201104164837_D92DB2  4 hours and 47 minutes      predict<JsonInput:DefaultOutput>          model<TensorflowSavedModelArtifact>
ImdbTensorflow:20201104145151_E24226  6 hours and 43 minutes      predict<JsonInput:DefaultOutput>          model<TensorflowSavedModelArtifact>
ImdbTensorflow:20201104144303_962D3F  6 hours and 52 minutes      predict<JsonInput:DefaultOutput>          model<Tensorflo

**`bentoml get <BentoService name>:<bentoService version>` display detailed information of the specific BentoService version**

In [17]:
!bentoml get ImdbTensorflow:latest

[2020-11-04 21:35:43,520] INFO - Getting latest version ImdbTensorflow:20201104213539_7F3867
[39m{
  "name": "ImdbTensorflow",
  "version": "20201104213539_7F3867",
  "uri": {
    "type": "LOCAL",
    "uri": "/home/ruhan/bentoml/repository/ImdbTensorflow/20201104213539_7F3867"
  },
  "bentoServiceMetadata": {
    "name": "ImdbTensorflow",
    "version": "20201104213539_7F3867",
    "createdAt": "2020-11-04T13:35:40.593944Z",
    "env": {
      "condaEnv": "name: bentoml-default-conda-env\nchannels:\n- conda-forge\n- defaults\ndependencies:\n- pip\n",
      "pythonVersion": "3.7.9",
      "dockerBaseImage": "bentoml/model-server:0.9.2-py37",
      "pipPackages": [
        "bentoml==0.9.2",
        "tensorflow==2.3.1"
      ]
    },
    "artifacts": [
      {
        "name": "model",
        "artifactType": "TensorflowSavedModelArtifact"
      }
    ],
    "apis": [
      {
        "name": "predict",
        "inputType": "JsonInput",
        "docs": "BentoService inference API 'predict'

**Serve bentoml REST server locally**

In [18]:
# !bentoml serve ImdbTensorflow:latest

## Query REST API with python

In [21]:
import requests

headers = {"content-type": "application/json"}
# reviews, a <str>
review = '"good"'
json_response = requests.post(f'http://localhost:5000/predict', data=review, headers=headers)
print(json_response)
print(json_response.text)

<Response [200]>
"positive"


## Query REST API with cURL

In [22]:
!curl -X POST "http://localhost:5000/predict" -H "accept: */*" -H "Content-Type: application/json" -d "\"good\""

"positive"

# Reference

- https://www.tensorflow.org/tutorials/keras/text_classification