# Basic classification: Classify images of clothing

BentoML is an open-source framework for machine learning **model serving**, aiming to **bridge the gap between Data Science and DevOps.**

Data Scientists can easily package their models trained with any ML framework using BentoMl and reproduce the model for serving in production. BentoML helps with managing packaged models in the BentoML format, and allows DevOps to deploy them as online API serving endpoints or offline batch inference jobs, on any cloud platform.

Before reading this example project, be sure to check out the [Getting started guide](https://github.com/bentoml/BentoML/blob/master/guides/quick-start/bentoml-quick-start-guide.ipynb) to learn about the basic concepts in BentoML.


![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=tensorflow&ea=tensorflow_1_fashion_mnist&dt=tensorflow_1_fashion_mnist)

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [4]:
!pip install -q bentoml tensorflow==1.14.0 matplotlib "numpy<1.17"
# why numpy<1.17: https://github.com/tensorflow/tensorflow/issues/30427

In [5]:
from __future__ import absolute_import, division, print_function, unicode_literals

import io

# TensorFlow
import tensorflow as tf

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt
print(tf.__version__)

1.14.0


In [6]:
fashion_mnist = tf.keras.datasets.fashion_mnist
(_train_images, train_labels), (_test_images, test_labels) = fashion_mnist.load_data()
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
train_images = _train_images / 255.0
test_images = _test_images / 255.0

train_x = np.reshape(train_images, [-1, 28, 28, 1])
train_y = np.eye(10)[train_labels]  # one hot

test_x = np.reshape(test_images, [-1, 28, 28, 1])
test_y = np.eye(10)[test_labels]  # one hot



In [7]:
input_shape = [None, 28, 28, 1]
number_of_classes = 10

In [10]:
#Function below builds model graph 
def cnn_model_fn(input_shape, number_of_classes, learning_rate):
    raw = tf.placeholder(tf.string, shape=[None])
    with tf.device("/cpu:0"): # map_fn has issues on GPU https://github.com/tensorflow/tensorflow/issues/28007
        img_array = tf.map_fn(lambda i: tf.io.decode_png(i, channels=1), raw, dtype=tf.uint8)
    img_array = tf.cast(img_array, tf.float32)
    img_array = (255.0 - img_array) / 255.0
    
    input_layer = tf.reshape(img_array, [-1, 28, 28, 1])

    #input_layer = tf.placeholder(tf.float32, shape=input_shape)
    labels = tf.placeholder(tf.float32, shape=[None, number_of_classes])
    
    #Train mode is used with dropout layers. We want effectively disable the dropout layers while
    #evaluation and predict and use it only while training
    train_mode = tf.placeholder(tf.bool)
    
    #Architecture: image ->conv2d->maxpooling->conv2d->maxpooling->flatten->dense->dropout->logits->softmax
    
    #convolution layer 1
    conv1 = tf.layers.conv2d(
        inputs=input_layer, 
        filters=32, 
        kernel_size=[5, 5], 
        padding="same", 
        activation=tf.nn.relu)
    
    #pooling layer 1
    pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
    
    #convolution layer 2
    conv2 = tf.layers.conv2d(
        inputs=pool1, 
        filters=64, 
        kernel_size=[5, 5], 
        padding="same", 
        activation=tf.nn.relu)
    
    #pooling layer 1
    pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
    
    #flatten the output volume of pool2 into a vector
    pool2_flat = tf.reshape(pool2, shape=[-1, 7*7*64])
    
    #dense layer
    dense = tf.layers.dense(
        inputs=pool2_flat, 
        units=1024,
        activation=tf.nn.relu)
    
    #dropout regularization
    dropout = tf.layers.dropout(
        inputs=dense, 
        rate=0.3, 
        training= train_mode)
    
    #logits layer
    logits = tf.layers.dense(inputs=dropout, units=10)
    
    predictions = {
        "classes" : tf.argmax(input=logits, axis=1),
        "probabilities" : tf.nn.softmax(logits=logits)
    }
    
    #loss
    loss = tf.losses.softmax_cross_entropy(labels, logits)
    
    #training operartion
    train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
    
    #accuracy
    accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1)), tf.float32))
    
    return { "logits": logits,
             "predictions": predictions,
             "loss": loss,
             "train_op": train_op,
             "accuracy": accuracy,
             "raw_x": raw,
             "x": input_layer,
             "y": labels,
             "train_mode": train_mode }

In [11]:
learning_rate = 0.01
batch_size = 1000
epoch = 5

tf.reset_default_graph()
cnn_model = cnn_model_fn(input_shape, number_of_classes, learning_rate)
x = cnn_model["x"]
y= cnn_model["y"]
train_mode = cnn_model["train_mode"]

Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Use keras.layers.MaxPooling2D instead.
Instructions for updating:
Use keras.layers.dense instead.
Instructions for updating:
Use keras.layers.dropout instead.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


## Train the model


In [12]:
with tf.Session() as sess:
    with tf.device("/gpu:0"):
        sess.run(tf.global_variables_initializer())

        #Divide input training set into mini batches of size batch_size.
        #If the total number of training examles is not exactly divisible by batch_size, 
        #the last batch will have less number of examples than batch_size.

        total_size = train_x.shape[0]
        number_of_batches = int(total_size/batch_size)

        print("Training:Start")
        for e in range(epoch):
            epoch_cost = 0
            epoch_accuracy = 0
            print("Epoch {}:".format(e+1))
            for i in range(number_of_batches):
                print("#", end='')
                mini_x = train_x[i*batch_size:(i+1)*batch_size, :, :, :]
                mini_y = train_y[i*batch_size:(i+1)*batch_size, :]
                _, cost = sess.run([cnn_model["train_op"], cnn_model["loss"]], 
                    feed_dict={x:mini_x, 
                               y:mini_y,
                               train_mode:True})
                train_accuracy = sess.run(cnn_model["accuracy"], 
                    feed_dict={x:mini_x, 
                               y:mini_y,
                               train_mode:False})
                epoch_cost += cost
                epoch_accuracy += train_accuracy

            #If the total number of training examles is not exactly divisible by batch_size, 
            #we have one more batch of size (total_size - number_of_batches*batch_size)
            if total_size % batch_size != 0:
                print("#", end='')
                mini_x = train_x[number_of_batches*batch_size:total_size, :, :, :]
                mini_y = train_y[number_of_batches*batch_size:total_size, :]
                _, cost = sess.run([cnn_model["train_op"], cnn_model["loss"]], 
                    feed_dict={x:mini_x, 
                               y:mini_y,
                               train_mode:True})
                train_accuracy = sess.run(cnn_model["accuracy"], 
                    feed_dict={x:mini_x, 
                               y:mini_y,
                               train_mode: False})
                epoch_cost += cost
                epoch_accuracy += train_accuracy

            epoch_cost /= number_of_batches

            if total_size % batch_size != 0:
                epoch_accuracy /= (number_of_batches+1)
            else:
                epoch_accuracy /= number_of_batches
            print()    
            print("loss: {:02.2f} accuracy: {:02.2f} ".format(np.squeeze(epoch_cost), epoch_accuracy))
            #Cross validation loss and accuracy
            cv_loss, cv_accuracy = sess.run([cnn_model["loss"], cnn_model["accuracy"]], 
                                        {x:test_x, 
                                         y:test_y,
                                         train_mode: False})
            print("val_loss: {:02.2f} val_accuracy: {:02.2f}".format(np.squeeze(cv_loss), cv_accuracy))

        print("Training:End")


        #prediction for test set
        test_accuracy, prediction = sess.run([cnn_model["accuracy"], 
                                              cnn_model["predictions"]["classes"]], 
                                             {x:test_x, y:test_y, train_mode:False})
        print("Test accuracy {:02.2f}".format(test_accuracy))

    with tf.device("/cpu:0"):
        inputs = {"x":cnn_model['raw_x'], "train_mode":cnn_model['train_mode']}
        outputs = {"prediction": cnn_model['predictions']['classes']}
        tf.saved_model.simple_save(sess, 'test_model', inputs=inputs, outputs=outputs)

Training:Start
Epoch 1:
############################################################
loss: 1.16 accuracy: 0.66 
val_loss: 0.49 val_accuracy: 0.82
Epoch 2:
############################################################
loss: 0.41 accuracy: 0.86 
val_loss: 0.36 val_accuracy: 0.86
Epoch 3:
############################################################
loss: 0.33 accuracy: 0.89 
val_loss: 0.33 val_accuracy: 0.88
Epoch 4:
############################################################
loss: 0.29 accuracy: 0.90 
val_loss: 0.32 val_accuracy: 0.88
Epoch 5:
############################################################
loss: 0.27 accuracy: 0.91 
val_loss: 0.30 val_accuracy: 0.89
Training:End
Test accuracy 0.89
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.simple_save.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or 

# Model inference test run (Ipython kernel restarting required!!)


In [1]:
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

with open("test.png", "rb") as f:
    img_bytes = f.read()

In [2]:
import tensorflow as tf

tf.compat.v1.enable_eager_execution()


loaded = tf.compat.v2.saved_model.load('test_model')
loaded_func = loaded.signatures[tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
pred = loaded_func(x=tf.constant([img_bytes], dtype=tf.string), train_mode=tf.constant(False))
output = pred['prediction']
[class_names[c] for c in output]

['Ankle boot']

And the model predicts a label as expected.

# Create BentoService class

In [3]:
%%writefile tensorflow_1_fashion_mnist.py

import bentoml
import tensorflow as tf

from bentoml.frameworks.tensorflow import TensorflowSavedModelArtifact
from bentoml.adapters import TfTensorInput

tf.compat.v1.enable_eager_execution() # required

FASHION_MNIST_CLASSES = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']


@bentoml.env(pip_packages=['tensorflow', 'numpy', 'pillow'])
@bentoml.artifacts([TensorflowSavedModelArtifact('trackable')])
class FashionMnistTensorflow(bentoml.BentoService):

    @bentoml.api(input=TfTensorInput(), batch=True)
    def predict(self, inputs):
        loaded_func = self.artifacts.trackable.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
        pred = loaded_func(x=inputs, train_mode=tf.constant(False))
        output = pred['prediction']
        return [FASHION_MNIST_CLASSES[c] for c in output]

Overwriting tensorflow_1_fashion_mnist.py


In [4]:
from tensorflow_1_fashion_mnist import FashionMnistTensorflow

bento_svc = FashionMnistTensorflow()
bento_svc.pack("trackable", "test_model/")
saved_path = bento_svc.save()

[2020-07-30 04:31:57,643] INFO - Detect BentoML installed in development model, copying local BentoML module file to target saved bundle path
running sdist
running egg_info
writing BentoML.egg-info/PKG-INFO
writing dependency_links to BentoML.egg-info/dependency_links.txt
writing entry points to BentoML.egg-info/entry_points.txt
writing requirements to BentoML.egg-info/requires.txt
writing top-level names to BentoML.egg-info/top_level.txt
reading manifest file 'BentoML.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'


no previously-included directories found matching 'e2e_tests'
no previously-included directories found matching 'tests'
no previously-included directories found matching 'benchmark'


writing manifest file 'BentoML.egg-info/SOURCES.txt'
running check
creating BentoML-0.8.3+42.gb8d36b6
creating BentoML-0.8.3+42.gb8d36b6/BentoML.egg-info
creating BentoML-0.8.3+42.gb8d36b6/bentoml
creating BentoML-0.8.3+42.gb8d36b6/bentoml/adapters
creating BentoML-0.8.3+42.gb8d36b6/bentoml/artifact
creating BentoML-0.8.3+42.gb8d36b6/bentoml/cli
creating BentoML-0.8.3+42.gb8d36b6/bentoml/clipper
creating BentoML-0.8.3+42.gb8d36b6/bentoml/configuration
creating BentoML-0.8.3+42.gb8d36b6/bentoml/configuration/__pycache__
creating BentoML-0.8.3+42.gb8d36b6/bentoml/handlers
creating BentoML-0.8.3+42.gb8d36b6/bentoml/marshal
creating BentoML-0.8.3+42.gb8d36b6/bentoml/saved_bundle
creating BentoML-0.8.3+42.gb8d36b6/bentoml/server
creating BentoML-0.8.3+42.gb8d36b6/bentoml/utils
creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai
creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/client
creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/deployment
creating BentoML-0.8.3+42.gb8d36b6/bentoml/yatai/dep

## REST API Model Serving


To start a REST API model server with the BentoService saved above, use the bentoml serve command:## Use BentoService with BentoML CLI

In [1]:
!bentoml serve FashionMnistTensorflow:latest

[2020-07-30 04:32:54,971] INFO - Getting latest version FashionMnistTensorflow:20200730043145_D7CFA7
[2020-07-30 04:32:54,972] INFO - Starting BentoML API server in development mode..
2020-07-30 04:32:57.431393: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-07-30 04:32:57.445711: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-30 04:32:57.446136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
2020-07-30 04:32:57.446314: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2020-07-30 04:32:57.447907: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Succe

If you are running this notebook from Google Colab, you can start the dev server with `--run-with-ngrok` option, to gain acccess to the API endpoint via a public endpoint managed by [ngrok](https://ngrok.com/):

In [None]:
!bentoml serve FashionMnistTensorflow:latest --run-with-ngrok

## Query REST API with python

In [8]:
import base64
import json
import requests

with open("test.png", "rb") as f:
    img_bytes = f.read()
img_b64 = base64.b64encode(img_bytes).decode()


headers = {"content-type": "application/json"}
data = json.dumps(
       {"instances": [{"b64": img_b64}]}
)
print('Data: {} ... {}'.format(data[:50], data[len(data)-52:]))

json_response = requests.post(f'http://localhost:5000/predict', data=data, headers=headers)
print(json_response)
print(json_response.text)

Data: {"instances": [{"b64": "iVBORw0KGgoAAAANSUhEUgAAAB ... ufkz8DPG//sD/AX8I8DvdgnOxdB4B1wAAAAASUVORK5CYII="}]}
<Response [200]>
["Ankle boot"]


## Containerize model server with Docker


One common way of distributing this model API server for production deployment, is via Docker containers. And BentoML provides a convenient way to do that.

Note that docker is **not available in Google Colab**. You will need to download and run this notebook locally to try out this containerization with docker feature.

If you already have docker configured, simply run the follow command to product a docker container serving the IrisClassifier prediction service created above:

In [5]:
!bentoml containerize FashionMnistTensorflow:latest

sha256:ab1acf390e1827b442c4d11a44a9b7ee49e9cb85e097d5febb42cfaaf2357f45


In [6]:
!docker run -p 5000:5000 fashionmnisttensorflow

[2020-07-29 20:40:37,074] INFO - Starting BentoML API server in production mode..
[2020-07-29 20:40:37,486] INFO - Running micro batch service on :5000
[2020-07-29 20:40:37 +0000] [13] [INFO] Starting gunicorn 20.0.4
[2020-07-29 20:40:37 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2020-07-29 20:40:37 +0000] [13] [INFO] Listening at: http://0.0.0.0:5000 (13)
[2020-07-29 20:40:37 +0000] [13] [INFO] Using worker: aiohttp.worker.GunicornWebWorker
[2020-07-29 20:40:37 +0000] [1] [INFO] Listening at: http://0.0.0.0:41595 (1)
[2020-07-29 20:40:37 +0000] [1] [INFO] Using worker: sync
[2020-07-29 20:40:37 +0000] [15] [INFO] Booting worker with pid: 15
[2020-07-29 20:40:37 +0000] [14] [INFO] Booting worker with pid: 14
[2020-07-29 20:40:37,547] INFO - Micro batch enabled for API `predict`
[2020-07-29 20:40:37,548] INFO - Your system nofile limit is 1048576, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file

## Load saved BentoService

bentoml.load is the API for loading a BentoML packaged model in python:

In [None]:
from bentoml import load


load_svc = load(saved_path)

print(load_svc.predict([data]))

## Launch inference job from CLI

BentoML cli supports loading and running a packaged model from CLI. With the DataframeInput adapter, the CLI command supports reading input Dataframe data from CLI argument or local csv or json files:

In [None]:
!bentoml run FashionMnistTensorflow:latest --input {data} 

# Deployment Options

If you are at a small team with limited engineering or DevOps resources, try out automated deployment with BentoML CLI, currently supporting AWS Lambda, AWS SageMaker, and Azure Functions:
- [AWS Lambda Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_lambda.html)
- [AWS SageMaker Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_sagemaker.html)
- [Azure Functions Deployment Guide](https://docs.bentoml.org/en/latest/deployment/azure_functions.html)

If the cloud platform you are working with is not on the list above, try out these step-by-step guide on manually deploying BentoML packaged model to cloud platforms:
- [AWS ECS Deployment](https://docs.bentoml.org/en/latest/deployment/aws_ecs.html)
- [Google Cloud Run Deployment](https://docs.bentoml.org/en/latest/deployment/google_cloud_run.html)
- [Azure container instance Deployment](https://docs.bentoml.org/en/latest/deployment/azure_container_instance.html)
- [Heroku Deployment](https://docs.bentoml.org/en/latest/deployment/heroku.html)

Lastly, if you have a DevOps or ML Engineering team who's operating a Kubernetes or OpenShift cluster, use the following guides as references for implementating your deployment strategy:
- [Kubernetes Deployment](https://docs.bentoml.org/en/latest/deployment/kubernetes.html)
- [Knative Deployment](https://docs.bentoml.org/en/latest/deployment/knative.html)
- [Kubeflow Deployment](https://docs.bentoml.org/en/latest/deployment/kubeflow.html)
- [KFServing Deployment](https://docs.bentoml.org/en/latest/deployment/kfserving.html)
- [Clipper.ai Deployment Guide](https://docs.bentoml.org/en/latest/deployment/clipper.html)

