# **BentoML Example: Linear Regression with Paddlepaddle**
**BentoML makes moving trained ML models to production easy:**



*   Package models trained with any ML framework and reproduce them for model serving in production
* **Deploy anywhere** for online API serving or offline batch serving
* High-Performance API model server with adaptive micro-batching support
* Central hub for managing models and deployment process via Web UI and APIs
* Modular and flexible design making it adaptable to your infrastrcuture

BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way.

Before reading this example project, be sure to check out the [Getting started guide](https://github.com/bentoml/BentoML/blob/master/guides/quick-start/bentoml-quick-start-guide.ipynb) to learn about the basic concepts in BentoML.

This notebook demonstrates how to use BentoML to turn a paddlepaddle model into a docker image containing a REST API server serving this model, how to use your ML service built with BentoML as a CLI tool, and how to distribute it a pypi package.

The example is based on [this tutorial](https://www.paddlepaddle.org.cn/documentation/docs/en/1.5/beginners_guide/basics/fit_a_line/README.html), using dataset from the [UCI Machine Learning Repository](https://www.kaggle.com/schirmerchad/bostonhoustingmlnd)

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
!python -m pip install git+https://github.com/bentoml/BentoML.git

#!pip install -q bentoml 'paddlepaddle>=2.0.0' 'pandas>=1.1.1' 'numpy>=1.8.2'
!pip install paddlepaddle

Collecting git+https://github.com/bentoml/BentoML.git
  Cloning https://github.com/bentoml/BentoML.git to /tmp/pip-req-build-jhg8dqoj
  Running command git clone -q https://github.com/bentoml/BentoML.git /tmp/pip-req-build-jhg8dqoj
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Building wheels for collected packages: BentoML
  Building wheel for BentoML (PEP 517) ... [?25l[?25hdone
  Created wheel for BentoML: filename=BentoML-0.12.0+14.g0fac537-cp37-none-any.whl size=1132685 sha256=a77ef8785b050713eef53fafdee64380df6b93fdbfa379e60eb4ccd1d2100a93
  Stored in directory: /tmp/pip-ephem-wheel-cache-dzy5155a/wheels/07/f4/e4/75cd038b063ebca70861fabd2631c5475542e880d5bafcad65
Successfully built BentoML
Installing collected packages: BentoML
  Found existing installation: BentoML 0.12.0
    Uninstalling BentoML-0.12.0:
      Successfully uninstalled BentoML-0.12.0
Successfully 

In [None]:
import numpy as np
import paddle
import paddle.nn as nn
import paddle.optimizer as opt
import pandas as pd

import bentoml

2021-04-01 21:26:54,239 - INFO - No local BentoML config file found, using default configurations


# **Prepare Dataset**

In [None]:
import paddle
import numpy as np
import bentoml
from paddle.static import InputSpec

BATCH_SIZE = 8
BATCH_NUM = 4
EPOCH_NUM = 5

IN_FEATURES = 13
OUT_FEATURES = 1

class LinearNet(paddle.nn.Layer):
    def __init__(self):
        super(LinearNet, self).__init__()
        self._linear = paddle.nn.Linear(IN_FEATURES, OUT_FEATURES)

    @paddle.jit.to_static(input_spec=[InputSpec(shape=[IN_FEATURES], dtype='float32')])
    def forward(self, x):
        return self._linear(x)

    def train(self, loader, loss_fn, opt):
        for epoch_id in range(EPOCH_NUM):
            for batch_id, (image, label) in enumerate(loader()):
                out = self._linear(image)
                loss = loss_fn(out, label)
                loss.backward()
                opt.step()
                opt.clear_grad()
                print("Epoch {} batch {}: loss = {}".format(
                    epoch_id, batch_id, np.mean(loss.numpy())))

  return (isinstance(seq, collections.Sequence) and


# **Model Training**

In [None]:
model = LinearNet()
loss_fn = paddle.nn.MSELoss()
adam = paddle.optimizer.Adam(parameters=model.parameters())

dataset = paddle.text.datasets.UCIHousing(mode="train")

loader = paddle.io.DataLoader(dataset,
  batch_size=BATCH_SIZE,
  shuffle=True,
  drop_last=True,
  num_workers=2)

model.train(loader, loss_fn, adam)

Epoch 0 batch 0: loss = 623.5848388671875
Epoch 0 batch 1: loss = 905.6049194335938
Epoch 0 batch 2: loss = 792.160888671875
Epoch 0 batch 3: loss = 780.5050048828125
Epoch 0 batch 4: loss = 627.6819458007812
Epoch 0 batch 5: loss = 525.576904296875
Epoch 0 batch 6: loss = 434.99334716796875
Epoch 0 batch 7: loss = 430.06915283203125
Epoch 0 batch 8: loss = 500.126953125
Epoch 0 batch 9: loss = 764.2137451171875
Epoch 0 batch 10: loss = 478.6463623046875
Epoch 0 batch 11: loss = 678.7437744140625
Epoch 0 batch 12: loss = 1200.9666748046875
Epoch 0 batch 13: loss = 405.596923828125
Epoch 0 batch 14: loss = 544.1121826171875
Epoch 0 batch 15: loss = 702.2733154296875
Epoch 0 batch 16: loss = 924.8780517578125
Epoch 0 batch 17: loss = 889.0203857421875
Epoch 0 batch 18: loss = 417.31298828125
Epoch 0 batch 19: loss = 797.6754150390625
Epoch 0 batch 20: loss = 599.2960815429688
Epoch 0 batch 21: loss = 397.35791015625
Epoch 0 batch 22: loss = 657.08642578125
Epoch 0 batch 23: loss = 505.52

In [None]:
 test_x = np.array([[-0.0405441 ,  0.06636364, -0.32356227, -0.06916996, -0.03435197,
        0.05563625, -0.03475696,  0.02682186, -0.37171335, -0.21419304,
       -0.33569506,  0.10143217, -0.21172912]]).astype('float32')

df_test_x = pd.DataFrame(test_x)

In [None]:
import csv

with open('test.csv', 'w', newline='') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(df_test_x.columns)
    spamwriter.writerow([-0.0405441 ,  0.06636364, -0.32356227, -0.06916996, -0.03435197,
        0.05563625, -0.03475696,  0.02682186, -0.37171335, -0.21419304,
       -0.33569506,  0.10143217, -0.21172912])

# **Create BentoService for model serving**

In [None]:
%%writefile paddle_linear_regression.py
import pandas as pd
import numpy as np

import bentoml
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import DataframeInput
from bentoml.frameworks.paddle import PaddlePaddleModelArtifact

@env(infer_pip_packages=True)
@artifacts([PaddlePaddleModelArtifact('model')])
class PaddleLinearRegression(bentoml.BentoService):

  @api(input=DataframeInput(), batch=True)
  def predict(self, df: pd.DataFrame):
        input_data = df.to_numpy().astype('float32')

        predictor = self.artifacts.model
        input_names = predictor.get_input_names()
        input_handle = predictor.get_input_handle(input_names[0])

        input_handle.reshape(input_data.shape)
        input_handle.copy_from_cpu(input_data)

        predictor.run()

        output_names = predictor.get_output_names()
        output_handle = predictor.get_output_handle(output_names[0])
        output_data = output_handle.copy_to_cpu()

        return output_data

Overwriting paddle_linear_regression.py


In [None]:
# 1) import the custom BentoService defined above
from paddle_linear_regression import PaddleLinearRegression

# 2) `pack` it with required artifacts
bento_svc = PaddleLinearRegression()
bento_svc.pack('model', model)

# 3) save your BentoSerivce
saved_path = bento_svc.save()




2021-04-01 21:28:46,195 - INFO - Context impl SQLiteImpl.
2021-04-01 21:28:46,196 - INFO - Will assume non-transactional DDL.


[2021-04-01 21:28:47,014] INFO - BentoService bundle 'PaddleLinearRegression:20210401212846_F88F15' saved to: /root/bentoml/repository/PaddleLinearRegression/20210401212846_F88F15


RuntimeError: ignored

# **REST API Model Serving**

In [None]:
!bentoml serve PaddleLinearRegression:latest

  """)
[2021-04-01 21:29:16,663] INFO - Getting latest version PaddleLinearRegression:20210401212846_F88F15
[2021-04-01 21:29:16,683] INFO - Starting BentoML API proxy in development mode..
[2021-04-01 21:29:16,685] INFO - Starting BentoML API server in development mode..
[2021-04-01 21:29:16,914] INFO - Your system nofile limit is 1048576, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection.
(Press CTRL+C to quit)
  return list(x) if isinstance(x, collections.Sequence) else [x]
 * Serving Flask app "PaddleLinearRegression" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
2021-04-01 21:29:19,971 - INFO -  * Running on http://127.0.0.1:50135/ (Press CTRL+C to quit)


--------------------------------------
C++ Traceback (most recent 

If you are running this notebook from Google Colab, you can start the dev server with --run-with-ngrok option, to gain acccess to the API endpoint via a public endpoint managed by ngrok:

In [None]:
!bentoml serve PaddleLinearRegression:latest --run-with-ngrok

  """)
[2021-04-01 21:30:36,588] INFO - Getting latest version PaddleLinearRegression:20210401212846_F88F15
[2021-04-01 21:30:36,609] INFO - Starting BentoML API proxy in development mode..
[2021-04-01 21:30:36,611] INFO - Starting BentoML API server in development mode..
[2021-04-01 21:30:36,751] INFO - Your system nofile limit is 1048576, which means each instance of microbatch service is able to hold this number of connections at same time. You can increase the number of file descriptors for the server process, or launch more microbatch instances to accept more concurrent connection.
(Press CTRL+C to quit)
[2021-04-01 21:30:38,624] INFO -  * Running on http://6a59231a38e1.ngrok.io
[2021-04-01 21:30:38,624] INFO -  * Traffic stats available on http://127.0.0.1:4040
  return list(x) if isinstance(x, collections.Sequence) else [x]
 * Serving Flask app "PaddleLinearRegression" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
2

# **Make request to the REST server**

*After navigating to the location of this notebook, copy and paste the following code to your terminal and run it to make request*

In [None]:
curl -i \
--request POST \
--header "Content-Type: text/csv" \
-d @test.csv \
localhost:5000/predict

# **Containerize model server with Docker**

One common way of distributing this model API server for production deployment, is via Docker containers. And BentoML provides a convenient way to do that.

Note that docker is **not available in Google Colab**. You will need to download and run this notebook locally to try out this containerization with docker feature.

If you already have docker configured, simply run the follow command to product a docker container serving the PaddleLinearRegression prediction service created above:

In [None]:
!bentoml containerize PaddleLinearRegression:latest

In [None]:
!docker run --rm -p 5000:5000 PaddleLinearRegression:20210306050051_766D0A

/bin/bash: docker: command not found


# **Load Saved Bento Service**

In [None]:
#TODO ADD INPUT
from bentoml import load

svc = load(saved_path)

input = pd.DataFrame([[-0.0405441 ,  0.06636364, -0.32356227, -0.06916996, -0.03435197,
        0.05563625, -0.03475696,  0.02682186, -0.37171335, -0.21419304,
       -0.33569506,  0.10143217, -0.21172912]]).astype('float32')

print(svc.predict(input))

[[0.85793805]]


# **Launch inference job from CLI**

In [None]:
!bentoml run PaddleLinearRegression:latest predict --format csv --input-file test.csv

  """)
[2021-04-01 21:29:58,275] INFO - Getting latest version PaddleLinearRegression:20210401212846_F88F15
  return list(x) if isinstance(x, collections.Sequence) else [x]
I0401 21:30:01.203775   849 analysis_predictor.cc:155] Profiler is deactivated, and no profiling report will be generated.
[1m[35m--- Running analysis [ir_graph_build_pass][0m
[1m[35m--- Running analysis [ir_graph_clean_pass][0m
[1m[35m--- Running analysis [ir_analysis_pass][0m
[32m--- Running IR pass [simplify_with_basic_ops_pass][0m
[32m--- Running IR pass [attention_lstm_fuse_pass][0m
[32m--- Running IR pass [seqconv_eltadd_relu_fuse_pass][0m
[32m--- Running IR pass [seqpool_cvm_concat_fuse_pass][0m
[32m--- Running IR pass [mul_lstm_fuse_pass][0m
[32m--- Running IR pass [fc_gru_fuse_pass][0m
[32m--- Running IR pass [mul_gru_fuse_pass][0m
[32m--- Running IR pass [seq_concat_fc_fuse_pass][0m
[32m--- Running IR pass [squeeze2_matmul_fuse_pass][0m
[32m--- Running IR pass [reshape2_matmul_f

# **Deployment Options**

If you are at a small team with limited engineering or DevOps resources, try out automated deployment with BentoML CLI, currently supporting AWS Lambda, AWS SageMaker, and Azure Functions:

* [AWS Lambda Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_lambda.html)
* [AWS SageMaker Deployment Guide](https://docs.bentoml.org/en/latest/deployment/aws_sagemaker.html)
* [Azure Functions Deployment Guide](https://docs.bentoml.org/en/latest/deployment/azure_functions.html)

If the cloud platform you are working with is not on the list above, try out these step-by-step guide on manually deploying BentoML packaged model to cloud platforms:

* [AWS ECS Deployment](https://docs.bentoml.org/en/latest/deployment/aws_ecs.html)
* [Google Cloud Run Deployment](https://docs.bentoml.org/en/latest/deployment/google_cloud_run.html)
* [Azure container instance Deployment](https://docs.bentoml.org/en/latest/deployment/azure_container_instance.html)
* [Heroku Deployment](https://docs.bentoml.org/en/latest/deployment/heroku.html)

Lastly, if you have a DevOps or ML Engineering team who's operating a Kubernetes or OpenShift cluster, use the following guides as references for implementating your deployment strategy:

* [Kubernetes Deployment](https://docs.bentoml.org/en/latest/deployment/kubernetes.html)
* [Knative Deployment](https://docs.bentoml.org/en/latest/deployment/knative.html)
* [Kubeflow Deployment](https://docs.bentoml.org/en/latest/deployment/kubeflow.html)
* [KFServing Deployment](https://docs.bentoml.org/en/latest/deployment/kfserving.html)
* [Clipper.ai Deployment Guide](https://docs.bentoml.org/en/latest/deployment/clipper.html)