# R API Serving Examples

In this example, we compare the runtimes of three methods for serving a model from an R hosted API:

* **Plumber**
 * Website: [https://www.rplumber.io/](https://www.rplumber.io/)
 * SageMaker Example: [r_byo_algo_with_plumber](../r_byo_algo_with_plumber)
* **RestRServe**
 * Website: [https://restrserve.org](https://restrserve.org/)
 * SageMaker Example: [r_byo_algo_with_restrserve](../r_byo_algo_with_restrserve)
* **FastAPI** (reticulated from Python)
 * Website: [https://fastapi.tiangolo.com](https://fastapi.tiangolo.com/)
 * SageMaker Example: [r_byo_algo_with_fastapi](../r_byo_algo_with_fastapi)

## Building Docker Images for Serving

First, let's build each docker image from the provided SageMaker Examples.

### Plumber Serving Image

In [1]:
!cd .. && docker build -t r-plumber -f r_byo_algo_with_plumber/Dockerfile r_byo_algo_with_plumber

Sending build context to Docker daemon  121.9kB
Step 1/9 : FROM r-base:3.6.3
 ---> cec2502269fb
Step 2/9 : MAINTAINER Amazon SageMaker Examples <amazon-sagemaker-examples@amazon.com>
 ---> Using cache
 ---> d5c7ee17124e
Step 3/9 : RUN apt-get -y update && apt-get install -y --no-install-recommends     wget     apt-transport-https     ca-certificates     libcurl4-openssl-dev     libsodium-dev
 ---> Using cache
 ---> 9045f663fee5
Step 4/9 : RUN R -e "install.packages(c('xgboost','plumber'), repos='https://cloud.r-project.org')"
 ---> Using cache
 ---> f3e0508834a6
Step 5/9 : COPY xgb.model /opt/ml/xgb.model
 ---> Using cache
 ---> 48db5ef0c627
Step 6/9 : COPY endpoints.R /opt/ml/endpoints.R
 ---> Using cache
 ---> 00b19c2b8b4f
Step 7/9 : COPY deploy.R /opt/ml/deploy.R
 ---> Using cache
 ---> 3732e470829d
Step 8/9 : WORKDIR /opt/ml
 ---> Using cache
 ---> aa8b88ff67c2
Step 9/9 : ENTRYPOINT ["/usr/bin/Rscript", "/opt/ml/deploy.R", "--no-save"]
 ---> Using cache
 ---> afc7d7621ec2
Successfu

### RestRServe Serving Image

In [2]:
!cd .. && docker build -t r-restrserve -f r_byo_algo_with_restrserve/Dockerfile r_byo_algo_with_restrserve

Sending build context to Docker daemon  107.5kB
Step 1/7 : FROM r-base:3.6.3
 ---> cec2502269fb
Step 2/7 : MAINTAINER Amazon SageMaker Examples <amazon-sagemaker-examples@amazon.com>
 ---> Using cache
 ---> d5c7ee17124e
Step 3/7 : RUN R -e "install.packages(c('RestRserve','xgboost','dplyr'), repos='https://cloud.r-project.org')"
 ---> Using cache
 ---> ebcf2f81ff2d
Step 4/7 : COPY xgb.model /opt/ml/xgb.model
 ---> Using cache
 ---> d3c15c4582c7
Step 5/7 : COPY restrserve.R /opt/ml/restrserve.R
 ---> Using cache
 ---> 558d27d04a7e
Step 6/7 : WORKDIR /opt/ml
 ---> Using cache
 ---> fd6a29e89e0b
Step 7/7 : ENTRYPOINT ["/usr/bin/Rscript", "/opt/ml/restrserve.R", "--no-save"]
 ---> Using cache
 ---> 9cfd01394754
Successfully built 9cfd01394754
Successfully tagged r-restrserve:latest


### FastAPI Serving Image

In [3]:
!cd .. && docker build -t r-fastapi -f r_byo_algo_with_fastapi/Dockerfile r_byo_algo_with_fastapi

Sending build context to Docker daemon    129kB
Step 1/11 : FROM r-base:3.6.3
 ---> cec2502269fb
Step 2/11 : MAINTAINER Amazon SageMaker Examples <amazon-sagemaker-examples@amazon.com>
 ---> Using cache
 ---> d5c7ee17124e
Step 3/11 : RUN apt-get -y update && apt-get install -y --no-install-recommends     wget     r-base     r-base-dev     apt-transport-https     ca-certificates     python3 python3-dev pip
 ---> Using cache
 ---> 627800afb90b
Step 4/11 : RUN pip install fastapi uvicorn numpy
 ---> Using cache
 ---> 34ca8248acac
Step 5/11 : RUN R -e "install.packages('reticulate', repos='https://cloud.r-project.org')"
 ---> Using cache
 ---> 9ec2537f2bc4
Step 6/11 : RUN R -e "install.packages(c('xgboost'), repos='https://cloud.r-project.org')"
 ---> Using cache
 ---> e9dc3ef54fec
Step 7/11 : COPY endpoints.py /opt/ml/endpoints.py
 ---> Using cache
 ---> c93e0e4cfa76
Step 8/11 : COPY deploy.R /opt/ml/deploy.R
 ---> Using cache
 ---> fcc54e3ec14b
Step 9/11 : COPY xgb.model /opt/ml/xgb.mode

## Launch Serving Containers

Next, we will launch each search container. The containers will be launch on the following ports:

In [4]:
ports = {
    "plumber": 5000,
    "restrserve": 5001,
    "fastapi": 5002,    
}

In [5]:
!bash launch.sh

Launching Plumber
34d7ca67f79b0dfdb4447292abf8e2721fddd7950fa0de06b92741a48ed00dde
Launching RestRServer
b4eb855317938b81cbdaad271f620b7a37bd5e1bd4a173b50a476702b7bcd54e
Launching FastAPI
5a3d76225727ce7eae1d34239543278322d8e0781520e91df68e26f2fd572d7e


In [6]:
!docker container list

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                  PORTS                    NAMES
5a3d76225727        r-fastapi           "/usr/bin/Rscript /o…"   1 second ago        Up Less than a second   0.0.0.0:5002->8080/tcp   agitated_ganguly
b4eb85531793        r-restrserve        "/usr/bin/Rscript /o…"   2 seconds ago       Up 1 second             0.0.0.0:5001->8080/tcp   musing_grothendieck
34d7ca67f79b        r-plumber           "/usr/bin/Rscript /o…"   2 seconds ago       Up 1 second             0.0.0.0:5000->8080/tcp   flamboyant_maxwell


## Define Simple Client

In [7]:
import requests
from tqdm import tqdm
import pandas as pd 

In [8]:
def get_predictions(examples, instance=requests, port=5000):
    payload = { "features": examples }
    return instance.post(f"http://127.0.0.1:{port}/invocations", json=payload)

In [9]:
def get_health(instance=requests, port=5000):
    instance.get(f"http://127.0.0.1:{port}/ping")

## Define Example Inputs

Let's a define an example from the [iris.csv](iris.csv) dataset.

In [10]:
iris = pd.read_csv("iris.csv")

In [11]:
iris_features = iris[['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width']]

In [12]:
example = iris_features.values[:1].tolist()

In [13]:
many_examples = iris_features.values[:100].tolist()

# New Requests

Now it's time to test how each API server performs under stress.

First, let's test how each one performs when each request generates a new connection. 

We will test the performance on following situations:
* 1000 requests of a single example
* 1000 requests of 100 examples
* 1000 pings for health status

### Plumber

In [14]:
get_predictions(example,port=ports["plumber"]).json()

{'output': [0]}

In [15]:
for i in tqdm(range(1000)):
    _ = get_predictions(example,port=ports["plumber"])

100%|██████████| 1000/1000 [00:07<00:00, 127.07it/s]


In [16]:
for i in tqdm(range(1000)):
    _ = get_predictions(many_examples, port=ports["plumber"])

100%|██████████| 1000/1000 [00:10<00:00, 99.16it/s]


In [17]:
for i in tqdm(range(1000)):
    get_health(port=ports["plumber"])

100%|██████████| 1000/1000 [00:02<00:00, 337.57it/s]


### RestRserve

In [18]:
get_predictions(example,port=ports["restrserve"]).json()

{'outputs': 0}

In [19]:
for i in tqdm(range(1000)):
    _ = get_predictions(example,port=ports["restrserve"])

100%|██████████| 1000/1000 [00:21<00:00, 46.16it/s]


In [20]:
for i in tqdm(range(1000)):
    _ = get_predictions(many_examples,port=ports["restrserve"])

100%|██████████| 1000/1000 [00:24<00:00, 41.39it/s]


In [21]:
for i in tqdm(range(1000)):
    get_health(port=ports["restrserve"])

100%|██████████| 1000/1000 [00:07<00:00, 130.99it/s]


### FastAPI

In [22]:
get_predictions(example,port=ports["fastapi"]).json()

{'output': 0.0}

In [23]:
for i in tqdm(range(1000)):
    _ = get_predictions(example,port=ports["fastapi"])

100%|██████████| 1000/1000 [00:04<00:00, 212.88it/s]


In [24]:
for i in tqdm(range(1000)):
    _ = get_predictions(many_examples,port=ports["fastapi"])

100%|██████████| 1000/1000 [00:06<00:00, 165.96it/s]


In [25]:
for i in tqdm(range(1000)):
    get_health(port=ports["fastapi"])

100%|██████████| 1000/1000 [00:02<00:00, 451.32it/s]


# Keep Alive

Now, let's test how each one performs when each request reuses a session connection. 

In [26]:
# reuse the session for each post and get request
instance=requests.Session()

### Plumber

In [27]:
for i in tqdm(range(1000)):
    _ = get_predictions(example, instance=instance, port=ports["plumber"])

100%|██████████| 1000/1000 [00:52<00:00, 19.01it/s]


In [28]:
for i in tqdm(range(1000)):
    _ = get_predictions(many_examples, instance=instance, port=ports["plumber"])

100%|██████████| 1000/1000 [00:53<00:00, 18.69it/s]


In [29]:
for i in tqdm(range(1000)):
    get_health(instance=instance, port=ports["plumber"])

100%|██████████| 1000/1000 [00:44<00:00, 22.44it/s]


### RestRserve

In [30]:
for i in tqdm(range(1000)):
    _ = get_predictions(example, instance=instance, port=ports["restrserve"])

100%|██████████| 1000/1000 [00:05<00:00, 191.60it/s]


In [31]:
for i in tqdm(range(1000)):
    _ = get_predictions(many_examples, instance=instance, port=ports["restrserve"])

100%|██████████| 1000/1000 [00:06<00:00, 163.42it/s]


In [32]:
for i in tqdm(range(1000)):
    get_health(instance=instance, port=ports["restrserve"])

100%|██████████| 1000/1000 [00:02<00:00, 460.18it/s]


### FastAPI

In [33]:
for i in tqdm(range(1000)):
    _ = get_predictions(example, instance=instance, port=ports["fastapi"])

100%|██████████| 1000/1000 [00:03<00:00, 262.13it/s]


In [34]:
for i in tqdm(range(1000)):
    _ = get_predictions(many_examples, instance=instance, port=ports["fastapi"])

100%|██████████| 1000/1000 [00:05<00:00, 193.26it/s]


In [35]:
for i in tqdm(range(1000)):
    get_health(instance=instance, port=ports["fastapi"])

100%|██████████| 1000/1000 [00:01<00:00, 585.39it/s]


### Stop All Serving Containers

Finally, let's shutdown the serving containers we launched for the tests.

In [36]:
!docker kill $(docker ps -q)

5a3d76225727
b4eb85531793
34d7ca67f79b
