# R API Serving Examples

In this example, we compare the runtimes of three methods for serving a model from an R hosted API:

* **Plumber**
 * Website: [https://www.rplumber.io/](https://www.rplumber.io/)
 * SageMaker Example: [r_byo_algo_with_plumber](../r_byo_algo_with_plumber)
* **RestRServe**
 * Website: [https://restrserve.org](https://restrserve.org/)
 * SageMaker Example: [r_byo_algo_with_restrserve](../r_byo_algo_with_restrserve)
* **FastAPI** (reticulated from Python)
 * Website: [https://fastapi.tiangolo.com](https://fastapi.tiangolo.com/)
 * SageMaker Example: [r_byo_algo_with_fastapi](../r_byo_algo_with_fastapi)

## Building Docker Images for Serving

First, let's build each docker image from the provided SageMaker Examples.

### Plumber Serving Image

In [1]:
!cd .. && docker build -t r-plumber -f r_byo_algo_with_plumber/Dockerfile r_byo_algo_with_plumber

Sending build context to Docker daemon  269.8kB
Step 1/9 : FROM r-base:3.6.3
 ---> cec2502269fb
Step 2/9 : MAINTAINER Amazon SageMaker Examples <amazon-sagemaker-examples@amazon.com>
 ---> Using cache
 ---> d5c7ee17124e
Step 3/9 : RUN apt-get -y update && apt-get install -y --no-install-recommends     wget     apt-transport-https     ca-certificates     libcurl4-openssl-dev     libsodium-dev
 ---> Using cache
 ---> 9045f663fee5
Step 4/9 : RUN R -e "install.packages(c('xgboost','plumber'), repos='https://cloud.r-project.org')"
 ---> Using cache
 ---> f3e0508834a6
Step 5/9 : COPY xgb.model /opt/ml/xgb.model
 ---> Using cache
 ---> 48db5ef0c627
Step 6/9 : COPY endpoints.R /opt/ml/endpoints.R
 ---> 70251154d26a
Step 7/9 : COPY deploy.R /opt/ml/deploy.R
 ---> 612f00e2f270
Step 8/9 : WORKDIR /opt/ml
 ---> Running in 71498ac36221
Removing intermediate container 71498ac36221
 ---> adad0fdfd9d6
Step 9/9 : ENTRYPOINT ["/usr/bin/Rscript", "/opt/ml/deploy.R", "--no-save"]
 ---> Running in 2c6b2b4d

### RestRServe Serving Image

In [2]:
!cd .. && docker build -t r-restrserve -f r_byo_algo_with_restrserve/Dockerfile r_byo_algo_with_restrserve

Sending build context to Docker daemon  247.8kB
Step 1/7 : FROM r-base:3.6.3
 ---> cec2502269fb
Step 2/7 : MAINTAINER Amazon SageMaker Examples <amazon-sagemaker-examples@amazon.com>
 ---> Using cache
 ---> d5c7ee17124e
Step 3/7 : RUN R -e "install.packages(c('RestRserve','xgboost','dplyr'), repos='https://cloud.r-project.org')"
 ---> Using cache
 ---> ebcf2f81ff2d
Step 4/7 : COPY xgb.model /opt/ml/xgb.model
 ---> Using cache
 ---> d3c15c4582c7
Step 5/7 : COPY restrserve.R /opt/ml/restrserve.R
 ---> Using cache
 ---> 1f157953f1d5
Step 6/7 : WORKDIR /opt/ml
 ---> Using cache
 ---> 7f4145abcde0
Step 7/7 : ENTRYPOINT ["/usr/bin/Rscript", "/opt/ml/restrserve.R", "--no-save"]
 ---> Using cache
 ---> 5de8902faece
Successfully built 5de8902faece
Successfully tagged r-restrserve:latest


### FastAPI Serving Image

In [3]:
!cd .. && docker build -t r-fastapi -f r_byo_algo_with_fastapi/Dockerfile r_byo_algo_with_fastapi

Sending build context to Docker daemon  266.2kB
Step 1/10 : FROM r-base:3.6.3
 ---> cec2502269fb
Step 2/10 : MAINTAINER Amazon SageMaker Examples <amazon-sagemaker-examples@amazon.com>
 ---> Using cache
 ---> d5c7ee17124e
Step 3/10 : RUN apt-get -y update && apt-get install -y --no-install-recommends     wget     r-base     r-base-dev     apt-transport-https     ca-certificates     python3 python3-dev pip
 ---> Using cache
 ---> 627800afb90b
Step 4/10 : RUN pip install fastapi uvicorn numpy
 ---> Using cache
 ---> 34ca8248acac
Step 5/10 : RUN R -e "install.packages(c('reticulate','xgboost'), repos='https://cloud.r-project.org')"
 ---> Using cache
 ---> bca2bed23e72
Step 6/10 : COPY endpoints.py /opt/ml/endpoints.py
 ---> Using cache
 ---> 5fa6298e7993
Step 7/10 : COPY deploy.R /opt/ml/deploy.R
 ---> Using cache
 ---> 81170561a1ab
Step 8/10 : COPY xgb.model /opt/ml/xgb.model
 ---> Using cache
 ---> 253c1f2ad5fc
Step 9/10 : WORKDIR /opt/ml
 ---> Using cache
 ---> c8469fcd9c1d
Step 10/10 

## Launch Serving Containers

Next, we will launch each search container. The containers will be launch on the following ports:

In [4]:
ports = {
    "plumber": 5000,
    "restrserve": 5001,
    "fastapi": 5002,
}

In [5]:
!bash launch.sh

Launching Plumber
d0c1d45551f3e3e21d10998ed90ffabb1a1283379e237da0db7fab49fbfbc29c
Launching RestRServer
d0912ec85abafc83c008da21df85bd64f263bf1baeb537a65ddc58845843dc9c
Launching FastAPI
fff726864968404a35db35f558db1ae8a76498a1bcc8263fc319eab3bd017fc8


In [6]:
!docker container list

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                  PORTS                    NAMES
fff726864968        r-fastapi           "/usr/bin/Rscript /o…"   1 second ago        Up Less than a second   0.0.0.0:5002->8080/tcp   exciting_thompson
d0912ec85aba        r-restrserve        "/usr/bin/Rscript /o…"   2 seconds ago       Up 1 second             0.0.0.0:5001->8080/tcp   vigilant_ellis
d0c1d45551f3        r-plumber           "/usr/bin/Rscript /o…"   3 seconds ago       Up 1 second             0.0.0.0:5000->8080/tcp   nostalgic_shamir


## Define Simple Client

In [7]:
import requests
from tqdm import tqdm
import pandas as pd

In [8]:
def get_predictions(examples, instance=requests, port=5000):
    payload = {"features": examples}
    return instance.post(f"http://127.0.0.1:{port}/invocations", json=payload)

In [9]:
def get_health(instance=requests, port=5000):
    instance.get(f"http://127.0.0.1:{port}/ping")

## Define Example Inputs

Let's a define an example from the [iris.csv](iris.csv) dataset.

In [10]:
iris = pd.read_csv("iris.csv")

In [11]:
iris_features = iris[["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"]]

In [12]:
example = iris_features.values[:1].tolist()

In [13]:
many_examples = iris_features.values[:100].tolist()

# New Requests

Now it's time to test how each API server performs under stress.

First, let's test how each one performs when each request generates a new connection. 

We will test the performance on following situations:
* 1000 requests of a single example
* 1000 requests of 100 examples
* 1000 pings for health status

### Plumber

In [14]:
get_predictions(example, port=ports["plumber"]).json()

{'output': [0]}

In [15]:
for i in tqdm(range(1000)):
    _ = get_predictions(example, port=ports["plumber"])

100%|██████████| 1000/1000 [00:07<00:00, 128.16it/s]


In [16]:
for i in tqdm(range(1000)):
    _ = get_predictions(many_examples, port=ports["plumber"])

100%|██████████| 1000/1000 [00:09<00:00, 104.55it/s]


In [17]:
for i in tqdm(range(1000)):
    get_health(port=ports["plumber"])

100%|██████████| 1000/1000 [00:02<00:00, 342.18it/s]


### RestRserve

In [18]:
get_predictions(example, port=ports["restrserve"]).json()

{'output': 0}

In [19]:
for i in tqdm(range(1000)):
    _ = get_predictions(example, port=ports["restrserve"])

100%|██████████| 1000/1000 [00:20<00:00, 49.05it/s]


In [20]:
for i in tqdm(range(1000)):
    _ = get_predictions(many_examples, port=ports["restrserve"])

100%|██████████| 1000/1000 [00:23<00:00, 43.07it/s]


In [21]:
for i in tqdm(range(1000)):
    get_health(port=ports["restrserve"])

100%|██████████| 1000/1000 [00:07<00:00, 133.37it/s]


### FastAPI

In [22]:
get_predictions(example, port=ports["fastapi"]).json()

{'output': 0.0}

In [23]:
for i in tqdm(range(1000)):
    _ = get_predictions(example, port=ports["fastapi"])

100%|██████████| 1000/1000 [00:04<00:00, 240.96it/s]


In [24]:
for i in tqdm(range(1000)):
    _ = get_predictions(many_examples, port=ports["fastapi"])

100%|██████████| 1000/1000 [00:06<00:00, 164.12it/s]


In [25]:
for i in tqdm(range(1000)):
    get_health(port=ports["fastapi"])

100%|██████████| 1000/1000 [00:02<00:00, 450.72it/s]


# Keep Alive

Now, let's test how each one performs when each request reuses a session connection. 

In [26]:
# reuse the session for each post and get request
instance = requests.Session()

### Plumber

In [27]:
for i in tqdm(range(1000)):
    _ = get_predictions(example, instance=instance, port=ports["plumber"])

100%|██████████| 1000/1000 [00:51<00:00, 19.32it/s]


In [28]:
for i in tqdm(range(1000)):
    _ = get_predictions(many_examples, instance=instance, port=ports["plumber"])

100%|██████████| 1000/1000 [00:52<00:00, 19.16it/s]


In [29]:
for i in tqdm(range(1000)):
    get_health(instance=instance, port=ports["plumber"])

100%|██████████| 1000/1000 [00:44<00:00, 22.40it/s]


### RestRserve

In [30]:
for i in tqdm(range(1000)):
    _ = get_predictions(example, instance=instance, port=ports["restrserve"])

100%|██████████| 1000/1000 [00:04<00:00, 225.13it/s]


In [31]:
for i in tqdm(range(1000)):
    _ = get_predictions(many_examples, instance=instance, port=ports["restrserve"])

100%|██████████| 1000/1000 [00:06<00:00, 160.45it/s]


In [32]:
for i in tqdm(range(1000)):
    get_health(instance=instance, port=ports["restrserve"])

100%|██████████| 1000/1000 [00:02<00:00, 470.66it/s]


### FastAPI

In [33]:
for i in tqdm(range(1000)):
    _ = get_predictions(example, instance=instance, port=ports["fastapi"])

100%|██████████| 1000/1000 [00:03<00:00, 280.06it/s]


In [34]:
for i in tqdm(range(1000)):
    _ = get_predictions(many_examples, instance=instance, port=ports["fastapi"])

100%|██████████| 1000/1000 [00:05<00:00, 186.77it/s]


In [35]:
for i in tqdm(range(1000)):
    get_health(instance=instance, port=ports["fastapi"])

100%|██████████| 1000/1000 [00:01<00:00, 600.64it/s]


### Stop All Serving Containers

Finally, let's shutdown the serving containers we launched for the tests.

In [36]:
!docker kill $(docker ps -q)

fff726864968
d0912ec85aba
d0c1d45551f3
