## Dockerfile

* The dockerfile defines the environment in which our server will be executed. 
* Below, you can see that the entrypoint for our container will be [deploy.R](deploy.R)

In [1]:
%pycat Dockerfile

[0mFROM[0m [0mr[0m[0;34m-[0m[0mbase[0m[0;34m:[0m[0;36m3.6[0m[0;36m.3[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0mMAINTAINER[0m [0mAmazon[0m [0mSageMaker[0m [0mExamples[0m [0;34m<[0m[0mamazon[0m[0;34m-[0m[0msagemaker[0m[0;34m-[0m[0mexamples[0m[0;34m@[0m[0mamazon[0m[0;34m.[0m[0mcom[0m[0;34m>[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0mRUN[0m [0mapt[0m[0;34m-[0m[0mget[0m [0;34m-[0m[0my[0m [0mupdate[0m [0;34m&[0m[0;34m&[0m [0mapt[0m[0;34m-[0m[0mget[0m [0minstall[0m [0;34m-[0m[0my[0m [0;34m-[0m[0;34m-[0m[0mno[0m[0;34m-[0m[0minstall[0m[0;34m-[0m[0mrecommends[0m \
    [0mwget[0m \
    [0mr[0m[0;34m-[0m[0mbase[0m \
    [0mr[0m[0;34m-[0m[0mbase[0m[0;34m-[0m[0mdev[0m \
    [0mapt[0m[0;34m-[0m[0mtransport[0m[0;34m-[0m[0mhttps[0m \
    [0mca[0m[0;34m-[0m[0mcertificates[0m \
    [0mpython3[0m [0mpython3[0m[0;34m-[0m[0mdev[0m [0mpip[0m[0;34m[0m
[0;34m[

## deploy.R

**deploy.R** handles the following steps
* Loads the R libraries used by the server.
* Loads a pretrain xgboost model that has been trained on the classical Iris dataset, [iris.csv](iris.csv).
* Defines an inference function that takes a matrix of iris features and returns predictions for those iris examples.
* Wraps the inference function to make it thread-safe for passing to python through reticulate.
* Finally, it generates the [endpoints.py](endpoints.py) from python and launches the FastAPI server app using those endpoint definitions.

In [2]:
%pycat deploy.R

[0mlibrary[0m[0;34m([0m[0mreticulate[0m[0;34m)[0m[0;34m[0m
[0;34m[0m[0mlibrary[0m[0;34m([0m[0mxgboost[0m[0;34m)[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;31m# explicit tell reticulate to use the system python[0m[0;34m[0m
[0;34m[0m[0muse_python[0m[0;34m([0m[0;34m"/usr/bin/python3"[0m[0;34m)[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;31m# load our FastAPI endpoints with reticulate[0m[0;34m[0m
[0;34m[0m[0msource_python[0m[0;34m([0m[0;34m'endpoints.py'[0m[0;34m)[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;31m# load a pretrained xgboost model[0m[0;34m[0m
[0;34m[0m[0mbst[0m [0;34m<[0m[0;34m-[0m [0mxgb[0m[0;34m.[0m[0mload[0m[0;34m([0m[0;34m"xgb.model"[0m[0;34m)[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;31m# create a closure around our xgboost model and input data processing[0m[0;34m[0m
[0;34m[0m[0minference[0m [0;34m<[0m[0;34m-[0m [0mfunction[0m[0;34m([0m[0mx[0m[0;34m)[

## endpoints.py

**endpoints.py** defines two routes:
* `/ping` returns a status of 'Alive' to indicate that the application is healthy
* `/invocations` applies the previously defined inference function to the input features from the request body

Note, that FastAPI is typed. The `Example` class define the type of the input that we expect to receive from the request.

For more information about the requirements for building your own inference container, see:
[Use Your Own Inference Code with Hosting Services](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html)

In [3]:
%pycat endpoints.py

[0;32mfrom[0m [0mtyping[0m [0;32mimport[0m [0mOptional[0m[0;34m,[0m [0mList[0m[0;34m[0m
[0;34m[0m[0;32mfrom[0m [0mfastapi[0m [0;32mimport[0m [0mFastAPI[0m[0;34m[0m
[0;34m[0m[0;32mfrom[0m [0mpydantic[0m [0;32mimport[0m [0mBaseModel[0m[0;34m[0m
[0;34m[0m[0;32mimport[0m [0muvicorn[0m[0;34m[0m
[0;34m[0m[0;32mimport[0m [0mnumpy[0m [0;32mas[0m [0mnp[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;31m# Define our expected input types[0m[0;34m[0m
[0;34m[0m[0;32mclass[0m [0mExample[0m[0;34m([0m[0mBaseModel[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0mfeatures[0m[0;34m:[0m [0mList[0m[0;34m[[0m[0mList[0m[0;34m[[0m[0mfloat[0m[0;34m][0m[0;34m][0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;31m# Create a function that we can use to pass our inference function[0m[0;34m[0m
[0;34m[0m[0;31m# to the endpoints during initialization.[0m[0;34m[0m


## Build the Serving Image

In [4]:
!docker build -t r-fastapi .

Sending build context to Docker daemon  366.6kB
Step 1/10 : FROM r-base:3.6.3
 ---> cec2502269fb
Step 2/10 : MAINTAINER Amazon SageMaker Examples <amazon-sagemaker-examples@amazon.com>
 ---> Using cache
 ---> d5c7ee17124e
Step 3/10 : RUN apt-get -y update && apt-get install -y --no-install-recommends     wget     r-base     r-base-dev     apt-transport-https     ca-certificates     python3 python3-dev pip
 ---> Using cache
 ---> 627800afb90b
Step 4/10 : RUN pip install fastapi uvicorn numpy
 ---> Using cache
 ---> 34ca8248acac
Step 5/10 : RUN R -e "install.packages(c('reticulate','xgboost'), repos='https://cloud.r-project.org')"
 ---> Using cache
 ---> bca2bed23e72
Step 6/10 : COPY endpoints.py /opt/ml/endpoints.py
 ---> Using cache
 ---> 5fa6298e7993
Step 7/10 : COPY deploy.R /opt/ml/deploy.R
 ---> Using cache
 ---> 81170561a1ab
Step 8/10 : COPY xgb.model /opt/ml/xgb.model
 ---> Using cache
 ---> 253c1f2ad5fc
Step 9/10 : WORKDIR /opt/ml
 ---> Using cache
 ---> c8469fcd9c1d
Step 10/10 

## Launch the Serving Container

In [5]:
!echo "Launching FastAPI"
!docker run -d --rm -p 5000:8080 r-fastapi
!echo "Waiting for the server to start.." && sleep 10

Launching FastAPI
9b5798f89687e630d00b885809636a042a0e3bc055cf8d4b4ae631c543d58d56
Waiting for the server to start..


In [6]:
!docker container list

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
9b5798f89687        r-fastapi           "/usr/bin/Rscript /o…"   11 seconds ago      Up 10 seconds       0.0.0.0:5000->8080/tcp   elated_mahavira


## Define Simple Python Client

In [7]:
import requests
from tqdm import tqdm
import pandas as pd

pd.set_option("display.max_rows", 500)

In [8]:
def get_predictions(examples, instance=requests, port=5000):
    payload = {"features": examples}
    return instance.post(f"http://127.0.0.1:{port}/invocations", json=payload)

In [9]:
def get_health(instance=requests, port=5000):
    instance.get(f"http://127.0.0.1:{port}/ping")

## Define Example Inputs

Let's define example inputs from the [iris.csv](iris.csv) dataset.

In [10]:
iris = pd.read_csv("iris.csv")

In [11]:
iris_features = iris[["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"]]

In [12]:
example_inputs = iris_features.values.tolist()

### Plumber

In [13]:
predicted = get_predictions(example_inputs).json()["output"]

In [14]:
iris["predicted"] = predicted

In [15]:
iris

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species,predicted
0,5.1,3.5,1.4,0.2,setosa,0.0
1,4.9,3.0,1.4,0.2,setosa,0.0
2,4.7,3.2,1.3,0.2,setosa,0.0
3,4.6,3.1,1.5,0.2,setosa,0.0
4,5.0,3.6,1.4,0.2,setosa,0.0
5,5.4,3.9,1.7,0.4,setosa,0.0
6,4.6,3.4,1.4,0.3,setosa,0.0
7,5.0,3.4,1.5,0.2,setosa,0.0
8,4.4,2.9,1.4,0.2,setosa,0.0
9,4.9,3.1,1.5,0.1,setosa,0.0


### Stop All Serving Containers

Finally, let's shutdown the serving container we launched for the test.

In [16]:
!docker kill $(docker ps -q)

9b5798f89687
