# Model Deployment on Cloud with Docker
- Once we create a Flask API that can run on our local computer, we have everything we need to publish our trained ML model for the whole world to use!
- To deploy a Flask API on the internet, we need two things
    1. A way to package `inference.py`, our trained ML model, the python libraries that are needed, etc into a format that we can easily deployed to any machine.
    1. A machine somewhere on the internet that can continuously keep our `inference.py` running.
- ***IMPORTANT!*** We need the trained model from `mlops-1-experiment-tracking.ipynb` and the inference.py from `mlops-2-flask-api.ipynb` notebooks to proceed

# 1. Docker
- A way to package `inference.py`, our trained ML model, the python libraries that are needed, etc into a format that we can easily deployed to any machine.
- [Docker](https://docs.docker.com/get-docker/) is a container framework heavily used in the software industry to package code and dependencies to run any application. Chances are, every single website you use, is a Docker container running on some machine on the internet.
- Docker has 3 simple concepts
    - Dockerfile: This is a file written in Docker domain specific language that provides instructions on how to create the Docker image
    - Docker image: This is the "package" containing all the files and dependencies necessary for the API to run. You can think of Docker image as a template
    - Docker container: We can use a docker image to run one or more containers on each machine. Each container uses the Docker image template to create an instance that is running on the machine.
    
    <img src="../images/03_01_Docker_concepts.png" width="1000">
    
- There are many ways to deploy ML models, Flask + Docker is just one of them. Some other popular ways are
    - AWS Sagemaker
    - GCP Vertex AI
    - [BentoML](https://www.bentoml.com/)
- The biggest advantage of Docker instead of Sagemaker or VertexAI is that once you "package" your code as a Docker image, it can run practically everywhere!

## Create a Dockerfile
- Create a new file called ***EXACTLY*** `Dockerfile`. No extensions, capital 'D'!

<img src="../images/03_02_Dockerfile.png" width="450">

- For a complete list of all possible Dockerfile commands, see [here](https://docs.docker.com/engine/reference/builder/)
- We can also overwrite the `requirements.txt` file in `best_estimator` to only keep the libraries that `inference.py` requires such as `flask` and `pandas` and `mlflow`. We can also replace `mlflow` with `mlflow-skinny` to reduce the size of the image further

In [1]:
%%writefile Dockerfile
# Use the official lightweight Python image from
# https://hub.docker.com/_/python
FROM python:3.8-slim

# Copy all the files needed for the app to work
COPY inference.py .
COPY best_estimator/ ./best_estimator

# Install all the necessary libraries
RUN pip install -r ./best_estimator/requirements.txt

# Run the API!
CMD python inference.py

Overwriting Dockerfile


In [2]:
%%writefile best_estimator/requirements.txt
pandas
flask
mlflow-skinny
scikit-learn==1.1.1

Overwriting best_estimator/requirements.txt


## Optional: Test the Dockerfile locally
- Start up Docker desktop on your computer
- Open a new terminal window and navigate to this solution-code directory. You should find the `Dockerfile` that we just created here.
- Build the docker image using this Dockerfile by running: `docker build . --tag grad-school-admission:latest`
    - --tag: A name for this new Docker image
    - Docker naming convention is <name_of_service>:<version>. latest means latest version
- Once the docker image has built successfully you can test it out by using it to run a container on your local computer either using the Docker Desktop UI or by running: `docker run -p 8080:8080 --rm grad-school-admission:latest`
    - -p 8080:8080 -> map the port 8080 in the docker container to 8080 on your local machine
    - --rm -> remove the container when its stopped (optional)
    - grad-school-admission:latest -> the name of the image to run


# 2. Deploy to Google Cloud Run for free!
- A machine somewhere on the internet that can continuously keep our `inference.py` running.
- Once you write a Dockerfile for your API, you can use the many many services provided by the various cloud vendors to deploy! 
- GCP Cloud Run is one of these services that's free for moderate usage [Link](https://cloud.google.com/products/calculator/#id=32ea150c-67b7-4ebc-9143-789f703ee574)

<img src="../images/03_03_cloud_run_pricing.png" width="350">

When you use GCP for the first time on your local computer, you need to perform some pre-requisite steps:
1. Have a GCP account and a project already created
1. Open a new terminal window
1. Install gcloud SDK from [Link](https://cloud.google.com/sdk/docs/install)
1. Initialize gloud SDK: `gcloud init`
1. Authenticate your account: `gcloud auth application-default login`

To deploy a Dockerfile to GCP Cloud Run:
1. Open a new terminal window and navigate to this `solution-code` directory. You should find the `Dockerfile` that we just created here.
1. Run: `gcloud run deploy grad-school-admission --source . --region asia-southeast1`
1. Type `y` to any message you get and press enter

Done! Your API has been deployed on the cloud and is now accessible by everyone on the internet!

P.S. If you have a new version of the model or code, you can run the same `gcloud run` command above and the API will get updated to the latest version!

# Test the API
- To test out if our API is working, we just need the URL from the Cloud Run page 

<img src="../images/03_04_cloud_run_url.png" width="1000">

- We can interact with any route in the API simply by posting a request to that route. For example, type the URL in your browser and see what you get!
- To get predictions, we need to post our input data to the `/predict` route which gets appended at the end of the URL.
- Let's load the same data we used to train the model and send the first 5 rows to the API for predictions. The only code difference compared to `mlops-2-flask-api.ipynb` is the `url` parameter. Everything else is exactly the same!

In [3]:
 # Load some data
import pandas as pd
admissions = pd.read_csv('../data/grad_admissions.csv')
admissions.dropna(inplace=True)

# Split X and y
X = admissions.drop(columns=['admit']) 
y = admissions['admit']

In [4]:
# Extract 5 lines from X to send to the API for predictions
# We'll convert the pandas dataframe to a JSON Lines (JSONL) object so it can be sent to the API
# We cannot directly send a dataframe over the internet. We can only send JSON over the internet

user_input_df = X.head()
user_input = user_input_df.to_json(orient="records", lines=True) # convert df to JSONL
user_input

'{"gre":380.0,"gpa":2.9150181139}\n{"gre":660.0,"gpa":4.0445401188}\n{"gre":800.0,"gpa":4.9507143064}\n{"gre":640.0,"gpa":3.9219939418}\n{"gre":520.0,"gpa":2.0698776028}\n'

In [5]:
# Send the JSONL data as request to the API and print the response
import requests

api_url = 'https://grad-school-admission-a4rmk57awq-as.a.run.app'
api_route = '/predict'

response = requests.post(f'{api_url}{api_route}', json=user_input)
predictions = response.json()

print(predictions)

{'predictions': [0, 1, 1, 1, 0]}


# Cleanup
- GCP Cloud Run is practically free so you can let your API continue to run
- If you want to delete everything
    1. Delete the API from GCP Cloud Run 
    1. Delete the docker image from GCP Artifact Registry
    1. Delete any Cloud Storage buckets 

# Bonus! Easy Steps to become a Millionaire
1. Invent a fancy API
1. Wrap it into a Docker container
1. Host it in GCP Cloud Run
1. Publish it on [RapidAPI](https://rapidapi.com/) to easily sell access to it!!! Check out [Leo's Name-Gender prediction API](https://rapidapi.com/stephenleo87-DGFI1at-XQ/api/name-gender1/)
1. Become a Millionaire!

![](../images/03_05_make-it-rain-meme.jpeg)

Read more about it in detail on Leo's Medium blog post: [Make extra money on the side with data science](https://pub.towardsai.net/make-extra-money-on-the-side-with-data-science-984a623c53f5?sk=31e1a7794b073841e9ed66eeb1cc8867)