# Model Deployment with Flask
- [Flask](https://flask.palletsprojects.com/en/2.2.x/) and [FastAPI](https://fastapi.tiangolo.com/) are two python libraries that allow us to create our own APIs
- Flask is older but probably more widely used. FastAPI is newer and is very fast but has a small learning curve. Many concepts are similar between them
- APIs work on the concept of `route`.
    - A `route` is an address in an API. It can accept some inputs called `requests` and returns some output `response`
    - A single API can contain multiple `routes`
    - All the routes inside the same API can be accessed using the same URL with the route name appended to the end of the URL as shown below
    ![](../images/02_01_anatomy_of_api.png)
- We can open up a way for others to interact with our trained ML model using Flask. The users would only need to provide inputs in the JSON format (python dictionary) and will get outputs also in JSON format

***IMPORTANT!*** We need the trained model from `mlops-1-experiment-tracking.ipynb` notebook to proceed with below code

In [1]:
# Ensure the Flask library is installed
# !pip install -U Flask

# Create an inference.py file
There are a few simple steps to create the Flask API
1. Create a new Python script. You can call it anything, we'll name is `inference.py` since we're going to use it to run our model inference
1. Instantiate the Flask API using the `Flask()` class and give it a name
1. Load the model
1. Create all the routes you want. Each route is just a Python function with a decorator to expose that function as a Flask API. Minimally you'd have atleast one route that accepts the user inputs, runs model predictions on these inputs and returns the predictions as output.
1. Run the API in the main block of the Python script

In [2]:
%%writefile inference.py
from flask import Flask, request
import pandas as pd
import os
import mlflow.pyfunc

# Step 2: Instantiate the Flask API
api = Flask('ModelEndpoint')

# Step 3: Load the model
model = mlflow.pyfunc.load_model(model_uri="./best_estimator")

# Step 4: Create the routes
## route 1: Health check. Just return success if the API is running
@api.route('/')
def home():
    # return a simple string
    return {"message": "Hi there!", "success": True}, 200

# route 2: accept input data
# Post method is used when we want to receive some data from the user
@api.route('/predict', methods = ['POST'])
def make_predictions():
    # Get the data sent over the API
    user_input = request.get_json(force=True)
    
    # Convert user inputs to pandas dataframe
    df_schema = {"gre":float, "gpa": float} # To ensure the columns get the correct datatype
    user_input_df = pd.read_json(user_input, lines=True, dtype=df_schema) # Convert JSONL to dataframe
    
    # Run predictions and convert to list
    predictions = model.predict(user_input_df).tolist()
    
    return {'predictions': predictions}
    

# Step 5: Main function that actually runs the API!
if __name__ == '__main__':
    api.run(host='0.0.0.0', 
            debug=True, # Debug=True ensures any changes to inference.py automatically updates the running API
            port=int(os.environ.get("PORT", 8080))
           ) 

Writing inference.py


# Test the API
- To test out if our API is working, we first need to run the API code `inference.py`
- Open a new terminal window and navigate to this `solution-code` directory. You should find the `inference.py` file that we just created here.
- Run the file as a normal python file: `python inference.py`
- Now your API is running on your local computer and is ready to accept input data at `http://localhost:8080` URL
- We can interact with any route in the API simply by posting a request to that route. For example, type `http://localhost:8080/` in your browser and see what you get!
- To get predictions, we need to post our input data to the `/predict` route which gets appended at the end of the URL. So the URL will become `http://localhost:8080/predict`
- Let's load the same data we used to train the model and send the first 5 rows to the API for predictions

In [3]:
# Load some data
import pandas as pd
admissions = pd.read_csv('../data/grad_admissions.csv')
admissions.dropna(inplace=True)

# Split X and y
X = admissions.drop(columns=['admit']) 
y = admissions['admit']

In [4]:
# Extract 5 lines from X to send to the API for predictions
# We'll convert the pandas dataframe to a JSON Lines (JSONL) object so it can be sent to the API
# We cannot directly send a dataframe over the internet. We can only send JSON over the internet

user_input_df = X.head()
user_input = user_input_df.to_json(orient="records", lines=True) # convert df to JSONL
user_input

'{"gre":380.0,"gpa":2.9150181139}\n{"gre":660.0,"gpa":4.0445401188}\n{"gre":800.0,"gpa":4.9507143064}\n{"gre":640.0,"gpa":3.9219939418}\n{"gre":520.0,"gpa":2.0698776028}\n'

In [5]:
# Send the JSONL data as request to the API and print the response
import requests

api_url = 'http://localhost:8080'
api_route = '/predict'

response = requests.post(f'{api_url}{api_route}', json=user_input)
predictions = response.json()

print(predictions)

{'predictions': [0, 1, 1, 1, 0]}


# Cleanup

## To stop and restart the Flask API
- To stop the Flask API, press `Ctrl + C` in the terminal window that's running the API.
- You can restart the Flask API ***IN THE SAME FOLDER*** by running `python inference.py` again!