# 👷 Serving a Model into Production

When we train a machine learning model using MLflow, we create a powerful tool that can make predictions based on what it has learned from a lot of data. However, to make those predictions accessible and usable in real-world applications, we need a way to serve the model.

**Model serving refers to the process of making the trained model available and accessible to other software applications or systems**. It's like setting up a special window through which others can ask questions to the model and get predictions in return.

FastAPI is a framework that helps us build web applications and APIs quickly and easily. By combining MLflow with FastAPI, we can create an API endpoint that allows other applications or systems to send data to the model and receive predictions in response.

Here's how it works:

1. We train and save the machine learning model using MLflow. This model contains all the knowledge it has acquired from the data.

2. We use FastAPI to build a web application or API. We create a specific route or endpoint in our application that can receive data from other systems.

3. When an external system wants to make a prediction, it sends the data to the API endpoint created with FastAPI.

4. FastAPI receives the data and passes it to the MLflow model, which then makes the prediction based on its learned knowledge.

5. FastAPI sends the prediction back to the requesting system, which can use the prediction for further processing or display it to the user.

In summary, model serving from MLflow into FastAPI allows us to deploy and expose our trained machine learning models as a service. It enables other systems or applications to interact with the model, send data for prediction, and receive the predictions in return. This integration between MLflow and FastAPI opens up a world of possibilities for using machine learning models in real-time applications and systems.

### Import python libraries

In [1]:
import mlflow
import numpy as np
from fastapi import FastAPI
from pydantic import BaseModel
from mlops_course import config

## ⬇️ Download model from MLFlow

In [2]:
# URI is the URL where the model is stored
MODEL_NAME = config.MODEL_NAME
MODEL_URI = f"models:/{MODEL_NAME}/Production"

# Load the MLflow model into memory
mlflow.set_tracking_uri(uri=config.MLFLOW_TRACKING_URI)
model = mlflow.sklearn.load_model(model_uri=MODEL_URI)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


## ✨ Create the API

### Define the API

In [3]:
# Define the FastAPI app
app = FastAPI()

# Define the input data schema using Pydantic BaseModel
class InputData(BaseModel):
    sex: int
    age: float
    fare: float


# Define the API endpoint
@app.post('/predict')
def predict(input_data: InputData):
    # Process the input data
    features = np.array([
        input_data.sex,
        input_data.age,
        input_data.fare,
    ]).reshape(1, -1)

    # Use the MLflow model to perform inference
    prediction = model.predict(features)

    # Return the inference results
    return {'prediction': prediction.tolist()}

### Launch the API

In [None]:
import uvicorn

if __name__ == "__main__":
    config = uvicorn.Config(app)
    server = uvicorn.Server(config)
    await server.serve()