# 📦 Model Serving & Deployment (MLOps)

In this notebook, we will see how to make a previously trained machine learning model available so that other people or applications can use it.
To do this, we will:

✅ Serve a model as a **REST API** using **FastAPI**

✅ Load a trained model and use it to make predictions through the API

✅ Easily test the endpoint from Python

✅ Understand key concepts about **model deployment in production**

## 🧠 What does “serving a model” mean?

When you train a model in an environment like Jupyter or Google Colab, only I can use it from there. If someone else wants to use it (for example, from a mobile app or a website), that model needs to be **accessible online**.

To achieve this, we **serve the model as a web API**.
This means that:

- We expose a function (for example, `predict_diabetes`) through a URL
- Any system can send data to that URL
- The model will respond with a prediction

This is the foundation of many products that use machine learning.

## 🔧 Tools we will use

| Tool         | What It’s Used For?                                |
| ------------ | --------------------------------------------------- |
| **FastAPI**  | To create modern and easy-to-use web APIs in Python |
| **Uvicorn**  | Web server that runs FastAPI                        |
| **Pickle**   | To save and load trained models                     |
| **requests** | To make HTTP calls from Python and test the API     |



## 📈 Practical case: Diabetes prediction

We will work with a model that predicts whether a person has diabetes, using the classic "Pima Indians Diabetes" dataset.

The model will be a logistic regression trained with`scikit-learn`, and we will serve it as a RESTful API using `FastAPI`.

### 1. Training and saving the model

In [1]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import pickle

# 1. Load the dataset
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/PimaIndiansDiabetes.csv")

# 2. Split the data into features (X) and target (y)
X = df.drop("Class", axis=1)
y = df["Class"]

# 3. Divide the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 4. Train a logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# 5. Save the trained model as a .pkl file
with open("diabetes_model.pkl", "wb") as f:
    pickle.dump(model, f)

print("Model saved correctly as 'diabetes_model.pkl'")

Model saved correctly as 'diabetes_model.pkl'


### 2. Creating an API to serve the model

Now we will use **FastAPI** to create a server that loads the trained model and listens for incoming requests. When someone sends data, the model will respond with a prediction.

The API will include:

- A basic endpoint (`/`) to verify that everything is working
- A prediction endpoint (`/predict/`) that receives input data and returns the model's result


In [2]:
!pip install fastapi uvicorn pyngrok

Collecting pyngrok
  Downloading pyngrok-7.2.11-py3-none-any.whl.metadata (9.4 kB)
Downloading pyngrok-7.2.11-py3-none-any.whl (25 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.2.11


In [3]:
# main.py
from fastapi import FastAPI
from pydantic import BaseModel
import numpy as np
import pickle

# 1. Load the trained model
with open("diabetes_model.pkl", "rb") as f:
    model = pickle.load(f)

# 2. Create the app using FastAPI
app = FastAPI(title="API to predict diabetes")

# 3. Define a class to validate the input data
class PatientData(BaseModel):
    pregnancies: int
    glucose: float
    bloodpressure: float
    skinthickness: float
    insulin: float
    bmi: float
    diabetespedigreefunction: float
    age: int

# 4. Create a base endpoint to check the connection
@app.get("/")
def root():
    return {"message": "API working correctly"}

# 5. Create a prediction endpoint
@app.post("/predict/")
def predict(data: PatientData):
    input_data = np.array([[
        data.pregnancies, data.glucose, data.bloodpressure,
        data.skinthickness, data.insulin, data.bmi,
        data.diabetespedigreefunction, data.age
    ]])

    prediction = model.predict(input_data)[0]
    result = "Diabetic" if prediction == 1 else "No diabetic"

    return {"result": result}

In [4]:
from pyngrok import ngrok
import nest_asyncio
import uvicorn

# Run the Server with ngrok
nest_asyncio.apply()  # To avoid loopback connection issues in Colab

# Get the ngrok token (https://dashboard.ngrok.com/get-started/your-authtoken)
ngrok.set_auth_token("2wK3AZxjnOqBaV1OkpnOxDiHqmw_rXCoWtn1t12Y5wnygjr8")

# Create a tunnel to expose the FastAPI server
public_url = ngrok.connect(8000)
print(f"Tu API pública está disponible en: {public_url}")

# Start the server using uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000, log_level="info")

Tu API pública está disponible en: NgrokTunnel: "https://421b-35-202-105-124.ngrok-free.app" -> "http://localhost:8000"


INFO:     Started server process [1240]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)


INFO:     2a0c:5a82:8309:4800:954:aa48:7f84:eac0:0 - "GET / HTTP/1.1" 200 OK


INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1240]


### 3: Testing the API

To verify that the API is working correctly, we can write a script or use an external tool to send data as if we were an external application.

A great tool for this is [POSTMAN](https://www.postman.com/).

We’ll use the public link provided by ngrok to interact with the API:

1. Start with a GET request to the base URL to check if the connection works (we should receive the default message).

2. Then, perform a POST request by adding */predict/* at the end of the URL.

3. In the Body tab, choose Raw, select JSON and paste the following sample input:
```
{
    "pregnancies": 3,
    "glucose": 85,
    "bloodpressure": 66,
    "skinthickness": 29,
    "insulin": 0,
    "bmi": 26.6,
    "diabetespedigreefunction": 0.351,
    "age": 33
}
```
4. Click SEND. The model should return whether this input corresponds to a diabetic patient or not.

> ⚠️ The model doesn't actually require the variables shown here – the dataset columns have different names (e.g., V1, V2, ...), which causes an error

### 4: Run the API Locally

To launch the FastAPI server locally, use the following command:
```bash
uvicorn main:app --reload
```
This starts the API in development mode, and you can access the endpoints at: http://127.0.0.1:8000

### 5: Dockerizing the API (optional)

To deploy your app anywhere (e.g., cloud platforms, servers), it's good practice to create a Docker image. This ensures the application behaves the same regardless of the environment.

**Example:**

```dockerfile
FROM python:3.9

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]


**How do you deploy this?**

Once your API is running and (optionally) dockerized, you can deploy it to:

- **Render.com** → Very simple for FastAPI apps
- **Hugging Face Spaces** → Ideal if you use `Gradio` or `Streamlit`
- **AWS/GCP/Azure** → Scalable enterprise solutions
- **Railway.app** or **Fly.io** → Great options for quick prototypes


## Conclusions

- We have learned how to train a model, save it, and serve it as an API using FastAPI.
- This is a crucial step in turning an ML experiment into a real product.


## (APPENDIX A) Next steps

### Error Handling

When building an API, you can not trust that users will always send perfect data. For example, if someone sends age="hello"... everything breaks.  That is why it is importante to prevent errors.

With FastAPI and Pydantic, we already get basic validation, but we can improve it.


**Example: Handle missing fields or invalid data types (e.g., non-numeric values)**

In [5]:
from fastapi import HTTPException

@app.post("/predict/")
def predict(data: PatientData):
    try:
        input_data = np.array([[ ... ]])
        prediction = model.predict(input_data)[0]
        result = "Diabetic" if prediction == 1 else "No diabetic"
        return {"result": result}
    except Exception as e:
        raise HTTPException(status_code=400, detail=f"Failing in prediction: {str(e)}")

### CI/CD: Continuous Integration and Deployment

CI/CD stands for:

- CI (Continuous Integration): Automatically testing your code when changes are made.
- CD (Continuous Deployment): Automatically deploying your app to production whenever the repository is updated (e.g., on GitHub).



**Example: CI/CD with GitHub Actions**

1. We create a file at .github/workflows/deploy.yml

```
name: Deploy API

on:
  push:
    branches:
      - main

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Run tests
        run: python -m unittest discover tests
```

2. This workflow will test our code every time we push changes to the repository.

Additionally, with platforms like Render.com or Railway.app, you can connect your GitHub repo and have automatic deployment every time new code is pushed.

### Connecting the API to a Database



Suppose you want to save each prediction made — who made it, when, and what data was sent. To do that, you need a database.

**Simple example using SQLite (a lightweight local file-based database):**

In [6]:
import sqlite3

# Create the database (only the first time)
conn = sqlite3.connect("predictions.db")
cursor = conn.cursor()

# Create the table
cursor.execute("""
CREATE TABLE IF NOT EXISTS predictions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    age INTEGER,
    result TEXT
)
""")
conn.commit()
conn.close()

Then, inside the prediction endpoint, we can store the input and prediction result:

In [7]:
# Example data
class InputData:
    def __init__(self, age):
        self.age = age

data = InputData(age=35)
result = "Diabetic"

In [8]:
# Save predictions
conn = sqlite3.connect("predictions.db")
cursor = conn.cursor()
cursor.execute("INSERT INTO predictions (age, result) VALUES (?, ?)", (data.age, result))
conn.commit()
conn.close()

> FastAPI integrates easily with SQLite, allowing you to define your database models, create the database, and interact with it directly from your API.

### Visual Interface with Streamlit or Gradio

This allows anyone (without coding knowledge) to test the model from a web browser.

In [9]:
import gradio as gr
import pickle
import numpy as np

with open("diabetes_model.pkl", "rb") as f:
    model = pickle.load(f)

def prediction(preg, gluc, pressure, skin, insulin, bmi, pedigree, age):
    data = np.array([[preg, gluc, pressure, skin, insulin, bmi, pedigree, age]])
    pred = model.predict(data)[0]
    return "Diabetic" if pred == 1 else "No diabetic"

demo = gr.Interface(
    fn=prediction,
    inputs=[
        gr.Number(label="Pregnancies"),
        gr.Number(label="Glucose"),
        gr.Number(label="Bloodpressure"),
        gr.Number(label="Skinthickness"),
        gr.Number(label="Insulin"),
        gr.Number(label="BMI"),
        gr.Number(label="Diabetespedigreefunction"),
        gr.Number(label="Age"),
    ],
    outputs="text",
    title="Diabetes prediction"
)

demo.launch()

ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-1' coro=<Server.serve() done, defined at /usr/local/lib/python3.11/dist-packages/uvicorn/server.py:68> exception=KeyboardInterrupt()>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/uvicorn/main.py", line 580, in run
    server.run()
  File "/usr/local/lib/python3.11/dist-packages/uvicorn/server.py", line 66, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/nest_asyncio.py", line 30, in run
    return loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/nest_asyncio.py", line 92, in run_until_complete
    self._run_once()
  File "/usr/local/lib/python3.11/dist-packages/nest_asyncio.py", line 133, in _run_once
    handle._run()
  File "/usr/lib/python3.11/asyncio/events.py", line 84, in _run
    se

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://2509bf3587bd96108a.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




> You can also deploy this interface directly on Hugging Face Spaces without needing your own servers.

### Visual summary
```
[ Trained Model ]
        ↓
[ FastAPI API (localhost) ]
        ↓
[ Docker / Render / Hugging Face Spaces ]
        ↓
[ Production with CI/CD ]
        ↓
[ Visual Interface / External Apps ]
```