## MODEL DEPLOYMENT WITH FAST API

# 🎯 Aim of the Notebook: Deploying a Machine Learning Model as a Web Service with FastAPI and Docker

This notebook walks through the full process of deploying a trained machine learning model as a RESTful web service using **FastAPI**, **Pipenv**, and **Docker**. The service allows users to send JSON input and receive predictions from the model via HTTP. It is ideal for production or local testing environments and exposes the model on port `9696`.

---

## ✅ Steps Overview


1. **Write a Python script to handle model predictions**
2. **Embed the script into a FastAPI application**
3. **Expose the service via Uvicorn**
4. **test the endpoint**
5. **Package the application using Docker which can be used for other services such as AWS**

---

## 📦 Create the Virtual Environment inside docker

```bash
- docker build -t ride-duration-prediction-service:v1 .
- docker run -it --rm -p 9696:9696  ride-duration-prediction-service:v1

In [1]:
import joblib
import pandas as pd
from xgboost import XGBRegressor



In [2]:
#Load the preprocesor
preprocessor= joblib.load("preprocessing.pkl")

In [3]:
#Load the model from UBJSON
model = XGBRegressor()
model.load_model("my_model.ubj") 



In [4]:
# Now transform a new sample
df = pd.DataFrame([{
    "passenger_count":1.0,
    "trip_distance": 4.12,
    "fare_amount":21.20,
    "total_amount":36.77,
    "PULocationID": 171,
    "DOLocationID": 73,
    
}])

# Transform the new data
X_processed = preprocessor.transform(df)


In [5]:
X_processed

array([[ 0.85351469, -0.277915  ,  0.5059576 ,  1.18344241, 12.75550494,
        12.75550494]])

In [6]:
feature = model.predict(X_processed)

In [7]:
feature[0]

17.735874

In [8]:
%%writefile predict.py

import joblib
import pandas as pd
from xgboost import XGBRegressor

def load_preprocessor(path: str):
    try:
        preprocessor = joblib.load(path)
        print("[INFO] Preprocessor loaded successfully.")
        return preprocessor
    except FileNotFoundError:
        print(f"[ERROR] Preprocessor file not found: {path}")
        raise
    except Exception as e:
        print(f"[ERROR] Failed to load preprocessor: {e}")
        raise

def load_model(path: str):
    try:
        model = XGBRegressor()
        model.load_model(path) 
        print("[INFO] Model loaded successfully.")
        return model
    except FileNotFoundError:
        print(f"[ERROR] Model file not found: {path}")
        raise
    except Exception as e:
        print(f"[ERROR] Failed to load model: {e}")
        raise

def predict_duration(preprocessor, model, ride_df):
    try:
        X_processed = preprocessor.transform(ride_df)
        prediction = model.predict(X_processed)
        return prediction[0]
    except Exception as e:
        print(f"[ERROR] Prediction failed: {e}")
        raise

def predict_from_dict(ride: dict):
    """function to predict from a simple ride dictionary."""
    try:
        preprocessor = load_preprocessor("preprocessing.pkl")
        model = load_model("my_model.ubj")

        df = pd.DataFrame([ride])
        return predict_duration(preprocessor, model, df)
    except Exception as e:
        print(f"[ERROR] Failed to predict from dict: {e}")
        return None

if __name__ == "__main__":
    try:
        preprocessor = load_preprocessor("preprocessing.pkl")
        model = load_model("my_model.ubj")
        predicted_duration = predict_duration(preprocessor, model, ride)
        print(f"[RESULT] Predicted trip duration: {predicted_duration:.2f} minutes")
    except Exception:
        print("[FAILED] Prediction pipeline could not complete.")


Overwriting predict.py


In [9]:
%%writefile test.py


import predict

ride = {
    "passenger_count":1.0,
    "trip_distance": 4.12,
    "fare_amount":21.20,
    "total_amount":36.77,
    "PULocationID": 171,
    "DOLocationID": 73,
    
}

time = predict.predict_from_dict(ride)

if time is not None:
    print(f"Predicted duration: {time:.2f} minutes")
else:
    print("Prediction failed.")



Overwriting test.py


In [10]:
!python test.py

[INFO] Preprocessor loaded successfully.
[INFO] Model loaded successfully.
Predicted duration: 17.74 minutes


## CREATE THE FAST API APP

In [11]:
import os
os.makedirs("templates",exist_ok=True )

In [12]:
%%writefile templates/predict_form.html
<!DOCTYPE html>
<html>
<head>
    <title>Trip Duration Predictor</title>
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style>
        :root { --primary:#007BFF; }
        body {
            font-family: Arial, sans-serif;
            background: #f4f6f9;
            padding: 20px;
        }
        .container {
            background: #fff;
            padding: 30px;
            border-radius: 12px;
            max-width: 560px;
            margin: auto;
            box-shadow: 0 10px 24px rgba(0,0,0,0.06);
        }
        .grid { display: grid; grid-template-columns: 1fr 1fr; gap: 16px; }
        .grid .full { grid-column: 1 / -1; }
        label { display:block; font-weight:600; margin-bottom:6px; }
        input[type="number"] {
            width: 100%;
            padding: 10px 12px;
            border: 1px solid #d6dae1;
            border-radius: 8px;
            background:#fff;
        }
        button {
            background: var(--primary);
            color: #fff;
            border: none;
            padding: 12px 18px;
            border-radius: 8px;
            cursor: pointer;
            font-weight: 600;
        }
        button:hover { background: #0056b3; }
        .result {
            margin-top: 20px;
            background: #e6f7ff;
            padding: 12px;
            border-left: 4px solid var(--primary);
            border-radius: 6px;
        }
        .subtitle { color:#566; margin-top:-6px; margin-bottom:18px; }
    </style>
</head>
<body>
    <div class="container">
        <h2>Trip Duration Predictor</h2>
        <p class="subtitle">Enter ride details and submit to get the predicted duration (minutes).</p>

        <form method="post">
            <div class="grid">
                <div>
                    <label for="passenger_count">Passenger Count</label>
                    <input type="number" id="passenger_count" name="passenger_count" step="1" min="0" required placeholder="e.g., 1">
                </div>

                <div>
                    <label for="trip_distance">Trip Distance (miles)</label>
                    <input type="number" id="trip_distance" name="trip_distance" step="0.01" min="0" required placeholder="e.g., 4.12">
                </div>

                <div>
                    <label for="fare_amount">Fare Amount ($)</label>
                    <input type="number" id="fare_amount" name="fare_amount" step="0.01" required placeholder="e.g., 21.20">
                </div>

                <div>
                    <label for="total_amount">Total Amount ($)</label>
                    <input type="number" id="total_amount" name="total_amount" step="0.01" required placeholder="e.g., 36.77">
                </div>

                <div>
                    <label for="PULocationID">PU Location ID</label>
                    <input type="number" id="PULocationID" name="PULocationID" step="1" min="0" required placeholder="e.g., 171">
                </div>

                <div>
                    <label for="DOLocationID">DO Location ID</label>
                    <input type="number" id="DOLocationID" name="DOLocationID" step="1" min="0" required placeholder="e.g., 73">
                </div>

                <div class="full" style="display:flex; justify-content:flex-end;">
                    <button type="submit">Predict Duration</button>
                </div>
            </div>
        </form>

        {% if result %}
        <div class="result">
            <strong>Prediction:</strong> {{ result }} minutes
        </div>
        {% endif %}
    </div>
</body>
</html>


Overwriting templates/predict_form.html


In [13]:
%%writefile main.py


from fastapi import FastAPI, HTTPException, Request, Form
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
from pydantic import BaseModel
import joblib
import pandas as pd
import uvicorn
import logging
import warnings
from typing import List
from xgboost import XGBRegressor

# Suppress category_encoders warning
warnings.filterwarnings('ignore', category=FutureWarning, module='category_encoders')

# Logging config
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize FastAPI app
app = FastAPI(
    title="Trip Duration Prediction API",
    description="API and Web UI for predicting NYC taxi trip duration using XGBoost",
    version="1.0.0"
)

# Templates setup
templates = Jinja2Templates(directory="templates")

# Globals
preprocessor = None
model = None

# Input schema
class RideData(BaseModel):
    passenger_count: float
    trip_distance: float
    fare_amount: float
    total_amount: float
    PULocationID: int
    DOLocationID: int
    

    class Config:
        schema_extra = {
            "example": {
                "passenger_count":1.0,
                "trip_distance": 4.12,
                "fare_amount":21.20,
                "total_amount":36.77,
                "PULocationID": 171,
                "DOLocationID": 73,
            }
        }


# Output schema
class PredictionResponse(BaseModel):
    predicted_duration: float
    status: str
    message: str

# Load preprocessor
def load_preprocessor(path: str):
    try:
        preprocessor = joblib.load(path)
        logger.info("Preprocessor loaded.")
        return preprocessor
    except Exception as e:
        logger.error(f"Error loading preprocessor: {e}")
        raise

# Load model
def load_model(path: str):
    try:
        model = XGBRegressor()
        model.load_model(path) 
        logger.info("Model loaded.")
        return model
    except Exception as e:
        logger.error(f"Error loading model: {e}")
        raise

# Prediction logic
def predict_duration(preprocessor, model, ride_df):
    try:
        X_processed = preprocessor.transform(ride_df)
        prediction = model.predict(X_processed)
        return float(prediction[0])
    except Exception as e:
        logger.error(f"Prediction error: {e}")
        raise

# Load on startup
@app.on_event("startup")
async def startup_event():
    global preprocessor, model
    preprocessor = load_preprocessor("preprocessing.pkl")
    model = load_model("my_model.ubj")

# Health check
@app.get("/")
async def root():
    return {"message": "Trip Duration Prediction API is running"}

@app.get("/health")
async def health_check():
    return {
        "status": "healthy",
        "preprocessor_loaded": preprocessor is not None,
        "model_loaded": model is not None
    }

# Single prediction
@app.post("/predict", response_model=PredictionResponse)
async def predict(ride_data: RideData):
    if preprocessor is None or model is None:
        raise HTTPException(status_code=500, detail="Models not loaded.")
    try:
        df = pd.DataFrame([ride_data.dict()])
        duration = predict_duration(preprocessor, model, df)
        return PredictionResponse(
            predicted_duration=duration,
            status="success",
            message="Prediction completed successfully"
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Prediction failed: {str(e)}")

# Batch prediction
@app.post("/predict_batch")
async def predict_batch(rides: List[RideData]):
    if preprocessor is None or model is None:
        raise HTTPException(status_code=500, detail="Models not loaded.")
    try:
        results = []
        for ride in rides:
            df = pd.DataFrame([ride.dict()])
            duration = predict_duration(preprocessor, model, df)
            results.append(duration)
        return {
            "predictions": results,
            "status": "success",
            "message": f"Predicted durations for {len(rides)} rides"
        }
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Batch prediction failed: {str(e)}")

# Web form GET
@app.get("/form", response_class=HTMLResponse)
async def form_get(request: Request):
    return templates.TemplateResponse("predict_form.html", {"request": request, "result": None})

# Web form POST
# Web form POST
@app.post("/form", response_class=HTMLResponse)
async def form_post(
    request: Request,
    passenger_count: float = Form(...),
    trip_distance: float = Form(...),
    fare_amount: float = Form(...),
    total_amount: float = Form(...),
    PULocationID: int = Form(...),
    DOLocationID: int = Form(...),
):
    try:
        if preprocessor is None or model is None:
            raise HTTPException(status_code=500, detail="Model not loaded.")
        # Build the full feature set expected by the preprocessor/model
        df = pd.DataFrame([{
            "passenger_count": float(passenger_count),
            "trip_distance": float(trip_distance),
            "fare_amount": float(fare_amount),
            "total_amount": float(total_amount),
            "PULocationID": int(PULocationID),
            "DOLocationID": int(DOLocationID),
        }])
        result = predict_duration(preprocessor, model, df)
        return templates.TemplateResponse(
            "predict_form.html",
            {"request": request, "result": f"{result:.2f}"}
        )
    except Exception as e:
        logger.error(f"Form prediction error: {e}")
        return templates.TemplateResponse(
            "predict_form.html",
            {"request": request, "result": "Error during prediction"}
        )


# Run server
if __name__ == "__main__":
    uvicorn.run("main:app", host="0.0.0.0", port=9696, reload=True)


Overwriting main.py


In [None]:
!uvicorn main:app --host 0.0.0.0 --port 9696 --reload
#http://localhost:9696/docs
#http://localhost:9696/form

[32mINFO[0m:     Will watch for changes in these directories: ['/Users/gabriel/Documents/Mlops_zoomcamp/02-Deployment/web-server-flask-docker']
[32mINFO[0m:     Uvicorn running on [1mhttp://0.0.0.0:9696[0m (Press CTRL+C to quit)
[32mINFO[0m:     Started reloader process [[36m[1m27731[0m] using [36m[1mWatchFiles[0m
* 'schema_extra' has been renamed to 'json_schema_extra'
[32mINFO[0m:     Started server process [[36m27733[0m]
[32mINFO[0m:     Waiting for application startup.
INFO:main:Preprocessor loaded.
INFO:main:Model loaded.
[32mINFO[0m:     Application startup complete.
[32mINFO[0m:     127.0.0.1:63515 - "[1mGET /form HTTP/1.1[0m" [32m200 OK[0m
[32mINFO[0m:     127.0.0.1:63516 - "[1mGET /form HTTP/1.1[0m" [32m200 OK[0m
[32mINFO[0m:     127.0.0.1:63519 - "[1mPOST /form HTTP/1.1[0m" [32m200 OK[0m
[32mINFO[0m:     127.0.0.1:63523 - "[1mPOST /form HTTP/1.1[0m" [32m200 OK[0m
