<img src="images/LandingPage-Header-RED-CENTRE.jpg" alt="Notebook Banner" style="width:100%; height:auto; display:block; margin-left:auto; margin-right:auto;">

# Introducing FastAPI

Now we have a good grasp of writing high quality code and we have started our machine learning pipeline with MLflow. Let us now walk through a basic workflow, demonstrating how FastAPI can be used for deployment and monitoring ML models


### What is FastAPI?

What is FastAPI
FastAPI is a modern, high-performance web framework for building APIs with Python designed for speed, both in development and execution, and comes with built-in features such as:

Automatic data validation – FastAPI uses Pydantic models to automatically validate incoming request data based on defined types and constraints.
Interactive API documentation – FastAPI generates real-time, interactive Swagger and ReDoc documentation from the code and data models.
Dependency injection – FastAPI allows clean and reusable logic injection into endpoints using Python’s Depends, ideal for things like auth or database sessions.
OAuth2 and JWT support – FastAPI has built-in tools for implementing secure OAuth2 authentication and JWT-based authorization workflows.
Type-based routing and serialization – FastAPI leverages Python type hints to validate input, serialize output, and automatically generate API documentation.


### Installation & Setup

To work through this notebook you need to install all the dependacies from `requirements.txt` in your working enviromentd.

## Understanding a Basic FastAPI App

Let’s walk through a simple FastAPI example step by step. This will help you understand, how to inisitalise FastAPI, how to run FastAPI for the first time and will help you understand how fastapi can be used to run a basic web application.


 **1. First import the fast api library**

```python
from fastapi import FastAPI
```
**2. Initialise FastAPI using the command below**

```python
app = FastAPI()
```


**3. Define a test route**
The code below is one of the many `requests` you can create on fastAPI this is an example of a simple `get` request 
```python
@app.get("/")
def read_root():
    return {"message": "Hello, FastAPI is working!"}
```

**What are the post and get requests?**
A `GET` request is used to retrieve data from a server without changing any state or data.
A `POST` request is used to send data to a server to create or update a resource.

**4. Running FastAPI with Uvicorn**

On the terminal you should have a file named `test_app.py`, you can start the server by navigating to the folder `03 Deploying & Productionising ML Models` and running the following command, `uvicorn [app_name]:app`--reload in a terminal:
```bash
uvicorn test_app:app --reload
```


### Find this code inside `fastapi_01_test_app.py` to see how to create a basic FastAPI

You should see in the terminal:
```bash
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [11200] using StatReload
INFO:     Started server process [24892]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
```

copy and paste into the url bar on your browser `http://127.0.0.1:8000` (subject to change) or `hold ctrl` + `click` on the server it provides

## What is Automatic Documentation

FastAPI can automatically create API documentation following the OpenAPI standard. This includes:

Swagger UI – An interactive, web-based interface for exploring and testing your API.
Port: `http://127.0.0.1:8000/docs`

ReDoc – A clean, alternative documentation interface.
Port: `http://127.0.0.1:8000/redoc`

The documentation is generated directly from your API’s code, ensuring it stays accurate and up to date with your endpoints, parameters, and data models.

## How to Set Roles of Entry

In many applications, different users have different levels of access. Examples of this are: **admins**, regular **users**, or **guests**. FastAPI allows you to manage this kind of access control using **dependency injection**.

Instead of creating a database connection inside every route, you define a single “dependency” function that returns the connection, and FastAPI will inject it into your endpoints automatically when they run.
```python
def get_current_user_role():
    # In a real app, you'd check a token or database
    return "user"  # Try changing this to "admin"
```


### Example: Role-Based Access Control

Let’s say we have a simple way to check the current user's role. We’ll use a dependency to simulate user authentication.

```python
from fastapi import FastAPI, Depends, HTTPException, status

app = FastAPI()

# Simulated function to get a user's role (e.g., from a token or session)
def get_current_user_role():
    # In a real app, you'd check a token or database
    return "user"  # Try changing this to "admin"

# Dependency that checks if the user is an admin
def require_admin(user_role: str = Depends(get_current_user_role)):
    if user_role != "admin":
        raise HTTPException(
            status_code=status.HTTP_403_FORBIDDEN,
            detail="Access denied: Admins only."
        )

# Open to everyone
@app.get("/public")
def public_endpoint():
    return {"message": "This endpoint is open to all users."}

# Protected route, only for admins
@app.get("/admin", dependencies=[Depends(require_admin)])
def admin_endpoint():
    return {"message": "Welcome, Admin. You have access to this ro


### Find this code inside fastapi_02_roles_app.py to see implementation of roles

## How to Set Up a Health Configuration Endpoint

Health check endpoints are useful for monitoring whether your application is alive, responsive, and ready to serve requests. They're especially important in production environments, where tools like load balancers, orchestration platforms (such as Kubernetes), or CI/CD pipelines may ping this endpoint to verify the application's health.

In FastAPI, setting up a simple health check is straightforward using a `GET` route.

---

### Example: Health Configuration Endpoint

The following example defines a `/health` endpoint that returns basic status information about your API. This includes:

- `status`: Whether the app is running properly
- `version`: The current API version
- `service`: The name or identifier for the API
- `dependencies`: A list or description of key services or libraries the app depends on

```python
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

# Define the response schema using Pydantic
class HealthResponse(BaseModel):
    status: str
    version: str
    service: str
    dependencies: str

# GET endpoint for health check
@app.get("/health", response_model=HealthResponse)
def health_check():
    return {
        "status": "ok",
        "version": "1.0.0",
        "service": "Titanic Predictor API",
        "dependencies": "MLflow, Scikit-learn"
    }


### Find this code inside `fastapi_03_health_app.py` to see implementation of roles

## Titanic Survival Prediction - Single Entry & Batch Entry

This section defines the structure for an individual passenger, loads a machine learning model on FastAPI startup, and provides an endpoint to predict the survival of a single passenger.

**1) Define the Passenger schema using Pydantic**

Each incoming request for prediction must conform to this structure. FastAPI will validate and parse this automatically.

```python
class Passenger(BaseModel):
    Pclass: int
    Sex: str
    Age: float
    SibSp: int
    Parch: int
    Fare: float
    Embarked: str

    class Config:
        json_schema_extra = {
            "example": {
                "Pclass": 1,
                "Sex": "female",
                "Age": 24.0,
                "SibSp": 0,
                "Parch": 0,
                "Fare": 75.0,
                "Embarked": "C"
            }
        }
```
<br>


**2) Load the ML model on FastAPI startup**

The model is loaded using a specific MLflow run ID and artifact path, and stored in a global variable. If loading fails, it is best practice to logs the error and keeps the model as `None`.

```python
MLFLOW_RUN_ID = "404582437544310156"
MODEL_ARTIFACT_PATH = "models/m-91653bd6e0d14e0598c366aea5981528/artifacts"
model = None

@app.on_event("startup")
def load_model():
    global model
    try:
        current_dir = pathlib.Path(__file__).resolve().parent
        model_path = current_dir.parent.parent / "mlruns" / MLFLOW_RUN_ID / MODEL_ARTIFACT_PATH
        model_uri = model_path.as_uri()

        print(f"Loading model from: {model_uri}")
        model = mlflow.sklearn.load_model(model_uri)
        print("Model loaded successfully.")
    except Exception as e:
        print(f"Failed to load model: {e}")
        model = None
```

<br>

**3) Create single-entry endpoint**

Next we need to make sure that FastAPI knows we are trying to create an endpoint (a URL path) that listens for POST requests at `/predict_single`. A POST request is typically used to send data to a server (like form input or JSON).

`@app.post("/predict_single")`

<br>

**4) Then define a function that will run when someone sends a POST request to `/predict_single`. It expects to receive a `Passenger` object, defining the structure of the input data (like name, age, sex, etc.).**

`def predict_survival(passenger: Passenger):`

<br>

**5) Converts the incoming passenger data to a Pandas DataFrame, which the ML model expects.**

`input_df = pd.DataFrame([passenger.dict()])`

<br>

**6) Makes the prediction: 1 = survived, 0 = did not survive.**

`prediction = model.predict(input_df)[0]`

<br>

**7) Returns a JSON with both the number and a human-readable result.**

```python
return {
    "prediction": int(prediction),
    "survival_status": "Survived" if prediction == 1 else "Not Survived"
}
```

If anything goes wrong, raise a proper HTTP error code:

```python
except Exception as e:
    raise HTTPException(status_code=500, detail=f"Prediction failed: {e}")
```

<br>

**8) Root endpoint to confirm API is running**

You can quickly check if the API is live by visiting the root `/` endpoint in your browser or with a GET request.

```python
@app.get("/")
def root():
    return {"message": "Titanic MLflow API is running"}
```

<br>

**FastAPI will automatically:**
- Validate the incoming JSON against the `Passenger` model.
- Parse it into a structured Python object.
- If validation fails, it returns a `422 Unprocessable Entity` error.



### Find this code inside `fastapi_04_single_app.py` to see implementation of a single entry test

### How to implement Validation

**When creating input values for fastapi, it is best practice to add some validation to insure values are resonable**

When you creating a class for our input data we can use Pydantic to run three steps:

**Step 1 — Type Checking**

`Literal[X, Y, Z]` - values must be one of the following form the list

`float` - must be convertible to a float.

`int` - must be convertible to an integer.

**Step 2 — Constraint Checking**

`Field()` is a helper function you use inside a BaseModel to give extra information about a model attribute beyond just the type hint

`gt` - greater than.

`lt` - less than.

`ge` - greater than or equal to.

`...` - required field.

**Step 3 — Documentation Info**

`description` helps us generate text to give more hints on how to enter the field(e.g., OpenAPI in FastAPI).

```python
class Passenger(BaseModel):
    Pclass: Literal[1, 2, 3] = Field(..., description="Passenger class (1 = 1st, 2 = 2nd, 3 = 3rd)")
    Sex: Literal["male", "female"] = Field(..., description="Sex of the passenger")
    Age: float = Field(..., gt=0, lt=100, description="Age must be between 0 and 100")
    SibSp: int = Field(..., ge=0, description="Number of siblings/spouses aboard")
    Parch: int = Field(..., ge=0, description="Number of parents/children aboard")
    Fare: float = Field(..., ge=0, description="Fare paid must be non-negative")
    Embarked: Literal["C", "Q", "S"] = Field(..., description="Port of Embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)")

    class Config:
        json_schema_extra = {
            "example": {
                "Pclass": 1,
                "Sex": "female",
                "Age": 24.0,
                "SibSp": 0,
                "Parch": 0,
                "Fare": 75.0,
                "Embarked": "C"
            }
        }
```

### Find this code inside `fastapi_04_02_single_app.py` to see implementation of a single entry test (with validation)

## Using FastAPI to Make Multi Entry Tests Using MLflow

**1) To create a multi-entry test define a data model like last time in out FastAPI script (test.py), but stucture the class in a way that recieves multiple entries**

```python
# --- Passenger schema ---
class PassengerBatch(RootModel[List[Passenger]]):
    class Config:
        json_schema_extra = {
            "example": [
                {
                    "Pclass": 1,
                    "Sex": "female",
                    "Age": 24.0,
                    "SibSp": 0,
                    "Parch": 0,
                    "Fare": 75.0,
                    "Embarked": "C"
                },
                {
                    "Pclass": 3,
                    "Sex": "male",
                    "Age": 22.0,
                    "SibSp": 1,
                    "Parch": 0,
                    "Fare": 7.25,
                    "Embarked": "S"
                },
                {
                    "Pclass": 2,
                    "Sex": "female",
                    "Age": 30.0,
                    "SibSp": 1,
                    "Parch": 1,
                    "Fare": 26.0,
                    "Embarked": "Q"
                }
            ]
        }
```

**2) Create multi-entry endpoint**

Create a FastAPI endpoint that listens for POST requests at `/predict_batch`. This will be used to handle and predict multiple passengers in one request.

`@app.post("/predict_batch")`

<br>

**3) Define the function that will handle the batch prediction**

This function is triggered when a POST request is sent to `/predict_batch`. It accepts a `PassengerBatch` object containing a list of passengers.

`def predict_survival_batch(passengers: PassengerBatch):`

<br>

**4) Convert the incoming list of passengers into a Pandas DataFrame**

The ML model expects input as a DataFrame, so the list of passengers is first converted.

`input_df = pd.DataFrame([p.dict() for p in passengers.root])`

<br>

**5) Make predictions for each passenger in the batch**

Run the ML model on the input DataFrame to generate predictions.

`predictions = model.predict(input_df)`

<br>

**6) Format the results into a structured list of dictionaries**

Each prediction is formatted with the passenger's index, the raw prediction (0 or 1), and a human-readable survival status.

```python
results = [
    {
        "passenger_index": i,
        "prediction": int(pred),
        "survival_status": "Survived" if pred == 1 else "Not Survived"
    }
    for i, pred in enumerate(predictions)
]
```

<br>

**7) Return the predictions as a JSON response**

The response includes all predictions under the key `batch_predictions`.

```python
return {"batch_predictions": results}
```

<br>

**Error handling**

If something goes wrong during prediction, a `500 Internal Server Error` is returned.

```python
except Exception as e:
    raise HTTPException(status_code=500, detail=f"Batch prediction failed: {e}")
```

<br>

**Once again FastAPI will automatically:**
- Validate the incoming JSON against the `PassengerBatch` schema.
- Parse it into a structured Python object with typed fields.
- Return a `422 Unprocessable Entity` error if the input data is invalid.

### Find this code inside `fastapi_05_batch_test_app.py`to see implementation of a batch entry test

## Calling FastAPI Endpoints as API Request (Single Entry)

**With our FastAPI server is running at http://127.0.0.1:8000 you can safely call the FastAPI endpoints from within any Python script or notebook that's part of your MLflow workflow using the requests library.**

In [2]:
import requests

# 1. Call /predict_single

single_url = "http://127.0.0.1:8000/predict_single"

single_passenger = {
    "Pclass": 1,
    "Sex": "female",
    "Age": 24.0,
    "SibSp": 0,
    "Parch": 0,
    "Fare": 75.0,
    "Embarked": "C"
}

single_response = requests.post(single_url, json=single_passenger)

print("Single Prediction Status:", single_response.status_code)
print("Single Prediction Result:", single_response.json())

Single Prediction Status: 200
Single Prediction Result: {'prediction': 1, 'survival_status': 'Survived'}


## Calling FastAPI Endpoints as APU Request (Multi Entry)

In [3]:
# 2. Call /predict_batch

batch_url = "http://127.0.0.1:8000/predict_batch"

batch_passengers = [
    {
        "Pclass": 1,
        "Sex": "female",
        "Age": 24.0,
        "SibSp": 0,
        "Parch": 0,
        "Fare": 75.0,
        "Embarked": "C"
    },
    {
        "Pclass": 3,
        "Sex": "male",
        "Age": 22.0,
        "SibSp": 1,
        "Parch": 0,
        "Fare": 7.25,
        "Embarked": "S"
    },
    {
        "Pclass": 2,
        "Sex": "female",
        "Age": 30.0,
        "SibSp": 1,
        "Parch": 1,
        "Fare": 26.0,
        "Embarked": "Q"
    }
]

batch_response = requests.post(batch_url, json=batch_passengers)

print("\nBatch Prediction Status:", batch_response.status_code)
print("Batch Prediction Result:", batch_response.json())



Batch Prediction Status: 200
Batch Prediction Result: {'batch_predictions': [{'passenger_index': 0, 'prediction': 1, 'survival_status': 'Survived'}, {'passenger_index': 1, 'prediction': 0, 'survival_status': 'Not Survived'}, {'passenger_index': 2, 'prediction': 1, 'survival_status': 'Survived'}]}


## Monitoring ML Projects
Now we can make requests using FastAPI, but without proper monitoring, we have no visibility into how our application is performing. Monitoring is essential because it allows us to track the health, performance, and reliability of our API in real time. 
<br>

**Why?**
<br>
It helps detect issues like high response times, failed predictions, or resource bottlenecks before they affect users or business operations. By collecting metrics with tools like Prometheus, we can make data-driven decisions, optimize system performance, and ensure our machine learning models are delivering accurate and timely results.


## Installation
Before you start use the link to download the version of Prometheus suitable for your system https://prometheus.io/download/

Windows Example: `prometheus-3.5.0.windows-amd64.zip` | `windows` | `amd64` | `119.22 MiB`

#### Configure Prometheus
After downloading and extracting Prometheus, copy/move the folder to the project path and open the file named `prometheus.yml`. This is the main configuration file where you define how Prometheus scrapes metrics.

**Edit the configuration file**

Replace the contents of `prometheus.yml` with the following:

In [None]:
'''
global:
  scrape_interval: 15s  # Set the scrape interval to every 15 seconds
  evaluation_interval: 15s  # Evaluate rules every 15 seconds

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# Scrape configuration
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
        labels:
          app: "prometheus"

  - job_name: "fastapi_app"
    static_configs:
      - targets: ["localhost:8000"]
        labels:
          app: "fastapi"
'''

**What does this configuration do:**
- Sets the global scrape interval and evaluation interval to 15 seconds.
- Configures a job for Prometheus itself (localhost:9090), so it can monitor its own metrics.
- Adds a job named fastapi_app that scrapes metrics from your FastAPI application on localhost:8000, where the /metrics endpoint is exposed.

Save the file and return to the terminal.

#### **Start Prometheus**
From the same directory as the config file, run Prometheus on the terminal, we suggest open a new terminal for this purpose.

`./prometheus.exe`

Keep it running in the background.
To open Prometheus visit http://localhost:9090 in your browser.

Note: On some OS's like windows some security preferences may cause windows to second guess the opening of the application. Click `more info` and then `run anyway`

## What should we be monitoring with Prometheus?
In this walkthrough we will be using prometheus to monitor two types of mertrics. Metrics related the perfomacnce of our app in relation to 'The Four Golden Signals' and Metrics related to requests and outputs of the model itself

#### The Four Golden Signals
The four golden signals of monitoring are latency, traffic, errors, and saturation. If you can only measure four metrics of your user-facing system, focus on these four.

`Latency` is the time it takes to process a request from start to finish. For example, you might see high latency on an API endpoint during peak hours when response times jump from 200ms to over 2 seconds.

`Traffic` refers to the volume of incoming requests to your application over time. For instance, an e-commerce site might experience a spike in traffic during a flash sale, with thousands of requests per minute.

`Errors` measure the rate or number of failed requests, such as HTTP 500 or 404 responses. You might see a surge in errors if a model file is missing or the database becomes unreachable.

`Saturation` indicates how close a system is to its capacity, such as CPU, memory, or thread limits. A FastAPI app on a small server might show saturation when it handles more concurrent requests than it can efficiently support.

<img src="images/four_golden.png" alt="Datasource" style="width:40%; height:auto;">

## Implementing Prometheus for Monitoring

**1) Import Library** 

To start using Prometheus with FastAPI, you first need to import the relevant metric types from the prometheus_client library

- Counter: A metric that only increases—used for counting events like HTTP requests.
- Gauge: A metric that can increase or decrease—used for values like CPU usage or memory consumption.
- make_asgi_app: A function that creates an ASGI-compatible app which exposes all registered metrics.

```python
from prometheus_client import Counter, Gauge, make_asgi_app
```

**2) Mount Prometheus Metrics Endpoint**

Once your metrics are defined, you need to expose them through an HTTP endpoint so Prometheus can collect them. FastAPI supports ASGI middleware, so we use make_asgi_app to expose a /metrics route.

```python
metric_app = make_asgi_app()
app.mount("/metrics", metric_app)
```

**3) Create Metric**

In order to create a metric with prometheus there are 5 parts we have to look at:
- Declare the variable `http_requests_total`
- Call Prometheus Metric. Prometheus has four main metrics designed for a specific kind of measurement.
    - Counter - You use it to count events
    - Gauge - To track current state or resource usage
    - Histogram - it counts how many observations fall into pre-defined buckets (ranges)
    - Summary - Similar to a histogram, but it calculates quantiles

```python
http_requests_total = Counter()
```

**4) Inside the Prometheus Metric you need to provide:**
- Metric Name: the unique name Prometheus uses to identify this metric.
- Help Text: a human-readable description of what this metric tracks.
- Labels: dimensions that allow you to break down the metric by HTTP method, route, and response status.

```python
http_requests_total = Counter(
    "http_requests_total",
    "Total number of HTTP requests received, labeled by method, endpoint, and HTTP status code.",
    ["method", "endpoint", "status_code"]
)

http_request_duration_seconds = Counter(
    "http_request_duration_seconds_total",
    "Total accumulated HTTP request duration in seconds, labeled by endpoint.",
    ["endpoint"]
)

http_errors_total = Counter(
    "http_errors_total",
    "Total number of HTTP error responses (status code >= 500), labeled by method and endpoint.",
    ["method", "endpoint"]
)

...
```

**5) Create middleware for collecting metrics**

After seting up our metrics, for Prometheus to retrieve this data we need to set up some middleware for these server metrics. Middleware is a function or component that sits between the request coming in and the response going out.

Think of it as a pipeline stage that runs before and after your route handlers.

**6) Create a fastAPI wrapper for your middleware**

We start by defining a middleware function using the @app.middleware("http") decorator. This middleware will collect request-related data and update the Prometheus metrics accordingly.

```python
@app.middleware("http")
async def prometheus_metrics_middleware(request: Request, call_next):
    start_time = time.time()
    method = request.method
    endpoint = request.url.path
```
- `@app.middleware("http")`: Registers the function as HTTP middleware — runs on every request.
- `request`: Request: Represents the incoming HTTP request.
- `call_next`: A function that processes the request and returns a `Response`.
- `start_time`: Captures when the request started, to calculate latency later.
- `method`: The HTTP method (e.g., `GET`, `POST`).
- `endpoint`: The path of the incoming request (e.g., `/predict_single`).


**7) Processing the Request**
This section forwards the incoming request to the appropriate route handler (e.g., /predict_single). It executes the logic defined in your endpoint and waits for the response.

```python
try:
    response = await call_next(request)
```
- call_next(request) processes the request through the FastAPI application and returns a response.
- await is used because this is an asynchronous call.
- The response object contains the status code, which we'll use for metric labeling.


**8) Handling Exceptions and Errors**

If something goes wrong during request processing (e.g., an unhandled exception occurs in a route), the application should catch it and record it as an error.

```python
    except Exception:
        http_errors_total.labels(method=method, endpoint=endpoint).inc()
        raise
```
- This block catches exceptions, increments the http_errors_total counter to log a failure, and re-raises the exception so FastAPI can handle it properly.
- The metric is labeled with the HTTP method and endpoint to provide context about where the error happened.

**9) Updating Prometheus Metrics**

Once the request has been processed successfully (or failed), we update our metrics.

```python
    http_requests_total.labels(
        method=method,
        endpoint=endpoint,
        status_code=str(status_code)
    ).inc()

    http_request_duration_seconds.labels(endpoint=endpoint).observe(duration)

    if status_code >= 500:
        http_errors_total.labels(method=method, endpoint=endpoint).inc()
```
- http_requests_total: Increments the counter for each completed request, labeled by method, path, and response status.

- http_request_duration_seconds: Records how long the request took to complete. This metric uses a histogram to bucket duration values.

- If the response status code is 500 or greater, we increment the http_errors_total counter again (useful if the exception didn’t trigger but an error still occurred).

*note: we use the status codes for 500 or greater as this is the status code for server-side errors*

**10) Return Response**

This line registers the function prometheus_metrics_middleware as HTTP middleware in your FastAPI application.


Middleware acts as a wrapper around your request-response cycle. and allows us to:
- Record when the request starts
- Execute the request
- Measure how long it took
- Log status codes and errors
- Update Prometheus metrics for observability and monitoring


### Find this code inside: `prometheus_00_simple_monitoring_app` to see how prometheus can track a simple metric

### Find this code inside: `prometheus_01_four_signal_monitor_app` to see how we can track the four golden signals using Prometheus

## **More Custom Metrics**

These custom metrics allow you to track detailed information specific to your machine learning model. For example, you can monitor how many predictions are being made, the output classes being predicted, how often batch vs single predictions are used, and the age distribution of passengers being processed.

We do not need to create middleware for these metrics because they are directly tied to specific actions (like running a prediction), not to every HTTP request. Instead, these metrics should be updated inside your route functions where the relevant logic occurs.


```python   
titanic_predictions_total = Counter(
    "titanic_predictions_total", 
    "Total number of predictions made by the model"
)
titanic_predictions_output = Counter(
    "titanic_predictions_output", 
    "Number of predictions made by predicted class", ["predicted_class"]
)
prediction_type_total = Counter(
    "prediction_type_total", 
    "Counts of prediction requests by type", ["type"]
)
```

### Find this code inside: `prometheus_02_model_monitor_app` to see how we cam use prometheus to monitor metrics specific to the model

## How to Query Prometheus

### Using Prometheus

Now that we have integrated metrics into our FastAPI application, the next step is to understand how to view and use these metrics for monitoring and analysis.

To use Prometheus effectively, follow these steps:

**1) Navigate to Prometheus**

Start your application and confirm that the /metrics endpoint is accessible. You can test this by visiting http://localhost:8000/metrics in your browser. You should see a plain text list of metrics, each with names, labels, and current values.

<img src="images/prometheus_metrics.png" alt="Datasource" style="width:60%; height:auto;">

```bash
# HELP <metric_name> <description of what this metric tracks>
# TYPE <metric_name> counter
<metric_name>{<label_key1>="<label_value1>", <label_key2>="<label_value2>"} <value>
<metric_name>{<label_key1>="<another_value1>", <label_key2>="<another_value2>"} <value>

# HELP model_prediction_output Number of predictions made by predicted class
# TYPE model_prediction_output counter
titanic_prediction_output{predicted_class="1"} 3.0
titanic_prediction_output{predicted_class="0"} 5.0
```

**2) Ensure Prometheus is open with the correct configuration**

Prometheus must be running and properly configured to scrape your FastAPI app. This means you should have a prometheus.yml file that includes your application's address (localhost:8000) under scrape_configs.

<img src="images/prometheus_application_nav.png" alt="Datasource" style="width:60%; height:auto;">
<img src="images/prometheus_application.png" alt="Datasource" style="width:60%; height:auto;">

**3) Access the Prometheus dashboard**
Once Prometheus is running, navigate to http://localhost:9090. This is the Prometheus web UI where you can query and visualize metrics.

**4) Write PromQL queries to inspect metrics**

In the Prometheus UI, use the query bar to write PromQL expressions like:

- `http_requests_total` to see all HTTP requests received
- `model_predictions_total` to track total predictions
- `prediction_type_total` to compare single vs batch prediction frequency
- `passenger_age_distribution` to view the histogram of input ages

**5) Click on `Charts` to metrics over time**

Prometheus stores historical values, so you can view trends, spikes, or anomalies over different time windows. Use the graph tab to visualize how a metric changes over time.


<img src="images/promquery.png" alt="Datasource" style="width:100&%; height:auto;">


## Logging predictions with Prometheus

Now we have a FastAPI application that includes Prometheus metrics for monitoring model predictions, prediction types, and passenger age distribution.
Another feature we can add is to track usage stats with Prometheus this is helpful for visual dashboards and performance monitoring (e.g., in Grafana).

Prometheus automatically scrapes metrics exposed by the FastAPI app. Every time a prediction is made, we update counters and histograms that represent model behavior.

**Log to Prometheus in the prediction endpoint**

```python
    titanic_predictions_total.inc()
    titanic_predictions_output.labels(predicted_class=str(prediction)).inc()
    prediction_type_total.labels(type="single").inc()
    passenger_age_distribution.observe(passenger.Age)
```

`titanic_predictions_total.inc()`
- Increments a counter every time a prediction is made.
- Example: titanic_predictions_total = 21

`titanic_predictions_output.labels(predicted_class=str(prediction)).inc()`
- Increments a labelled counter that tracks how many predictions were class 0 or 1.
- Example:
predicted_class="0" : 7
predicted_class="1" : 14

`prediction_type_total.labels(type="single").inc()`
- Increments a labelled counter for the type of prediction (e.g., "single" vs "batch").

### Logging as a JSON

Once prometheus is tracking our metrics it will also be convenient for us to save/log the data as a JSON file (JSONL)

1) Add a helper to increment an execution counter

```python
    def get_next_execution_number(log_path: pathlib.Path) -> int:
    if not log_path.exists():
        return 1
    try:
        with open(log_path, "r") as f:
            lines = f.readlines()
            if not lines:
                return 1
            last_line = lines[-1]
            last_log = json.loads(last_line)
            return last_log.get("execute", 0) + 1
    except Exception:
        return 1
```

What it does:
- If the log file is missing or empty, `returns 1.`
- Otherwise, parses the last JSON line and returns `last.execute` + `1`

<br>

2) Decide where the JSONL file will live (set on startup)

In your `@app.on_event("startup")` you can compute the model path and then set a timestamped (using: `timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")`), then log path next to the model `artifacts` file. This ensures logs for each process run are kept separate.


```python
current_dir = pathlib.Path(__file__).resolve().parent
model_path = current_dir.parent.parent / "mlruns" / EXPERIMENT_ID / MODEL_ARTIFACT_PATH

# Optional custom override (uncomment & set CUSTOM_LOG_PATH globally if desired)
# CUSTOM_LOG_PATH = pathlib.Path("/your/custom/path/inference_logs.jsonl")

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
PREDICTION_LOG_PATH = (
    CUSTOM_LOG_PATH if "CUSTOM_LOG_PATH" in globals() and CUSTOM_LOG_PATH is not None
    else model_path / f"simulation_logs_{timestamp}.jsonl"
)

PREDICTION_LOG_PATH.parent.mkdir(parents=True, exist_ok=True)
with open(PREDICTION_LOG_PATH, "w"):
    pass  reset file each startup

```

3) Build the JSON object laying out how it will be structured.

```python
execution_number = get_next_execution_number(PREDICTION_LOG_PATH)
log_entry = {
    "execute": execution_number,
    "execution_time": datetime.now().isoformat(),  # ISO timestamp
    "experiment_id": EXPERIMENT_ID,
    "run_id": RUN_ID,
    "prediction": int(prediction),
    "probability": {
        "Not Survived": float(probabilities[0]),
        "Survived": float(probabilities[1]),
    }
}
```

4) Append one JSON object per line (JSONL)
```python
with open(PREDICTION_LOG_PATH, "a") as f:
    json.dump(log_entry, f)
    f.write("\n")
```


### Find this code inside `prometheus_03_logging_app.py` to see the implementation of logging

## Multi Execute Code

To demonstrate logging operating larger scale below a version of the code we have been working with that will perform multiple executes of random values to provide us more logs.


We do this by creating a new endpoint called `.post("/simulate_predictions"` containing

```python
        for _ in range(10):
            passenger = {
                "Pclass": random.choice([1, 2, 3]),
                "Sex": random.choice(["male", "female"]),
                "Age": round(random.uniform(1, 80), 1),
                "SibSp": random.randint(0, 3),
                "Parch": random.randint(0, 3),
                "Fare": round(random.uniform(10, 250), 2),
                "Embarked": random.choice(["C", "Q", "S"])
            }
```

This will randomly select values for the model to parse 10 times 

#### Find this code in `prometheus_04_multi_execute_app.py` demonstating simulating multiple executes

## FastAPI Query: Input Simulations

For further control of the simulations being ran it might be in your best interest to have a way to input how many inputs the simulation can take.

`from fastapi import Query`

`Query` in FastAPI lets you declare a parameter that will be read from the request’s URL query string. FastAPI automatically extracts that value, validates it against any rules you set, and converts it to the correct Python type. This means you can use the parameter directly in your function without manually parsing or checking the request.


`num_executions: int = Query(10, ge=1, le=100, description="How many simulations to run")`

Query can take multiple arguments like: default values (10) and validation (ge (greater than equal to) & le (less than equal to))

**Note: Validation is important these parameter can be used as injections directly into our code**

# Grafana - Monitoring Visualisation Tool

Now we have Prometheus running tracking metrics, it will be useful to create visualisations to see these metrics clearer. For this we will be using Grafana.

#### What is Grafana

Grafana is an open-source analytics and visualization platform used to monitor and display time-series data from various data sources like Prometheus, InfluxDB, and Elasticsearch. It allows users to create interactive and customizable dashboards for real-time observability and performance monitoring.

#### How to run Grafana

1. Download Stand alone Grafana Binaries. Unzip the file and place it in the project folder. 

Download from [grafana.com/grafana/download](https://grafana.com/grafana/download) 

2. Open Grafana

Open a new terminal and navigate to grafana bins folder. on the new terminal type:

`.grafana-server.exe`

Using the app url with the port `3000` e.g: `http://127.0.0.1`:<span style="color:green;">`3000`</span>

*Note: Default Username: admin  Default Password: admin (you’ll be prompted to change the password, skip).*

3. Add a Data Source

- Under `Connections` click `Data Sources`
- Click on the `Add new data sources` from the top right
- Select `Prometheus`

<img src="images/data_source.png" alt="Datasource" style="width:20%; height:auto;">

4. Setting for Grafana

Under Connections in the `Add new data sources` setting change `Prometheus server URL:` to the app URL (`http://127.0.0.1/`) + `:9090`

<img src="images/prometheus_settings.png" alt="Datasource" style="width:40%; height:auto;">

save & exit

5. Import Dashboard Template

Dashboard templates can be made from scratch but we will be using a premade simple dashboard preconstructed using a JSON file

- On the left side navigation click on `Dashboards` open the dropdown `New` and select `Import`
- Within this training material folder open another folder folder called `grafana_dashboard`, select `Titanic API - Simple App Monitoring` and `load`

<img src="images/new_dashboard.png" alt="Datasource" style="width:40%; height:auto;">


6. Open Grafana Dashboard

<img src="images/grafana_dash.png" alt="Datasource" style="width:80%; height:auto;">

#### What is Grafana doing? #### 

Grafana uses a language called PromQL (Prometheus Query Language), to query and visualize time-series metrics stored in a Prometheus database.

When connected to Prometheus as a data source, Grafana allows users to write PromQL queries to extract specific metrics (like API uptime, request counts, error rates, etc.) and then visualizes them through interactive dashboards using panels like time series charts, gauges, heatmaps, and bar graphs.

To See this:

On the top right on any visualisation click on the menu button to view inside the visualisation `⋮`

<img src="images/edit_vis.png" alt="Datasource" style="width:15%; height:auto;">

Here we can see the PromQL in action at the bottom with `titanic_predictions_total` being the metric that tracks how many prediction our titanic prediction app has made:

<img src="images/promql.png" alt="Datasource" style="width:50%; height:auto;">