## 5.1 - Intro to ML monitoring

Monitoring is a crucial part of MLOps. It involves tracking the performance of machine learning models in production, system metrics, data quality, and more. Over time, the quality of models can degrade, and monitoring can help detect this.


### 5.1.1. Metrics to monitor (compulsory)
What metrics should we add to our monitoring to fully capture the ML service we are focusing on?

1. **Service health** -> answers the question *does it work?*
2. **Model performance** -> *how good the models are?* E.g.:
    - Regression problems: rmse, ...
    - Classification problems: precision, recall, ...
    
3. **Data quality and integrity** -> *how good is the input data?* *where and how can be improved?* E.g. 
    - Amount of missing values, calculate value ranges, ...
4. **Data and concept drift** -> The service works in a constantly changing environment, thus *is the model still relevant?* 
    - Compares the distribution of input data to model's output

### 5.1.2. Metrics to monitor (additional/optional)
There are also other metrics we can pay attention to, depending on the sensitivity, risk and resources involved:
- Performance by segment: if you have a large diversity in your dataset
- Model bias / fairness: in case of sensitive domain area (e.g., human studies in medicine)
- Monitor for outliers: in case case each individual error is very high
- Explainability: e.g., in recommender systems, how this recommendation is generated

### 5.1.3. Monitoring depending on deployment type

*Note:* You can reuse existing monitoring architecture for ML models (monitoring traditional software applications). This can save time and resources as you don't need to build a new monitoring system from scratch.

The approach we use to deploy the model will influence how monitoring is done.

1. **Batch deployment**: most of the metrics previously described can be analyzed in batch mode. 
    - E.g., you can take the data from most recent batch as current data and compare with reference

2. **Non-batch deployment**: e.g., web-services. It becomes more complicated. Not every metric can be calculated in real time. 
    - Instead, it's more effective to analyze it as a collection (i.e., batch-like).  
    - We can use window functions, wait until you collect a small batch unit, then calculate all these metrics and store it somewhere.
    - This means that even if your service is purely online, you can still do monitoring in batch

### 5.1.4. Monitoring architecture

The monitoring scheme we are going to use for this week can be used for both batch and non-batch deployment:

![title](images/monitoring-scheme.png)

- Taxi ride predicition service (either web or batch)
- Productionize the service  and outputting some logs
- We are building the monitoring on top of these logs (as local files):
    1. Implement monitoring jobs with **Prefect**
    2. We use **Evidently AI** as the evaluation layer: read prediction logs by batch, analyze, calculate metrics and store in postgresql
       - *Evidently* is an open-source Python library to evaluate, test, and monitor the performance of machine learning models from the validation phase to production. It allows you to detect potential issues early.
    3. Later, we use this database as source for Dashboard (**Grafana**) containing evolution of different metrics
       - *Grafana* is an open-source platform for monitoring and observability. It allows you to query, visualize, alert on, and understand your metrics no matter where they are stored. 

## 5.2 - Environment setup

### 5.2.1 Python packages

First, we need to create a working directory for our project in ```/05-monitoring``` directory

```
$ mkdir taxi_monitoring
```

Next, we'll work in ```mlops-zoomcamp``` conda virtual environment. Thus:

```
$ conda activate mlops-zoomcamp
```

We'll create a ```requirements.txt``` file in ```taxi_monitoring/``` to list all the necessary Python libraries for our project.

```
prefect
tqdm
requests
joblib
pyarrow
psycopg
psycopg_binary
evidently 
pandas
numpy
scikit-learn
jupyter
matplotlib
```

Now, we'll install all the Python packages listed in the ```requirements.txt``` file.

```
$ pip install -r requirements.txt
```

### 5.2.2 Docker Compose

#### Definition of ```compose.yaml```

A Docker Compose file is used to define and manage a multi-container Docker application. It's a YAML file that contains all the necessary configurations to run the application. The services defined in this file are typically used in a development environment for testing or in a production environment for deployment.

We create the YAML file in ```taxi_monitoring/```:

```
(mlops-zoomcamp) $ touch compose.yaml
```

We'll configure the ```compose.yaml``` so that can be typically used in a scenario where you want to set up a monitoring system for your application. It uses these 3 services:

- **Postgres database**: stores application data
- **Adminer**: provides a web interface for managing the Postgres database
- **Grafana**: creates visualizations and dashboards based on the data in the database.

Here is how the ```compose.yaml``` would look like:

```dockerfile

version: '3.7'

# Declares volumes that can be used by the services (in this specific file, the volumes declared are not used). Volumes are used to store your artifacts independently of the status of your containers (up or down). It's especially relevant for Grafana
volumes:
  grafana_data: {}

# Defines networks that can be used by the services
networks:
  front-tier:
  back-tier:

# Defines the services that make up your app
services:
  db:
    # Specifies the Docker image to use for this service
    image: postgres
    # Ensures that the service is always restarted if it stops
    restart: always
    # Sets environment variables for the service
    environment:
      # Sets the password for the Postgres database
      POSTGRES_PASSWORD: example
    # Maps ports between the host and the container
    ports:
      - "5432:5432"
    # Specifies the networks that this service is part of
    networks:
      - back-tier

  adminer:
    image: adminer
    restart: always
    ports:
      - "8080:8080"
    networks:
      # Communicate with postgres
      - back-tier
      # We need access (browser)
      - front-tier  

  grafana:
    image: grafana/grafana
    # Sets the user ID under which the service will run. The service uses a specific user ID to run and mounts several volumes for configuration and dashboards.
    user: "472"
    ports:
      - "3000:3000"
    # Maps local directories to directories inside the container
    volumes:
      # :ro -> read-only
      - ./config/grafana_datasources.yaml:/etc/grafana/provisioning/datasources/datasource.yaml:ro
      - ./config/grafana_dashboards.yaml:/etc/grafana/provisioning/dashboards/dashboards.yaml:ro
      - ./dashboards:/opt/grafana/dashboards
    networks:
      # Communicate with postgres
      - back-tier
      # Access our dashboard from browser
      - front-tier
    restart: always
```

#### Grafana data source configuration

We'll create a Grafana data source configuration file. It's used to define the data sources that Grafana should connect to. A data source in Grafana represents a back-end database, such as PostgreSQL. 

In your project directory, create a new directory named ```config``` and a file within it named ```grafana_datasources.yaml```:

```
$ mkdir config
$ touch config/grafana_datasources.yaml
```

The content of ```grafana_datasources.yaml``` is as follows:

```dockerfile
# Specifies the version of the configuration file format being used
apiVersion: 1

# Defines the data sources that Grafana should connect to
datasources:
    # name of the data source
  - name: PostgreSQL
    # type of the data source as PostgreSQL
    type: postgres
    # Grafana will act as a proxy between the data source and the clients
    access: proxy
    # PostgreSQL service defined in the Docker Compose file (including port)
    url: db.:5432
    # Name of the database within the PostgreSQL server that Grafana should connect to.
    database: test
    # Sets the username to be used for authentication when connecting to the PostgreSQL server.
    user: postgres
    # Contains sensitive data (in this case, the password) that is securely stored. Here, the password is set to "example".
    secureJsonData:
      password: 'example'
    # Contains additional JSON data. Here, SSL/TLS encryption for the connection is disabled.
    jsonData:
      sslmode: 'disable'
```

This configuration file is used in the context of the ```compose.yaml``` setup we provided earlier. The Grafana service in the Docker Compose file uses this configuration file to set up its connection to the PostgreSQL database.

This setup allows Grafana to pull data from the PostgreSQL database, which it can then use to create visualizations and dashboards. This is particularly useful in a monitoring setup, where you might want to visualize metrics from your application that are stored in the PostgreSQL database.

### 5.2.3 Build and Run Docker Compose

Finally, we'll build and run our Docker Compose configuration. There are different options:

```$ docker compose build``` : only builds the images, does not start the containers

```$ docker compose up``` : builds the images if the images do not exist and starts the containers

```$ docker compose up --build``` : forced to build the images even when not needed

In our case we use the 3rd option. You should see that all your containers are successfully created. You can verify this by accessing 

- Grafana : ```localhost:3000```. When accessing for the 1st time, ```user:admin``` and ```password:admin```
- Adminer : ```localhost:8080```

and use 

``` $ docker compose down```

to stop the containers.

## 5.3 Prepare reference and model

Before we can start monitoring our machine learning model, we need to establish a baseline. This involves gathering initial data and training a model that we can use as a reference point for future comparisons.

Within the directory ```taxi_monitoring/```, we create two extra folders: ```models``` and ```data```. From now on, the following content will refer to the functions implemented in the notebook ```baseline_model_nyc_taxi_data.ipynb```.

1. First, we need to import all the necessary libraries for our project. This includes libraries for handling requests, datetime operations, data manipulation (Pandas), machine learning (Scikit-learn), model persistence (Joblib), progress bars (tqdm), and data drift and model performance monitoring (Evidently).

2. 
```python 
def download_files()
```
This function downloads files from a specified URL and saves them to a local directory. It uses the ```requests``` library to send a GET request to the URL and download the file, and the ```tqdm``` library to display a progress bar.

3. 
```python
def preprocess_data()
```
This function preprocesses the data by calculating the duration of each trip in minutes and filtering out trips with unrealistic durations and passenger counts.

4. ```python
def train_and_evaluate()
``` 
This function trains a Linear Regression model on the training data and evaluates it on the validation data. The script calculates the Mean Absolute Error (MAE) of the predictions on both the training and validation data.

5. ```python
def save_model_and_data()
``` 
This function saves the trained model and the validation data for future use. It uses the ```joblib.dump``` function to save the model and the ```polars.DataFrame.write_parquet``` method to save the validation data. This step is important for model deployment and monitoring, as we'll need to load the model and reference data in the future to calculate dataset drift when compared with production behavior

## 5.4 Generate Evidently Report

```python
def generate_report()
```
This function generates an Evidently report that provides insights into the performance of the model. It specifies the target variable, prediction variable, and the numerical and categorical features. The report checks for column drift, dataset drift, and missing values in the dataset.

In the context of the report generated by the evidently library, **data drift** refers to a concept that measures the changes in data distribution over time. It helps in identifying situations where the model might be operating on data that is significantly different from the data it was trained on, which can indicate the need for model retraining.

## 5.5  Dummy monitoring

Now we will put everything together with an example using dummy data. The implementation can be found in ```taxi_monitoring/dummy_metrics_calculation.py```

### 5.5.1  Set up dummy data

- We will start by preparing the database and creating tables with dummy data 
- Next, we will insert timestamp values into the database table. 
- Finally, we will set up and access a PostgreSQL database for Grafana dashboard.

In the following, we will describe the content of ```dummy_metrics_calculation.py```

1. ```python 
def prep_db()
```
This function ensures that s PostgreSQL database exists and creates it if necessary. It then establishes a connection to the database and creates a table with the specified columns. 

2. ```python 
def calculate_dummy_metrics_postgresql()
``` 
This function calculates dummy metrics and loads them into the table. It generates random values for three variables and inserts them into the table along with the current timestamp.

3. ```python
def main()
```
In the main function, we prepare the database and then run a loop to calculate and insert dummy metrics into the table. We also calculate the time delay to simulate real production usage (also, it will be easier to visualize and see how data changes in Grafana).

### 5.5.2  Launch containers & dummy script

We can now run the script. But first we need to activate the containers:

```
$ docker compose up
```
and then run the script:

```
$ python dummy_metrics_calculation.py
```
After waiting for a couple of data sent, we can now go to our browser and open Adminer at ```localhost:8080```. We use the following parameters:
- System: PostgreSQL
- Server: db
- Username: postgres
- Password: example
- Database: test

The dummy_table looks like this:

![title](images/dummy.png)

and we can see how quite a bit of data has been written already:

![title](images/dummy_data.png)

### 5.5.3  Access dummy data from Grafana

Now, let's go to Grafana and see whether we are able to access those data from Grafana. If we configured our Grafana and PostgreSQL correctly, we should be able to create a new dashboard. 

To access Grafana through the browser (containers are running) -> ```localhost:3000```. 

- We need to create a new dashboard associated with PostgreSQL. 
- By default a visualization will be autogenerated (not related with our data)
- To visualize our data, we need to select our table name and columns we are interested. For example, for value1:

![title](images/grafana1.png)

We can create several visualizations as well (e.g., value1 and value3):

![title](images/grafana2.png)

## 5.6  Data Quality Monitoring - NYC taxi trip

### 5.6.1 Prepare script

Now we delve into a crucial aspect of MLOps: monitoring the performance of deployed machine learning models over time. The primary objective is to identify and log any changes in the model's performance, a process known as **drift detection**. Drift can occur when the statistical properties of the target variable, which the model aims to predict, alter unexpectedly over time, potentially leading to a decline in the model's performance.

To orchestrate the tasks involved in calculating and storing drift metrics, we employ the **Prefect** library. The **Evidently** library is utilized to compute three types of metrics:

- Column Drift
- Dataset Drift
- Missing Values

These metrics, computed for each day's data, are stored in a PostgreSQL database for subsequent analysis. Designed to operate as a batch job, the script processes a large volume of data at once, instead of handling each data point individually in real-time.

Included in the script ```taxi_monitoring/evidently_metrics_calculation.py``` we define several tasks that will be used in our Prefect pipeline:

1. ```python
@task
def prep_db():
```
which prepares the database, creating the necessary table if it doesn't already exist.

2. ```python
@task
def calculate_metrics_postgresql(curr, i):
```
calculates the metrics for each day ```i``` and inserts them into the database. 

3. ```python
@flow
def batch_monitoring_backfill():
```
Finally, this flow orchestrates the entire process. It calls the ```prep_db()``` task to prepare the database, then loops over each day in the time period, calls the ```calculate_metrics_postgresql()``` task to calculate the metrics for that day, and inserts them into the database. It also ensures that the metrics are sent to the database at a rate that does not exceed the ```SEND_TIMEOUT```.

In summary, with periodic execution (e.g., daily), the script enables continuous monitoring of the model's performance, providing early warning signs if performance is deteriorating due to drift. This information can trigger model retraining or other interventions to maintain performance.

### 5.6.2 Run containers, Prefect and script

We can now 

1. run the containers:

```
$ docker compose up
```

2. start Prefect locally/cloud:
```
$ prefect server start
```
Once the Prefect server is up and running, we make sure that we apply the API URL to our Prefect configuration so that we're pointing to the correct API URL and the workflow metadata is correctly sent to the server UI. Therefore, in a new CLI window, we run the following command:
```
$ prefect config set PREFECT_API_URL=http://127.0.0.1:4200/api
```

3. and run the script:

```
$ python evidently_metrics_calculation.py
```

Then we can access Prefect UI at ```localhost:4200```. In the figure below, we can observe the iteration has reached 10 days:

![title](images/monitoring-prefect.png)

and navigate to Adminer at ```localhost:8080``` to observe that the metrics table has been created:

![title](images/monitoring-adminer.png)

Finally, we demonstrate how to construct a dashboard with panels and metrics in Grafana, offering a visual representation of the model's performance and any detected drift.

To access Grafana through the browser (containers are running) -> ```localhost:3000```. In this case we want to monitor *prediction_drift*. We need to make sure that we zoom the data to the historical timestamps, as we are not recording the present time but February 2022:

![title](images/grafana-monitoring.png)

We can also add the other two metrics:

![title](images/grafana-monitoring2.png)

Next, we will discuss how to save the dashboard and ensure it can be loaded every time the Docker container is rerun.

## 5.7 Save Grafana Dashboard


### 5.7.1 Saving Grafana dashboard configurations
With Grafana, we don't need to recreate all the panels again, but we can just access them from the dashboards. Remember on ```compose.yaml``` we used

```dockerfile
grafana:
    image: grafana/grafana
    user: "472"
    ports:
      - "3000:3000"
    volumes:
      - ./config/grafana_datasources.yaml:/etc/grafana/provisioning/datasources/datasource.yaml:ro
      #- ./config/grafana_dashboards.yaml:/etc/grafana/provisioning/dashboards/dashboards.yaml:ro
      #- ./dashboards:/opt/grafana/dashboards
    networks:
      - back-tier
      - front-tier
    restart: always
```
where ```config/grafana_dashboards.yaml``` is a configuration file used to set up Grafana dashboards for monitoring machine learning models. The dashboards are stored ```dashboards/``` and Grafana scans for changes at the specified interval. We can uncomment these two volumes so that the dashboard can be saved.

```yaml

apiVersion: 1

providers:
  # <string> an unique provider name. Required
  - name: 'Evidently Dashboards'
    # <int> Org id. Default to 1
    orgId: 1
    # <string> name of the dashboard folder.
    folder: ''
    # <string> folder UID. will be automatically generated if not specified
    folderUid: ''
    # <string> provider type. Default to 'file'
    type: file
    # <bool> disable dashboard deletion
    disableDeletion: false
    # <int> how often Grafana will scan for changed dashboards
    updateIntervalSeconds: 10
    # <bool> allow updating provisioned dashboards from the UI
    allowUiUpdates: false
    options:
      # <string, required> path to dashboard files on disk. Required when using the 'file' type
      path: /opt/grafana/dashboards
      # <bool> use folder names from filesystem to create folders in Grafana
      foldersFromFilesStructure: true

```
Make sure you have the Grafana dashboard configuration file ```dashboards/data_drift.json```. The content of this JSON file can be obtained from Grafana as follows:

![title](images/grafana-json.png)

### 5.7.2 Reloading Docker and accessing saved dashboard in Grafana.

We stop docker:

```
$ docker compose down
```

and then rerun the containers:

```
$ docker compose up
```

and the script:

```
$ python evidently_metrics_calculation.py
```

By accessing Grafana through the browser (containers are running) -> ```localhost:3000```, the visualizations are already loaded. If you access Grafana before the script is completed, the visualizations will show only the days calculated so far.

## 5.8 Debugging with test suites and reports

Now we will explore how to debug and monitor machine learning models using the Evidently library. 

- Debugging involves identifying, isolating, and fixing issues that may affect the performance of a machine learning model. 

- Monitoring, on the other hand, is the continuous observation of the model's performance over time to detect any significant changes or anomalies.

Evidently provides powerful tools for both debugging and monitoring. It offers Test Suites and Reports that help in understanding the model's behavior and performance.

**Test Suites** in Evidently allow us to run a series of tests on the data to check for various conditions, such as data drift. If any of the tests fail, it indicates a potential issue that needs to be debugged.

**Reports** in Evidently provide a comprehensive analysis of the data and the model. They calculate various metrics and provide visualizations that help in understanding the state of the data and the model. For instance, a Data Drift report can show how the features' distributions have changed over time, indicating drift.

By leveraging these tools, we can effectively debug and monitor our machine learning models, ensuring their robustness and reliability in production environments.

The code is provided at ```taxi_monitoring/debugging_nyc_taxi_data.ipynb```