# Monitoring with Evidently

To install Evidently using the pip package manager, run:

```$ pip install evidently```


If you want to see reports inside a Jupyter notebook, you need to also install the Jupyter nbextension. After installing evidently, run the two following commands in the terminal from the Evidently directory.

To install jupyter nbextension, run:

```$ jupyter nbextension install --sys-prefix --symlink --overwrite --py evidently```

To enable it, run:

```$ jupyter nbextension enable evidently --py --sys-prefix```

That's it!

In [None]:
try:
    import evidently
except:
    !pip install git+https://github.com/evidentlyai/evidently.git

In [1]:
import pandas as pd
import numpy as np

from sklearn.datasets import fetch_california_housing

from evidently import ColumnMapping

from evidently.report import Report
from evidently.metrics.base_metric import generate_column_metrics
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset, DataQualityPreset, RegressionPreset
from evidently.metrics import *

from evidently.test_suite import TestSuite
from evidently.tests.base_test import generate_column_tests
from evidently.test_preset import DataStabilityTestPreset, NoTargetPerformanceTestPreset, RegressionTestPreset
from evidently.tests import *

In [2]:
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')

## Load Data

In [3]:
data = fetch_california_housing(as_frame=True)
housing_data = data.frame

In [4]:
housing_data.rename(columns={'MedHouseVal': 'target'}, inplace=True)
housing_data['prediction'] = housing_data['target'].values + np.random.normal(0, 5, housing_data.shape[0])

In [5]:
reference = housing_data.sample(n=5000, replace=False, random_state=123)
current = housing_data.sample(n=5000, replace=False, random_state=321)

#### TODO: Print basic statistics of reference and current data
- mean
- min
- max
- number of elements
- and similar

## Report

Evidently Reports help explore and debug data and model quality. They calculate various metrics and generate a dashboard with rich visuals.

To start, you can use Metric Presets. These are pre-built Reports that group relevant metrics to evaluate a specific aspect of the model performance.

Let’s start with the Data Drift. This Preset compares the distributions of the model features and show which have drifted. When you do not have ground truth labels or actuals, evaluating input data drift can help understand if an ML model still operates in a familiar environment.

The data drift report compares the distributions of each feature in the two datasets (reference vs current). It automatically picks an appropriate statistical test or metric based on the feature type and volume. It then returns p-values or distances and visually plots the distributions. You can also adjust the drift detection method or thresholds, or pass your own.

##### TODO: Create a preset report
- Create a Report object with a DataDrift preset included
- Use the reference and current datasets created in previous steps
- Experiment with changing the statistical test in the report to something else.

Evidently Reports are very configurable. You can define which Metrics to include and how to calculate them.

To create a custom Report, you need to list individual Metrics. Evidently has dozens of Metrics that evaluate anything from descriptive feature statistics to model quality. You can calculate Metrics on the column level (e.g., mean value of a specific column) or dataset-level (e.g., share of drifted features in the dataset).

##### TODO: Create a custom report
- Display a summary metric for the column `AveRooms`
- Display a quantile metric for the 0.25 quantile for the column `Latitude` and `Longitude`
- Display the drift metric for the column `HouseAge`

##### TODO: Saving the report
- Save the output of the report as html
- Get the output of the report as python dict
- Get the output of the report as JSON

## Test Suite

Reports help visually explore the data or model quality or share results with the team. However, it is less convenient if you want to run your checks automatically and only react to meaningful issues.

To integrate Evidently checks in the prediction pipeline, you can use the Test Suites functionality. They are also better suited to handle large datasets.

Test Suites help compare the two datasets in a structured way. A Test Suite contains several individual tests. Each Test compares a specific value against a defined condition and returns an explicit pass/fail result. You can apply Tests to the whole dataset or individual columns.

Just like with Reports, you can create a **custom Test Suite** or use one of the **Presets**.

How does it work? 
Evidently automatically generates the test conditions based on the provided reference dataset. They are based on heuristics. For example, the test for column types fails if the column types do not match the reference. The test for the number of columns with missing values fails if the number is higher than in reference. The test for the share of drifting features fails if over 50% are drifting. You can easily pass custom conditions to set your own expectations.

##### TODO: Create a custom TestSuite
- Test for missing values in columns
- Test for rows with missing values
- Test for constant columns
- Test for duplicate rows
- Test for duplicate columns
- Test column types
- Test for drifted columns

##### TODO: Display the performed tests
- as dict
- as JSON