# Getting started tutorial

To install Evidently using the pip package manager, run:

```bash
pip install evidently
```

If you want to see reports inside a Jupyter notebook, you need to also install the Jupyter nbextension. After installing evidently, run the two following commands in the terminal from the Evidently directory.

To install jupyter nbextension, run:
```bash
jupyter nbextension install --sys-prefix --symlink --overwrite --py evidently
```

To enable it, run:

```bash
jupyter nbextension enable evidently --py --sys-prefix
```

In [1]:
#!pip install evidently
!jupyter nbextension install --sys-prefix --symlink --overwrite --py evidently
!jupyter nbextension enable evidently --py --sys-prefix

Installing /Users/ngohongthai/miniconda3/envs/mlops/lib/python3.9/site-packages/evidently/nbextension/static -> evidently
Removing: /Users/ngohongthai/miniconda3/envs/mlops/share/jupyter/nbextensions/evidently
Symlinking: /Users/ngohongthai/miniconda3/envs/mlops/share/jupyter/nbextensions/evidently -> /Users/ngohongthai/miniconda3/envs/mlops/lib/python3.9/site-packages/evidently/nbextension/static
- Validating: [32mOK[0m

    To initialize this nbextension in the browser every time the notebook (or other app) loads:
    
          jupyter nbextension enable evidently --py --sys-prefix
    
Enabling notebook extension evidently/extension...
      - Validating: [32mOK[0m


In [2]:
import pandas as pd
import numpy as np

from sklearn.datasets import fetch_california_housing

from evidently import ColumnMapping

from evidently.report import Report
from evidently.metrics.base_metric import generate_column_metrics
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset, DataQualityPreset, RegressionPreset
from evidently.metrics import *

from evidently.test_suite import TestSuite
from evidently.tests.base_test import generate_column_tests
from evidently.test_preset import DataStabilityTestPreset, NoTargetPerformanceTestPreset
from evidently.tests import *

  @numba.jit()
  @numba.jit()
  @numba.jit()
  @numba.jit()


In [3]:
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')

## Load data

In [4]:
data = fetch_california_housing(as_frame=True)
housing_data = data.frame

In [5]:
housing_data.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [6]:
housing_data.rename(columns={'MedHouseVal': 'target'}, inplace=True)
housing_data['prediction'] = housing_data['target'].values + np.random.normal(0, 5, housing_data.shape[0])

In [7]:
housing_data.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,target,prediction
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526,-1.819465
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585,1.793194
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521,11.684745
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413,9.835884
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422,4.404241


In [8]:
reference = housing_data.sample(n=5000, replace=False)
current = housing_data.sample(n=5000, replace=False)

## Report

In [9]:
report = Report(metrics=[
    DataDriftPreset(), 
])

report.run(reference_data=reference, current_data=current)
report

## Customize the report

In [10]:
report = Report(metrics=[
    ColumnSummaryMetric(column_name='AveRooms'),
    ColumnQuantileMetric(column_name='AveRooms', quantile=0.25),
    ColumnDriftMetric(column_name='AveRooms')
])

report.run(reference_data=reference, current_data=current)
report

If you want to generate multiple column-level Metrics, there is a helper function. For example, in order to calculate the same quantile value for all the columns in the list, you can use the generator:

In [11]:
report = Report(metrics=[
    generate_column_metrics(ColumnQuantileMetric, parameters={'quantile':0.25}, columns=['AveRooms', 'AveBedrms']),
])

report.run(reference_data=reference, current_data=current)
report

You can easily combine individual Metrics, Presets and metric generators in a single list:

In [12]:
report = Report(metrics=[
    ColumnSummaryMetric(column_name='AveRooms'),
    generate_column_metrics(ColumnQuantileMetric, parameters={'quantile':0.25}, columns='num'),
    DataDriftPreset()
])

report.run(reference_data=reference, current_data=current)
report

## Define the report output format

In [13]:
report.as_dict()

{'metrics': [{'metric': 'ColumnSummaryMetric',
   'result': {'column_name': 'AveRooms',
    'column_type': 'num',
    'reference_characteristics': {'number_of_rows': 5000,
     'count': 5000,
     'missing': 0,
     'missing_percentage': 0.0,
     'mean': 5.47,
     'std': 2.94,
     'min': 1.63,
     'p25': 4.44,
     'p50': 5.25,
     'p75': 6.06,
     'max': 132.53,
     'unique': 4883,
     'unique_percentage': 97.66,
     'infinite_count': 0,
     'infinite_percentage': 0.0,
     'most_common': 4.0,
     'most_common_percentage': 0.18},
    'current_characteristics': {'number_of_rows': 5000,
     'count': 5000,
     'missing': 0,
     'missing_percentage': 0.0,
     'mean': 5.38,
     'std': 2.06,
     'min': 1.41,
     'p25': 4.43,
     'p50': 5.19,
     'p75': 6.03,
     'max': 61.81,
     'unique': 4888,
     'unique_percentage': 97.76,
     'infinite_count': 0,
     'infinite_percentage': 0.0,
     'most_common': 5.0,
     'most_common_percentage': 0.12}}},
  {'metric': 'Colum

In [14]:
report.json()

'{"version": "0.3.3", "timestamp": "2023-06-27 00:50:07.872385", "metrics": [{"metric": "ColumnSummaryMetric", "result": {"column_name": "AveRooms", "column_type": "num", "reference_characteristics": {"number_of_rows": 5000, "count": 5000, "missing": 0, "missing_percentage": 0.0, "mean": 5.47, "std": 2.94, "min": 1.63, "p25": 4.44, "p50": 5.25, "p75": 6.06, "max": 132.53, "unique": 4883, "unique_percentage": 97.66, "infinite_count": 0, "infinite_percentage": 0.0, "most_common": 4.0, "most_common_percentage": 0.18}, "current_characteristics": {"number_of_rows": 5000, "count": 5000, "missing": 0, "missing_percentage": 0.0, "mean": 5.38, "std": 2.06, "min": 1.41, "p25": 4.43, "p50": 5.19, "p75": 6.03, "max": 61.81, "unique": 4888, "unique_percentage": 97.76, "infinite_count": 0, "infinite_percentage": 0.0, "most_common": 5.0, "most_common_percentage": 0.12}}}, {"metric": "ColumnQuantileMetric", "result": {"column_name": "AveBedrms", "column_type": "num", "quantile": 0.25, "current": {"val

In [15]:
# report.save_html("file.html")

##  Run data stability tests - Kiểm tra độ ổn định của dữ liệu liệu

In [16]:
tests = TestSuite(tests=[
    TestNumberOfColumnsWithMissingValues(),
    TestNumberOfRowsWithMissingValues(),
    TestNumberOfConstantColumns(),
    TestNumberOfDuplicatedRows(),
    TestNumberOfDuplicatedColumns(),
    TestColumnsType(),
    TestNumberOfDriftedColumns(),
])

tests.run(reference_data=reference, current_data=current)
tests

In [17]:
suite = TestSuite(tests=[
    NoTargetPerformanceTestPreset(),
])

suite.run(reference_data=reference, current_data=current)
suite

Just like with Reports, you can combine individual Tests and Presets in a single Test Suite and use column test generator to generate multiple column-level tests:

In [18]:
suite = TestSuite(tests=[
    TestColumnDrift('Population'),
    TestShareOfOutRangeValues('Population'),
    generate_column_tests(TestMeanInNSigmas, columns='num'),
    
])

suite.run(reference_data=reference, current_data=current)
suite