![alt text](assets/evidently_ai_logo_fi.png "Title")

For docs and more: https://evidentlyai.com

### Prepare the environment

In [None]:
%conda install evidently

In [None]:
# Enable interactive reports inside jupyter notebook
!jupyter nbextension install --sys-prefix --symlink --overwrite --py evidently
!jupyter nbextension enable evidently --py --sys-prefix

### Import necessary libraries

In [None]:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

In [None]:
# For interactive and HTML reports
from evidently.dashboard import Dashboard
from evidently.tabs import DataDriftTab, NumTargetDriftTab, CatTargetDriftTab, ClassificationPerformanceTab

In [None]:
# To generate JSON profiles
from evidently.model_profile import Profile
from evidently.profile_sections import DataDriftProfileSection

## Customer churn 

- Churn rate: % of customers or employees who leave a company/service over time
- Predicting churn rate is important to optimize customer retention
- Churn is can be prevented by identifying users at risk

### Data: 
- Customer churn from a US mobile provider
- Size: 5000 records, 20 customer attributes (features)
- Model: Binary classification

### Load data and pre-processing

In [None]:
raw_data = pd.read_csv('data/churn.txt')

In [None]:
raw_data.columns

In [None]:
raw_data.head()

In [None]:
drop_c = ['State', 'Account Length', 'Area Code', 'Phone', "Int'l Plan", 'VMail Plan', 'VMail Message', 'CustServ Calls']
raw_data = raw_data.drop(drop_c, axis=1)

In [None]:
raw_data['churn'] = [0 if churn == 'False.' else 1 for churn in raw_data['Churn?'].values]
raw_data = raw_data.drop(['Churn?'], axis=1)

In [None]:
feature_c = ['Day Mins', 'Day Calls', 'Day Charge', 'Eve Mins', 'Eve Calls', 'Eve Charge', 'Night Mins', 'Night Calls', 'Night Charge', 'Intl Mins', 'Intl Calls', 'Intl Charge']
target_c = 'churn'

In [None]:
raw_data.head()

## Split the dataset into train and test

<b>Evidently</b> compares <b>reference</b> and <b>production</b> data!

In this example <b>train</b> data will be our <b>reference</b> and <b>test</b> our <b>production</b> dataset. 

In [None]:
train, test = train_test_split(raw_data, test_size=0.20, random_state=1)

In [None]:
clf = RandomForestClassifier(
    max_depth=5,
    random_state=0
)

In [None]:
clf.fit(train[feature_c], train[target_c])

In [None]:
clf.predict(test[feature_c].iloc[:10])

## Evidently reports

The following reports are available

- <b>DataDriftTab</b> to estimate the data drift
- <b>NumTargetDriftTab</b> to estimate target drift for the numerical target 
- <b>CatTargetDriftTab</b> to estimate target drift for the categorical target 
- <b>RegressionPerformanceTab</b> to explore the performance of a regression model
- <b>ClassificationPerformanceTab</b> to explore the performance of a classification model
- <b>ProbClassificationPerformanceTab</b> to explore the performance of a probabilistic classification model 

In [None]:
column_mapping = {}
column_mapping['target'] = target_c # 'churn' columns
column_mapping['numerical_features'] = feature_c # feature columns

### Data drift

Detects changes in feature distribution

In [None]:
data_drift = Dashboard(tabs=[DataDriftTab])

In [None]:
data_drift.calculate(train, test, column_mapping=column_mapping)

In [None]:
# Save the report as an html file
data_drift.save('report.html')

In [None]:
data_drift.show()

### Target drift (numerical and categorical)

Detects changes in numerical/categorical target and feature behavior.

In [None]:
target_drift = Dashboard(tabs=[CatTargetDriftTab])

In [None]:
target_drift.calculate(train, test, column_mapping=column_mapping)

In [None]:
target_drift.show()

### Classification model performance (Probabilistic and binary/multi-class)

Analyzes the performance of a probabilistic classification model, quality of the model calibration, and model errors. Works both for binary and multi-class models.

In [None]:
classification_mapping = {}

classification_mapping['target'] = target_c
classification_mapping['prediction'] = 'prediction'
classification_mapping['numerical_features'] = feature_c

In [None]:
train['target'] = train[target_c]
test['target'] = test[target_c]
train['prediction'] = clf.predict(train[feature_c]) 
test['prediction'] = clf.predict(test[feature_c])

In [None]:
classification_performance = Dashboard(tabs=[ClassificationPerformanceTab])
classification_performance.calculate(train, test, column_mapping=classification_mapping)
classification_performance.show()

### Creating Profiles and json exports

In [None]:
data_drift_profile = Profile(sections=[DataDriftProfileSection])

In [None]:
data_drift_profile.calculate(train, test, column_mapping=column_mapping)

In [None]:
data_drift_profile.json()

### Generate reports from the terminal

```
python -m evidently calculate dashboard --config config.json 
--reference reference.csv --current current.csv --output output_folder --report_name output_file_name
```

Where `config.json` is a configuration file for your report. E.g.
```
{
  "data_format":{
    "separator":",",
    "header":true,
    "date_column":null
  },
  "column_mapping":{},
  "profile_sections":["data_drift"],
  "pretty_print":true
}
```