# Data Tracker usage example on tabular data

In [1]:
from sklearn.datasets import load_wine
from alibi_detect.metrics import DataTracker

## Load dataset
We fetch the [wine dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html#sklearn.datasets.load_wine).

In [2]:
data = load_wine()
X = data.data
print(X.shape)

(178, 13)


## Initialize `DataTracker` object

In [3]:
dt = DataTracker(n_features=X.shape[1])

## Simulate updating data
The `DataTracker` supports batch updating, so we can simulate a sequence of events by passing a batch of the dataset to the `update` function.

In [4]:
dt.update(X[:50,:])

## Inspect metrics
We can use the `get` function to receive data metrics held by the `DataTracker`. We can choose to have these serialized (False by default) which would convert all `numpy` types to Python native types.

In [5]:
stats = dt.get(serialize=False)

Stats contains the ```mean```, ```variance``` and ```median``` values by feature as well as a ```histogram``` for each feature.

In [6]:
for k, v in stats.items():
    print('\nFeature {}'.format(k+1))
    print('mean: {:.2f} -- variance: {:.2f} -- median: {:.2f}'.format(v['mean'], v['variance'], v['median']))


Feature 1
mean: 13.76 -- variance: 0.23 -- median: 13.68

Feature 2
mean: 2.06 -- variance: 0.54 -- median: 1.79

Feature 3
mean: 2.46 -- variance: 0.05 -- median: 2.41

Feature 4
mean: 17.16 -- variance: 6.76 -- median: 17.33

Feature 5
mean: 106.00 -- variance: 113.18 -- median: 102.97

Feature 6
mean: 2.81 -- variance: 0.10 -- median: 2.81

Feature 7
mean: 2.95 -- variance: 0.16 -- median: 2.91

Feature 8
mean: 0.30 -- variance: 0.00 -- median: 0.29

Feature 9
mean: 1.87 -- variance: 0.17 -- median: 1.86

Feature 10
mean: 5.37 -- variance: 1.61 -- median: 5.03

Feature 11
mean: 1.07 -- variance: 0.01 -- median: 1.07

Feature 12
mean: 3.17 -- variance: 0.14 -- median: 3.18

Feature 13
mean: 1102.84 -- variance: 54384.30 -- median: 1055.54


We can update these metrics with new data:

In [7]:
dt.update(X[50:,:])
stats = dt.get(serialize=False)

for k, v in stats.items():
    print('\nFeature {}'.format(k+1))
    print('mean: {:.2f} -- variance: {:.2f} -- median: {:.2f}'.format(v['mean'], v['variance'], v['median']))


Feature 1
mean: 13.00 -- variance: 0.66 -- median: 12.97

Feature 2
mean: 2.34 -- variance: 1.25 -- median: 2.03

Feature 3
mean: 2.37 -- variance: 0.08 -- median: 2.34

Feature 4
mean: 19.49 -- variance: 11.15 -- median: 19.56

Feature 5
mean: 99.74 -- variance: 203.99 -- median: 96.60

Feature 6
mean: 2.30 -- variance: 0.39 -- median: 2.28

Feature 7
mean: 2.03 -- variance: 1.00 -- median: 2.04

Feature 8
mean: 0.36 -- variance: 0.02 -- median: 0.35

Feature 9
mean: 1.59 -- variance: 0.33 -- median: 1.56

Feature 10
mean: 5.06 -- variance: 5.37 -- median: 4.63

Feature 11
mean: 0.96 -- variance: 0.05 -- median: 0.96

Feature 12
mean: 2.61 -- variance: 0.50 -- median: 2.73

Feature 13
mean: 746.89 -- variance: 99166.72 -- median: 678.04
