# Image Property Drift

This notebooks provides an overview for using and understanding the image property drift check.

**Structure:**

- [What is the purpose of the check?](#purpose)
- [Prepare data](#prepare)
- [Run the check](#run-check)
- [Define a condition](#define-condition)
- [Check Parameters](#parameters)


## What is the purpose of the check? <a name='purpose'></a>

Data drift is simply a change in the distribution of data over time. It is also one of the top reasons of a machine learning model performance degrades over time.

In the context of machine learning, drift between the training set and the test set will likely make the model to be prone to errors. In other words, this means that the model was trained on data that is different from the current test data, thus it will probably make more mistakes predicting the target variable.

Image Property Drift check calculates a drift score for each image property in test dataset, by comparing its distribution to the train dataset. For this, we use the Earth Movers Distance (Wasserstein distance).

### Imports

In [1]:
from deepchecks.vision.datasets.detection import coco
from deepchecks.vision.checks.distribution import ImagePropertyDrift

### Prepare data <a name='prepare'></a>

In [2]:
train_dataset = coco.load_dataset(train=True, object_type='VisionData')
test_dataset = coco.load_dataset(train=False, object_type='VisionData')

### Run the check <a name='run-check'></a>

In [3]:
check_result = ImagePropertyDrift().run(train_dataset, test_dataset)
check_result

### Observe the check’s output <a name='check-output'></a>

The result value is a pandas DataFrame that contains drift score for each image property.

In [4]:
check_result.value

Unnamed: 0,Drift score
area,0.042969
aspect_ratio,0.054844
brightness,0.048437
normalized_blue_mean,0.038437
normalized_green_mean,0.037812
normalized_red_mean,0.026563


## Define a condition <a name='define-condition'></a>

We can define a condition that make sure that image properties drift scores do not exceed allowed threshold.

In [5]:
check_result = (
    ImagePropertyDrift()
    .add_condition_drift_score_not_greater_than(0.001)
    .run(train_dataset, test_dataset)
)
check_result.show(show_additional_outputs=False)

Status,Condition,More Info
✖,Earth Mover's Distance <= 0.001 for image propertiesdrift,Earth Mover's Distance is above the threshold for the next properties: area=0.04; aspect_ratio=0.05; brightness=0.05; normalized_blue_mean=0.04; normalized_green_mean=0.04; normalized_red_mean=0.02


### Check Parameters <a name='parameters'></a>

Image Property Drift Check accepts two parameters that allows us to control the look of the output:
- `image_properties` - list of image properties that we are interested in
- `default_number_of_bins` - number of bins to use for the histograms

Only next string values are allowed for the `image_properties` parameter:
- `aspect_ratio`
- `area`
- `brightness`
- `normalized_red_mean`
- `normalized_green_mean`
- `normalized_blue_mean`

In [6]:
check_result = ImagePropertyDrift(
    image_properties=['area', 'aspect_ratio'], 
    default_number_of_bins=20
).run(train_dataset, test_dataset)

check_result