# Image Property Drift

This notebooks provides an overview for using and understanding the image property drift check.

**Structure:**

- [What is the purpose of the check?](#purpose)
- [Prepare data](#prepare)
- [Run the check](#run-check)
- [Define a condition](#define-condition)
- [Check Parameters](#parameters)


## What is the purpose of the check? <a name='purpose'></a>

Data drift is simply a change in the distribution of data over time. It is also one of the top reasons of a machine learning model performance degrades over time.

In the context of machine learning, drift between the training set and the test set will likely make the model to be prone to errors. In other words, this means that the model was trained on data that is different from the current test data, thus it will probably make more mistakes predicting the target variable.

Image Property Drift check calculates a drift score for each image property in test dataset, by comparing its distribution to the train dataset. For this, we use the Earth Movers Distance (Wasserstein distance).

### Imports

In [1]:
from deepchecks.vision.datasets.detection import coco
from deepchecks.vision.checks.distribution import ImagePropertyDrift

train_dataset = coco.load_dataset(train=True, object_type='VisionData')
test_dataset = coco.load_dataset(train=False, object_type='VisionData')

check_result = ImagePropertyDrift().run(train_dataset, test_dataset)


Ability to import base tabular functionality from the `deepchecks` package directly is deprecated, please import from `deepchecks.tabular` instead



### Prepare data <a name='prepare'></a>

In [2]:
train_dataset = coco.load_dataset(train=True, object_type='VisionData')
test_dataset = coco.load_dataset(train=False, object_type='VisionData')

### Run the check <a name='run-check'></a>

In [3]:
check_result = ImagePropertyDrift().run(train_dataset, test_dataset)
check_result

### Observe the check’s output <a name='check-output'></a>

The result value is a pandas DataFrame that contains drift score for each image property.

In [4]:
check_result.value

{'Aspect Ratio': 0.05528212860648983,
 'Area': 0.042819242931547624,
 'Brightness': 0.05395373606950852,
 'RMS Contrast': 0.020397627575683232,
 'Normalized Red Mean': 0.02548689439641292,
 'Normalized Blue Mean': 0.03820480493421847,
 'Normalized Green Mean': 0.038332879702880115}

## Define a condition <a name='define-condition'></a>

We can define a condition that make sure that image properties drift scores do not exceed allowed threshold.

In [5]:
check_result = (
    ImagePropertyDrift()
    .add_condition_drift_score_not_greater_than(0.001)
    .run(train_dataset, test_dataset)
)
check_result.show(show_additional_outputs=False)

Status,Condition,More Info
✖,Earth Mover's Distance <= 0.001 for image properties drift,Earth Mover's Distance is above the threshold for the next properties: Aspect Ratio=0.06; Area=0.04; Brightness=0.05; RMS Contrast=0.02; Normalized Red Mean=0.03; Normalized Blue Mean=0.04; Normalized Green Mean=0.04


### Check Parameters <a name='parameters'></a>

Image Property Drift Check accepts two parameters that allows us to control the look of the output:
- `image_properties` - list of image properties that we are interested in
- `max_num_categories` - Maximal number of categories to use for the calculation of drift using PSI (Population Stability Index)

`image_properties` is a list of dictionary items, which each item is a property. Each property contains the following entries:
* name - name of property
* method - function which accepts list of numpy array as images, and returns list of primitive values (numbers, string, boolean)
* output_type - a is string of either 'continuous' or 'discrete'

In [6]:
from typing import List
import numpy as np

def area(images: List[np.ndarray]) -> List[int]:
    # Return list of integers of image areas (height multiplied by width)
    return [img.shape[0] * img.shape[1] for img in images]
    
def aspect_ratio(images: List[np.ndarray]) -> List[float]:
    # Return list of floats of image height to width ratio
    return [img.shape[0] / img.shape[1] for img in images]

properties = [
    {'name': 'Area', 'method': area, 'output_type': 'continuous'},
    {'name': 'Aspect Ratio', 'method': aspect_ratio, 'output_type': 'continuous'}
]

check_result = ImagePropertyDrift(
    alternative_image_properties=properties, 
    max_num_categories=20
).run(train_dataset, test_dataset)

check_result