In [None]:
%matplotlib inline

Image Property Outliers {#plot_vision_image_property_outliers}
=======================

This notebooks provides an overview for using and understanding the
image property outliers check, used to detect outliers in simple image
properties in a dataset.

**Structure:**

-   [Why Check for Outliers?](#why-check-for-outliers)
-   [How Does the Check Work?](#how-does-the-check-work)
-   [Which Image Properties Are Used?](#which-image-properties-are-used)
-   [Run the Check](#run-the-check)

Why Check for Outliers?
-----------------------

Examining outliers may help you gain insights that you couldn\'t have
reached from taking an aggregate look or by inspecting random samples.
For example, it may help you understand you have some corrupt samples
(e.g. an image that is completely black), or samples you didn\'t expect
to have (e.g. extreme aspect ratio). In some cases, these outliers may
help debug some performance discrepancies (the model can be excused for
failing on a totally dark image). In more extreme cases, the outlier
samples may indicate the presence of samples interfering with the
model\'s training by teaching the model to fit \"irrelevant\" samples.

How Does the Check Work?
------------------------

Ideally we would like to directly find images which are outliers, but
this is computationally expensive and does not have a clear and
explainable results. Therefore, we use image properties in order to find
outliers (such as brightness, aspect ratio etc.) which are much more
efficient to compute, and each outlier is easily explained.

We use [Interquartile
Range](https://en.wikipedia.org/wiki/Interquartile_range#Outliers) to
define our upper and lower limit for the properties\' values.

### Which Image Properties Are Used?

By default the checks use the built-in image properties, and it\'s also
possible to replace the default properties with custom ones. For the
list of the built-in image properties and explanation about custom
properties refer to
`vision properties </user-guide/vision/vision_properties>`{.interpreted-text
role="doc"}.


Run the Check
=============

For the example we will load COCO object detection data, and will run
the check with the default properties.


In [4]:
import sys
!{sys.executable} -m pip install deepchecks --quiet --upgrade # --user

In [1]:
from deepchecks.vision.checks import ImagePropertyOutliers
from deepchecks.vision.datasets.detection.coco import load_dataset

train_data = load_dataset(train=True, object_type='VisionData')
check = ImagePropertyOutliers()
result = check.run(train_data)
result

  0%|          | 0/6984509 [00:00<?, ?it/s]



VBox(children=(HTML(value='<h4><b>Image Property Outliers</b></h4>'), HTML(value='<p>Find outliers images with…

To display the results in an IDE like PyCharm, you can use the following
code:


In [None]:
#  result.show_in_window()

The result will be displayed in a new window.


Observe Graphic Result
======================

The check shows a section for each property. In each section we show the
number of outliers and the non-outlier property range, and also the
images with the lowest and highest values for the property.

For example in property \"RMS Contrast\" we can see that only 3 outliers
were found, 1 below the normal property range and 2 above. Now we can
inspect these images and decide if we wish to ignore these kinds of
samples or if we would like the model to be able to support them, in
which case we may take a close look into the model\'s predictions on
these samples.

Observe Result Value
====================

The check returns CheckResult object with a property \'value\' on it
which contain the information that was calculated in the check\'s run.


In [2]:
result.value

{'Aspect Ratio': {'indices': [6, 4, 27, 40, 63, 31, 8, 46, 10, 32, 22],
  'lower_limit': 0.340625,
  'upper_limit': 1.3029296874999998},
 'Area': {'indices': [60, 61, 44, 25, 62, 13, 6, 58, 50, 11, 26, 14, 45],
  'lower_limit': 220800.0,
  'upper_limit': 359040.0},
 'Brightness': {'indices': [54, 55, 47, 38, 62, 28],
  'lower_limit': 0.23778584214186751,
  'upper_limit': 0.6858694940161068},
 'RMS Contrast': {'indices': [54, 56, 61],
  'lower_limit': 0.09993963741568856,
  'upper_limit': 0.36929402509717535},
 'Mean Red Relative Intensity': {'indices': [50, 37, 36, 60, 61, 55],
  'lower_limit': 0.24169391794555903,
  'upper_limit': 0.4769510114694686},
 'Mean Green Relative Intensity': {'indices': [61, 3, 60, 63, 48, 54, 50],
  'lower_limit': 0.28084770328411535,
  'upper_limit': 0.4030514973864122},
 'Mean Blue Relative Intensity': {'indices': [60, 55, 50],
  'lower_limit': 0.15795800862207085,
  'upper_limit': 0.41322135317957304}}

In [4]:
dir(train_data)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__slotnames__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_classes_indices',
 '_current_index',
 '_data_loader',
 '_get_classes_error',
 '_get_data_loader_copy',
 '_get_data_loader_props',
 '_get_data_loader_sequential',
 '_image_formatter_error',
 '_label_formatter_error',
 '_label_map',
 '_num_classes',
 '_sampler',
 '_transform_field',
 'assert_images_valid',
 'assert_labels_valid',
 'batch_of_index',
 'batch_to_images',
 'batch_to_labels',
 'classes_indices',
 'copy',
 'data_dimension',
 'data_loader',
 'from_dataset',
 'get_augmented_dataset',
 'get_classes',
 'get_transform_type',
 'has_images',
 'has_labels',
 'infer_on_batch',


In [13]:
train_data.to_dataset_index(0,1,2)

[38, 24, 48]