## Intro to anomaly detection with OpenCV, Computer Vision, and scikit-learn - PyImage Search by Adrian Rosebrock

Source: https://www.pyimagesearch.com/2020/01/20/intro-to-anomaly-detection-with-opencv-computer-vision-and-scikit-learn/

In this tutorial, we learn how to perform anomaly/novelty detection in image datasets (spot outliers and anomalies in your own image datasets) using OpenCV, Computer Vision, and the scikit-learn machine learning library.

How are machine learning algorithms, which tend to work optimally with balanced datasets, supposed to work when the anomalies we want to detect may only happen 1%, 0.1%, or 0.0001% of the time? Luckily, machine learning researchers have investigated this type of problem and have devised algorithms to handle the task.

Key Take-aways:
- Two types of events: Standard events and Anomaly events
- Anomaly detection algorithms are broken into two types:
    - Outlier detection: Includes Standard and Anomaly events in training data. Unsupervised learners used.
    - Novelty detection: Includes only labelled Standard events for training. Supervised learners used.
- Novelty detection is done in this tutorial.
- Isolation forests is the ensemble algorithm used. Isolation forests are a type of ensemble algorithm and consist of multiple decision trees used to partition the input dataset into distinct groups of inliers.
- The task is to determine the anomaly among three images when compared to 15 other Standard event images. 
    

        

In [1]:
# Import Relevant libraries
# Import relevant libraries
from imutils import paths
import numpy as np
import cv2


### Load and Process Data

Define a fuction to quantify and characterize the contents of input images using color histograms.

In [None]:
def quantify_image(image, bins=(4, 6, 3)):
    """Create color histograms to quantify and characterize the contents of input images"""
    
    # Compute a 3D color histogram over an images and normalize it
    hist = cv2.calcHist([image], [0, 1, 2], None, bins, 
                        [0, 180, 0, 256, 0, 256])
    hist = cv2.normalize(hist, hist).flatten()
    
    # Return the histogram
    return hist

Then we load the dataset and loop the image paths while quantifying them using the quantify_image method

In [None]:
# Grab the paths to all images in our dataset directory, then 
# initilaize our lists of images
image_paths = list(paths.list_images("C:\Users\user\Documents\Machine Learning\Education\intro-anomaly-detection\intro-anomaly-detection\forest"))
