# Outlier Detection - Isolation Forest

Isolation Forest Algorithm.

Return the anomaly score of each sample using the IsolationForest algorithm

The IsolationForest ‘isolates’ observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.

Since recursive partitioning can be represented by a tree structure, the number of splittings required to isolate a sample is equivalent to the path length from the root node to the terminating node.

This path length, averaged over a forest of such random trees, is a measure of normality and our decision function.

Random partitioning produces noticeably shorter paths for anomalies. Hence, when a forest of random trees collectively produce shorter path lengths for particular samples, they are highly likely to be anomalies.

User Guide: https://scikit-learn.org/stable/modules/outlier_detection.html#isolation-forest

## Dependencies

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Read data and remove missing value

In [2]:
df = pd.read_csv('/work/Nov2Temp.csv')
df

Unnamed: 0,high,low
0,58,25
1,26,11
2,53,24
3,60,37
4,67,42
...,...,...
115,99,33
116,99,27
117,18,38
118,15,51


In [3]:
df.drop([72, 79], inplace = True)

## Detect outliers using IsolationForest

In [6]:
from sklearn.ensemble import IsolationForest

clf = IsolationForest(contamination = 0.05).fit(df)
preds = clf.fit_predict(df)
df[preds == -1]



Unnamed: 0,high,low
81,18,-1
111,48,99
112,43,99
113,64,99
116,99,27
118,15,51
