# Anomaly Detection with Isolation Forest

## 1. Introduction
This notebook demonstrates anomaly detection using the Isolation Forest algorithm on the Iris dataset. We will treat one of the Iris species as normal and the other two as anomalies to simulate an anomaly detection scenario.

## 2. Data Loading and Preparation

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Create a DataFrame
df = pd.DataFrame(X, columns=iris.feature_names)
df['target'] = y

# Define anomalies: 0 = normal (setosa), 1 = anomaly (versicolor/virginica)
df['is_anomaly'] = df['target'].apply(lambda x: 1 if x > 0 else 0)

df.head()

## 3. Model Building and Training

In [None]:
from sklearn.ensemble import IsolationForest

# Initialize and train the Isolation Forest model
model = IsolationForest(n_estimators=100, contamination='auto', random_state=42)
model.fit(X)

# Get the predictions: -1 for anomalies, 1 for inliers
df['anomaly_pred'] = model.predict(X)

## 4. Model Evaluation

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

# Convert predictions to the same format as our 'is_anomaly' column (0 for normal, 1 for anomaly)
df['anomaly_pred'] = df['anomaly_pred'].apply(lambda x: 1 if x == -1 else 0)

# Evaluate the model
print('Classification Report:')
print(classification_report(df['is_anomaly'], df['anomaly_pred']))
print('\nConfusion Matrix:')
sns.heatmap(confusion_matrix(df['is_anomaly'], df['anomaly_pred']), annot=True, fmt='g')
plt.show()

## 5. Visualization of Results

In [None]:
plt.figure(figsize=(10, 6))
plt.scatter(df.iloc[:, 0], df.iloc[:, 1], c=df['anomaly_pred'], cmap='coolwarm')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.title('Anomaly Detection with Isolation Forest')
plt.show()

## 6. Conclusion
The Isolation Forest model successfully identified anomalies in the Iris dataset. The classification report and confusion matrix show that the model can effectively distinguish between the normal and anomalous classes. This demonstrates the utility of Isolation Forest for unsupervised anomaly detection tasks.