## Isolation Forest for Anomaly Detection
**Objective**: Understand and apply the Isolation Forest algorithm to identify anomalies in datasets.

### Task: Anomaly Detection in Network Traffic
**Steps**:
1. Extract Features from Dataset:
    - Load `network_traffic.csv` .
2. Isolation Forest Model
3. Display Anomalies

In [1]:
# write your code from here
import pandas as pd
from sklearn.ensemble import IsolationForest

# Step 1: Load dataset and extract features
df = pd.read_csv('/workspaces/AI_DATA_ANALYSIS_/src/Module 11/Hands-on - AI-Based Data Quality & Real-Time Monitoring/network_traffic.csv')

# For this example, let's assume the dataset has numeric features suitable for anomaly detection
# You can customize feature selection as needed; for now, we'll use all numeric columns
X = df.select_dtypes(include=['float64', 'int64'])

# Step 2: Initialize and fit Isolation Forest model
iso_forest = IsolationForest(n_estimators=100, contamination=0.05, random_state=42)
iso_forest.fit(X)

# Step 3: Predict anomalies
df['anomaly'] = iso_forest.predict(X)  # 1 = normal, -1 = anomaly

# Display anomalies
anomalies = df[df['anomaly'] == -1]

print(f"Number of anomalies detected: {len(anomalies)}")
print(anomalies)

# Optional: Save anomalies to a CSV
anomalies.to_csv('network_traffic_anomalies.csv', index=False)


Number of anomalies detected: 1
    duration  src_bytes  dst_bytes  wrong_fragment  urgent  count  srv_count  \
14      1000      50000      60000               0       0     50         50   

    anomaly  
14       -1  
