# üîê IoT Intrusion Detection ‚Äì Anomaly Detection Notebook

This notebook presents the preprocessing, feature scaling, and anomaly detection
process applied to an IoT network traffic dataset using Isolation Forest.

**Goals:**
- Prepare the dataset
- Handle IP addresses
- Balance classes (SMOTE)
- Perform anomaly detection
- Evaluate results


In [None]:
import pandas as pd
import ipaddress
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import IsolationForest
from sklearn.metrics import classification_report, confusion_matrix
from imblearn.over_sampling import SMOTE


## 1. Load Dataset


In [None]:
df = pd.read_csv("balanced_dataset.csv")
df.head()


## 2. Convert IP Addresses to Numerical Format
Machine learning models cannot handle string-based IP addresses.
They are converted into integer representations.


In [None]:
def ip_to_int(ip):
    try:
        return int(ipaddress.ip_address(ip))
    except ValueError:
        return 0

df['ip.src'] = df['ip.src'].apply(ip_to_int)
df['ip.dst'] = df['ip.dst'].apply(ip_to_int)


## 3. Feature and Label Preparation


In [None]:
X = df.drop(columns=['label'])
y = df['label']


## 4. Binary Label Encoding
Normal traffic = 0  
Attack traffic = 1


In [None]:
df['label_binary'] = df['label'].apply(
    lambda x: 0 if 'normal' in str(x).lower() else 1
)

df['label_binary'].value_counts()


## 5. Feature Scaling


In [None]:
X = df.drop(['label', 'label_binary'], axis=1)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


## 6. Anomaly Detection using Isolation Forest


In [None]:
iso = IsolationForest(contamination=0.1, random_state=42)
df['anomaly_pred'] = iso.fit_predict(X_scaled)

df['anomaly_pred'] = df['anomaly_pred'].apply(
    lambda x: 0 if x == -1 else 1
)


## 7. Model Evaluation


In [None]:
print(confusion_matrix(df['label_binary'], df['anomaly_pred']))
print(classification_report(df['label_binary'], df['anomaly_pred']))


## 8. Anomaly Distribution


In [None]:
plt.figure(figsize=(6, 4))
sns.countplot(x='anomaly_pred', data=df)
plt.title("Detected Anomalies")
plt.xlabel("Prediction (0=Anomaly, 1=Normal)")
plt.ylabel("Count")
plt.show()
