# üìå 03 ‚Äì Model Training (Random Forest + Isolation Forest)

‚ÄúThis notebook trains both supervised and unsupervised models (RandomForest and IsolationForest) using the engineered features to detect web anomalies.‚Äù


*‚Äì Imports*

In [7]:
import sys
import os

# Add src/ to path
sys.path.append(os.path.abspath("../src"))

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, IsolationForest
from sklearn.metrics import classification_report
import joblib

from features import build_features


*‚Äî Load dataset + compute features :*

In [8]:
df = pd.read_csv("../data/csic_database.csv")

# Ensure label type
df["label"] = df["label"].astype(int)

# Build features
X, feature_cols = build_features(df)
y = df["label"]

print(f"‚úî Dataset loaded: {len(df)} samples")
print(f"‚úî Features used: {feature_cols}")

‚úî Dataset loaded: 123042 samples
‚úî Features used: ['url_length', 'param_count', 'has_special_chars', 'has_sql', 'method_encoded', 'payload_length', 'url_entropy']


*‚Äì Train/Test Split :*

In [9]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    stratify=y,
    random_state=42
)

*‚Äì Train RandomForest (supervised) :*

In [10]:
print("=== Training RandomForest ===")

rf = RandomForestClassifier(
    n_estimators=200,
    random_state=42
)

rf.fit(X_train, y_train)

print("\n=== RandomForest Evaluation ===")
print(classification_report(y_test, rf.predict(X_test)))

joblib.dump(rf, "../models/rf_model.joblib")
print("‚úî RandomForest saved ‚Üí ../models/rf_model.joblib")


=== Training RandomForest ===

=== RandomForest Evaluation ===
              precision    recall  f1-score   support

           0       0.80      0.98      0.88     17600
           1       0.89      0.39      0.54      7009

    accuracy                           0.81     24609
   macro avg       0.85      0.68      0.71     24609
weighted avg       0.83      0.81      0.78     24609

‚úî RandomForest saved ‚Üí ../models/rf_model.joblib


*‚Äì Train IsolationForest (unsupervised) :*

In [11]:
print("\n=== Training IsolationForest ===")

iso = IsolationForest(
    contamination=0.05,
    random_state=42
)

iso.fit(X_train)

joblib.dump(iso, "../models/iso_model.joblib")
print("‚úî IsolationForest saved ‚Üí ../models/iso_model.joblib")



=== Training IsolationForest ===
‚úî IsolationForest saved ‚Üí ../models/iso_model.joblib


In [12]:

print("\n Training completed successfully.")



 Training completed successfully.
