# üõ°Ô∏è NetGuardian-AI: Complete IDS Pipeline (Part 2)

**Phases 4-7: Model Training, Evaluation, Simulation & Comparison**

This notebook continues from Part 1 with:
- Phase 4: Hybrid Model Training
- Phase 5: Model Evaluation
- Phase 6: Real-Time Simulation
- Phase 7: Model Comparison

---

<a id='phase4'></a>
## üéØ Phase 4: Hybrid Model Training

**Purpose**: Train a two-stage hybrid IDS system.

**Architecture**:
1. **Model 1 (Binary)**: Detects if traffic is normal or malicious
2. **Model 2 (Multi-Class)**: Identifies the specific attack type

**Why Hybrid?**
- **Faster**: Binary detection is quick, multi-class only runs on detected attacks
- **More Accurate**: Each model specializes in its task
- **Scalable**: Can handle high-volume traffic efficiently

### Step 4.1: Train Binary Detection Model

**Explanation**: XGBoost is chosen for its:
- **Speed**: Fast training and prediction
- **Accuracy**: Handles complex patterns well
- **Imbalance handling**: `scale_pos_weight` parameter balances classes

In [None]:
# Calculate class weight for imbalanced data
scale_pos_weight = len(y_binary_train[y_binary_train==0]) / len(y_binary_train[y_binary_train==1])
print(f"Scale pos weight: {scale_pos_weight:.2f}")

# Train binary classifier
print("\nüöÄ Training Model 1 (Binary Detection)...")
model1 = XGBClassifier(
    n_estimators=200,        # Number of trees
    max_depth=10,            # Maximum tree depth
    learning_rate=0.1,       # Step size shrinkage
    scale_pos_weight=scale_pos_weight,  # Handle class imbalance
    random_state=42,
    n_jobs=-1                # Use all CPU cores
)

model1.fit(X_train_scaled, y_binary_train)
print("‚úÖ Model 1 trained successfully")

### Step 4.2: Evaluate Binary Model

**Explanation**: We evaluate using multiple metrics:
- **Precision**: Of predicted attacks, how many are real?
- **Recall**: Of real attacks, how many did we catch?
- **F1-Score**: Harmonic mean of precision and recall
- **AUC-ROC**: Overall discrimination ability

In [None]:
# Make predictions
y_binary_pred = model1.predict(X_test_scaled)
y_binary_proba = model1.predict_proba(X_test_scaled)[:, 1]

# Evaluation
print("="*70)
print("MODEL 1: BINARY DETECTION")
print("="*70)
print(classification_report(y_binary_test, y_binary_pred, target_names=['Normal', 'Attack']))

# AUC-ROC
auc = roc_auc_score(y_binary_test, y_binary_proba)
print(f"\nAUC-ROC: {auc:.4f}")

In [None]:
# Confusion matrix visualization
cm = confusion_matrix(y_binary_test, y_binary_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Normal', 'Attack'],
            yticklabels=['Normal', 'Attack'])
plt.title('Confusion Matrix - Binary Detection', fontsize=14, fontweight='bold')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.tight_layout()
plt.show()

### Step 4.3: Train Multi-Class Model with SMOTE

**Explanation**: 
- We train only on detected attacks (not normal traffic)
- **SMOTE** (Synthetic Minority Over-sampling Technique) creates synthetic samples for minority classes
- This balances the dataset and improves detection of rare attacks

In [None]:
# Filter only attacks for training
attack_mask_train = y_binary_train == 1
X_train_attacks = X_train_scaled[attack_mask_train]
y_multi_train_attacks = y_multi_train[attack_mask_train]

print(f"Attack samples for training: {X_train_attacks.shape}")
print(f"\nClass distribution before SMOTE:")
print(y_multi_train_attacks.value_counts())

In [None]:
# Apply SMOTE to balance classes
print("\nüîÑ Applying SMOTE...")
smote = SMOTE(random_state=42, k_neighbors=5)
X_train_resampled, y_multi_resampled = smote.fit_resample(X_train_attacks, y_multi_train_attacks)

print(f"‚úÖ After SMOTE: {X_train_resampled.shape}")
print(f"\nClass distribution after SMOTE:")
print(pd.Series(y_multi_resampled).value_counts())

In [None]:
# Train multi-class classifier
print("\nüöÄ Training Model 2 (Multi-Class Classification)...")
model2 = XGBClassifier(
    n_estimators=300,
    max_depth=12,
    learning_rate=0.1,
    random_state=42,
    n_jobs=-1
)

model2.fit(X_train_resampled, y_multi_resampled)
print("‚úÖ Model 2 trained successfully")

### Step 4.4: Evaluate Multi-Class Model

**Explanation**: We evaluate only on attack samples from the test set to see how well we classify different attack types.

In [None]:
# Evaluate on test attacks
attack_mask_test = y_binary_test == 1
X_test_attacks = X_test_scaled[attack_mask_test]
y_multi_test_attacks = y_multi_test[attack_mask_test]

y_multi_pred = model2.predict(X_test_attacks)

print("="*70)
print("MODEL 2: MULTI-CLASS CLASSIFICATION")
print("="*70)
print(classification_report(y_multi_test_attacks, y_multi_pred, target_names=le.classes_))

In [None]:
# Multi-class confusion matrix
cm_multi = confusion_matrix(y_multi_test_attacks, y_multi_pred)

plt.figure(figsize=(10, 8))
sns.heatmap(cm_multi, annot=True, fmt='d', cmap='YlOrRd',
            xticklabels=le.classes_,
            yticklabels=le.classes_)
plt.title('Confusion Matrix - Multi-Class', fontsize=14, fontweight='bold')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

### Step 4.5: Create Hybrid IDS Class

**Explanation**: This class combines both models into a single prediction pipeline:
1. **Step 1**: Binary model checks if traffic is malicious
2. **Step 2**: If malicious, multi-class model identifies the attack type

In [None]:
class HybridIDS:
    """
    Two-stage hybrid intrusion detection system
    """
    
    def __init__(self, binary_model, multiclass_model, scaler, label_encoder):
        self.binary_model = binary_model
        self.multiclass_model = multiclass_model
        self.scaler = scaler
        self.le = label_encoder
    
    def predict(self, X):
        """
        Two-stage prediction
        
        Args:
            X: Features (DataFrame or array)
        
        Returns:
            List of prediction dictionaries
        """
        # Normalize
        X_scaled = self.scaler.transform(X)
        
        # Stage 1: Binary detection
        is_attack = self.binary_model.predict(X_scaled)
        binary_proba = self.binary_model.predict_proba(X_scaled)
        
        results = []
        
        for i, (attack_flag, proba) in enumerate(zip(is_attack, binary_proba)):
            if attack_flag == 0:
                # Normal traffic
                results.append({
                    'type': 'Normal_Traffic',
                    'confidence': float(proba[0]),
                    'is_attack': False
                })
            else:
                # Attack detected ‚Üí Stage 2: Classify
                attack_type_encoded = self.multiclass_model.predict(X_scaled[i:i+1])[0]
                attack_proba = self.multiclass_model.predict_proba(X_scaled[i:i+1])[0]
                attack_type = self.le.inverse_transform([attack_type_encoded])[0]
                
                results.append({
                    'type': attack_type,
                    'confidence': float(attack_proba.max()),
                    'is_attack': True
                })
        
        return results
    
    def predict_df(self, X):
        """Return predictions as DataFrame"""
        results = self.predict(X)
        return pd.DataFrame(results)

print("‚úÖ HybridIDS class created")

### Step 4.6: Test Hybrid System

**Explanation**: We test the complete hybrid system on a sample to verify it works correctly.

In [None]:
# Create hybrid system
hybrid_ids = HybridIDS(
    binary_model=model1,
    multiclass_model=model2,
    scaler=scaler,
    label_encoder=le
)

# Test on sample
sample_size = 100
X_sample = X_test.iloc[:sample_size]
y_true = df_clean.loc[X_sample.index, 'Attack_Type']

# Predictions
predictions = hybrid_ids.predict_df(X_sample)

# Compare with ground truth
comparison = pd.DataFrame({
    'True_Label': y_true.values,
    'Predicted_Label': predictions['type'].values,
    'Confidence': predictions['confidence'].values,
    'Is_Attack': predictions['is_attack'].values
})

print("Sample predictions:")
display(comparison.head(20))

# Calculate accuracy
correct = (comparison['True_Label'] == comparison['Predicted_Label']).sum()
accuracy = correct / len(comparison) * 100

print(f"\n{'='*70}")
print(f"HYBRID SYSTEM PERFORMANCE")
print(f"{'='*70}")
print(f"Accuracy: {accuracy:.2f}%")
print(f"Correct predictions: {correct}/{len(comparison)}")
print(f"{'='*70}")

### Step 4.7: Save Models

**Explanation**: We save all components for deployment and future use.

In [None]:
# Save individual models
joblib.dump(model1, 'model1_binary.pkl')
joblib.dump(model2, 'model2_multiclass.pkl')

# Save complete system
import pickle
hybrid_system = {
    'binary_model': model1,
    'multiclass_model': model2,
    'scaler': scaler,
    'label_encoder': le,
    'feature_names': X.columns.tolist()
}

with open('hybrid_ids_system.pkl', 'wb') as f:
    pickle.dump(hybrid_system, f)

print("‚úÖ All models saved successfully")
print("\nFiles created:")
print("  - model1_binary.pkl")
print("  - model2_multiclass.pkl")
print("  - hybrid_ids_system.pkl")
print("  - scaler.pkl")
print("  - label_encoder.pkl")

<a id='phase5'></a>
## üìä Phase 5: Model Evaluation & Robustness

**Purpose**: Thoroughly evaluate the hybrid system.

**What this phase does**:
1. Evaluates performance on clean test data
2. Tests robustness against noisy data (simulating real-world conditions)
3. Analyzes error patterns
4. Identifies weaknesses for improvement

### Step 5.1: Robustness Testing

**Explanation**: Real-world network traffic is never as clean as our dataset. We add Gaussian noise to simulate:
- Network jitter
- Measurement errors
- Hardware variations

A robust model should maintain performance even with noise.

In [None]:
def add_noise(X, noise_level=0.1):
    """
    Add Gaussian noise to features
    
    Args:
        X: Feature array
        noise_level: Standard deviation of noise (as fraction of feature std)
    
    Returns:
        Noisy feature array
    """
    noise = np.random.normal(0, noise_level, X.shape)
    X_noisy = X + noise * X.std().values
    return X_noisy

# Test with different noise levels
noise_levels = [0.0, 0.1, 0.2, 0.5]  # 0%, 10%, 20%, 50%
accuracies = []

print("üõ°Ô∏è Robustness Testing...")
print("="*50)

# Get ground truth
y_true_merged = df_clean.loc[X_test.index, 'Attack_Merged']

for nl in noise_levels:
    X_noisy = add_noise(X_test, noise_level=nl)
    res_noisy = hybrid_ids.predict_df(X_noisy)
    acc = accuracy_score(y_true_merged, res_noisy['type'])
    accuracies.append(acc)
    print(f"Noise {nl*100:3.0f}% ‚Üí Accuracy: {acc:.2%}")

print("="*50)

In [None]:
# Visualize robustness
plt.figure(figsize=(10, 6))
plt.plot(noise_levels, accuracies, marker='o', linestyle='-', color='red', linewidth=2, markersize=8)
plt.title('Model Robustness vs Noise Level', fontsize=14, fontweight='bold')
plt.xlabel('Noise Level (fraction of std)', fontsize=12)
plt.ylabel('Accuracy', fontsize=12)
plt.grid(True, alpha=0.3)
plt.ylim([0, 1.1])
plt.tight_layout()
plt.show()

print("\nüìä Interpretation:")
print("- If accuracy stays >90% with 10% noise: Excellent robustness")
print("- If accuracy drops <80% with 20% noise: Model may struggle in real-world")

### Step 5.2: Error Analysis

**Explanation**: Understanding where the model fails helps us improve it. We analyze:
- Which attack types are confused with each other
- False positives (normal traffic flagged as attack)
- False negatives (attacks missed)

In [None]:
# Analyze errors
predictions_full = hybrid_ids.predict_df(X_test)
results_df = pd.DataFrame({
    'True_Label': y_true_merged.values,
    'Predicted_Label': predictions_full['type'].values
})

# Find errors
errors = results_df[results_df['True_Label'] != results_df['Predicted_Label']]

print(f"Total errors: {len(errors):,} out of {len(results_df):,} ({len(errors)/len(results_df)*100:.2f}%)")
print("\nTop 5 confusion pairs (True ‚Üí Predicted):")
confusion_counts = errors.groupby(['True_Label', 'Predicted_Label']).size().sort_values(ascending=False).head(5)
print(confusion_counts)

print("\nüí° Insights:")
print("- If DoS_DDoS ‚Üî PortScan confusion: These attacks have similar network patterns")
print("- If Normal ‚Üí Attack: False positives (may annoy users)")
print("- If Attack ‚Üí Normal: False negatives (CRITICAL security risk!)")