<a href="https://colab.research.google.com/github/Raziasultan-786/machine-learning-01/blob/main/CICIDS_2017_ML_Analysis_Part4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CMP7239 Applied Machine Learning Assignment - Part 4
## Model Evaluation and Performance Analysis

**Note:** This is the final part of the analysis. Run Parts 1, 2, and 3 before running this notebook.


## 7. Model Performance Evaluation

In [None]:
def evaluate_model_performance(y_true, y_pred, y_pred_proba, model_name, target_encoder):
    """
    Comprehensive model evaluation with all required metrics
    """
    print(f"\n=== {model_name.upper()} PERFORMANCE EVALUATION ===")
    print("=" * 60)

    # Basic metrics
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average='weighted')
    recall = recall_score(y_true, y_pred, average='weighted')
    f1 = f1_score(y_true, y_pred, average='weighted')

    print(f"Accuracy:  {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall:    {recall:.4f}")
    print(f"F1-Score:  {f1:.4f}")

    # ROC-AUC (for multiclass)
    try:
        roc_auc = roc_auc_score(y_true, y_pred_proba, multi_class='ovr', average='weighted')
        print(f"ROC-AUC:   {roc_auc:.4f}")
    except Exception as e:
        print(f"ROC-AUC:   Could not calculate ({e})")
        roc_auc = None

    # Detailed classification report
    print("\nDetailed Classification Report:")
    print("-" * 60)
    class_names = target_encoder.classes_
    report = classification_report(y_true, y_pred, target_names=class_names)
    print(report)

    # Confusion Matrix
    cm = confusion_matrix(y_true, y_pred)

    # Plot Confusion Matrix
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
                xticklabels=class_names, yticklabels=class_names)
    plt.title(f'{model_name} - Confusion Matrix')
    plt.xlabel('Predicted Label')
    plt.ylabel('True Label')
    plt.tight_layout()
    plt.show()

    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'roc_auc': roc_auc,
        'confusion_matrix': cm
    }

# Evaluate all models
model_results = {}

if 'rf_pred' in locals():
    model_results['Random Forest'] = evaluate_model_performance(
        y_test, rf_pred, rf_pred_proba, 'Random Forest', target_encoder
    )

if 'svm_pred' in locals():
    model_results['SVM'] = evaluate_model_performance(
        y_test, svm_pred, svm_pred_proba, 'SVM', target_encoder
    )

if 'third_pred' in locals():
    model_name = 'XGBoost' if XGBOOST_AVAILABLE else 'Logistic Regression'
    model_results[model_name] = evaluate_model_performance(
        y_test, third_pred, third_pred_proba, model_name, target_encoder
    )

## 8. ROC Curves Comparison

In [None]:
def plot_roc_curves(models_data, y_true, target_encoder):
    """
    Plot ROC curves for all models
    """
    print("\n=== ROC CURVES COMPARISON ===")

    n_classes = len(target_encoder.classes_)
    class_names = target_encoder.classes_

    # Create binary labels for each class
    y_true_binary = np.eye(n_classes)[y_true]

    # Plot ROC curve for each class
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    axes = axes.ravel()

    colors = ['blue', 'red', 'green', 'orange', 'purple']

    for class_idx in range(min(n_classes, 4)):  # Plot up to 4 classes
        ax = axes[class_idx]

        for i, (model_name, data) in enumerate(models_data.items()):
            if 'pred_proba' in data:
                y_score = data['pred_proba'][:, class_idx]
                fpr, tpr, _ = roc_curve(y_true_binary[:, class_idx], y_score)
                roc_auc = auc(fpr, tpr)

                ax.plot(fpr, tpr, color=colors[i % len(colors)], lw=2,
                       label=f'{model_name} (AUC = {roc_auc:.3f})')

        ax.plot([0, 1], [0, 1], 'k--', lw=2, label='Random Classifier')
        ax.set_xlim([0.0, 1.0])
        ax.set_ylim([0.0, 1.05])
        ax.set_xlabel('False Positive Rate')
        ax.set_ylabel('True Positive Rate')
        ax.set_title(f'ROC Curve - {class_names[class_idx]}')
        ax.legend(loc="lower right")
        ax.grid(True)

    plt.tight_layout()
    plt.show()

# Prepare data for ROC curves
roc_data = {}
if 'rf_pred_proba' in locals():
    roc_data['Random Forest'] = {'pred_proba': rf_pred_proba}
if 'svm_pred_proba' in locals():
    roc_data['SVM'] = {'pred_proba': svm_pred_proba}
if 'third_pred_proba' in locals():
    model_name = 'XGBoost' if XGBOOST_AVAILABLE else 'Logistic Regression'
    roc_data[model_name] = {'pred_proba': third_pred_proba}

if roc_data and 'y_test' in locals():
    plot_roc_curves(roc_data, y_test, target_encoder)
else:
    print("Cannot plot ROC curves - model predictions not available")

## 9. Model Comparison Summary

In [None]:
def create_model_comparison_table(model_results):
    """
    Create a comprehensive comparison table of all models
    """
    print("\n=== MODEL COMPARISON SUMMARY ===")
    print("=" * 80)

    if not model_results:
        print("No model results available for comparison")
        return

    # Create comparison DataFrame
    comparison_data = []
    for model_name, metrics in model_results.items():
        comparison_data.append({
            'Model': model_name,
            'Accuracy': f"{metrics['accuracy']:.4f}",
            'Precision': f"{metrics['precision']:.4f}",
            'Recall': f"{metrics['recall']:.4f}",
            'F1-Score': f"{metrics['f1_score']:.4f}",
            'ROC-AUC': f"{metrics['roc_auc']:.4f}" if metrics['roc_auc'] else 'N/A'
        })

    comparison_df = pd.DataFrame(comparison_data)
    print(comparison_df.to_string(index=False))

    # Find best model for each metric
    print("\n=== BEST PERFORMING MODELS ===")
    print("-" * 40)

    metrics_to_compare = ['accuracy', 'precision', 'recall', 'f1_score', 'roc_auc']

    for metric in metrics_to_compare:
        if metric == 'roc_auc':
            # Handle cases where ROC-AUC might be None
            valid_results = {k: v for k, v in model_results.items() if v[metric] is not None}
            if valid_results:
                best_model = max(valid_results.keys(), key=lambda k: valid_results[k][metric])
                best_score = valid_results[best_model][metric]
                print(f"Best {metric.upper()}: {best_model} ({best_score:.4f})")
        else:
            best_model = max(model_results.keys(), key=lambda k: model_results[k][metric])
            best_score = model_results[best_model][metric]
            print(f"Best {metric.upper()}: {best_model} ({best_score:.4f})")

    # Overall recommendation
    print("\n=== OVERALL RECOMMENDATION ===")
    print("-" * 40)

    # Calculate average rank for each model
    model_ranks = {}
    for model_name in model_results.keys():
        ranks = []
        for metric in ['accuracy', 'precision', 'recall', 'f1_score']:
            sorted_models = sorted(model_results.keys(),
                                 key=lambda k: model_results[k][metric], reverse=True)
            ranks.append(sorted_models.index(model_name) + 1)
        model_ranks[model_name] = np.mean(ranks)

    best_overall = min(model_ranks.keys(), key=lambda k: model_ranks[k])
    print(f"Best Overall Model: {best_overall}")
    print(f"Average Rank: {model_ranks[best_overall]:.2f}")

    return comparison_df

# Create model comparison
if model_results:
    comparison_table = create_model_comparison_table(model_results)
else:
    print("No model results available for comparison")

## 10. Feature Importance Analysis

In [None]:
def plot_feature_importance(model, feature_names, model_name, top_n=20):
    """
    Plot feature importance for tree-based models
    """
    if hasattr(model, 'feature_importances_'):
        print(f"\n=== {model_name.upper()} FEATURE IMPORTANCE ===")

        # Get feature importances
        importances = model.feature_importances_

        # Create DataFrame for easier handling
        feature_importance_df = pd.DataFrame({
            'feature': feature_names,
            'importance': importances
        }).sort_values('importance', ascending=False)

        # Display top features
        print(f"\nTop {top_n} Most Important Features:")
        print("-" * 50)
        for i, (_, row) in enumerate(feature_importance_df.head(top_n).iterrows()):
            print(f"{i+1:2d}. {row['feature']}: {row['importance']:.4f}")

        # Plot feature importance
        plt.figure(figsize=(12, 8))
        top_features = feature_importance_df.head(top_n)
        plt.barh(range(len(top_features)), top_features['importance'])
        plt.yticks(range(len(top_features)), top_features['feature'])
        plt.xlabel('Feature Importance')
        plt.title(f'{model_name} - Top {top_n} Feature Importances')
        plt.gca().invert_yaxis()
        plt.tight_layout()
        plt.show()

        return feature_importance_df
    else:
        print(f"{model_name} does not support feature importance analysis")
        return None

# Plot feature importance for applicable models
if 'X' in locals():
    feature_names = X.columns.tolist()

    if 'rf_model' in locals():
        rf_importance = plot_feature_importance(rf_model, feature_names, 'Random Forest')

    if 'third_model' in locals() and XGBOOST_AVAILABLE:
        xgb_importance = plot_feature_importance(third_model, feature_names, 'XGBoost')
else:
    print("Cannot analyze feature importance - feature names not available")

## 11. Conclusions and Recommendations

### Summary of Results

This comprehensive analysis of the CICIDS 2017 dataset for network intrusion detection has provided valuable insights:

#### Dataset Characteristics:
- **Domain**: Cybersecurity - Network Intrusion Detection
- **Dataset**: CICIDS 2017 with multiple attack types
- **Features**: Network traffic characteristics and flow statistics
- **Classes**: Various attack types and normal traffic

#### Data Preprocessing:
- ✅ **Missing Values**: Handled using median/mode imputation
- ✅ **Duplicates**: Removed duplicate records
- ✅ **Infinite Values**: Replaced with appropriate bounds
- ✅ **Categorical Encoding**: Applied Label Encoding
- ✅ **Feature Scaling**: Applied StandardScaler normalization

#### Exploratory Data Analysis:
- ✅ **Summary Statistics**: Comprehensive feature analysis
- ✅ **Class Distribution**: Analyzed class imbalances
- ✅ **Correlation Analysis**: Identified feature relationships
- ✅ **Visualizations**: Class distributions and correlation heatmaps

#### Machine Learning Models:
- ✅ **Random Forest**: Implemented with hyperparameter tuning
- ✅ **Support Vector Machine**: Optimized with Grid Search
- ✅ **XGBoost/Logistic Regression**: Third algorithm implementation

#### Performance Evaluation:
- ✅ **Confusion Matrix**: Detailed classification analysis
- ✅ **Accuracy, Precision, Recall, F1-Score**: Comprehensive metrics
- ✅ **ROC-AUC**: Multi-class performance assessment
- ✅ **ROC Curves**: Visual performance comparison

#### Key Findings:
1. **Best Performing Model**: [Results will show after execution]
2. **Feature Importance**: Network flow characteristics are crucial
3. **Class Imbalance**: Some attack types are underrepresented
4. **Model Comparison**: Ensemble methods typically perform better

#### Recommendations for Production:
1. **Model Selection**: Use the best performing model based on F1-score
2. **Feature Engineering**: Focus on top important features
3. **Class Balancing**: Consider SMOTE or other techniques for imbalanced classes
4. **Real-time Implementation**: Optimize for low-latency detection
5. **Continuous Learning**: Implement model updates with new attack patterns

### Technical Implementation Notes:
- All code is well-documented with inline comments
- Functions are modular and reusable
- Visualizations are publication-ready
- Results are clearly formatted for report inclusion

**Note**: Run all notebook cells in sequence to see the complete analysis results and performance metrics.
