Project Overview This project implements a robust code capable of identifying malicious and benign network traffic flows using the CICIDS2017 dataset. The code includes from-scratch implementations of Logistic Regression and SVM with comprehensive evaluation metrics.
Features From-Scratch Implementations: Logistic Regression and SVM with gradient descent Comprehensive EDA: Data exploration, preprocessing, and visualization Hyperparameter Tuning: Systematic optimization of model parameters Adversarial Testing: Robustness evaluation against noisy data Model Interpretability: Feature importance analysis Performance Metrics: Accuracy, ROC curves, confusion matrices
Dataset CICIDS2017 - Friday Working Hours Morning Samples: 191,033 network flows Features: 78 network traffic characteristics
Classes: BENIGN (189,067 samples) Bot (1,966 samples) Imbalance: 99% benign vs 1% malicious traffic
Key Features Used: Destination Port Flow Duration Packet statistics (Fwd/Bwd) Inter-Arrival Times TCP Flags Window sizes Flow timing characteristics
Installation & Setup: Prerequisites Python 3.8+ pip install -r requirements.txt
Required Packages: pip install pandas numpy matplotlib seaborn scikit-learn jupyter
Dataset Setup Download the CICIDS2017 dataset from University of New Brunswick Extract and place Friday-WorkingHours-Morning.pcap_ISCX.csv in the project root The system automatically handles preprocessing and cleaning
Results: Key Findings Class Imbalance Challenge: Severe imbalance (99:1) affects minority class detection Feature Importance: Backward traffic timing features are most discriminative Robustness: Both models maintain performance under adversarial noise SVM Superiority: Better overall performance and generalization
Visualizations Generated Label distribution charts Training loss curves Confusion matrices ROC curves Feature importance plots Adversarial robustness charts