<a href="https://colab.research.google.com/github/ajit-rajput/misc-notebooks/blob/main/network_traffic_classification_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🚀 Network Traffic Classification (CICIDS2017 Example)

This Colab notebook demonstrates how to build a **network traffic
classifier** using machine learning.  
We will classify flows into categories (e.g., HTTP, FTP, DoS, etc.).

------------------------------------------------------------------------

## 📌 Step 1: Setup Environment

``` python
!pip install scikit-learn pandas matplotlib seaborn
```

------------------------------------------------------------------------

## 📌 Step 2: Import Libraries

``` python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
```

------------------------------------------------------------------------

## 📌 Step 3: Load Dataset

> ⚠️ Note: CICIDS2017 is large. Replace the path below with your
> uploaded dataset (CSV).

``` python
# Example: Mount Google Drive if dataset is stored there
from google.colab import drive
drive.mount('/content/drive')

# Replace with actual path to CICIDS2017 dataset
data_path = "/content/drive/MyDrive/CICIDS2017/Friday-WorkingHours-Afternoon-DDos.pcap_ISCX.csv"
df = pd.read_csv(data_path)

df.head()
```

------------------------------------------------------------------------

## 📌 Step 4: Preprocessing

``` python
# Drop non-numeric / irrelevant columns if needed
df = df.dropna()  # drop missing values

# Encode categorical labels
le = LabelEncoder()
df['Label'] = le.fit_transform(df['Label'])

# Features and labels
X = df.drop("Label", axis=1, errors="ignore")
y = df["Label"]

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)
```

------------------------------------------------------------------------

## 📌 Step 5: Train Model

``` python
model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
model.fit(X_train, y_train)
```

------------------------------------------------------------------------

## 📌 Step 6: Evaluate Model

``` python
y_pred = model.predict(X_test)

print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

# Plot confusion matrix heatmap
plt.figure(figsize=(8,6))
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix")
plt.show()
```

------------------------------------------------------------------------

## ✅ Next Steps

-   Try **XGBoost** or **Neural Networks**
-   Compare accuracy/F1-scores
-   Feature importance analysis
-   Deploy trained model for real-time classification