<a href="https://colab.research.google.com/github/araj07/YBI-PROJECT/blob/main/AI_for_Cybersecurity_Threat_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Creating a Python model for Cybersecurity Threat Detection using AI involves multiple steps depending on the specific type of threats (e.g., malware, phishing, intrusion detection, etc.). Here, I’ll give you a basic framework using machine learning (Random Forest) to detect network intrusions from a commonly used dataset: KDD Cup 99 / NSL-KDD.

✅ Step-by-Step Breakdown:

1.Dataset: We’ll use NSL-KDD, a refined version of KDD’99, suitable for anomaly detection in network traffic.

2.Model: We'll use a Random Forest Classifier for simplicity and interpretability.

3.Features: Preprocessing includes encoding, scaling, and splitting.

**Python Code: AI for Cybersecurity Threat Detection**

In [None]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

# Step 1: Load Dataset
# Download 'KDDTrain+.txt' from NSL-KDD and put path below
data_path = "KDDTrain+.txt"
column_names_path = "KDDFeatureNames.txt"

# Load column names
with open(column_names_path, 'r') as f:
    column_names = [line.split(":")[0] for line in f.readlines()]
column_names.append('target')

# Load dataset
df = pd.read_csv(data_path, names=column_names)

# Step 2: Encode labels
df['target'] = df['target'].apply(lambda x: 'normal' if x == 'normal' else 'threat')

# Step 3: Encode categorical features
cat_cols = ['protocol_type', 'service', 'flag']
df[cat_cols] = df[cat_cols].apply(LabelEncoder().fit_transform)

# Step 4: Feature/Target split
X = df.drop('target', axis=1)
y = df['target']
y = LabelEncoder().fit_transform(y)

# Step 5: Scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 6: Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)

# Step 7: Model Training
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Step 8: Evaluation
y_pred = model.predict(X_test)
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

**Dataset Links (You’ll need to download manually):** NSL-KDD Dataset: https://www.unb.ca/cic/datasets/nsl.html

Download these files:

KDDTrain+.txt

KDDTest+.txt

KDDFeatureNames.txt

**Extensions You Can Add:** Use deep learning (e.g., LSTM for time-based analysis).

Use unsupervised learning for anomaly detection (Isolation Forest, AutoEncoders).

Integrate with real-time systems using tools like Kafka or PyShark.

Build a dashboard with Streamlit or Dash.