# 🔍 AI Detective - Catching the Cyber Criminal! 🕵️‍♂️

Welcome, AI investigators! Your mission is to **catch the cybercriminal** behind a series of hacking attempts.

You will use **Decision Trees, Random Forests, and XGBoost** to analyze past cyber activities and predict future attacks.

**Let's begin! 🚀**


In [1]:
# 📌 Step 1: Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.ensemble import IsolationForest
from sklearn.model_selection import GridSearchCV

print('Libraries imported successfully! ✅')

Libraries imported successfully! ✅


## 📂 Step 2: Load the Cyber Crime Dataset
Let's load the dataset and inspect the first few rows.

In [None]:
# df = pd.read_csv()
# df.head()

## 📊 Step 3: Exploratory Data Analysis (EDA)
Let's analyze the dataset to understand the patterns of cyber crimes.

In [3]:
# Check for missing values


In [None]:
# Visualizing correlation heatmap


## 🏗 Step 4: Prepare Data for Training
We need to split the data into training and testing sets.

In [5]:
# Convert timestamp to datetime

# Extract useful time-based features

# Drop the original timestamp column

# Define features and target

# train - test split 

## 🌳 Step 5: Train a Decision Tree Model
Let's train a **Decision Tree Classifier** to predict cyber crimes.

In [7]:
# train the Decision Tree Classifier  

print('Decision Tree Accuracy:')

Decision Tree Accuracy:


## 🌲 Step 6: Train a Random Forest Model
Let's improve our model using **Random Forest**.

In [8]:
# train the Random Forest Classifier 

print('Random Forest Accuracy:')

Random Forest Accuracy:


## ⚡ Step 7: Train an XGBoost Model
Let's use **XGBoost** for optimized performance.

In [9]:
# train the XGBoost Classifier 

print('XGBoost Accuracy:')

XGBoost Accuracy:


## 📊 Step 8: Compare Standard vs. Tuned Models
Let's analyze how hyperparameter tuning and model stacking affect performance.

In [13]:
# model_results = {
#     'Decision Tree': accuracy_score(),
#     'Random Forest': accuracy_score(),
#     'XGBoost': accuracy_score(),
# }

# results_df = pd.DataFrame(list(model_results.items()), columns=['Model', 'Accuracy'])
# print(results_df)

# plt.figure(figsize=(10,5))
# plt.bar(results_df['Model'], results_df['Accuracy'], color=['blue', 'green', 'red'])
# plt.xlabel('Models')
# plt.ylabel('Accuracy Score')
# plt.title('Model Performance Comparison')
# plt.xticks(rotation=45)
# plt.show()

## ⚙️ Step 9: Hyperparameter Tuning for Random Forest & XGBoost
Now let's **optimize** our models using GridSearchCV to find the best hyperparameters.

In [10]:
# Hyperparameter tuning for Random Forest 

# Apply Grid Search 
print('Best Random Forest Params:')

Best Random Forest Params:


In [11]:
# Hyperparameter tuning for XGBoost

# Apply Grid Search 
print('Best XGBoost Params:')

Best XGBoost Params:


## 🔥 Step 10: Build a Stacked Model
Now that we have optimized our models, let's combine them into an **ensemble model**.

In [12]:
# Build a stacked model using the best-tuned classifiers
# stacked_model = VotingClassifier()

# Train stacked model
print('Stacked Model Accuracy:')

Stacked Model Accuracy:


## 📊 Step 11: Final Model Comparison
Let's compare all models, including the stacked model.

In [14]:
# model_results = {
#     'Decision Tree': accuracy_score(),
#     'Random Forest': accuracy_score(),
#     'Tuned RF': accuracy_score(),
#     'XGBoost': accuracy_score(),
#     'Tuned XGBoost': accuracy_score(),
#     'Stacked Model': accuracy_score()
# }

# results_df = pd.DataFrame(list(model_results.items()), columns=['Model', 'Accuracy'])
# print(results_df)

# plt.figure(figsize=(10,5))
# plt.bar(results_df['Model'], results_df['Accuracy'], color=['blue', 'green', 'red', 'orange', 'purple', 'black'])
# plt.xlabel('Models')
# plt.ylabel('Accuracy Score')
# plt.title('Final Model Performance Comparison')
# plt.xticks(rotation=45)
# plt.show()

## 📝 Step 12: Final Questions
Please answer the following questions in the markdown cell below: <br>
1. **Model Comparison:** Which model performed best? Why do you think it performed better? <br>
2. **Hyperparameter Tuning:** How much did tuning improve the performance of Random Forest and XGBoost? <br>
3. **Stacked Model:** Did the stacked model outperform individual models? Why or why not? <br>
4. **Feature Importance:** Which features were most important in predicting cybercrime? Use the feature importance of XGBoost to analyze this. <br>
5. **Real-World Application:** How can this approach be used in real-world cybersecurity? <br> 

📌 Write your answers in the markdown cell below.