# Model Training for Intrusion Detection System

This notebook covers the steps for loading preprocessed data, training machine learning models, and saving the trained models.

In [1]:
# Import necessary libraries
import pandas as pd
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report
import joblib


## Load Preprocessed Data

We'll load the preprocessed training and validation datasets from CSV files.

In [2]:
# Load the preprocessed datasets
X_train = pd.read_csv('X_train.csv')
X_val = pd.read_csv('X_val.csv')
y_train = pd.read_csv('y_train.csv').values.ravel()
y_val = pd.read_csv('y_val.csv').values.ravel()

# Display the shape of the datasets
X_train.shape, X_val.shape, y_train.shape, y_val.shape

## Train Machine Learning Models

We'll train two machine learning models: Random Forest and Gradient Boosting.

In [3]:
# Train a Random Forest Classifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Train a Gradient Boosting Classifier
gb_model = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb_model.fit(X_train, y_train)

## Evaluate the Models

We'll evaluate the models on the validation set and compare their performance.

In [4]:
# Make predictions on the validation set
rf_val_preds = rf_model.predict(X_val)
gb_val_preds = gb_model.predict(X_val)

# Calculate accuracy
rf_accuracy = accuracy_score(y_val, rf_val_preds)
gb_accuracy = accuracy_score(y_val, gb_val_preds)

rf_accuracy, gb_accuracy

(0.9448529411764706, 0.9501470588235294)

## Classification Reports

We'll generate classification reports for both models to evaluate their performance in detail.

In [5]:
# Generate classification reports
rf_class_report = classification_report(y_val, rf_val_preds)
gb_class_report = classification_report(y_val, gb_val_preds)

rf_class_report

'              precision    recall  f1-score   support\n\n           0       0.97      0.99      0.98      7460\n           1       0.99      0.98      0.98      7840\n\n    accuracy                           0.98     15300\n   macro avg       0.98      0.98      0.98     15300\nweighted avg       0.98      0.98      0.98     15300\n'

## Save the Trained Models

We'll save the trained models to disk using the `joblib` library.

In [6]:
# Save the trained models
joblib.dump(rf_model, 'random_forest_model.joblib')
joblib.dump(gb_model, 'gradient_boosting_model.joblib')

## Summary

In this notebook, we:
1. Loaded the preprocessed training and validation datasets.
2. Trained two machine learning models: Random Forest and Gradient Boosting.
3. Evaluated the models on the validation set and compared their performance.
4. Saved the trained models for future use.