Skip to content

PCopath/Autoencoder-NIDS-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Autoencoder-based Network Intrusion Detection System (NIDS)

This project implements a Network Intrusion Detection System (NIDS) using a Deep Learning Autoencoder architecture. It is designed to detect anomalies in network traffic using the UNSW-NB15 dataset. The system learns the pattern of "normal" network traffic and flags any traffic with high reconstruction error as an attack/anomaly.

📖 Overview

Traditional signature-based IDS requires a database of known threats. This project uses an anomaly detection approach:

  1. Training: The Autoencoder is trained only on normal network traffic data. It learns to compress (encode) and reconstruct (decode) these normal patterns efficiently.
  2. Detection: When new data comes in, the model attempts to reconstruct it.
    • Normal Traffic: Low reconstruction error (the model knows this pattern).
    • Attack Traffic: High reconstruction error (the model has never seen this pattern before).

🚀 Key Features

  • Robust Data Preprocessing: Handles categorical data (One-Hot Encoding), scaling (MinMax/Standardization), and cleaning of the complex UNSW-NB15 dataset.
  • Deep Autoencoder Architecture: Designed with TensorFlow/Keras, using symmetrical encoder-decoder layers to capture latent features of network flow.
  • Dynamic Thresholding: Automatically calculates the optimal threshold for anomaly detection based on statistical analysis of reconstruction errors (e.g., Mean + 2*Std, 95th/99th Percentile, or optimized F1-score balance).
  • Comprehensive Visualization:
    • Loss Curves: To monitor training progress and check for overfitting.
    • Reconstruction Error Histograms: To visually inspect the separation between normal and attack traffic distributions.
    • Confusion Matrix: To evaluate the classification performance (TP, FP, TN, FN).

📂 Project Structure

├── main.py                 # Main entry point: manages the full pipeline (load -> train -> evaluate).
├── preprocess_and_save.py  # Standalone script for data preprocessing and saving to CSV.
├── requirements.txt        # List of python dependencies.
├── src/
│   ├── data_loader.py      # Functions for loading raw CSVs and saving processed data.
│   ├── model.py            # Definition of the Autoencoder class/model architecture.
│   └── utils.py            # Helper functions for plotting and metrics calculation.
├── outputs/                # Directory where generated graphs and models are saved.
├── data/                   # Directory for processed intermediate data (excluded from git).
└── UNSW_NB15/              # Directory for raw dataset files (excluded from git).

🛠️ Installation

  1. Clone the repository:

    git clone https://github.com/PCopath/Autoencoder-NIDS-Project.git
    cd Autoencoder-NIDS-Project
  2. Install dependencies: It is recommended to use a virtual environment (conda or venv).

    pip install -r requirements.txt
  3. Download the Dataset:

    • Download the UNSW-NB15 dataset (CSV files).
    • Place the following files inside the UNSW_NB15/ folder in the project root:
      • UNSW_NB15_training-set.csv
      • UNSW_NB15_testing-set.csv
    • Note: The dataset is too large to be included in this repository.

▶️ Usage

1. Full Pipeline (Recommended)

Run the main script to load data, train the model, and evaluate results:

python main.py

This script checks if processed data exists; if not, it performs preprocessing automatically.

2. Preprocessing Only

If you want to prepare the data without training (useful for debugging or preparing large datasets):

python preprocess_and_save.py

📊 Results & Interpretation

After running the model, check the outputs/ directory for:

  • loss_curve.png: Shows the training and validation loss (Mean Squared Error) over epochs. A decreasing curve indicates the model is learning to reconstruct normal traffic.
  • reconstruction_error_hist.png: A histogram showing the distribution of errors. You should ideally see two distinct peaks: one for Normal traffic (left, low error) and one for Attacks (right, high error). The vertical line represents the calculated threshold.
  • confusion_matrix.png: Displays the accuracy of the classification on the test set.

📝 Technical Notes

  • Unsupervised/Semi-supervised Learning: The labels (Attack/Normal) are stripped during training. The model is trained purely on X_train (Normal data). Labels are only used at the end to evaluate how well the anomaly detection worked.
  • Threshold Selection: The system calculates multiple potential thresholds. The "Balanced" threshold attempts to maximize the F1-Score while keeping the False Positive Rate (FPR) low.
  • Git Ignore: Large files (raw CSVs in UNSW_NB15/ and processed CSVs in data/) are ignored to keep the repository lightweight.

📜 License

This project is open-source. Feel free to modify and use it for educational or research purposes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages