Autoencoder-based Network Intrusion Detection System (NIDS)

This project implements a Network Intrusion Detection System (NIDS) using a Deep Learning Autoencoder architecture. It is designed to detect anomalies in network traffic using the UNSW-NB15 dataset. The system learns the pattern of "normal" network traffic and flags any traffic with high reconstruction error as an attack/anomaly.

📖 Overview

Traditional signature-based IDS requires a database of known threats. This project uses an anomaly detection approach:

Training: The Autoencoder is trained only on normal network traffic data. It learns to compress (encode) and reconstruct (decode) these normal patterns efficiently.
Detection: When new data comes in, the model attempts to reconstruct it.
- Normal Traffic: Low reconstruction error (the model knows this pattern).
- Attack Traffic: High reconstruction error (the model has never seen this pattern before).

🚀 Key Features

Robust Data Preprocessing: Handles categorical data (One-Hot Encoding), scaling (MinMax/Standardization), and cleaning of the complex UNSW-NB15 dataset.
Deep Autoencoder Architecture: Designed with TensorFlow/Keras, using symmetrical encoder-decoder layers to capture latent features of network flow.
Dynamic Thresholding: Automatically calculates the optimal threshold for anomaly detection based on statistical analysis of reconstruction errors (e.g., Mean + 2*Std, 95th/99th Percentile, or optimized F1-score balance).
Comprehensive Visualization:
- Loss Curves: To monitor training progress and check for overfitting.
- Reconstruction Error Histograms: To visually inspect the separation between normal and attack traffic distributions.
- Confusion Matrix: To evaluate the classification performance (TP, FP, TN, FN).

📂 Project Structure

├── main.py                 # Main entry point: manages the full pipeline (load -> train -> evaluate).
├── preprocess_and_save.py  # Standalone script for data preprocessing and saving to CSV.
├── requirements.txt        # List of python dependencies.
├── src/
│   ├── data_loader.py      # Functions for loading raw CSVs and saving processed data.
│   ├── model.py            # Definition of the Autoencoder class/model architecture.
│   └── utils.py            # Helper functions for plotting and metrics calculation.
├── outputs/                # Directory where generated graphs and models are saved.
├── data/                   # Directory for processed intermediate data (excluded from git).
└── UNSW_NB15/              # Directory for raw dataset files (excluded from git).

🛠️ Installation

Clone the repository:

git clone https://github.com/PCopath/Autoencoder-NIDS-Project.git
cd Autoencoder-NIDS-Project

Install dependencies: It is recommended to use a virtual environment (conda or venv).
```
pip install -r requirements.txt
```
Download the Dataset:
- Download the UNSW-NB15 dataset (CSV files).
- Place the following files inside the UNSW_NB15/ folder in the project root:
  - UNSW_NB15_training-set.csv
  - UNSW_NB15_testing-set.csv
- Note: The dataset is too large to be included in this repository.

▶️ Usage

1. Full Pipeline (Recommended)

Run the main script to load data, train the model, and evaluate results:

python main.py

This script checks if processed data exists; if not, it performs preprocessing automatically.

2. Preprocessing Only

If you want to prepare the data without training (useful for debugging or preparing large datasets):

python preprocess_and_save.py

📊 Results & Interpretation

After running the model, check the outputs/ directory for:

loss_curve.png: Shows the training and validation loss (Mean Squared Error) over epochs. A decreasing curve indicates the model is learning to reconstruct normal traffic.
reconstruction_error_hist.png: A histogram showing the distribution of errors. You should ideally see two distinct peaks: one for Normal traffic (left, low error) and one for Attacks (right, high error). The vertical line represents the calculated threshold.
confusion_matrix.png: Displays the accuracy of the classification on the test set.

📝 Technical Notes

Unsupervised/Semi-supervised Learning: The labels (Attack/Normal) are stripped during training. The model is trained purely on X_train (Normal data). Labels are only used at the end to evaluate how well the anomaly detection worked.
Threshold Selection: The system calculates multiple potential thresholds. The "Balanced" threshold attempts to maximize the F1-Score while keeping the False Positive Rate (FPR) low.
Git Ignore: Large files (raw CSVs in UNSW_NB15/ and processed CSVs in data/) are ignored to keep the repository lightweight.

📜 License

This project is open-source. Feel free to modify and use it for educational or research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
outputs		outputs
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pre_processing.py		pre_processing.py
preprocess_and_save.py		preprocess_and_save.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Autoencoder-based Network Intrusion Detection System (NIDS)

📖 Overview

🚀 Key Features

📂 Project Structure

🛠️ Installation

▶️ Usage

1. Full Pipeline (Recommended)

2. Preprocessing Only

📊 Results & Interpretation

📝 Technical Notes

📜 License

About

Uh oh!

Releases

Packages

Languages

PCopath/Autoencoder-NIDS-Project

Folders and files

Latest commit

History

Repository files navigation

Autoencoder-based Network Intrusion Detection System (NIDS)

📖 Overview

🚀 Key Features

📂 Project Structure

🛠️ Installation

▶️ Usage

1. Full Pipeline (Recommended)

2. Preprocessing Only

📊 Results & Interpretation

📝 Technical Notes

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages