Data_analysis_ml_methods_autoencoder

This code loads network data, preprocesses it, reduces dimensions with an autoencoder, and trains multiple classifiers (KNN, RF, LR, SVM) for anomaly detection.

Sure, here's a sample README file for the provided Python code:

Network Anomaly Detection

This project implements a machine learning pipeline to detect network anomalies using various classification algorithms. The dataset used for this project contains network traffic data with labeled anomalies.

Installation

Clone the repository:

git clone https://github.com/HayatiYrtgl/Data_analysis_ml_methods_autoencoder.git

Create and activate a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Ensure your dataset is placed in the appropriate directory, as specified in the code (../dataset/network_anomaly_detection/all_data (3).csv).
Run the Jupyter notebook:
```
jupyter notebook
```
Open the notebook and execute the cells to run the entire pipeline.

Project Structure

network-anomaly-detection/
├── dataset/
│   └── network_anomaly_detection/
│       └── all_data (3).csv
├── README.md
├── requirements.txt
└── anomaly_detection.ipynb

dataset/: Directory containing the dataset.
README.md: This file.
requirements.txt: List of Python packages required for the project.
anomaly_detection.ipynb: Jupyter notebook containing the code.

Data

The dataset used for this project contains network traffic data with labeled anomalies. It is loaded from a CSV file located in the dataset/network_anomaly_detection/ directory.

Exploratory Data Analysis

Initial data exploration includes:

Viewing the first and last few rows of the dataset.
Checking data types and missing values.
Plotting the distribution of the target variable.
Visualizing feature correlations.

Data Preprocessing

Steps include:

Handling missing values and duplicates.
Encoding categorical variables.
Scaling numerical features using MinMaxScaler.

Dimensionality Reduction

An autoencoder is used for dimensionality reduction to select the most important features.

Model Training and Evaluation

Four different classifiers are trained and evaluated:

K-Nearest Neighbors (KNN)
Random Forest
Logistic Regression
Support Vector Machine (SVM)

The performance of each model is assessed using a classification report.

Results

The results of the models, including precision, recall, and F1-score, are printed for each classifier.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
all_data (3).csv		all_data (3).csv
network anomaly.ipynb		network anomaly.ipynb
network anomaly.py		network anomaly.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data_analysis_ml_methods_autoencoder

Network Anomaly Detection

Table of Contents

Installation

Usage

Project Structure

Data

Exploratory Data Analysis

Data Preprocessing

Dimensionality Reduction

Model Training and Evaluation

Results

Contributing

License

About

Releases

Packages

Languages

License

HayatiYrtgl/Data_analysis_ml_methods_autoencoder

Folders and files

Latest commit

History

Repository files navigation

Data_analysis_ml_methods_autoencoder

Network Anomaly Detection

Table of Contents

Installation

Usage

Project Structure

Data

Exploratory Data Analysis

Data Preprocessing

Dimensionality Reduction

Model Training and Evaluation

Results

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages