Enhancing Cybersecurity Intelligence through Machine Learning: Cluster and Predictive Analysis of Honeypot Data
This repository contains the code and documentation for my final master thesis, focusing on the analysis of cyberattack data, collected using a distributed system of honeypots around the world. The project is divided into two main parts: Cluster Analysis and Forecasting.
In the cluster analysis phase, the goal is to classify cyberattack data into groups to identify patterns and detect potential cyber threats. The primary objectives of this part include:
- Data Classification: Utilizing clustering techniques to group cyberattack data.
- Pattern Detection: Identifying common patterns and behaviors within the classified data.
- Anomaly Detection: Detecting outliers and potential emerging threats.
The forecasting phase aims to predict the number of cyberattacks expected in the next month for a given honeypot (or country). Key objectives of this part include:
- Country-specific Forecasting: Developing models to forecast cyberattacks tailored to individual countries.
- Temporal Trends: Analyzing and incorporating temporal trends in the forecasting models.
- Data Visualization: Presenting forecast results through visualizations for better understanding.
-
/cluster-analysis: Contains code and documentation related to the cluster analysis phase. The utils.py python file inside this folder contains all the necessary function to run the jupyter notebooks.
-
/forecasting: Contains code and documentation related to the forecasting phase. The utils.py python file inside this folder contains all the necessary function to run the jupyter notebooks.
Getting Started To replicate the analysis and forecasting results, follow these steps:
- Clone the Repository:
git clone https://github.com/davidrosado4/cyber-meets-ml.git && cd ciber-meets-ml
- Cluster Analysis:
Navigate to /cluster-analysis and follow the instructions in the README.
- Forecasting:
Navigate to /forecasting and follow the instructions in the README.