# CyberThreat Insight – Cybersecurity Data Generator  
Anomalous Behavior Detection in Cybersecurity Analytics using Generative AI

**Toronto, September 01 2025**  
**Author: Atsu Vovor**  
> Master of Management in Artificial Intelligence  
> Consultant Data Analytics Specialist | Machine Learning | Data Science | Quantitative Analysis | French & English Bilingual

**Model Name:** Cybersecurity Data Generator  
**Model Type:** Data Generator

## Project Overview
This notebook provides a Python script to generate **synthetic cybersecurity issue data**. It simulates both **normal and anomalous cybersecurity events** for:

* Training anomaly detection models (labeled normal/anomalous samples)
* Testing SIEM systems (realistic security logs and incidents)
* Developing and validating security analytics (exploring KPIs/KRIs and metrics)
* Demonstrating cybersecurity concepts for educational purposes

The script generates attributes such as issue details, user activity, system metrics, and threat indicators. It includes an **adaptive defense mechanism** based on threat level and severity.

## Script Structure and Functionality
**Classes:**

**1. `DataConfig`** – Holds configuration parameters for users, departments, date ranges, categories, severities, statuses, reporters, assignees, locations, column names, and pre-defined DataFrames for KTIs and Scenarios.

**2. `DataGenerator`** – Generates synthetic datasets:
* Maps issue categories to normal/anomalous names
* Filters categories into KPIs/KRIs
* Generates random dates within range
* Calculates threat level via `calculate_threat_level`
* Suggests defense actions via `adaptive_defense_mechanism`
* Generates DataFrames for normal/anomalous issues
* Combines datasets and adds "Is Anomaly" label

**3. `DataProcessor`** – Adds visual indicators (threat level/severity colors).

**4. `DataSaver`** – Saves DataFrames to Google Drive or local paths.

**5. `DataDisplay`** – Displays `.head()`, `.info()`, `.describe()` for quick inspection.

## How to Use the Script

### 1. Connect to Google Drive
Ensure Colab is connected to your Google Drive to save generated datasets.


### 2. Run the Code
**Option 1: Run normally**

```python
%run https://raw.githubusercontent.com/atsuvovor/CyberThreat_Insight/main/datagen/cyberdatagen.py
```

At the end, you’ll be prompted:

```
Would you like to download the data files locally as well? (yes/no):
```

* **yes** → creates and downloads `cybersecurity_data.zip`
* **no** → keeps datasets in the Google Drive folder only

**Option 2: Run in One Cell (Recommended for Colab)**

Users can generate and download all datasets with a **single click**:

```python
!wget -q -O cyberdatagen.py https://raw.githubusercontent.com/atsuvovor/CyberThreat_Insight/main/datagen/cyberdatagen.py && \
%run cyberdatagen.py && \
from IPython.display import FileLink, display; import os; display(FileLink("cybersecurity_data.zip", result_html_prefix="✅ Download: "))
```

This will:
1. Download the latest script from GitHub
2. Run it to generate datasets in `cybersecurity_data/`
3. Provide a clickable ZIP download link

### 3. Access the Data

Generated CSV files will be saved in:

```
/content/drive/My Drive/Cybersecurity Data
```

Or, if using ZIP, in the notebook environment:

```
/content/cybersecurity_data.zip
```

## Customization

Adjust parameters in `DataConfig`:
* Number of normal/anomalous issues
* Number of users, reporters, assignees
* Date ranges
* Categories, severities, statuses
* Distribution logic in `DataGenerator` methods

## Future Enhancements

* More sophisticated anomaly generation techniques
* Complex feature interactions and dependencies
* Incorporate network traffic/log data
* Command-line interface or GUI for configuration
* Alternative data formats (Parquet, JSON)

## Author

**Atsu Vovor**  
Data & Analytics Consultant | Cybersecurity | AI Reporting  
[LinkedIn](https://www.linkedin.com/in/atsu-vovor-mmai-9188326/) | [GitHub](https://github.com/atsuvovor)