# Exploratory Data Analysis (EDA) for Intrusion Detection System

In this notebook, we will perform exploratory data analysis on the dataset used for the Intrusion Detection System (IDS). We will visualize data distributions, relationships, and identify any patterns or anomalies.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [None]:
# Load the dataset
data_path = '../data/processed/your_processed_data.csv'  # Update with your processed data path
df = pd.read_csv(data_path)

# Display the first few rows of the dataset
df.head()

In [None]:
# Summary statistics
df.describe()

In [None]:
# Check for missing values
missing_values = df.isnull().sum()
missing_values[missing_values > 0]

In [None]:
# Visualize the distribution of the target variable
plt.figure(figsize=(10, 6))
sns.countplot(x='target_variable', data=df)  # Update 'target_variable' with your actual target column
plt.title('Distribution of Target Variable')
plt.xlabel('Target Variable')
plt.ylabel('Count')
plt.show()

In [None]:
# Visualize correlations between features
plt.figure(figsize=(12, 10))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Feature Correlation Matrix')
plt.show()

## Conclusion

In this notebook, we performed exploratory data analysis on the IDS dataset. We visualized the data distributions, checked for missing values, and examined correlations between features. Further analysis and feature engineering can be conducted based on these insights.