# Fuzzing Attack Influence on Payload_Entropy and Payload_Decimal Analysis

This notebook compares `Payload_Entropy` and `Payload_Decimal` between two datasets:
- **full_data_capture_entropy_decimal.csv**: Normal CAN bus traffic.
- **suspension_CANid_Inter_Arrival.csv**: Normal traffic + fuzzing attack (messages with payload `FFFFFFFFFFFFFFFF` across any `CAN_ID`).

**Goal**: Analyze the fuzzing attack's impact on `Payload_Entropy` and `Payload_Decimal` to enhance detectability for Intrusion Detection Systems (IDS).

**Datasets**:
- **Columns**:
  - `Timestamp` (float), `Interface` (triple-quoted string), `CAN_ID` (triple-quoted string), `Payload` (triple-quoted string), `Payload_Entropy` (float, 5 decimal places), `Payload_Decimal` (integer).
- **Normal Traffic**: ~386,567 messages over ~275s, `Payload_Entropy` ~0.5–3.0, `Payload_Decimal` varies, few `FFFFFFFFFFFFFFFF` payloads (if any).
- **Fuzzing Attack**: ~10 messages with `Payload` = `FFFFFFFFFFFFFFFF`, `Payload_Entropy` ≈ 0, `Payload_Decimal` = 18446744073709551615, across any `CAN_ID`.
- **Payload_Entropy**: Shannon entropy of payload bytes (0 for uniform payloads like `FFFFFFFFFFFFFFFF`, higher for random data).
- **Payload_Decimal**: Decimal equivalent of payload hex (max 64-bit integer for `FFFFFFFFFFFFFFFF`).

**Steps**:
1. Load and preprocess datasets (clean triple-quoted strings for `Interface`, `CAN_ID`, `Payload`).
2. Compute summary statistics for `Payload_Entropy` and `Payload_Decimal` per `CAN_ID`.
3. Visualize `Payload_Entropy` and `Payload_Decimal` (histograms, box plots).
4. Detect fuzzing attack by identifying messages with `Payload` = `FFFFFFFFFFFFFFFF` and `Payload_Entropy` < 0.1.
5. Summarize findings and verify detection.

In [2]:
# Import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import os
import csv

# Set plot style
sns.set_style('whitegrid')
%matplotlib inline

# Define paths
project_dir = r'C:\Users\pc\OneDrive\Images\Bureau\VS_code_Projects\MLproject_Predictive_Maintenance_for_Vehicles_Using_CAN_Bus_Data'
data_dir = os.path.join(project_dir, 'dataSet', 'raw', 'fuzzing_payload')
plots_dir = os.path.join(project_dir, 'dataSet', 'raw', 'fuzzing_payload', 'plots')
os.makedirs(plots_dir, exist_ok=True)

normal_file = os.path.join(data_dir, 'normal_PE_PD.csv')
fuzzing_file = os.path.join(data_dir, 'fuzzing_PE_PD.csv')

## 1. Load and Preprocess Datasets

Load both CSV files and clean triple-quoted strings (`"""..."""`) for `Interface`, `CAN_ID`, and `Payload`. `Timestamp` is a float, `Payload_Entropy` is a float (string in fuzzing dataset due to 5-decimal formatting), `Payload_Decimal` is an integer.

In [3]:
# Load datasets
normal_df = pd.read_csv(normal_file)
fuzzing_df = pd.read_csv(fuzzing_file)

# Clean triple-quoted strings for Interface, CAN_ID, Payload
for col in ['Interface', 'CAN_ID', 'Payload']:
    normal_df[col] = normal_df[col].str.replace('"""', '')
    fuzzing_df[col] = fuzzing_df[col].str.replace('"""', '')

# Convert Payload_Entropy to float in fuzzing dataset
fuzzing_df['Payload_Entropy'] = fuzzing_df['Payload_Entropy'].astype(float)

# Verify columns
expected_columns = ['Timestamp', 'Interface', 'CAN_ID', 'Payload', 'Payload_Entropy', 'Payload_Decimal']
if not all(col in normal_df.columns for col in expected_columns):
    print(f"Error: Normal dataset missing columns. Expected: {expected_columns}, Found: {normal_df.columns.tolist()}")
if not all(col in fuzzing_df.columns for col in expected_columns):
    print(f"Error: Fuzzing dataset missing columns. Expected: {expected_columns}, Found: {fuzzing_df.columns.tolist()}")

# Check data types
print("Normal Dataset Data Types:")
print(normal_df.dtypes)
print("\nFuzzing Dataset Data Types:")
print(fuzzing_df.dtypes)

# Check first few rows
print("\nNormal Dataset Sample:")
print(normal_df.head())
print("\nFuzzing Dataset Sample:")
print(fuzzing_df.head())

Normal Dataset Data Types:
Timestamp          float64
Interface           object
CAN_ID              object
Payload             object
Payload_Entropy    float64
Payload_Decimal     uint64
dtype: object

Fuzzing Dataset Data Types:
Timestamp          float64
Interface           object
CAN_ID              object
Payload             object
Payload_Entropy    float64
Payload_Decimal     uint64
dtype: object

Normal Dataset Sample:
      Timestamp Interface CAN_ID           Payload  Payload_Entropy  \
0  1.508687e+09    slcan0    12E  C680027FD0FFFF00          1.90615   
1  1.508687e+09    slcan0    090          1A000000          0.56234   
2  1.508687e+09    slcan0    0C6  7512800A8008BAAC          1.90615   
3  1.508687e+09    slcan0    242    0000FFEFFE000D          1.47508   
4  1.508687e+09    slcan0    29C  00000000FFFFFFFF          0.69315   

        Payload_Decimal  
0  14303435164519235328  
1             436207616  
2   8435945834604444332  
3         1099243061261  
4          

## 2. Summary Statistics

Compute statistics for `Payload_Entropy` and `Payload_Decimal` across all messages and per `CAN_ID` to identify low entropy (<0.1) and high decimal values indicative of the fuzzing attack.

In [4]:
# Summary statistics
stats_data = {}

# Normal Dataset statistics
stats_data['Normal_Payload_Entropy'] = normal_df['Payload_Entropy'].describe()
stats_data['Normal_Payload_Decimal'] = normal_df['Payload_Decimal'].describe()

# Fuzzing Dataset statistics
stats_data['Fuzzing_Payload_Entropy'] = fuzzing_df['Payload_Entropy'].describe()
stats_data['Fuzzing_Payload_Decimal'] = fuzzing_df['Payload_Decimal'].describe()

# Convert to DataFrame for CSV saving
stats_df = pd.DataFrame(stats_data)

# Save to CSV
stats_df.to_csv(os.path.join(data_dir, 'summary_statistics.csv'))

# Print the statistics
print("Normal Dataset - Payload_Entropy Statistics:")
print(normal_df['Payload_Entropy'].describe())
print("\nNormal Dataset - Payload_Decimal Statistics:")
print(normal_df['Payload_Decimal'].describe())
print("\nFuzzing Dataset - Payload_Entropy Statistics:")
print(fuzzing_df['Payload_Entropy'].describe())
print("\nFuzzing Dataset - Payload_Decimal Statistics:")
print(fuzzing_df['Payload_Decimal'].describe())

# Per-CAN_ID statistics
normal_stats = normal_df.groupby('CAN_ID')[['Payload_Entropy', 'Payload_Decimal']].agg(['min', 'max', 'mean']).reset_index()
fuzzing_stats = fuzzing_df.groupby('CAN_ID')[['Payload_Entropy', 'Payload_Decimal']].agg(['min', 'max', 'mean']).reset_index()

# Flatten column names
normal_stats.columns = ['CAN_ID', 'Entropy_min', 'Entropy_max', 'Entropy_mean', 'Decimal_min', 'Decimal_max', 'Decimal_mean']
fuzzing_stats.columns = ['CAN_ID', 'Entropy_min', 'Entropy_max', 'Entropy_mean', 'Decimal_min', 'Decimal_max', 'Decimal_mean']

# Quote CAN_ID to prevent scientific notation
normal_stats['CAN_ID'] = normal_stats['CAN_ID'].apply(lambda x: f'"{x}"')
fuzzing_stats['CAN_ID'] = fuzzing_stats['CAN_ID'].apply(lambda x: f'"{x}"')

# Save to CSV
normal_stats.to_csv(os.path.join(data_dir, 'normal_entropy_decimal_stats.csv'), index=False, quoting=csv.QUOTE_NONNUMERIC)
fuzzing_stats.to_csv(os.path.join(data_dir, 'fuzzing_entropy_decimal_stats.csv'), index=False, quoting=csv.QUOTE_NONNUMERIC)

# Detect fuzzing attack (Payload = FFFFFFFFFFFFFFFF and Payload_Entropy < 0.1)
entropy_threshold = 0.1
fuzzing_messages = fuzzing_df[(fuzzing_df['Payload'] == 'FFFFFFFFFFFFFFFF') & (fuzzing_df['Payload_Entropy'] < entropy_threshold)]
suspicious_can_ids = fuzzing_messages['CAN_ID'].unique()
print("\nSuspicious Messages (Payload = FFFFFFFFFFFFFFFF and Payload_Entropy < 0.1, possible fuzzing attack):")
if not fuzzing_messages.empty:
    print(f"Found {len(fuzzing_messages)} fuzzing messages across {len(suspicious_can_ids)} CAN_IDs:")
    for can_id in suspicious_can_ids:
        count = len(fuzzing_messages[fuzzing_messages['CAN_ID'] == can_id])
        print(f"CAN_ID = {can_id}: {count} messages")
else:
    print("No fuzzing messages with Payload = FFFFFFFFFFFFFFFF and Payload_Entropy < 0.1 found.")

# Print per-CAN_ID statistics
print("\nNormal Dataset - Per-CAN_ID Statistics:")
for _, row in normal_stats.iterrows():
    print(f"CAN_ID = {row['CAN_ID'].strip('\"')}:")
    print(f"  Entropy: min: {row['Entropy_min']:.5f}, max: {row['Entropy_max']:.5f}, mean: {row['Entropy_mean']:.5f}")
    print(f"  Decimal: min: {row['Decimal_min']}, max: {row['Decimal_max']}, mean: {row['Decimal_mean']:.0f}")

print("\nFuzzing Dataset - Per-CAN_ID Statistics:")
for _, row in fuzzing_stats.iterrows():
    print(f"CAN_ID = {row['CAN_ID'].strip('\"')}:")
    print(f"  Entropy: min: {row['Entropy_min']:.5f}, max: {row['Entropy_max']:.5f}, mean: {row['Entropy_mean']:.5f}")
    print(f"  Decimal: min: {row['Decimal_min']}, max: {row['Decimal_max']}, mean: {row['Decimal_mean']:.0f}")

Normal Dataset - Payload_Entropy Statistics:
count    325185.000000
mean          1.319950
std           0.689194
min           0.000000
25%           0.693150
50%           1.609440
75%           1.906150
max           2.079440
Name: Payload_Entropy, dtype: float64

Normal Dataset - Payload_Decimal Statistics:
count    3.251850e+05
mean     2.608045e+18
std      4.526866e+18
min      0.000000e+00
25%      2.181038e+08
50%      1.093375e+16
75%      3.562347e+18
max      1.844673e+19
Name: Payload_Decimal, dtype: float64

Fuzzing Dataset - Payload_Entropy Statistics:
count    115971.000000
mean          1.309039
std           0.694995
min           0.000000
25%           0.693150
50%           1.609440
75%           1.906150
max           2.079440
Name: Payload_Entropy, dtype: float64

Fuzzing Dataset - Payload_Decimal Statistics:
count    1.159710e+05
mean     2.599732e+18
std      4.534243e+18
min      0.000000e+00
25%      8.388608e+06
50%      1.239394e+16
75%      2.737626e+18
max

## 3. Visualize Payload_Entropy and Payload_Decimal

Create histograms and box plots for `Payload_Entropy` and `Payload_Decimal` to compare normal and fuzzing datasets. Use log scale for `Payload_Decimal` due to large values.

In [None]:
# Histogram of Payload_Entropy
plt.figure(figsize=(12, 6))
sns.histplot(data=normal_df, x='Payload_Entropy', label='Normal', color='blue', bins=50, alpha=0.5)
sns.histplot(data=fuzzing_df, x='Payload_Entropy', label='Fuzzing', color='red', bins=50, alpha=0.5)
plt.xlabel('Payload_Entropy')
plt.ylabel('Count')
plt.title('Payload_Entropy Distribution: Normal vs Fuzzing')
plt.legend()
plt.savefig(os.path.join(plots_dir, 'entropy_histogram.png'))
plt.show()

# Histogram of Payload_Decimal (log scale)
plt.figure(figsize=(12, 6))
sns.histplot(data=normal_df, x='Payload_Decimal', label='Normal', color='blue', bins=50, log_scale=True, alpha=0.5)
sns.histplot(data=fuzzing_df, x='Payload_Decimal', label='Fuzzing', color='red', bins=50, log_scale=True, alpha=0.5)
plt.xlabel('Payload_Decimal (log scale)')
plt.ylabel('Count')
plt.title('Payload_Decimal Distribution: Normal vs Fuzzing')
plt.legend()
plt.savefig(os.path.join(plots_dir, 'decimal_histogram.png'))
plt.show()

# Box plot of Payload_Entropy by CAN_ID (top 10 CAN_IDs by count in fuzzing dataset)
top_can_ids = fuzzing_df['CAN_ID'].value_counts().index[:10]
plt.figure(figsize=(14, 6))
sns.boxplot(x='CAN_ID', y='Payload_Entropy', data=fuzzing_df[fuzzing_df['CAN_ID'].isin(top_can_ids)])
plt.xlabel('CAN_ID')
plt.ylabel('Payload_Entropy')
plt.title('Payload_Entropy by CAN_ID (Top 10, Fuzzing Dataset)')
plt.xticks(rotation=45)
plt.savefig(os.path.join(plots_dir, 'entropy_boxplot_fuzzing.png'))
plt.show()

## 4. Fuzzing Attack Detection

Identify messages with `Payload` = `FFFFFFFFFFFFFFFF` and `Payload_Entropy` < 0.1 as fuzzing attacks. Verify by examining the affected `CAN_IDs` and message details.

In [None]:
# Detailed analysis of fuzzing messages
if not fuzzing_messages.empty:
    print("\nDetailed Analysis of Fuzzing Messages:")
    for can_id in suspicious_can_ids:
        can_id_clean = can_id
        suspicious_data = fuzzing_messages[fuzzing_messages['CAN_ID'] == can_id_clean]
        print(f"\nCAN_ID = {can_id_clean}:")
        print(f"Number of fuzzing messages: {len(suspicious_data)}")
        print("Fuzzing message details:")
        print(suspicious_data[['Timestamp', 'CAN_ID', 'Payload', 'Payload_Entropy', 'Payload_Decimal']].to_string(index=False))
else:
    print("\nNo fuzzing messages detected (no Payload = FFFFFFFFFFFFFFFF with Payload_Entropy < 0.1).")

## 5. Summary and Findings

- **Normal Dataset**: `Payload_Entropy` ~0.5–3.0, `Payload_Decimal` varies, few (if any) `FFFFFFFFFFFFFFFF` payloads.
- **Fuzzing Dataset**: Expect ~10 messages with `Payload` = `FFFFFFFFFFFFFFFF`, `Payload_Entropy` ≈ 0, `Payload_Decimal` = 18446744073709551615 across any `CAN_ID`.
- **Detection**: `Payload` = `FFFFFFFFFFFFFFFF` and `Payload_Entropy` < 0.1 effectively identifies fuzzing attacks, distinguishing them from normal traffic.
- **IDS Application**: Monitoring `Payload_Entropy` and `Payload` is efficient for real-time detection of fuzzing attacks, complementing suspension attack detection (large `CAN_ID_Inter_Arrival`).