# DoS Attack Influence on CAN_ID_Inter_Arrival and CAN_ID_Window_Count Analysis

This notebook compares `CAN_ID_Inter_Arrival` and `CAN_ID_Window_Count` between two datasets:
- **IA_WC_normal.csv**: Normal CAN bus traffic.
- **IA_WC_dos.csv**: Normal traffic + DoS attack (`CAN_ID = 000`, ~40,000 messages at 4000 messages/s for 10s).

**Goal**: Analyze the DoS attack's impact on `CAN_ID_Inter_Arrival` and `CAN_ID_Window_Count` to enhance detectability.

**Datasets**:
- **Columns**: `Timestamp` (float), `Interface` (quoted string), `CAN_ID` (quoted string), `Payload` (quoted string), `CAN_ID_Inter_Arrival` (float), `CAN_ID_Window_Count` (integer).
- **Normal Traffic**: ~386,567 messages over ~275s, `CAN_ID_Inter_Arrival` ~0.001–0.1s (mean ~0.0164s per ID), `CAN_ID_Window_Count` ~1000–3000 per 10s window.
- **DoS Attack**: Adds ~40,000 `CAN_ID = 000` messages in a 10s window, `CAN_ID_Inter_Arrival` ~0.00025s, `CAN_ID_Window_Count` ~40,000.
- **CAN_ID_Inter_Arrival**: Time between consecutive messages of the same CAN ID; first message uses mean for that ID (no 0.0 values).
- **CAN_ID_Window_Count**: Number of messages for a CAN ID in its 10-second window.

**Steps**:
1. Load and preprocess datasets (clean quoted strings for `Interface`, `CAN_ID`, `Payload`).
2. Compute summary statistics for `CAN_ID_Inter_Arrival` and `CAN_ID_Window_Count`.
3. Visualize both parameters (histograms, windowed analysis, box plots).
4. Focus on `CAN_ID = 000` for DoS-specific analysis.
5. Summarize findings and verify DoS detection using both parameters.

In [24]:
# Import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import os
import csv

# Set plot style
sns.set_style('whitegrid')
%matplotlib inline

# Define paths
project_dir = r'C:\Users\pc\OneDrive\Images\Bureau\VS_code_Projects\MLproject_Predictive_Maintenance_for_Vehicles_Using_CAN_Bus_Data'
data_dir = os.path.join(project_dir, 'dataSet', 'raw', 'dos')
plots_dir = os.path.join(project_dir, 'dataSet', 'raw','dos', 'plots')
os.makedirs(plots_dir, exist_ok=True)

normal_file = os.path.join(data_dir, 'IA_WC_normal.csv')
dos_file = os.path.join(data_dir, 'IA_WC_dos.csv')

## 1. Load and Preprocess Datasets

Load both CSV files and clean quoted strings (`="..."`) for `Interface`, `CAN_ID`, and `Payload`. `Timestamp` is already a float, and `CAN_ID_Window_Count` is an integer.

In [4]:
# Load datasets
normal_df = pd.read_csv(normal_file)
dos_df = pd.read_csv(dos_file)

# Clean quoted strings for Interface, CAN_ID, Payload
for col in ['Interface', 'CAN_ID', 'Payload']:
    normal_df[col] = normal_df[col].str.replace('="', '').str.replace('"', '')
    dos_df[col] = dos_df[col].str.replace('="', '').str.replace('"', '')

# Verify columns
expected_columns = ['Timestamp', 'Interface', 'CAN_ID', 'Payload', 'CAN_ID_Inter_Arrival', 'CAN_ID_Window_Count']
if not all(col in normal_df.columns for col in expected_columns):
    print(f"Error: Normal dataset missing columns. Expected: {expected_columns}, Found: {normal_df.columns.tolist()}")
if not all(col in dos_df.columns for col in expected_columns):
    print(f"Error: DoS dataset missing columns. Expected: {expected_columns}, Found: {dos_df.columns.tolist()}")

# Check data types
print("Normal Dataset Data Types:")
print(normal_df.dtypes)
print("\nDoS Dataset Data Types:")
print(dos_df.dtypes)

# Check first few rows
print("\nNormal Dataset Sample:")
print(normal_df.head())
print("\nDoS Dataset Sample:")
print(dos_df.head())

Normal Dataset Data Types:
Timestamp               float64
Interface                object
CAN_ID                   object
Payload                  object
CAN_ID_Inter_Arrival    float64
CAN_ID_Window_Count       int64
dtype: object

DoS Dataset Data Types:
Timestamp               float64
Interface                object
CAN_ID                   object
Payload                  object
CAN_ID_Inter_Arrival    float64
CAN_ID_Window_Count       int64
dtype: object

Normal Dataset Sample:
      Timestamp Interface CAN_ID           Payload  CAN_ID_Inter_Arrival  \
0  1.508687e+09    slcan0    12E  C680027FD0FFFF00              0.010000   
1  1.508687e+09    slcan0    090          1A000000              0.010000   
2  1.508687e+09    slcan0    0C6  7512800A8008BAAC              0.010000   
3  1.508687e+09    slcan0    242    0000FFEFFE000D              0.020002   
4  1.508687e+09    slcan0    29C  00000000FFFFFFFF              0.020001   

   CAN_ID_Window_Count  
0                  611  
1    

## 2. Summary Statistics

Compute statistics for `CAN_ID_Inter_Arrival` and `CAN_ID_Window_Count`, focusing on `CAN_ID = 000` for DoS detection.

In [5]:
# Summary statistics for CAN_ID_Inter_Arrival and CAN_ID_Window_Count
stats_data = {}

# Normal Dataset statistics
stats_data['Normal_CAN_ID_Inter_Arrival'] = normal_df['CAN_ID_Inter_Arrival'].describe()
stats_data['Normal_CAN_ID_Window_Count'] = normal_df['CAN_ID_Window_Count'].describe()

# DoS Dataset statistics
stats_data['DoS_CAN_ID_Inter_Arrival'] = dos_df['CAN_ID_Inter_Arrival'].describe()
stats_data['DoS_CAN_ID_Window_Count'] = dos_df['CAN_ID_Window_Count'].describe()

# CAN_ID = 000 statistics
normal_000 = normal_df[normal_df['CAN_ID'] == '000']
dos_000 = dos_df[dos_df['CAN_ID'] == '000']

if not normal_000.empty:
    stats_data['Normal_000_CAN_ID_Inter_Arrival'] = normal_000['CAN_ID_Inter_Arrival'].describe()
    stats_data['Normal_000_CAN_ID_Window_Count'] = normal_000['CAN_ID_Window_Count'].describe()
else:
    stats_data['Normal_000_CAN_ID_Inter_Arrival'] = pd.Series(['No CAN_ID = 000 in normal dataset.'], index=['Note'])
    stats_data['Normal_000_CAN_ID_Window_Count'] = pd.Series(['No CAN_ID = 000 in normal dataset.'], index=['Note'])

if not dos_000.empty:
    stats_data['DoS_000_CAN_ID_Inter_Arrival'] = dos_000['CAN_ID_Inter_Arrival'].describe()
    stats_data['DoS_000_CAN_ID_Window_Count'] = dos_000['CAN_ID_Window_Count'].describe()
else:
    stats_data['DoS_000_CAN_ID_Inter_Arrival'] = pd.Series(['No CAN_ID = 000 in DoS dataset.'], index=['Note'])
    stats_data['DoS_000_CAN_ID_Window_Count'] = pd.Series(['No CAN_ID = 000 in DoS dataset.'], index=['Note'])

# Convert to DataFrame for CSV saving
stats_df = pd.DataFrame(stats_data)

# Save to CSV
stats_df.to_csv(os.path.join(data_dir, 'summary_statistics.csv'))

# Print the statistics
print("Normal Dataset - CAN_ID_Inter_Arrival Statistics:")
print(normal_df['CAN_ID_Inter_Arrival'].describe())
print("\nNormal Dataset - CAN_ID_Window_Count Statistics:")
print(normal_df['CAN_ID_Window_Count'].describe())
print("\nDoS Dataset - CAN_ID_Inter_Arrival Statistics:")
print(dos_df['CAN_ID_Inter_Arrival'].describe())
print("\nDoS Dataset - CAN_ID_Window_Count Statistics:")
print(dos_df['CAN_ID_Window_Count'].describe())

print("\nCAN_ID = 000 - Normal Dataset:")
if not normal_000.empty:
    print("CAN_ID_Inter_Arrival:")
    print(normal_000['CAN_ID_Inter_Arrival'].describe())
    print("CAN_ID_Window_Count:")
    print(normal_000['CAN_ID_Window_Count'].describe())
else:
    print("No CAN_ID = 000 in normal dataset.")
print("\nCAN_ID = 000 - DoS Dataset:")
if not dos_000.empty:
    print("CAN_ID_Inter_Arrival:")
    print(dos_000['CAN_ID_Inter_Arrival'].describe())
    print("CAN_ID_Window_Count:")
    print(dos_000['CAN_ID_Window_Count'].describe())
else:
    print("No CAN_ID = 000 in DoS dataset.")

Normal Dataset - CAN_ID_Inter_Arrival Statistics:
count    386567.000000
mean          0.039141
std           0.084468
min           0.000025
25%           0.010022
50%           0.019925
75%           0.049586
max           3.050461
Name: CAN_ID_Inter_Arrival, dtype: float64

Normal Dataset - CAN_ID_Window_Count Statistics:
count    386567.000000
mean        597.383773
std         369.028652
min           2.000000
25%         200.000000
50%         500.000000
75%        1000.000000
max        1001.000000
Name: CAN_ID_Window_Count, dtype: float64

DoS Dataset - CAN_ID_Inter_Arrival Statistics:
count    141927.000000
mean          0.032080
std           0.212978
min           0.000037
25%           0.000289
50%           0.010069
75%           0.020096
max          12.013576
Name: CAN_ID_Inter_Arrival, dtype: float64

DoS Dataset - CAN_ID_Window_Count Statistics:
count    141927.000000
mean       6247.435393
std        9352.258820
min           1.000000
25%         356.000000
50%       

In [18]:
# Ensure CAN_ID is treated as a string
normal_df['CAN_ID'] = normal_df['CAN_ID'].astype(str)

# Calculate statistics for each CAN_ID in normal dataset
stats_normal = normal_df.groupby('CAN_ID').agg({
    'CAN_ID_Inter_Arrival': ['min', 'max', 'mean'],
    'CAN_ID_Window_Count': ['min', 'max', 'mean']
})

# Rename columns for clarity
stats_normal.columns = ['_'.join(col).strip() for col in stats_normal.columns.values]

# Reset index to make CAN_ID a column
stats_normal = stats_normal.reset_index()

# Ensure CAN_ID is quoted in CSV to prevent scientific notation
stats_normal['CAN_ID'] = stats_normal['CAN_ID'].apply(lambda x: f'"{x}"')

# Save to CSV with quoting
stats_normal.to_csv(os.path.join(data_dir, 'normal_can_id_statistics.csv'), index=False, quoting=csv.QUOTE_NONNUMERIC)

# Print statistics for each CAN_ID
for can_id in stats_normal.index:
    print(f"\nCAN_ID = {stats_normal.loc[can_id, 'CAN_ID'].strip('\"')} - Normal Dataset:")
    print(f"CAN_ID_Inter_Arrival:")
    print(stats_normal.loc[can_id, ['CAN_ID_Inter_Arrival_min', 'CAN_ID_Inter_Arrival_max', 'CAN_ID_Inter_Arrival_mean']])
    print(f"CAN_ID_Window_Count:")
    print(stats_normal.loc[can_id, ['CAN_ID_Window_Count_min', 'CAN_ID_Window_Count_max', 'CAN_ID_Window_Count_mean']])


CAN_ID = 090 - Normal Dataset:
CAN_ID_Inter_Arrival:
CAN_ID_Inter_Arrival_min     0.000025
CAN_ID_Inter_Arrival_max     0.024192
CAN_ID_Inter_Arrival_mean        0.01
Name: 0, dtype: object
CAN_ID_Window_Count:
CAN_ID_Window_Count_min            611
CAN_ID_Window_Count_max           1000
CAN_ID_Window_Count_mean    988.030281
Name: 0, dtype: object

CAN_ID = 0C6 - Normal Dataset:
CAN_ID_Inter_Arrival:
CAN_ID_Inter_Arrival_min     0.000028
CAN_ID_Inter_Arrival_max     0.023957
CAN_ID_Inter_Arrival_mean        0.01
Name: 1, dtype: object
CAN_ID_Window_Count:
CAN_ID_Window_Count_min            611
CAN_ID_Window_Count_max           1000
CAN_ID_Window_Count_mean    988.030281
Name: 1, dtype: object

CAN_ID = 12E - Normal Dataset:
CAN_ID_Inter_Arrival:
CAN_ID_Inter_Arrival_min     0.000033
CAN_ID_Inter_Arrival_max     0.024359
CAN_ID_Inter_Arrival_mean        0.01
Name: 2, dtype: object
CAN_ID_Window_Count:
CAN_ID_Window_Count_min            611
CAN_ID_Window_Count_max           1000
CAN_ID

In [20]:
# Ensure CAN_ID is treated as a string
dos_df['CAN_ID'] = dos_df['CAN_ID'].astype(str)

# Calculate statistics for each CAN_ID in DoS dataset
stats_dos = dos_df.groupby('CAN_ID').agg({
    'CAN_ID_Inter_Arrival': ['min', 'max', 'mean'],
    'CAN_ID_Window_Count': ['min', 'max', 'mean']
})

# Rename columns for clarity
stats_dos.columns = ['_'.join(col).strip() for col in stats_dos.columns.values]

# Reset index to make CAN_ID a column
stats_dos = stats_dos.reset_index()

# Ensure CAN_ID is quoted in CSV to prevent scientific notation
stats_dos['CAN_ID'] = stats_dos['CAN_ID'].apply(lambda x: f'"{x}"')

# Save to CSV with quoting
stats_dos.to_csv(os.path.join(data_dir, 'dos_can_id_statistics.csv'), index=False, quoting=csv.QUOTE_NONNUMERIC)

# Print statistics for each CAN_ID
for can_id in stats_dos.index:
    print(f"\nCAN_ID = {stats_dos.loc[can_id, 'CAN_ID'].strip('\"')} - DoS Dataset:")
    print(f"CAN_ID_Inter_Arrival:")
    print(stats_dos.loc[can_id, ['CAN_ID_Inter_Arrival_min', 'CAN_ID_Inter_Arrival_max', 'CAN_ID_Inter_Arrival_mean']])
    print(f"CAN_ID_Window_Count:")
    print(stats_dos.loc[can_id, ['CAN_ID_Window_Count_min', 'CAN_ID_Window_Count_max', 'CAN_ID_Window_Count_mean']])


CAN_ID = 000 - DoS Dataset:
CAN_ID_Inter_Arrival:
CAN_ID_Inter_Arrival_min      0.0002
CAN_ID_Inter_Arrival_max      0.0003
CAN_ID_Inter_Arrival_mean    0.00025
Name: 0, dtype: object
CAN_ID_Window_Count:
CAN_ID_Window_Count_min            16012
CAN_ID_Window_Count_max            23989
CAN_ID_Window_Count_mean    20795.886728
Name: 0, dtype: object

CAN_ID = 090 - DoS Dataset:
CAN_ID_Inter_Arrival:
CAN_ID_Inter_Arrival_min      0.001511
CAN_ID_Inter_Arrival_max     10.010306
CAN_ID_Inter_Arrival_mean     0.011379
Name: 1, dtype: object
CAN_ID_Window_Count:
CAN_ID_Window_Count_min            356
CAN_ID_Window_Count_max           1000
CAN_ID_Window_Count_mean    889.597463
Name: 1, dtype: object

CAN_ID = 0C6 - DoS Dataset:
CAN_ID_Inter_Arrival:
CAN_ID_Inter_Arrival_min      0.000037
CAN_ID_Inter_Arrival_max     10.010294
CAN_ID_Inter_Arrival_mean     0.011379
Name: 2, dtype: object
CAN_ID_Window_Count:
CAN_ID_Window_Count_min            356
CAN_ID_Window_Count_max           1000
CAN_ID