# Suspension Attack Influence on CAN_ID_Inter_Arrival Analysis

This notebook compares `CAN_ID_Inter_Arrival` between two datasets:
- **IA_WC_normal.csv**: Normal CAN bus traffic.
- **suspension_can_id_inter_arrival.csv**: Normal traffic + suspension attack (random `CAN_ID` messages deleted for ~10s).

**Goal**: Analyze the suspension attack's impact on `CAN_ID_Inter_Arrival` to enhance detectability for Intrusion Detection Systems (IDS).

**Datasets**:
- **Columns**:
  - Normal: `Timestamp` (float), `Interface` (triple-quoted string), `CAN_ID` (triple-quoted string), `Payload` (triple-quoted string), `CAN_ID_Inter_Arrival` (float), `CAN_ID_Window_Count` (integer).
  - Suspension: `Timestamp` (float), `Interface` (triple-quoted string), `CAN_ID` (triple-quoted string), `Payload` (triple-quoted string), `CAN_ID_Inter_Arrival` (float, 5 decimal places).
- **Normal Traffic**: ~386,567 messages over ~275s, `CAN_ID_Inter_Arrival` ~0.001–0.1s (mean ~0.039s, max ~3.05s, 55 `CAN_IDs`).
- **Suspension Attack**: Deletes messages for a random `CAN_ID` for ~10s, causing a `CAN_ID_Inter_Arrival` >10s (e.g., ~10.12039s) for that `CAN_ID`.
- **CAN_ID_Inter_Arrival**: Time between consecutive messages of the same `CAN_ID`; first message uses 0.00001s (10 µs) in suspension dataset, mean in normal dataset (no 0.0 values).

**Steps**:
1. Load and preprocess datasets (clean triple-quoted strings for `Interface`, `CAN_ID`, `Payload`).
2. Compute summary statistics for `CAN_ID_Inter_Arrival` per `CAN_ID`.
3. Visualize `CAN_ID_Inter_Arrival` (histograms, box plots).
4. Detect suspension attack by identifying `CAN_IDs` with `max CAN_ID_Inter_Arrival` >5s.
5. Summarize findings and verify detection.

In [4]:
# Import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import os
import csv

# Set plot style
sns.set_style('whitegrid')
%matplotlib inline

# Define paths
project_dir = r'C:\Users\pc\OneDrive\Images\Bureau\VS_code_Projects\MLproject_Predictive_Maintenance_for_Vehicles_Using_CAN_Bus_Data'
data_dir = os.path.join(project_dir, 'dataSet', 'raw', 'suspension')
plots_dir = os.path.join(project_dir, 'dataSet', 'raw', 'suspension', 'plots')
os.makedirs(plots_dir, exist_ok=True)

normal_file = os.path.join(data_dir, 'normal_can_id_inter_arrival.csv')
suspension_file = os.path.join(data_dir, 'suspension_can_id_inter_arrival.csv')

## 1. Load and Preprocess Datasets

Load both CSV files and clean triple-quoted strings (`"""..."""`) for `Interface`, `CAN_ID`, and `Payload`. `Timestamp` is a float, `CAN_ID_Inter_Arrival` is a float (string in suspension dataset due to 5-decimal formatting).

In [5]:
# Load datasets
normal_df = pd.read_csv(normal_file)
suspension_df = pd.read_csv(suspension_file)

# Clean triple-quoted strings for Interface, CAN_ID, Payload
for col in ['Interface', 'CAN_ID', 'Payload']:
    normal_df[col] = normal_df[col].str.replace('="', '').str.replace('"', '')
    suspension_df[col] = suspension_df[col].str.replace('"""', '')

# Convert CAN_ID_Inter_Arrival to float in suspension dataset
suspension_df['CAN_ID_Inter_Arrival'] = suspension_df['CAN_ID_Inter_Arrival'].astype(float)

# Verify columns
normal_columns = ['Timestamp', 'Interface', 'CAN_ID', 'Payload', 'CAN_ID_Inter_Arrival', 'CAN_ID_Window_Count']
suspension_columns = ['Timestamp', 'Interface', 'CAN_ID', 'Payload', 'CAN_ID_Inter_Arrival']
if not all(col in normal_df.columns for col in normal_columns):
    print(f"Error: Normal dataset missing columns. Expected: {normal_columns}, Found: {normal_df.columns.tolist()}")
if not all(col in suspension_df.columns for col in suspension_columns):
    print(f"Error: Suspension dataset missing columns. Expected: {suspension_columns}, Found: {suspension_df.columns.tolist()}")

# Check data types
print("Normal Dataset Data Types:")
print(normal_df.dtypes)
print("\nSuspension Dataset Data Types:")
print(suspension_df.dtypes)

# Check first few rows
print("\nNormal Dataset Sample:")
print(normal_df.head())
print("\nSuspension Dataset Sample:")
print(suspension_df.head())

Error: Normal dataset missing columns. Expected: ['Timestamp', 'Interface', 'CAN_ID', 'Payload', 'CAN_ID_Inter_Arrival', 'CAN_ID_Window_Count'], Found: ['Timestamp', 'Interface', 'CAN_ID', 'Payload', 'CAN_ID_Inter_Arrival']
Normal Dataset Data Types:
Timestamp               float64
Interface                object
CAN_ID                   object
Payload                  object
CAN_ID_Inter_Arrival    float64
dtype: object

Suspension Dataset Data Types:
Timestamp               float64
Interface                object
CAN_ID                   object
Payload                  object
CAN_ID_Inter_Arrival    float64
dtype: object

Normal Dataset Sample:
      Timestamp Interface CAN_ID           Payload  CAN_ID_Inter_Arrival
0  1.508687e+09    slcan0    12E  C680027FD0FFFF00               0.00001
1  1.508687e+09    slcan0    090          1A000000               0.00001
2  1.508687e+09    slcan0    0C6  7512800A8008BAAC               0.00001
3  1.508687e+09    slcan0    242    0000FFEFFE000D   

## 2. Summary Statistics

Compute statistics for `CAN_ID_Inter_Arrival` across all `CAN_IDs` and per `CAN_ID` to identify large inter-arrival times (>5s) indicative of the suspension attack.

In [6]:
# Summary statistics for CAN_ID_Inter_Arrival
stats_data = {}

# Normal Dataset statistics
stats_data['Normal_CAN_ID_Inter_Arrival'] = normal_df['CAN_ID_Inter_Arrival'].describe()

# Suspension Dataset statistics
stats_data['Suspension_CAN_ID_Inter_Arrival'] = suspension_df['CAN_ID_Inter_Arrival'].describe()

# Convert to DataFrame for CSV saving
stats_df = pd.DataFrame(stats_data)

# Save to CSV
stats_df.to_csv(os.path.join(data_dir, 'suspension_summary_statistics.csv'))

# Print the statistics
print("Normal Dataset - CAN_ID_Inter_Arrival Statistics:")
print(normal_df['CAN_ID_Inter_Arrival'].describe())
print("\nSuspension Dataset - CAN_ID_Inter_Arrival Statistics:")
print(suspension_df['CAN_ID_Inter_Arrival'].describe())

# Per-CAN_ID statistics
normal_stats = normal_df.groupby('CAN_ID')['CAN_ID_Inter_Arrival'].agg(['min', 'max', 'mean']).reset_index()
suspension_stats = suspension_df.groupby('CAN_ID')['CAN_ID_Inter_Arrival'].agg(['min', 'max', 'mean']).reset_index()

# Rename columns
normal_stats.columns = ['CAN_ID', 'CAN_ID_Inter_Arrival_min', 'CAN_ID_Inter_Arrival_max', 'CAN_ID_Inter_Arrival_mean']
suspension_stats.columns = ['CAN_ID', 'CAN_ID_Inter_Arrival_min', 'CAN_ID_Inter_Arrival_max', 'CAN_ID_Inter_Arrival_mean']

# Quote CAN_ID to prevent scientific notation
normal_stats['CAN_ID'] = normal_stats['CAN_ID'].apply(lambda x: f'"{x}"')
suspension_stats['CAN_ID'] = suspension_stats['CAN_ID'].apply(lambda x: f'"{x}"')

# Save to CSV
normal_stats.to_csv(os.path.join(data_dir, 'normal_can_id_inter_arrival_stats.csv'), index=False, quoting=csv.QUOTE_NONNUMERIC)
suspension_stats.to_csv(os.path.join(data_dir, 'suspension_can_id_inter_arrival_stats.csv'), index=False, quoting=csv.QUOTE_NONNUMERIC)

# Detect suspension attack (CAN_ID_Inter_Arrival > 5s)
suspension_threshold = 5.0
suspicious_can_ids = suspension_stats[suspension_stats['CAN_ID_Inter_Arrival_max'] > suspension_threshold]['CAN_ID']
print("\nSuspicious CAN_IDs (CAN_ID_Inter_Arrival_max > 5s, possible suspension attack):")
if not suspicious_can_ids.empty:
    for can_id in suspicious_can_ids:
        max_inter_arrival = suspension_stats[suspension_stats['CAN_ID'] == can_id]['CAN_ID_Inter_Arrival_max'].iloc[0]
        print(f"CAN_ID = {can_id.strip('\"')}: max_inter_arrival = {max_inter_arrival:.5f}s")
else:
    print("No CAN_IDs with CAN_ID_Inter_Arrival_max > 5s found.")

# Print per-CAN_ID statistics
print("\nNormal Dataset - Per-CAN_ID Statistics:")
for _, row in normal_stats.iterrows():
    print(f"CAN_ID = {row['CAN_ID'].strip('\"')}:")
    print(f"  min: {row['CAN_ID_Inter_Arrival_min']:.5f}s, max: {row['CAN_ID_Inter_Arrival_max']:.5f}s, mean: {row['CAN_ID_Inter_Arrival_mean']:.5f}s")

print("\nSuspension Dataset - Per-CAN_ID Statistics:")
for _, row in suspension_stats.iterrows():
    print(f"CAN_ID = {row['CAN_ID'].strip('\"')}:")
    print(f"  min: {row['CAN_ID_Inter_Arrival_min']:.5f}s, max: {row['CAN_ID_Inter_Arrival_max']:.5f}s, mean: {row['CAN_ID_Inter_Arrival_mean']:.5f}s")

Normal Dataset - CAN_ID_Inter_Arrival Statistics:
count    386567.000000
mean          0.039109
std           0.084165
min           0.000010
25%           0.010020
50%           0.019920
75%           0.049570
max           3.050460
Name: CAN_ID_Inter_Arrival, dtype: float64

Suspension Dataset - CAN_ID_Inter_Arrival Statistics:
count    115472.000000
mean          0.039224
std           0.088689
min           0.000010
25%           0.010020
50%           0.019930
75%           0.049720
max          10.000400
Name: CAN_ID_Inter_Arrival, dtype: float64

Suspicious CAN_IDs (CAN_ID_Inter_Arrival_max > 5s, possible suspension attack):
CAN_ID = 2C6: max_inter_arrival = 10.00040s

Normal Dataset - Per-CAN_ID Statistics:
CAN_ID = 090:
  min: 0.00001s, max: 0.02419s, mean: 0.01000s
CAN_ID = 0C6:
  min: 0.00001s, max: 0.02396s, mean: 0.01000s
CAN_ID = 12E:
  min: 0.00001s, max: 0.02436s, mean: 0.01000s
CAN_ID = 186:
  min: 0.00001s, max: 0.02486s, mean: 0.01000s
CAN_ID = 18A:
  min: 0.00001s, 

## 3. Visualize CAN_ID_Inter_Arrival

Create histograms and box plots for `CAN_ID_Inter_Arrival` to compare normal and suspension datasets. Use log scale for histograms due to large range (0.00001s to >10s).

In [None]:
# Histogram of CAN_ID_Inter_Arrival
plt.figure(figsize=(12, 6))
sns.histplot(data=normal_df, x='CAN_ID_Inter_Arrival', label='Normal', color='blue', bins=50, log_scale=True, alpha=0.5)
sns.histplot(data=suspension_df, x='CAN_ID_Inter_Arrival', label='Suspension', color='red', bins=50, log_scale=True, alpha=0.5)
plt.xlabel('CAN_ID_Inter_Arrival (seconds, log scale)')
plt.ylabel('Count')
plt.title('CAN_ID_Inter_Arrival Distribution: Normal vs Suspension')
plt.legend()
plt.savefig(os.path.join(plots_dir, 'inter_arrival_histogram.png'))
plt.show()

# Box plot of CAN_ID_Inter_Arrival by CAN_ID (top 10 CAN_IDs by count in suspension dataset)
top_can_ids = suspension_df['CAN_ID'].value_counts().index[:10]
plt.figure(figsize=(14, 6))
sns.boxplot(x='CAN_ID', y='CAN_ID_Inter_Arrival', data=suspension_df[suspension_df['CAN_ID'].isin(top_can_ids)])
plt.yscale('log')
plt.xlabel('CAN_ID')
plt.ylabel('CAN_ID_Inter_Arrival (seconds, log scale)')
plt.title('CAN_ID_Inter_Arrival by CAN_ID (Top 10, Suspension Dataset)')
plt.xticks(rotation=45)
plt.savefig(os.path.join(plots_dir, 'inter_arrival_boxplot_suspension.png'))
plt.show()

## 4. Suspension Attack Detection

Identify `CAN_IDs` with `max CAN_ID_Inter_Arrival` >5s as suspicious. Verify by examining the targeted `CAN_ID`'s inter-arrival times.

In [None]:
# Detailed analysis of suspicious CAN_IDs
if not suspicious_can_ids.empty:
    print("\nDetailed Analysis of Suspicious CAN_IDs:")
    for can_id in suspicious_can_ids:
        can_id_clean = can_id.strip('\"')
        suspicious_data = suspension_df[suspension_df['CAN_ID'] == can_id_clean]
        large_inter_arrivals = suspicious_data[suspicious_data['CAN_ID_Inter_Arrival'] > suspension_threshold]
        print(f"\nCAN_ID = {can_id_clean}:")
        print(f"Number of large inter-arrivals (>5s): {len(large_inter_arrivals)}")
        if not large_inter_arrivals.empty:
            print("Large inter-arrival instances:")
            print(large_inter_arrivals[['Timestamp', 'CAN_ID_Inter_Arrival']].to_string(index=False))
else:
    print("\nNo suspicious CAN_IDs detected (no CAN_ID_Inter_Arrival > 5s).")

## 5. Summary and Findings

- **Normal Dataset**: `CAN_ID_Inter_Arrival` ranges ~0.001–0.1s (mean ~0.039s, max ~3.05s), indicating regular message intervals.
- **Suspension Dataset**: Expect one `CAN_ID` with `max CAN_ID_Inter_Arrival` >10s (e.g., ~10.12039s) due to a 10s message deletion, while others remain <0.1s.
- **Detection**: `CAN_ID_Inter_Arrival` >5s effectively identifies the targeted `CAN_ID`, distinguishing the suspension attack from normal traffic.
- **IDS Application**: Monitoring `CAN_ID_Inter_Arrival` is efficient for real-time detection of suspension attacks, complementing DoS detection (small inter-arrivals ~0.00025s).