## 数据处理

In [None]:
import pandas as pd

In [None]:
data = pd.read_csv('data.csv')

In [None]:
data

In [None]:
print(data)

## AI解读代码

```python
statistics_threshold_tuning_data = pd.DataFrame(pd.concat([
        pd.read_csv(os.path.join(dir_path, file)).groupby(['MOBIL_Left_Time_Window', 'MOBIL_Right_Time_Window', 'Gipps_Left_Time_Window', 'Gipps_Right_Time_Window']).agg(
            MOBIL_left_timewindow_zero=('MOBIL_Left_Time_Window', lambda x: (x == 0).sum()),
            MOBIL_left_timewindow_non_zero=('MOBIL_Left_Time_Window', lambda x: (x != 0).sum()),
            MOBIL_right_timewindow_zero=('MOBIL_Right_Time_Window', lambda x: (x == 0).sum()),
            MOBIL_right_timewindow_non_zero=('MOBIL_Right_Time_Window', lambda x: (x != 0).sum()),
            Gipps_left_timewindow_zero=('Gipps_Left_Time_Window', lambda x: (x == 0).sum()),
            Gipps_left_timewindow_non_zero=('Gipps_Left_Time_Window', lambda x: (x != 0).sum()),
            Gipps_right_timewindow_zero=('Gipps_Right_Time_Window', lambda x: (x == 0).sum()),        
            Gipps_right_timewindow_non_zero=('Gipps_Right_Time_Window', lambda x: (x != 0).sum())
        ).assign(
        Threshold=os.path.basename(file).split('_')[-1].split('.csv')[0]
        )
        for file in os.listdir(dir_path) if file.startswith('20251103_statistics_threshold_') and file.endswith('.csv')
    ]).groupby('Threshold').sum().reset_index()
).assign(
    Total_lane_changed_vehicles=lambda df: df['MOBIL_left_timewindow_zero'] + df['MOBIL_left_timewindow_non_zero'],
    MOBIL_left_timewindow_zero_rate=lambda df: df['MOBIL_left_timewindow_zero'] / (df['MOBIL_left_timewindow_zero'] + df['MOBIL_left_timewindow_non_zero']),
    MOBIL_right_timewindow_zero_rate=lambda df: df['MOBIL_right_timewindow_zero'] / (df['MOBIL_right_timewindow_zero'] + df['MOBIL_right_timewindow_non_zero']),
    Gipps_left_timewindow_zero_rate=lambda df: df['Gipps_left_timewindow_zero'] / (df['Gipps_left_timewindow_zero'] + df['Gipps_left_timewindow_non_zero']),
    Gipps_right_timewindow_zero_rate=lambda df: df['Gipps_right_timewindow_zero'] / (df['Gipps_right_timewindow_zero'] + df['Gipps_right_timewindow_non_zero']),
)[
    [
        'Threshold', 'Total_lane_changed_vehicles',
        'MOBIL_left_timewindow_zero_rate', 'MOBIL_left_timewindow_zero', 'MOBIL_left_timewindow_non_zero',
        'MOBIL_right_timewindow_zero_rate', 'MOBIL_right_timewindow_zero', 'MOBIL_right_timewindow_non_zero', 
        'Gipps_left_timewindow_zero_rate', 'Gipps_left_timewindow_zero', 'Gipps_left_timewindow_non_zero',
        'Gipps_right_timewindow_zero_rate', 'Gipps_right_timewindow_zero', 'Gipps_right_timewindow_non_zero'
    ]
]
```

### Explanation of the Code Logic

This code processes multiple CSV files containing traffic simulation data (likely from MOBIL and Gipps lane-changing models) to aggregate statistics on time windows for different thresholds. It calculates counts and rates of zero vs. non-zero time windows, summarizing performance across thresholds. The logic is broken down step-by-step below.

#### 1. **File Selection and Iteration**
   - `for file in os.listdir(dir_path) if file.startswith('20251103_statistics_threshold_') and file.endswith('.csv')`
   - Selects CSV files in `dir_path` that match the pattern (e.g., `20251103_statistics_threshold_0.5.csv`), each representing a different threshold value for the simulation.

#### 2. **Per-File Processing**
   - `pd.read_csv(os.path.join(dir_path, file))`: Loads each CSV file into a DataFrame.
   - `.groupby(['MOBIL_Left_Time_Window', 'MOBIL_Right_Time_Window', 'Gipps_Left_Time_Window', 'Gipps_Right_Time_Window'])`: Groups rows by the four time window columns (likely representing time gaps for lane changes in left/right directions for each model).
   - `.agg(...)`: Aggregates within each group:
     - Counts zero values: `lambda x: (x == 0).sum()` (e.g., `MOBIL_left_timewindow_zero`).
     - Counts non-zero values: `lambda x: (x != 0).sum()` (e.g., `MOBIL_left_timewindow_non_zero`).
     - This creates columns for each time window type, showing how many vehicles had zero or non-zero time windows in that group.
   - `.assign(Threshold=...)`: Adds a `Threshold` column by extracting the value from the filename (e.g., "0.5" from `20251103_statistics_threshold_0.5.csv`).

#### 3. **Concatenation Across Files**
   - `pd.concat([...])`: Combines the processed DataFrames from all files into one large DataFrame. Each row represents a unique combination of time windows for a specific threshold.

#### 4. **Aggregation by Threshold**
   - `.groupby('Threshold').sum().reset_index()`: Groups by `Threshold` and sums the counts (e.g., total zero/non-zero across all groups for each threshold). This consolidates data per threshold.

#### 5. **Additional Calculations**
   - `.assign(...)`: Adds new columns:
     - `Total_lane_changed_vehicles`: Total vehicles that changed lanes (sum of MOBIL left zero + non-zero).
     - Rate columns (e.g., `MOBIL_left_timewindow_zero_rate`): Proportion of zero time windows (zero / (zero + non-zero)). This indicates the frequency of "immediate" lane changes (no time gap).
     - Similar for right, Gipps left, and Gipps right.

#### 6. **Column Selection**
   - `[['Threshold', 'Total_lane_changed_vehicles', ...]]`: Selects only the relevant columns for output, focusing on thresholds, totals, rates, and raw counts.

#### Overall Purpose
- **Goal**: Evaluate how different thresholds affect time windows in lane-changing simulations. Zero time windows suggest aggressive/immediate changes; rates help tune thresholds for realistic behavior.
- **Output**: A DataFrame with one row per threshold, showing aggregated counts and rates for analysis (e.g., plotting rates vs. thresholds to find optimal values).
- **Assumptions**: CSVs have consistent columns; time windows are numeric; filenames follow the exact pattern.

If you need visualization code, error handling, or modifications (e.g., for different models), provide more details!

力扣题：动态规划