# 02 – Data Parsing & Exploration

In this notebook, we load all experimental log data (`client_log.csv` and `server_log.csv`) from the `data/` folder and consolidate them into a single, clean DataFrame.

This will make it easier to analyze performance differences across modes (`classic`, `hybrid`, `pqc`), setups (`docker`, `native`), and experiment IDs.

In [1]:
import pandas as pd
from pathlib import Path

## Load all experiment logs

We loop through the `data/docker/` and `data/native/` directories and collect:
- the `experiment` number
- the `mode` (classic, hybrid, pqc)
- the `side` (client or server)
- all measurement columns (`duration_sec`, `cpu_percent`, ...)

All of this will be merged into one big DataFrame.


In [2]:
base_path = Path("../data")
records = []

for setup_dir in base_path.iterdir():
    if not setup_dir.is_dir():
        continue
    setup = setup_dir.name

    for experiment_dir in setup_dir.iterdir():
        experiment = experiment_dir.name

        for mode_dir in experiment_dir.iterdir():
            if not mode_dir.is_dir():
                continue
            mode = mode_dir.name

            for log_file in mode_dir.glob("*_log.csv"):
                side = "client" if "client" in log_file.name else "server"

                df = pd.read_csv(log_file)
                df["experiment"] = experiment
                df["setup"] = setup
                df["mode"] = mode
                df["side"] = side

                records.append(df)

df_all = pd.concat(records, ignore_index=True)

In [3]:
df_all.to_csv(base_path / "all_measurements.csv")

## Preview the full dataset

Here we check the first few rows and the shape of our combined DataFrame.


In [4]:
print(f"Total rows: {len(df_all)}")
df_all.head()

Total rows: 42000


Unnamed: 0,run,mode,duration_sec,shared_secret_length,cpu_percent,ram_percent,success,error,experiment,setup,side,netem
0,1,classic,0.156922,32,6.4,10.1,1,,experiment_04,docker,client,
1,2,classic,0.108111,32,0.8,10.1,1,,experiment_04,docker,client,
2,3,classic,0.10475,32,0.0,10.1,1,,experiment_04,docker,client,
3,4,classic,0.10294,32,0.8,10.1,1,,experiment_04,docker,client,
4,5,classic,0.104275,32,0.0,10.1,1,,experiment_04,docker,client,


## Summary of available modes, setups, and experiments
This helps confirm that all data was loaded correctly and completely.


In [8]:
df_conclusion = df_all.groupby(["setup", "experiment", "mode", "side"]).size().unstack(fill_value=0)
df_conclusion.to_csv(base_path / "conclusion.csv")
df_conclusion

Unnamed: 0_level_0,Unnamed: 1_level_0,side,client,server
setup,experiment,mode,Unnamed: 3_level_1,Unnamed: 4_level_1
docker,experiment_01,classic,1000,1000
docker,experiment_01,hybrid,1000,1000
docker,experiment_01,pqc,1000,1000
docker,experiment_02,classic,1000,1000
docker,experiment_02,hybrid,1000,1000
docker,experiment_02,pqc,1000,1000
docker,experiment_03,classic,1000,1000
docker,experiment_03,hybrid,1000,1000
docker,experiment_03,pqc,1000,1000
docker,experiment_04,classic,1000,1000
