# Lambda CSV analysis example


In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Working with multiple files as one Data Frame

```bash
# EDM4EIC with root-tree
k_lambda_5x41_5000evt_001.edm4eic.root
k_lambda_5x41_5000evt_002.edm4eic.root
...

# Corresponding CSV files:
k_lambda_5x41_5000evt_001.mcdis.csv
k_lambda_5x41_5000evt_001.mcpart_lambda.csv
k_lambda_5x41_5000evt_002.mcdis.csv
k_lambda_5x41_5000evt_002.mcpart_lambda.csv
...
```

Each file such as `k_lambda_5x41_5000evt_001.edm4eic.root` is based on the processing 5000-events.
It is convenient to have files split in small chunks like this, but
analysis-wise 5k events are not statistically significant.
So to get results, we want to combine several files in one dataframe.
In general it is simple with pandas, but we have one problem.
When we have multiple CSV files from different runs or datasets, each file starts its event numbering from 0:

```
File 1: evt = [0, 1, 2, 3, 4, ...]
File 2: evt = [0, 1, 2, 3, 4, ...]  ← ID Collision!
File 3: evt = [0, 1, 2, 3, 4, ...]  ← ID Collision!
```

**Problem**: Event 0 from File 1 is completely different from Event 0 from File 2, but they have the same ID!

**Solution**: Global Unique Event IDs

We need to create globally unique event IDs across all files that we open:


In [4]:
def concat_csvs_with_unique_events(files):
    """Load and concatenate CSV files with globally unique event IDs"""
    dfs = []
    offset = 0

    for file in files:
        df = pd.read_csv(file)
        df['evt'] = df['evt'] + offset
        offset = df['evt'].max() + 1
        dfs.append(df)

    return pd.concat(dfs, ignore_index=True)


In [10]:


files_5x41 = [
    r"C:\data\meson-structure\csv\k_lambda_5x41_5000evt_001.reco_dis.csv.zip",
    r"C:\data\meson-structure\csv\k_lambda_5x41_5000evt_002.reco_dis.csv.zip",
    r"C:\data\meson-structure\csv\k_lambda_5x41_5000evt_003.reco_dis.csv.zip",
]

# df = pd.read_csv(r"C:\data\meson-structure\csv\k_lambda_5x41_5000evt_001.reco_dis.csv.zip")

reco_dis_5x41 = concat_csvs_with_unique_events(files_5x41)
reco_dis_5x41


Unnamed: 0,evt,da_x,da_q2,da_y,da_nu,da_w,esigma_x,esigma_q2,esigma_y,esigma_nu,...,sigma_x,sigma_q2,sigma_y,sigma_nu,sigma_w,mc_x,mc_q2,mc_y,mc_nu,mc_w
0,0,0.331693,105.5530,0.387979,169.581,14.61350,0.485111,127.5890,0.320661,140.157,...,0.485111,153.9690,0.386960,169.136,12.8180,0.646321,126.959642,0.239083,104.678778,70.355051
1,1,0.246381,74.8769,0.370522,161.951,15.16280,0.395020,94.8294,0.292682,127.928,...,0.395020,120.1690,0.370891,162.112,13.5986,0.644567,96.148082,0.181773,79.490282,53.899290
2,2,2.802830,1040.9500,0.452801,197.914,,0.603230,482.4470,0.975078,426.196,...,0.603230,221.6230,0.447925,195.783,12.1100,0.767213,483.945779,0.768830,336.141239,147.718422
3,3,0.609999,209.3620,0.418447,182.898,11.60750,0.630800,211.3010,0.408398,178.506,...,0.630800,206.2000,0.398537,174.196,11.0257,0.621466,213.257824,0.418433,182.864142,130.775150
4,4,1.829470,673.2550,0.448669,196.108,,0.660962,405.0320,0.747110,326.553,...,0.660962,245.6540,0.453126,198.056,11.2644,0.725492,411.711279,0.690854,302.413703,156.661713
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14995,14995,0.640042,218.1870,0.415617,181.661,11.11700,0.165831,111.4890,0.819667,358.267,...,0.165831,77.8563,0.572401,250.190,19.8120,0.562334,208.208064,0.451584,197.307968,162.929386
14996,14996,2.107920,769.3180,0.444964,194.489,,0.560336,396.6230,0.862983,377.200,...,0.560336,204.3840,0.444705,194.375,12.6984,0.683973,393.588067,0.701501,306.651070,182.736531
14997,14997,0.816297,287.2580,0.429039,187.528,8.09482,0.638620,254.8250,0.486488,212.639,...,0.638620,230.4950,0.440039,192.336,11.4592,0.655625,265.535963,0.493848,215.828555,140.356185
14998,14998,0.633263,296.5170,0.570871,249.521,13.13770,1.071600,388.5910,0.442112,193.242,...,1.071600,471.9340,0.536933,234.688,,0.846016,255.877311,0.368538,161.173884,47.452937
