# Global values for debiasing and standardization
**Purpose**: The purpose of this notebook is to calculate:
- Global mean $(x, y)$ ring centre locations to use in debiasing the data
- Global mean and standard deviation values for momentum to be used in standardizing this feature before it is fed into the neural networks
- Global mean and standard deviation values for ring radius to be used in standardizing this feature before it is fed into the neural networks

**Author**: Nico Van den Hooff

In [1]:
import pandas as pd
import os
import h5py
import numpy as np
import matplotlib.pyplot as plt

# Unfiltered data

In [2]:
df = pd.read_hdf("/fast_scratch_1/capstone_2022/datasetC_combined.h5")

In [3]:
df.describe().round(3).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
run_id,1731226.0,8999.629,21.041,8968.0,8982.0,8998.0,9014.0,9040.0
burst_id,1731226.0,743.398,433.352,1.0,366.0,738.0,1109.0,1646.0
event_id,1731226.0,1144229.514,626307.273,9989.0,610397.5,1151238.5,1672073.0,2866610.0
track_id,1731226.0,0.08,0.289,0.0,0.0,0.0,0.0,9.0
track_momentum,1731226.0,31.298,7.543,15.0,25.641,31.541,37.292,45.0
chod_time,1731226.0,14.503,8.467,-24.897,8.146,14.624,21.175,49.806
ring_radius,1731220.0,4710.062,1507327.5,0.0,171.259,178.452,182.505,1366392000.0
ring_centre_pos_x,1731220.0,8848.921,1307900.5,-938928100.0,-173.313,-100.625,-38.024,1183359000.0
ring_centre_pos_y,1731220.0,7550.644,758649.875,-683145300.0,-73.84,3.221,77.719,307175400.0
ring_likelihood_pion,1731226.0,0.525,0.482,0.0,0.0,0.773,1.0,1.0


# Filtered data

In [4]:
filtered_df = df.query("ring_radius < 500 and ring_radius > 0")
filtered_df = filtered_df.query("ring_centre_pos_x < 2500 and ring_centre_pos_x > -2500")
filtered_df = filtered_df.query("ring_centre_pos_y < 2500 and ring_centre_pos_y > -2500")

In [5]:
filtered_df.describe().round(3).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
run_id,1716936.0,8999.627,21.041,8968.0,8982.0,8998.0,9014.0,9040.0
burst_id,1716936.0,743.385,433.38,1.0,366.0,738.0,1109.0,1646.0
event_id,1716936.0,1144319.562,626325.349,9989.0,610483.75,1151292.0,1672140.5,2866610.0
track_id,1716936.0,0.08,0.288,0.0,0.0,0.0,0.0,9.0
track_momentum,1716936.0,31.339,7.523,15.0,25.704,31.58,37.316,45.0
chod_time,1716936.0,14.495,8.457,-24.679,8.142,14.615,21.165,49.806
ring_radius,1716936.0,174.972,12.013,12.728,171.327,178.459,182.489,445.424
ring_centre_pos_x,1716936.0,-110.251,78.601,-410.247,-173.849,-101.88,-39.185,222.039
ring_centre_pos_y,1716936.0,1.139,82.92,-393.071,-74.338,1.971,76.368,316.534
ring_likelihood_pion,1716936.0,0.527,0.482,0.0,0.0,0.8,1.0,1.0


In [6]:
print("Before removing outliers:")
print("Ring centre mean x:", df["ring_centre_pos_x"].mean())
print("Ring centre mean y:", df["ring_centre_pos_y"].mean())
print("Momentum mean:", df["track_momentum"].mean())
print("Momentum std:", df["track_momentum"].std())
print("Ring radii mean:", df["ring_radius"].mean())
print("Ring radii std:", df["ring_radius"].std())

Before removing outliers:
Ring centre mean x: 8848.921
Ring centre mean y: 7550.6436
Momentum mean: 31.298187
Momentum std: 7.543085
Ring radii mean: 4710.0625
Ring radii std: 1507327.5


In [7]:
print("After removing outliers:")
print("Ring centre mean x:", filtered_df["ring_centre_pos_x"].mean())
print("Ring centre mean y:", filtered_df["ring_centre_pos_y"].mean())
print("Momentum mean:", filtered_df["track_momentum"].mean())
print("Momentum std:", filtered_df["track_momentum"].std())
print("Ring radii mean:", filtered_df["ring_radius"].mean())
print("Ring radii std:", filtered_df["ring_radius"].std())

After removing outliers:
Ring centre mean x: -110.25132
Ring centre mean y: 1.1389542
Momentum mean: 31.338661
Momentum std: 7.523443
Ring radii mean: 174.97235
Ring radii std: 12.013085
