## Generate a data sample for SVDTime Neural Network training

This script generates a toy data sample for neural network training.
The result is stored in a ROOT file. 

TO DO / FIX: 
* We can make the generation faster using C++. 
* root_pandas doesn't work. HDF5 works perfectly, but requires complicated installation on linux. So we stay with pickle (*.pkl) for the time being.

Packages required:
- pandas
- SVDSiimBase

In [1]:
import pandas as pd
from svd.SVDSimBase import *

Welcome to JupyROOT 6.08/06


### Sample generation

Generate a pandas dataframe containing a large number of waveform samples and truth data, and pickle it.
The data will be used as training and test data.
Waveform widths (tau), amplitudes and time shifts are sampled uniformly from a large set of feasible values.
Additionally, waveform widths (tau) are jittered using a normal distribution with 5 ns width: that is, the "true" values are slightly off. This is to robustify against imprecise knowledge of waveform width or shape.

In [2]:
n_samples = 1000000
generator = SampleGenerator(
    (-2.5*dt, 1.5*dt), 
    (tau_hao2real(raw_tau_min), tau_hao2real(raw_tau_max)), 
    (3,100), 
    (1,4),
    5.0,
    3)
print('Generating {0} samples...'.format(n_samples))
sample = generator.generate(n_samples)

# Create a bin table
timearray = generator.get_t0_array()
timebins = generator.get_t0_bins()
bins = pd.DataFrame({
    'midpoint' : timearray,
    'lower' : timebins.values[:-1],
    'upper' : timebins.values[1:]
})

# Create a table of simulation bounds
bounds = pd.DataFrame({
    'value': np.array(['t0', 'amplitude', 'tau', 'sigma']),
    'sampling': np.array(['uniform', 'uniform', 'uniform', 'uniform']),
    'low'  : [
        generator.get_t0_bounds()[0], 
        generator.get_amp_bounds()[0], 
        tau_hao2real(raw_tau_min),
        generator.get_sigma_bounds()[0]
    ],
    'high' : [
        generator.get_t0_bounds()[1], 
        generator.get_amp_bounds()[1], 
        tau_hao2real(raw_tau_max),
        generator.get_sigma_bounds()[1]
    ]
})
orderedcols = ['value', 'sampling', 'low', 'high']
bounds = bounds[orderedcols]

print('Samples created.')

Generating 100000 samples...
Samples created.


In [3]:
output_name = 'SVDTime_Training{0}_{1}.pkl'

# There will be three trees: sample, bins, bounds.

sample.to_pickle(output_name.format('Sample', n_samples))
bins.to_pickle(output_name.format('Bins', n_samples))
bounds.to_pickle(output_name.format('Bounds', n_samples))

print('Done.')

Done.


In [4]:
sample.head()

Unnamed: 0,test,amplitude,t0,tau,sigma,s1,s2,s3,s4,s5,s6,normed_tau,t0_bin,abin
0,0.381214,66.205163,-16.50769,350.80962,1.339915,0.0,15.672636,51.495806,64.929494,65.67581,56.720018,99.48549,13,64
1,0.479518,94.376003,-32.060522,301.452437,1.583958,0.0,58.713675,89.017508,90.280167,73.865591,56.188356,65.439326,10,92
2,0.243131,62.374614,-5.334261,347.403432,3.99614,0.0,3.753623,37.285984,60.558444,60.057961,53.30144,97.135931,15,60
3,0.286232,71.052203,-44.876416,217.046846,2.080566,19.706177,67.289386,66.328109,49.505763,32.202778,21.148093,7.217067,7,69
4,0.430956,93.181003,33.732367,281.486897,1.713032,0.0,0.0,0.0,54.28969,91.650337,86.396496,51.667266,23,91
