## Generate a data sample for SVDTime Neural Network training

This script generates a toy data sample for neural network training.
The result is stored in a ROOT file. 

TO DO / FIX: 
* We can make the generation faster using C++. 
* root_pandas doesn't work. HDF5 works perfectly, but requires complicated installation on linux. So we stay with pickle (*.pkl) for the time being.

Packages required:
- pandas
- SVDSiimBase

In [1]:
import pandas as pd
from svd.SVDSimBase import *

Welcome to JupyROOT 6.08/06


### Sample generation

Generate a pandas dataframe containing a large number of waveform samples and truth data, and store it in HDF5.
The data will be used as training and test data.
Waveform widths (tau), amplitudes and time shifts are sampled uniformly from a large set of feasible values.

In [2]:
n_samples = 1000000
generator = SampleGenerator(
    (-1.5*dt, 1.0*dt), 
    (tau_hao2real(raw_tau_min), tau_hao2real(raw_tau_max)), 
    (3,100), 
    (1,5),
    tau_sigma = 0,
    bin_size = 3)
print('Generating {0} samples...'.format(n_samples))
sample = generator.generate(n_samples)

# Create a bin table
timearray = generator.get_t0_array()
timebins = generator.get_t0_bins()
bins = pd.DataFrame({
    'midpoint' : timearray,
    'lower' : timebins.values[:-1],
    'upper' : timebins.values[1:]
})

# Create a table of simulation bounds
bounds = pd.DataFrame({
    'value': np.array(['t0', 'amplitude', 'tau', 'sigma']),
    'sampling': np.array(['uniform', 'uniform', 'uniform', 'uniform']),
    'low'  : [
        generator.get_t0_bounds()[0], 
        generator.get_amp_bounds()[0], 
        tau_hao2real(raw_tau_min),
        generator.get_sigma_bounds()[0]
    ],
    'high' : [
        generator.get_t0_bounds()[1], 
        generator.get_amp_bounds()[1], 
        tau_hao2real(raw_tau_max),
        generator.get_sigma_bounds()[1]
    ]
})
orderedcols = ['value', 'sampling', 'low', 'high']
bounds = bounds[orderedcols]

print('Samples created.')

Generating 1000000 samples...
Samples created.


In [3]:
output_name = 'SVDTime_Training3_{0}_{1}.pkl'

# There will be three trees: sample, bins, bounds.

sample.to_pickle(output_name.format('Sample', n_samples))
bins.to_pickle(output_name.format('Bins', n_samples))
bounds.to_pickle(output_name.format('Bounds', n_samples))

print('Done.')

Done.


In [4]:
sample.head()

Unnamed: 0,test,amplitude,t0,tau,sigma,s1,s2,s3,s4,s5,s6,normed_tau,t0_bin,abin
0,0.267836,14.456133,-7.404476,225.897068,4.139629,0.0,0.0,14.010917,12.561512,10.387404,8.454864,13.321875,13,12
1,0.76131,90.094752,-0.203702,244.035236,2.352088,0.0,0.0,65.898884,90.557821,75.252274,55.695186,25.833429,15,88
2,0.389157,98.08342,-39.552291,328.020822,4.495098,8.231188,68.519079,95.882218,93.212643,79.419842,63.624859,83.765971,3,96
3,0.646701,94.73681,29.742417,284.786097,4.981928,0.0,0.0,0.0,62.62635,93.7388,88.519938,53.943026,25,92
4,0.361906,3.346757,-38.118085,239.676184,1.654655,0.0,0.0,4.230489,4.834844,4.230489,0.0,22.826592,3,1
