# Getting started with WFSim

Hello friend.
Welcome to the basic tutorial on how to simulate waveforms with the latest wfsim version in strax.
Here we'll just demonstrate the basic functionality. For more in depth analysis stuff, checkout the straxen tutorials for more detailed thing.

In [1]:
import numpy as np
import strax
import straxen
import wfsim

import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
from multihist import Histdd, Hist1d
from scipy import stats

import json

## Setting everything up

First we need to define the right context. The thing which differs now is where to get the plugin to provide raw records. By default this is the DAQ Reader. Depending on which detector you want (1T or nT) we have two different plugins you'll need to register. One is RawRecordsFromFax1T, the other RawRecordsFromFaxNT.

For 1T do:

In [2]:
st = strax.Context(
        storage=strax.DataDirectory('./strax_data'),
        register=wfsim.RawRecordsFromFax1T,
        config=dict(detector='XENON1T',
                    fax_config='https://raw.githubusercontent.com/XENONnT/'
                               'strax_auxiliary_files/master/fax_files/fax_config_1t.json',
                    **straxen.contexts.x1t_common_config,),
        **straxen.contexts.common_opts)

Or nT do:

In [None]:
straxen.contexts.xnt_common_config['gain_model'] = ('to_pe_per_run',
                                        aux_repo+'58e615f99a4a6b15e97b12951c510de91ce06045/fax_files/to_pe_nt.npy')
st = strax.Context(
        storage=strax.DataDirectory('./strax_data'),
        register=wfsim.RawRecordsFromFaxNT,
        config=dict(detector='XENONnT',
                    fax_config=aux_repo+'4e71b8a2446af772c83a8600adc77c0c3b7e54d1/fax_files/fax_config_nt.json'
                    **straxen.contexts.xnt_common_config,
                   ),
        **straxen.contexts.common_opts)

To make your life even simpler there is currently a pull request open to add these to the default straxen contexts.
As soon as they get merged you can also do:

In [None]:
st = straxen.contexts.xenon1t_simulation()

or

In [None]:
st = straxen.contexts.xenonnt_simulation()

Now we need to define a run id. What you give it doesn't really matter, since strax will look for data and make new if it doesn't find anything. And this is what you want.
Strax will use the run id to get the electron lifetime and pmt gains from a database, and returns placeholders if the run doesn't exist.

In [4]:
run_id = '1'

## Defining instructions


For the instructions there are multiple different ways to do it. The simulator has this option called "fax_file". If it has a value (None by default) the simulator will either read it as a csv or root file. If not it will use some predefined functions to make your events. The number of event you'll simulate based on the product of the config values "nchunk","event_rate" and "chunk_size". Which you can set as follows:

In [5]:
st.set_config(dict(nchunk=1, event_rate=5, chunk_size=10,))

Strax groups data together in chunks based on time (for low level data). nchunk is the number of chunks you want to simulate
event_rate is the number of events per seconds, so this effects the amount of spacing between events. Finally chunk_size is the length in seconds of your chunk
For the DAQ this is 5 seconds. For simulations you can do whatever you want. It is important to note that Strax will write out data per chunk
So when your chunks are small you'll, among other things, call Strax' IO functions a lot, giving a substantial overhead. On the other hand, to large chunks will hog all your memory and your kernel might crash.
Based on my experimentation setting chunk_size to 500 gives best performance

These are ways you can give instructions
  * #### Random
    This is the default where simulator will generate some random instructions for you.
  * #### Custom
    For this you will need to overwrite the instruction generator function
  * #### CSV
    You can provide a csv file with the instruction (Like the output of nSort)
  * #### Geant4
    If you provide a .root file the simulator will make instructions based on this by converting the energy to photons and electrons. This is like an incredibly elementary nSort

### Random
I guess this is pretty self explanatory. The simulator has this function called "rand_instructions" which will make something up for you.



In [6]:
wfsim.rand_instructions??

### Custom
This is some more fun. To do this we'll write a new function which returns a structured numpy array with the correct dtype.

In [7]:
wfsim.instruction_dtype

[('event_number', numpy.int32),
 ('type', numpy.int8),
 ('time', numpy.int64),
 ('x', numpy.float32),
 ('y', numpy.float32),
 ('z', numpy.float32),
 ('amp', numpy.int32),
 ('recoil', '<U2')]

Event number is just a lable which peaks are together. type is either 1(S1) or 2 (S2). In the truth higher numbers are also used to refer to different types of afterpulses. time,x,y,z are the time and positions of the signal. Amp is the number of photons or electrons generated, and recoil can be used for different types of recoil (but only Electronic recoil is supported)

Now, lets say we want some krypton peaks. For this we'll need to change the default instruction function to include this double decay and use Nestpy to convert energy deposits into a number of photons and electrons.

In this case I'll use 1 "event" per full decay, that where all the 4's are coming from.

In [8]:
def super_awesome_custom_instruction(c):
    import nestpy
    half_life = 156.94e-9 #Kr intermediate state half-life in ns
    decay_energies = [32.2,9.4] # Decay energies in kev
    
    n = c['nevents'] = c['event_rate'] * c['chunk_size'] * c['nchunk']
    c['total_time'] = c['chunk_size'] * c['nchunk']

    instructions = np.zeros(4 * n, dtype=wfsim.instruction_dtype)
    instructions['event_number'] = np.digitize(instructions['time'],
         1e9 * np.arange(c['nchunk']) * c['chunk_size']) - 1
    
    instructions['type'] = np.tile([1, 2], 2 * n)
    instructions['recoil'] = ['er' for i in range(4 * n)]
    
    r = np.sqrt(np.random.uniform(0, 2500, n))
    t = np.random.uniform(-np.pi, np.pi, n)
    instructions['x'] = np.repeat(r * np.cos(t), 4)
    instructions['y'] = np.repeat(r * np.sin(t), 4)
    instructions['z'] = np.repeat(np.random.uniform(-100, 0, n), 4)
    
    #To get the correct times we'll need to include the 156.94 ns half life of the intermediate state.

    uniform_times = c['total_time'] * (np.arange(n) + 0.5) / n
    delayed_times = uniform_times + np.random.exponential(half_life/np.log(2),len(uniform_times))
    instructions['time'] = np.repeat(list(zip(uniform_times,delayed_times)),2) * 1e9


    # Here we'll define our XENON-like detector
    nc = nestpy.NESTcalc(nestpy.VDetector())
    A = 131.293
    Z = 54.
    density = 2.862  # g/cm^3   #SR1 Value
    drift_field = 82  # V/cm    #SR1 Value
    interaction = nestpy.INTERACTION_TYPE(7)
    
    energy = np.tile(decay_energies,n)
    quanta = []
    for en in energy:
        y = nc.GetYields(interaction,
                         en,
                         density,
                         drift_field,
                         A,
                         Z,
                         (1, 1))
        quanta.append(nc.GetQuanta(y, density).photons)
        quanta.append(nc.GetQuanta(y, density).electrons)
        
    instructions['amp'] = quanta

    return instructions

Now here comes the magic line:

In [9]:
wfsim.strax_interface.rand_instruction = super_awesome_custom_instruction

This changes the default rand_instruction function in our own super awesome function. So when the simulator will call rand_instruction the code from super_awesome_custom_instruction will be executed

### CSV
The format for csv files is the same as the instructions dtype. So on every line specify event_number,type,time ,x,y,z, amp and recoil in that order.
Then tell the simulator it exists:


In [10]:
st.set_config(dict(fax_file='instructions.csv'))

Ofcourse if you do not have this file it will not work

### Geant4
Setting a root file as instructions is the same as for the csv. Just change the value of fax_file in the config to where-ever your file is and the simulator will take care of the rest. The e_dep field will be fed to nestpy to get the number of photons and electrons. I do think it's cleaner to first feed your GEANT4 output to nSort, since that does the same, but better, and use the output csv in here. Calling nSort from here will not work since it depends on root, and we do not plan on having root installed in our main envs

In [11]:
st.set_config(dict(fax_file= '/dali/lgrandi/pgaemers/fax_files/Xenon1T_WholeLXe_Pb212_00008_g4mc_G4.root'))

## Configuration customization
The simulator using a larger large amount of configuration settings to do it's magic. Some of them are best left along, like pmt_circuit_load_resistor. Others on the other hand are things you might want change a bit to see how the data will change. Unfortunately currently the full list is spread out over two different places. One is the fax config json which is on github. The other is the option list in strax. Besides those things like pattern maps are hardcoded in load_resource.py.

The strax config is viewable like this and can be changed by st.set_config(dict(option you want=value you want))

In [12]:
st.show_config('raw_records')



Unnamed: 0,option,default,current,applies_to,help
0,fax_file,,/dali/lgrandi/pgaemers/fax_files/Xenon1T_Whole...,"(raw_records, truth)",Directory with fax instructions
1,nSort_path,,<OMITTED>,"(raw_records, truth)",If nSort is needed to convert GEANT4 to instru...
2,fax_config_override,,<OMITTED>,"(raw_records, truth)",Dictionary with configuration option overrides
3,event_rate,5,5,"(raw_records, truth)",Average number of events per second
4,chunk_size,5,10,"(raw_records, truth)",Duration of each chunk in seconds
5,nchunk,4,1,"(raw_records, truth)",Number of chunks to simulate
6,fax_config,https://raw.githubusercontent.com/XENONnT/stra...,https://raw.githubusercontent.com/XENONnT/stra...,"(raw_records, truth)",
7,to_pe_file,https://raw.githubusercontent.com/XENONnT/stra...,https://raw.githubusercontent.com/XENONnT/stra...,"(raw_records, truth)",
8,gain_model,"(to_pe_per_run, https://raw.githubusercontent....","(to_pe_constant, 0.005)","(raw_records, truth)","PMT gain model. Specify as (model_type, model_..."
9,right_raw_extension,50000,<OMITTED>,"(raw_records, truth)",


The config from github can be loaded as:

In [13]:
straxen.get_resource('https://raw.githubusercontent.com/XENONnT/'
                               'strax_auxiliary_files/master/fax_files/fax_config_nt.json',fmt='json').keys()

dict_keys(['pmt_pulse_time_rounding', 'gauss_noise_sigmas', 'real_noise_file', 'real_noise_sample_mode', 'maximum_recombination_time', 's1_detection_efficiency', 'singlet_lifetime_liquid', 'triplet_lifetime_liquid', 's1_ER_recombination_fraction', 's1_ER_primary_singlet_fraction', 's1_ER_secondary_singlet_fraction', 's1_NR_singlet_fraction', 's1_ER_alpha_singlet_fraction', 'electron_trapping_time', 'gas_drift_velocity_slope', 'lxe_dielectric_constant', 'singlet_lifetime_gas', 'triplet_lifetime_gas', 'singlet_fraction_gas', 'each_pmt_afterpulse_types', 'pmt_afterpulse_types', 'p_double_pe_emision', 's1_model_type', 's1_decay_time', 'pmt_transit_time_spread', 'real_noise_sample_size', 'pressure', 'temperature', 'anode_voltage', 'pmt_rise_time', 'pmt_fall_time', 'photon_area_distribution', 'pe_pulse_ts', 'pe_pulse_ys', 'samples_before_pulse_center', 'samples_after_pulse_center', 'pmt_transit_time_mean', 'led_pulse_length', 'drift_field', 'liquid_density', 'diffusion_constant_liquid', 'ele

Changing things in this guy goes slightly different. In the strax option list there is the option called "fax_config_override". This takes a dict which will be used to override any values in the json config.
So changing the 's2_secondary_sc_gain' is done as:

In [14]:
st.set_config(dict(fax_config_override = dict(s2_secondary_sc_gain=23)))

## What actually happens?



What happens behind the scenes is that the instructions are first grouped together in chunks. Then we loop over the instructions and the full chunk is returned before starting with the next one.

We use a S1 and S2 class to calculate the arrival times of the photons and the channels which have been hit. Then we'll hand them over to the Pulse class to calculate the currents in the channels. Finally the currents go to a RawData class where we fake the digitizer response.

### S1

For S1s we start with calculating the light yield based on the position of the interaction, and draw the number of photons seen from a Poisson distribution.

Second we calculate the arrival times of the photons. This is based on the scintillation of the xenon atoms. It is dependend on the recombination time, the singlet and triplet fractions.

Finally the channels are calculated. Based on the pattern map we use a interpolation map to get a probability distribution for channels to be hit for a S1 signal based on the position of the interaction, and then we draw from this distribution for every photon.

### S2

S2s are slightly more complicated. First we need to drift the electrons up, and while doing so we'll lose some of them.
To get the photon timings, we first need to get the arrival times of the electrons at the gas interface based on a diffusion model. Then we can calculate the photon timings based on a luminescence model for every individual electron. And for the channels we do the same trick with the interpolating map.


### Pulse

When we have our lists of channels and timing we can generate actual pulses. First we add a pmt transition time. Then we loop over all channels, calculate the double pe emission probabilities, and add a current in the pmt channel based on the arrival time. This is all stored in a big dictionary. Afterwards this is passed to our fake digitizers which then returns you with your very own pretty data


### Getting down to bussiness


Now we have access to all the normal strax data types, and another one called 'truth' which holds the simulation instructions. Calling it follows the normal strax convention.

In [15]:
st.set_config(dict(fax_file=None))

In [16]:
# Remove any previously simulated data, if such exists
# !rm -r strax_data

records = st.get_array(run_id,'records')
# peaks = st.get_array(run_id, ['peaks','peak_basics'])
# data = st.get_df(run_id, 'event_info')

truth = st.get_df(run_id, 'truth')

Simulating Raw Records: 100%|██████████| 96/96 [00:24<00:00,  3.87it/s]


Source finished!




Now it is time to make pretty plots and see if what we makes actually makes any sense

In [17]:
peak_basics = st.get_df(run_id,'peak_basics')

In [18]:
peak_basics[:10]

Unnamed: 0,time,endtime,center_time,area,n_channels,max_pmt,max_pmt_area,range_50p_area,range_90p_area,area_fraction_top,length,dt,rise_time,tight_coincidence,type
0,99999510,100000190,100000098,1511.265015,438,425,11,45.49892,108.933258,0.090103,68,10,32.073212,278,1
1,100000190,100000460,100000224,160.055038,436,425,2,41.434231,93.05368,0.088126,27,10,27.741791,19,1
2,100000470,100002090,100001584,1.275,1,477,1,19.543358,126.469147,0.0,162,10,12.299422,1,0
3,100002730,100003030,100002842,2.07,3,36,0,211.971954,235.813843,1.0,30,10,17.707243,2,0
4,100164950,100166240,100165555,18.5,17,128,3,202.121216,416.111206,0.173243,129,10,187.104172,3,2
5,100514580,100523780,100518825,11563.5625,491,206,1228,1771.331787,4564.572266,0.087043,184,50,1627.153198,59,2
6,100870810,100871920,100871312,18.85,19,34,1,144.104645,346.100159,0.701591,111,10,124.979836,8,2
7,100938950,100940020,100939763,20.995001,19,97,5,183.766678,469.409851,0.568231,107,10,316.340454,4,2
8,299999510,300000190,300000098,1476.585693,458,332,9,44.220722,108.634956,0.15612,68,10,30.482187,313,1
9,300000190,300000770,300000223,158.209991,455,179,1,40.539444,93.23481,0.161873,58,10,26.819931,32,1


## Matching
To do matching with the truth the easiest way is to write a new strax plugin where you loop over peaks and get the truth arrays where the mean arrival time of the photons are within the time window of the peak
So that will look something like this:

In [20]:
class MatchedPeaks(strax.LoopPlugin):
    depends_on = ('peak_basics','truth')
    provides = 'matched_peaks'
    __version__ = '0.0.2'
    dtype = [('time',np.int),
             ('endtime',np.int),
             ('area',np.int),
             ('n_photon',np.int)]
    
    def compute(self, peaks, truth):
        result = np.zeros(len(peaks), self.dtype)
        
        for ix, p in enumerate(peaks):
            t = truth[(p['time']<truth['t_mean_photon'])&
                      (p['endtime']>truth['t_mean_photon'])]
            r = result[ix]
            r['time'] = p['time']
            r['endtime'] = p['endtime']
            r['area'] = p['area']
            if len(t)==0:
                r['n_photon'] = 0
            elif len(t)>1:
                r['n_photon'] = np.sum(t['n_photon'])
            else:
                r['n_photon'] = t['n_photon']
        
        return result

Of course this doesn't actually work. An electron afterpulse can be very spread out leading it to be interpreted as multiple peaks while coming from 1 instruction falling outside of the specified range. It would be very much appreciated if someone wants make a more sturdy selection criteria :)

For example, checkout out peak matching algorithm for WFSim: https://github.com/XENONnT/pema

In [21]:
st.register(MatchedPeaks)

__main__.MatchedPeaks

In [22]:
st.get_array(run_id,'matched_peaks')