# Simulation Truth

This notebook will introduce you to the concept of simulation truth in fuse.



## Imports and Simulation Context

Similar to the previous notebooks, we will start by importing the necessary modules and creating a simulation context.

In [1]:
import fuse
import numpy as np

In [None]:
st = fuse.context.full_chain_context(output_folder = "./fuse_data")

st.set_config({"path": "/project2/lgrandi/xenonnt/simulations/testing",
               "file_name": "pmt_neutrons_100.root",
               "entry_stop": 10,
               })

run_number = "00000"

## Raw_Records and Contributing_Clusters

First we will run the simulation up to `raw_records`.

In [None]:
st.make(run_number, "microphysics_summary")
st.make(run_number, "photon_summary")
st.make(run_number, "raw_records")

Now that the main simulation output is produced, we can build some truth information. First we will build `records_truth`. 

In [None]:
st.make(run_number, "records_truth")

Both `raw_records`and `records_truth` are of the same data_kind and can be loaded together. 

In [None]:
raw_records = st.get_array(run_number, ["raw_records", "records_truth"])

`contributing_clusters` gives you five additional columns. These are:
- `s1_photons_in_record` - The number of S1 photons in the `raw_record`
- `s2_photons_in_record` - The number of S2 photons in the `raw_record`
- `ap_photons_in_record` - The number of (virtual) PMT afterpulse 'photons' in the record
- `raw_area` - The sum of the contributing photon gains divided by the gain of the PMT

Lets have a look on the number of photons that make it into the first record: 

In [6]:
index = 0
print("S1 photons:", raw_records[index]["s1_photons_in_record"])
print("S2 photons:", raw_records[index]["s2_photons_in_record"])
print("AP photons:", raw_records[index]["ap_photons_in_record"])
print("Raw area:", raw_records[index]["raw_area"])

S1 photons: 1
S2 photons: 0
AP photons: 0
Raw area: 1.0999995


## Peaks and peak_truth

Next we can process the simulation result to `peak_basics`. Strax(en) will merge multiple records into a peak. The PeakTruth plugin will evaluate which photons contribute to a peak and calculate a truth output for each peak. The provided columns for each peak are:
- `s1_photons_in_peak` - The number of S1 photons that contributed to the peak
- `s2_photons_in_peak` - The number of S2 photons that contributed to the peak
- `ap_photons_in_peak` - The number of (virtual) PMT afterpulse 'photons' that contributed to the peak
- `raw_area_truth` - The sum of all contributing photon gains divided by the gains of the PMTs
- `observable_energy_truth` - Estimate of the energy that is associated with the peak.
- `number_of_contributing_clusters_s1` - Number of clusters that contributed to the peak with S1 photons
- `number_of_contributing_clusters_s2` - Number of clusters that contributed to the peak with S2 photons
- `average_x_of_contributing_clusters` - Weighted average of the x position of the clusters that contributed to the peak
- `average_y_of_contributing_clusters` - Weighted average of the y position of the clusters that contributed to the peak
- `average_z_of_contributing_clusters` - Weighted average of the z position of the clusters that contributed to the peak
- `average_x_obs_of_contributing_clusters` - Weighted average of the observed x position of the clusters that contributed to the peak
- `average_y_obs_of_contributing_clusters` - Weighted average of the observed y position of the clusters that contributed to the peak
- `average_z_obs_of_contributing_clusters` - Weighted average of the observed z position of the clusters that contributed to the peak

Lets take a closer look at `observable_energy_truth` using an example: 
If we would have two clusters, the first one with 100 keV energy producig 100 S1 photons and the second one with 10 keV producing 10 S1 photons. After simulation and processing we find two S1 peaks in our data. The first S1 consitis of 90 photons from the first cluster and 5 photons of the second cluster. The `observable_energy_truth` for this peak is calculated as: 90/100 * 100 keV + 5/10 * 10 keV = 90 keV + 5 keV = 95 keV. The second S1 consists of 3 photons from the first cluster and 4 photons of the second cluster. The `observable_energy_truth` for this peak is calculated as: 3/100 * 100 keV + 4/10 * 10 keV = 3 keV + 4 keV = 7 keV. A similar calculation is done for the S2 peaks but replacing the S1 photons with the S2 photons.


In [None]:
st.make(run_number, "peak_positions")
st.make(run_number, "peak_truth")

As strax(en) will take care of the matching of our truth information to the individual peaks, we can simply load the `peak_basics` and `peak_truth` data together.

In [None]:
peak_basics = st.get_df(run_number, ["peak_basics", "peak_truth", "peak_positions"])

For a peak area bias study we could now compare the raw_area to the peak area:

In [9]:
peak_basics[["area", "raw_area_truth"]].head()

Unnamed: 0,area,raw_area_truth
0,2.80885,3.049999
1,1059.032593,1078.859741
2,61364.210938,61378.636719
3,718.991394,744.489868
4,341264.375,322082.71875


We might also be interested in the peak classification: 

In [10]:
peak_basics[["type", "s1_photons_in_peak", "s2_photons_in_peak", "ap_photons_in_peak"]].head()

Unnamed: 0,type,s1_photons_in_peak,s2_photons_in_peak,ap_photons_in_peak
0,1,3,0,0
1,2,0,847,0
2,1,48464,0,49
3,2,0,577,0
4,2,0,252627,250


Or you might want to check how our position reconstruction is doing: 

In [11]:
peak_basics[["type","x","y", "average_x_obs_of_contributing_clusters", "average_y_obs_of_contributing_clusters", "average_z_obs_of_contributing_clusters"]].head()

Unnamed: 0,type,x,y,average_x_obs_of_contributing_clusters,average_y_obs_of_contributing_clusters,average_z_obs_of_contributing_clusters
0,1,,,-20.890713,-51.367107,-1.4648
1,2,-20.788963,-52.198101,-20.890713,-51.367107,-1.4648
2,1,-57.817341,0.454473,-20.348944,-0.438408,-22.55941
3,2,-38.655247,-18.31365,-38.867641,-18.246555,-6.679745
4,2,-37.277332,-18.950422,-34.188717,-14.120629,-11.869003


## Surviving Clusters
Next lets evaluate if an energy deposit makes it into a peak. This is done by the `SurvivingClusters` plugin. It will provide the following columns:
- `creating_a_photon` - Boolean if the cluster created a propagated photon
- `in_a_peak` - Boolean if the cluster is in a peak

In [None]:
st.make(run_number, "surviving_clusters")
microphysics_summary = st.get_df(run_number, ["microphysics_summary", "surviving_clusters"])

Now that we have the data loaded we could have a look at clusters that did not make it into a peak: 

In [13]:
microphysics_summary.query("in_a_peak == False").head()

Unnamed: 0,e_field,time,endtime,x,y,z,ed,nestid,A,Z,...,z_pri,cluster_id,xe_density,vol_id,create_S2,photons,electrons,excitons,creating_a_photon,in_a_peak
170,27,2827386497,2827386497,32.959923,14.185023,-9.616013,0.096808,0,0,0,...,7.933488,171,2.862,1,True,0,0,0,False,False
203,26,2827386535,2827386535,27.667393,10.273334,-12.147082,0.044494,0,0,0,...,7.933488,204,2.862,1,True,0,0,0,False,False
205,25,2827386710,2827386710,9.488111,6.251637,-11.284761,0.141653,0,0,0,...,7.933488,206,2.862,1,True,0,0,0,False,False
208,24,2827386851,2827386851,16.457802,8.368632,-30.0026,0.085379,0,0,0,...,7.933488,209,2.862,1,True,0,0,0,False,False
209,24,2827386893,2827386893,16.093048,14.68545,-34.862846,0.057414,0,0,0,...,7.933488,210,2.862,1,True,0,0,0,False,False


## Event Truth

Lets move from the peak level to event level data. This is done by the `EventTruth` plugin. It will provide the following columns:
- `x_obs_truth` - The x position of the event. This corresponds to the x position of the main S2.
- `y_obs_truth` - The y position of the event. This corresponds to the y position of the main S2.
- `z_obs_truth` - The z position of the event. This is calculated as mean of the main S1 and S2 `average_z_obs_of_contributing_clusters`. Does this make sense?
- `energy_of_main_peaks_truth` - This is intended to be the energy that can be found in the main S1 and S2. It is calculated as the mean of the `observable_energy_truth` of the main S1 and S2. Does this make any sense???
- `total_energy_in_event_truth` - The sum of all energy deposits that are in the event

In [None]:
st.make(run_number, "event_truth")

In [None]:
event_data = st.get_df(run_number, ["event_info", "event_truth"])

First lets take a look at the energy informations: 

In [16]:
event_data[["e_ces", "energy_of_main_peaks_truth", "total_energy_in_event_truth"]]

Unnamed: 0,e_ces,energy_of_main_peaks_truth,total_energy_in_event_truth
0,0.950759,5.071669,5.101432
1,10316.319336,7268.864258,9708.582031
2,1329.449341,779.614685,1574.821167
3,2615.176758,2296.671143,4019.808838
4,7898.591797,6686.427734,15595.068359
5,17511.164062,756.653198,1264.030273
6,2871.658936,2877.854736,4376.391602
7,1303.23645,1671.288818,1684.907104
8,6.340443,31.338665,31.483326


And the positions: 

In [17]:
event_data[["x", "x_obs_truth", "z", "z_obs_truth"]]

Unnamed: 0,x,x_obs_truth,z,z_obs_truth
0,-20.420223,-20.890713,-2.169019,-1.4648
1,-13.204618,-29.273678,-34.250954,-18.659674
2,-18.117203,-19.646385,-51.266327,-13.678743
3,-50.167782,-47.090977,-19.874458,-13.973438
4,14.325102,15.204413,-17.188948,-18.430935
5,-8.170411,-19.107325,-115.72876,-122.673523
6,-23.360212,-27.319216,-3.040385,-4.436149
7,20.358648,20.087492,-11.474121,-12.390039
8,-49.958988,-50.566116,-0.835833,-0.404144


## Cluster Tagging

Finally we can investigate if a cluster contributed to the main or alternative S1 or S2. This is done by the `ClusterTagging` plugin. It will provide the following columns:
- `in_main_s1` - Boolean if the cluster contributed to the main S1
- `in_main_s2` - Boolean if the cluster contributed to the main S2
- `in_alt_s1` - Boolean if the cluster contributed to an alternative S1
- `in_alt_s2` - Boolean if the cluster contributed to an alternative S2
- `photons_in_main_s1` - Number of photons the cluster contributed to the main S1
- `photons_in_main_s2` - Number of photons the cluster contributed to the main S2
- `photons_in_alt_s1` - Number of photons the cluster contributed to the alternative S1
- `photons_in_alt_s2` - Number of photons the cluster contributed to the alternative S2

In [None]:
st.make(run_number, "tagged_clusters")

We can load it together with e.g. microphysics_summary:

In [None]:
ms_with_tagged_clusters = st.get_df(run_number, ["microphysics_summary", "tagged_clusters", "s2_photons_sum"])

Lets take a look at some cluster information that contributed to the main S2 of the second event: 

In [20]:
ms_with_tagged_clusters_cut = ms_with_tagged_clusters.query("evtid == 1 & in_main_s2 == True")
ms_with_tagged_clusters_cut[["ed","sum_s2_photons", "photons_in_main_s2"]].head(10)

Unnamed: 0,ed,sum_s2_photons,photons_in_main_s2
60,119.66584,64670,61672
61,70.190887,26666,26331
62,64.250038,23452,23213
63,75.890457,28687,28354
64,60.865395,18687,18448
65,59.640846,18365,18153
66,42.037697,17448,17304
67,183.596298,97796,96954
68,21.843777,10543,10458
69,32.598518,14696,14558
