# Step 3: Instrument Response Functions from Data Level 2


In this notebook DL2 Monte Carlo (the test sample) are used to check the quality of the assigned gammaness (with gammaness distrubutions), the energy reconstrucion (by means of the migration matrix), the reconstructed direction, the instrument effective area. Dynamic gammaness and theta cuts are calculated and applied to input events and the angular resolution, energy resolution and energy bias are evaluated.
The effects of the cuts on signal and background events are also discussed.


### As usual, let's start by loading some modules

In [None]:
import itertools
import operator
import glob
import numpy as np
from astropy import units as u
from astropy.table import Table, QTable, vstack
from magicctapipe.io import load_mc_dl2_data_file
from matplotlib import gridspec
import matplotlib as mpl
from matplotlib import pyplot as plt
from pyirf.benchmarks import angular_resolution, energy_bias_resolution
from pyirf.cuts import calculate_percentile_cut, evaluate_binned_cut
from pyirf.irf import effective_area_per_energy

In [None]:
# Configure the pyplot figure
plt.rcParams.update(
    {"figure.figsize": (12, 9), "font.size": 15, "grid.linestyle": "dotted"}
)

# Get the pyplot default color cycle
colors = plt.rcParams["axes.prop_cycle"].by_key()["color"]

### Load MC DL2 data files and set some options

Here you can provide gamma and proton DL2 files: we selected those in a zenith range consistent with that used for the RFs training (Zd<53 deg) and azimuth in a narrow range (<30 deg).
In this example we use data from the directory: /fefs/aswg/workspace/2023_joint_analysis_school/RFs_and_DL2/input/input_step_3

Then we set some parameters useful for the next data handling:

`quality_cuts`: disp_diff_mean<0.22 (a standard cut, the same used to produce IRFs in the pipeline)  
`event_type`: "software", same as default in config.yaml (it excludes the magic-only events)  
`dl2_weight_type`: "intensity", same as default in config.yaml; this is the weight used to average the DL2 single telescope values (energy/gammaness/direction) to get a single value for each quantity for the event.  
`energy_bins`: we want to use log energy scale

In [None]:
#Defining the file lists
input_file_gamma = glob.glob('/fefs/aswg/workspace/2023_joint_analysis_school/RFs_and_DL2/input/input_step_3/DL2-gamma/*.h5')
input_file_gamma.sort()
print(f"{len(input_file_gamma)} gamma files are found")

input_file_proton = glob.glob('/fefs/aswg/workspace/2023_joint_analysis_school/RFs_and_DL2/input/input_step_3/DL2-proton/*.h5')
input_file_proton.sort()
print(f"{len(input_file_proton)} proton files are found")

# Cuts and Parameters
quality_cuts= f"(disp_diff_mean < {np.sqrt(0.05)})"
event_type="software"
dl2_weight_type="intensity"
energy_bins=np.logspace(-2,3,15)[2:]

Please check if there is at least one gamma and one proton file!

Now let's load these input files and check their contents (this can be quite heavy for your personal computer).
We will call the gamma-ray table "signal_table", and the proton table "background_table".

The function *load_mc_dl2_data_file* loads the DL2 data and calculates the average of the values reconstrucred by the telescopes participating to the event.

For the signal_table we load them both in an "all zenith" dataframe and in "low/middle/high zenith" dataframes (the latter ones are used to evaluate effective area in this example).

In [None]:
print("Loading the input files...")

signal_table=[]
background_table=[]
signal_table_6_26=[]
signal_table_26_46=[]
signal_table_46_67=[]

#First we do for the gammas:
for i_file, input_file in enumerate(input_file_gamma):
    # Load the input file
    sig_table, sig_pointing, sig_sim_config =load_mc_dl2_data_file(
        input_file, quality_cuts, event_type, dl2_weight_type
    )
    
    if sig_pointing[0]<=26:
        signal_table_6_26=vstack([signal_table_6_26,sig_table])
    elif sig_pointing[0]<=46:
        signal_table_26_46=vstack([signal_table_26_46,sig_table])
    elif sig_pointing[0]<=67:
        signal_table_46_67=vstack([signal_table_46_67,sig_table])
    signal_table=vstack([signal_table,sig_table])

#And then for the protons:
for i_file, input_file in enumerate(input_file_proton):
    # Load the input file
    back_table, back_pointing, back_sim_config =load_mc_dl2_data_file(
        input_file, quality_cuts, event_type, dl2_weight_type
   
    ) 
    
    background_table=vstack([background_table,back_table])



And this is how these tables look like:

In [None]:
signal_table

In [None]:
background_table

If you take a look at the colum "gammaness" in both tables, you will see that the values of gammaness for the protons is smaller than for the gamma rays. This means that the classification of events was good, but let's do a couple of plots to better compare.

### Gammaness 

Let's compare these two MC samples, i.e. gammas and protons, in terms of their distributions of gammaness. Let's start with the simulated gamma rays:

In [None]:
x=np.array(signal_table['true_energy'].value)
y=np.array(signal_table['gammaness'].value)
plt.figure()
y_space = np.linspace(0, 1, 30)
x_space = np.logspace(-2, 3, 60)
plt.xscale("log")
plt.xlabel("True energy of the simulated gamma rays [TeV]")
plt.ylabel("Gammaness")
plt.hist2d(x,y, bins=(x_space, y_space),  norm=mpl.colors.LogNorm())
plt.colorbar(label="Number of events")
plt.title("Simulated gamma rays")

The plot above seems quite good because most of the simulated gamma rays indeed have high values of gammaness. Now look how different is the gammaness distribution for protons:

In [None]:
x=np.array(background_table['true_energy'].value)
y=np.array(background_table['gammaness'].value)
plt.figure()
y_space = np.linspace(0, 1, 30)
x_space = np.logspace(-2, 3, 60)
plt.xscale("log")
plt.xlabel("True energy of the simulated protons [TeV]")
plt.ylabel("Gammaness")
plt.hist2d(x,y, bins=(x_space, y_space),  norm=mpl.colors.LogNorm())
plt.colorbar(label="Number of events")
plt.title("Simulated protons")

This means that protons and $\gamma$ rays are quite well distinguished by our classification algorithm.


### Migration matrix

Now let's check the quality of our energy reconstruction, plotting a migration matrix. If the reconstruction process is well trained, the reconstructed energy of the $\gamma$ rays must be very similar to the true (simulated) energy of the simulated photons, i.e., the events must fall in a narrow region where renconstructed_energy = true_energy.

In [None]:
x=np.array(signal_table['reco_energy'].value)
y=np.array(signal_table['true_energy'].value)
plt.figure()
y_space = np.logspace(-2, 3, 60)
x_space = np.logspace(-2, 3, 60)
plt.xscale("log")
plt.yscale("log")
plt.xlabel("Reconstructed energy (TeV)")
plt.ylabel("True energy of the simulated photon (TeV)")
plt.hist2d(x,y, bins=(x_space, y_space),  norm=mpl.colors.LogNorm())
plt.annotate("",
              xy=(0.01,0.01), xycoords='data',
              xytext=(110,110), textcoords='data',
              arrowprops=dict(arrowstyle="-",
                              connectionstyle="arc3,rad=0."), 
              )
plt.colorbar(label="Number of events")

### Check the reconstructed energy distributions

Let's check the distribution of reconstructed energies for the $\gamma$ rays and the protons.

In [None]:
energy_bins_center = (energy_bins[:-1] + energy_bins[1:]) / 2

energy_bins_width = [
    energy_bins[1:] - energy_bins_center,
    energy_bins_center - energy_bins[:-1],
]
plt.figure()
plt.xlabel("Reconstructed energy [TeV]")
plt.ylabel("Number of events")
plt.semilogx()
plt.yscale("log")
plt.grid()
E_back=np.array(background_table["reco_energy"].value)
E_sig=np.array(signal_table["reco_energy"].value)

# Plot the background energy distribution
plt.hist(
    E_back,
    bins=energy_bins,
    label="protons",
    histtype="step",
    linewidth=3,
)

# Plot the signal energy distribution
plt.hist(
    E_sig,
    bins=energy_bins,
    label="gammas",
    histtype="step",
    linewidth=3,
)

plt.legend()

### Check the gammaness distributions:

Let's do the same for the gammaness distributions: the MC gamma-ray distribution must peak in 1, while the proton distribution must peak in 0.

In [None]:
gh_bins = np.linspace(0, 1, 51)

plt.figure(dpi=70)
plt.xlabel("Gammaness")
plt.ylabel("Number of events")
plt.yscale("log")
plt.grid()
g_back=np.array(background_table["gammaness"].value)
g_sig=np.array(signal_table["gammaness"].value)

#Plot the background gammaness distribution
plt.hist(
    g_back,
    bins=gh_bins,
    label="protons (background)",
    histtype="step",
    linewidth=2,
)

# Plot the signal gammaness distribution
plt.hist(
    g_sig,
    bins=gh_bins,
    label="gammas (signal)",
    histtype="step",
    linewidth=2,
)

plt.legend(loc="upper left")


Here it is worth to notice that the gammaness increases for higher energies in the case of $\gamma$ rays, while it decreases for the protons, such that the discrimination between the two types of events is easier at higher energies.

Let's do the same for different energy ranges:

In [None]:
n_columns = 3
n_rows = int(np.ceil(len(energy_bins[:-1]) / n_columns))

grid = (n_rows, n_columns)
locs = list(itertools.product(range(n_rows), range(n_columns)))

plt.figure(figsize=(20, n_rows * 8))

# Loop over every energy bin
for i_bin, (eng_lolim, eng_uplim) in enumerate(zip(energy_bins[:-1], energy_bins[1:])):

    plt.subplot2grid(grid, locs[i_bin])
    plt.title(f"{eng_lolim:.3f} < energy < {eng_uplim:.3f} [TeV]", fontsize=20)
    plt.xlabel("Gammaness", fontsize=22)
    plt.yscale("log")
    plt.grid()

    # Apply the energy cuts
    cond_back_lolim = background_table["reco_energy"].value > eng_lolim
    cond_back_uplim = background_table["reco_energy"].value < eng_uplim

    cond_signal_lolim = signal_table["reco_energy"].value > eng_lolim
    cond_signal_uplim = signal_table["reco_energy"].value < eng_uplim

    condition_back = np.logical_and(cond_back_lolim, cond_back_uplim)
    condition_signal = np.logical_and(cond_signal_lolim, cond_signal_uplim)

    dt_back = background_table[condition_back]
    dt_signal = signal_table[condition_signal]

    # Plot the background gammaness distribution
    if len(dt_back) > 0:
        plt.hist(
             dt_back["gammaness"].value,
             bins=gh_bins,
             label="background",
             histtype="step",
             linewidth=2,
       )

    # Plot the signal gammaness distribution
    if len(dt_signal) > 0:
        plt.hist(
            dt_signal["gammaness"].value,
            bins=gh_bins,
            label="signal",
            histtype="step",
            linewidth=2,
        )

    plt.legend(loc="lower left")


### Apply dynamic gammaness cuts:

Now we use the function `calculate_percentile_cut()` from the module `pyirf` to evaluate dynamic (i.e. energy dependent) gammaness cuts such that we hava a gamma efficiency of 90% (i.e.: 90% of the $\gamma$ rays are kept) in each energy bin. This is the simplest and most robust way to define the gammeness cuts.

In [None]:
gh_efficiency = 0.9
gh_percentile = 100 * (1 - gh_efficiency)


# Calculate the dynamic gammaness cuts
gh_table_eff = calculate_percentile_cut(
    values=signal_table["gammaness"],
    bin_values=signal_table["reco_energy"],
    bins=u.Quantity(energy_bins, u.TeV),
    fill_value=0.0,
    percentile=gh_percentile,
)

gh_cuts_eff = gh_table_eff["cut"].value
print(f"Energy bins: {energy_bins}")
print(f"Efficiency gammaness cuts:\n{gh_cuts_eff}")

Gammaness cuts' plot: as expected from the gammaness distributions shown above, the cut value increases for higher energies. The fall above 10 TeV, however, happens due to low statistics.

In [None]:
plt.figure(dpi=70)
plt.xlabel("Reconstructed energy [TeV]")
plt.ylabel("Gammaness cut that saves 90% of the gamma rays")
plt.semilogx()
plt.grid()

# Plot the dynamic gammaness cuts
plt.errorbar(
    x=energy_bins_center,
    y=gh_cuts_eff,
    xerr=energy_bins_width,
    label="gamma efficiency",
    marker="o",
)

Now we apply these gammaness cuts to our datasets, such that we end up with a relatively pure sample of $\gamma$-rays.

In [None]:
print("\nApplying the gammaness cuts...")

mask_gh_eff = evaluate_binned_cut(
    values=signal_table["gammaness"],
    bin_values=signal_table["reco_energy"],
    cut_table=gh_table_eff,
    op=operator.ge,
)

data_eff_gcut = signal_table[mask_gh_eff]
print(f"--> Number of events: {len(data_eff_gcut)}")

#Applying the same cuts to the all zenith angle samples.

mask_gh_eff_26 = evaluate_binned_cut(
    values=signal_table_6_26["gammaness"],
    bin_values=signal_table_6_26["reco_energy"],
    cut_table=gh_table_eff,
    op=operator.ge,
)

data_eff_gcut_26 = signal_table_6_26[mask_gh_eff_26]
  
    
mask_gh_eff_46 = evaluate_binned_cut(
    values=signal_table_26_46["gammaness"],
    bin_values=signal_table_26_46["reco_energy"],
    cut_table=gh_table_eff,
    op=operator.ge,
)

data_eff_gcut_46 = signal_table_26_46[mask_gh_eff_46]

mask_gh_eff_67 = evaluate_binned_cut(
    values=signal_table_46_67["gammaness"],
    bin_values=signal_table_46_67["reco_energy"],
    cut_table=gh_table_eff,
    op=operator.ge,
)

data_eff_gcut_67 = signal_table_46_67[mask_gh_eff_67]

### Check the quality of the reconstructed direction:

We look at the $\theta^2$ distribution. Don't forget that $\theta$ is the offset angle between the real position of the target and the reconstructed position for a single event, as in the figure below:

![theta](./figures/theta.png)


In [None]:
theta2_bins = np.linspace(0, 0.5, 91)

plt.figure(dpi=70)
plt.title(f"g/h efficiency = {gh_efficiency}")
plt.xlabel(r"$\theta^2$ [deg$^2$]")
plt.ylabel("Number of events")
plt.yscale("log")
plt.grid()

eff_gcut=np.array(data_eff_gcut["theta"].value)

# Plot the signal theta2 distribution
plt.hist(
    np.square(eff_gcut),
    bins=theta2_bins,
    label="signal",
    histtype="step",
    linewidth=2,
)

plt.legend()

Let's do the same for different energy bins. We see that the higher the energy, the better is the reconstructed direction.

In [None]:
n_columns = 3
n_rows = int(np.ceil(len(energy_bins[:-1]) / n_columns))

grid = (n_rows, n_columns)
locs = list(itertools.product(range(n_rows), range(n_columns)))

plt.figure(figsize=(20, n_rows * 8))

# Loop over every energy bin
for i_bin, (eng_lolim, eng_uplim) in enumerate(zip(energy_bins[:-1], energy_bins[1:])):

    plt.subplot2grid(grid, locs[i_bin])
    plt.title(f"{eng_lolim:.3f} < energy < {eng_uplim:.3f} [TeV]")
    plt.xlabel("Theta2 [deg$^2$]")
    plt.yscale("log")
    plt.grid()

    # Apply the energy cuts
    cond_eff_lolim = data_eff_gcut["reco_energy"].value > eng_lolim
    cond_eff_uplim = data_eff_gcut["reco_energy"].value < eng_uplim

    
    
    condition_eff = np.logical_and(cond_eff_lolim, cond_eff_uplim)

    
    dt_eff = data_eff_gcut[condition_eff]
    dte=np.array(dt_eff["theta"].value)
    
    # Plot the theta2 distribution
    if len(dt_eff) > 0:
        plt.hist(
            np.square(dte),
            bins=theta2_bins,
            label="signal",
            histtype="step",
            linewidth=2,
        )

    plt.legend(loc="upper right")


### Check the angular resolution:

We now use the `angular_resolution()` function from `pyirf` to evaluate the angular resolution, which is defined as the $\theta$ value which encloses 68% of the events. 

In [None]:
# Calculate the angular resolution
angres_table_eff = angular_resolution(
    data_eff_gcut, u.Quantity(energy_bins, u.TeV), energy_type="reco"
)

angres_eff = angres_table_eff["angular_resolution"].value
print(f"\n angular resolution:\n{angres_eff} deg")

In [None]:
plt.figure(dpi=70)
gs = gridspec.GridSpec(4, 1)

plt.title(f"angular resolution(g/h efficiency = {100*gh_efficiency}%)")
plt.ylabel("Angular resolution (68% cont.) [deg]")
plt.xlabel("Energy [TeV]")
plt.semilogx()
plt.grid()

# Plot the angular resolution
plt.errorbar(
    x=energy_bins_center,
    y=angres_eff,
    xerr=energy_bins_width,
    label="signal",
    marker="o",
)


plt.legend()

### Applying dynamic $\theta$ cuts to the data

Similar to what we did for the dynamic gammaness above, here we find the value of $\theta$ that contains 68% (or 80%, for comparison) of the photons at each energy bin, what we call dynamic $\theta$ cut.

In [None]:
theta_efficiency = 0.8
theta_efficiency_68=0.68

theta_percentile = 100 * theta_efficiency
theta_percentile_68 = 100 * theta_efficiency_68


# Calculate the dynamic theta cuts
theta_table_eff = calculate_percentile_cut(
    values=data_eff_gcut["theta"],
    bin_values=data_eff_gcut["reco_energy"],
    bins=u.Quantity(energy_bins, u.TeV),
    fill_value=data_eff_gcut["theta"].unmasked.max(),
    percentile=theta_percentile,
)
theta_table_eff_68= calculate_percentile_cut(
    values=data_eff_gcut["theta"],
    bin_values=data_eff_gcut["reco_energy"],
    bins=u.Quantity(energy_bins, u.TeV),
    fill_value=data_eff_gcut["theta"].unmasked.max(),
    percentile=theta_percentile_68,
)

theta_cuts_eff = theta_table_eff["cut"]
theta_cuts_eff_68 = theta_table_eff_68["cut"]

theta_cut_eff = theta_table_eff["cut"].value
theta_cut_eff_68 = theta_table_eff_68["cut"].value

plt.figure(dpi=70)
plt.title(f"theta cuts (g/h eff. = {gh_efficiency})")
plt.xlabel("Reconstructed energy [TeV]")
plt.ylabel(r"Cut in $\theta$ [deg]")
plt.semilogx()
plt.grid()

# Plot the 68% dynamic theta cuts
plt.errorbar(
    x=energy_bins_center,
    y=theta_cuts_eff_68,
    xerr=energy_bins_width,
    label="68%",
    marker="o",
)

# Plot the 80% dynamic theta cuts
plt.errorbar(
    x=energy_bins_center,
    y=theta_cuts_eff,
    xerr=energy_bins_width,
    label="80%",
    marker="o",
)

plt.ylim(0.05,0.23)
plt.legend(loc="upper right")

We now apply another filter in our data by selecting only the $\gamma$ rays lying within the 80% $\theta$ cuts (i.e. a bit larger than the angular resolution). Remember that these cuts are listed in the table `theta_table_eff` computed above.

In [None]:
# Apply dynamic theta cuts
print("\nApplying the theta cuts to signal...")

mask_theta_eff = evaluate_binned_cut(
    values=data_eff_gcut["theta"],
    bin_values=data_eff_gcut["reco_energy"],
    cut_table=theta_table_eff,
    op=operator.le,
)

data_eff_gtcuts = data_eff_gcut[mask_theta_eff]
print(f"--> Number of events: {len(data_eff_gtcuts)}")

In [None]:
# Apply dynamic theta cuts
print("\nApplying the theta cuts to signal...")

mask_theta_eff_26 = evaluate_binned_cut(
    values=data_eff_gcut_26["theta"],
    bin_values=data_eff_gcut_26["reco_energy"],
    cut_table=theta_table_eff,
    op=operator.le,
)

data_eff_gtcuts_26 = data_eff_gcut_26[mask_theta_eff_26]
print(f"--> Number of events: {len(data_eff_gtcuts_26)}")

In [None]:
# Apply dynamic theta cuts
print("\nApplying the theta cuts to signal...")

mask_theta_eff_46 = evaluate_binned_cut(
    values=data_eff_gcut_46["theta"],
    bin_values=data_eff_gcut_46["reco_energy"],
    cut_table=theta_table_eff,
    op=operator.le,
)

data_eff_gtcuts_46 = data_eff_gcut_46[mask_theta_eff_46]
print(f"--> Number of events: {len(data_eff_gtcuts_46)}")

In [None]:
# Apply dynamic theta cuts
print("\nApplying the theta cuts to signal...")

mask_theta_eff_67 = evaluate_binned_cut(
    values=data_eff_gcut_67["theta"],
    bin_values=data_eff_gcut_67["reco_energy"],
    cut_table=theta_table_eff,
    op=operator.le,
)

data_eff_gtcuts_67 = data_eff_gcut_67[mask_theta_eff_67]
print(f"--> Number of events: {len(data_eff_gtcuts_67)}")

### Check the effective area

The effective area depends on zenith and azimuth angles. Below we compute the effective area in terms of reconstructed energy for the three different zenith regions defined above in the sections **Load MC DL2 data files** and **Apply dynamic $\theta$ cuts to the angular resolution**.

Given the dependency on azimuth, we must provide a narrow range of azimuths in an interval no larger than ~$30^{\circ}$.

The effective areas are calculated using pyirf function: `effective_area_per_energy`

In [None]:
# Calculate the low zenith effective area
aeff_eff_26 = effective_area_per_energy(
    selected_events=data_eff_gtcuts_26,
    simulation_info=sig_sim_config,
    true_energy_bins=u.Quantity(energy_bins, u.TeV),
)

# Calculate the mid zenith effective area
aeff_eff_46 = effective_area_per_energy(
    selected_events=data_eff_gtcuts_46,
    simulation_info=sig_sim_config,
    true_energy_bins=u.Quantity(energy_bins, u.TeV),
)

# Calculate the high zenith effective area
aeff_eff_67 = effective_area_per_energy(
    selected_events=data_eff_gtcuts_67,
    simulation_info=sig_sim_config,
    true_energy_bins=u.Quantity(energy_bins, u.TeV),
)


Below we plot the effective areas as a function of the true energy. The samples divided in zenith angle bins include events from a different azimuth within a ~$30^o$ range.

In [None]:
plt.figure()
plt.title(f"g/h eff. = {100*gh_efficiency}%, theta eff. = {100*theta_efficiency}%")
plt.xlabel("True energy [TeV]")
plt.ylabel("Effective area [m$^2$]")
plt.loglog()
plt.grid()


# Plot the effective area
plt.errorbar(
    x=energy_bins_center,
    y=aeff_eff_26.value,
    xerr=energy_bins_width,
    label="zenith < 26$^{\circ}$",
    marker="o",
)


# Plot the effective area
plt.errorbar(
    x=energy_bins_center,
    y=aeff_eff_46.value,
    xerr=energy_bins_width,
    label="26$^{\circ}$ < zenith < 46$^{\circ}$",
    marker="o",
)

# Plot the effective area
plt.errorbar(
    x=energy_bins_center,
    y=aeff_eff_67.value,
    xerr=energy_bins_width,
    label="46$^{\circ}$ < zenith < 67$^{\circ}$",
    marker="o",
)

plt.legend(loc="lower right")


We see that, as expected, for higher energies the effective area increases with zenith angle while the enrgy threshold increases. This happens because, at high zenith angles, the size of the shower pool on ground becomes very large. Illustration from (Aharonian & Casanova, 2018).


![pool](./figures/pool.png)

### Check the energy bias and energy resolution

The energy bias and resolution are computed from the $(E_{rec} - E_{true})/E_{true}$ distribution by the pyIRF function: `energy_bias_resolution`.

In [None]:
# Calculate the energy bias and resolution
eneres_table_eff = energy_bias_resolution(
    data_eff_gtcuts, u.Quantity(energy_bins, u.TeV), energy_type="reco"
)

enebias_eff = eneres_table_eff["bias"].value
eneres_eff = eneres_table_eff["resolution"].value

plt.figure()
gs = gridspec.GridSpec(4, 1)

plt.title(f"Energy bias and energy resolution(g/h eff. = {100*gh_efficiency}%, theta eff. = {100*theta_efficiency}%)")
plt.ylabel("Energy bias and resolution")
plt.xlabel("Reconstructed energy [TeV]")

plt.semilogx()
plt.grid()

# Plot the energy bias and resolution
plt.errorbar(
    x=energy_bins_center,
    y=eneres_eff,
    xerr=energy_bins_width,
    label="Energy resolution",
    marker="o",
    color=colors[1],
)

plt.errorbar(
    x=energy_bins_center,
    y=enebias_eff,
    xerr=energy_bins_width,
    label="Energy bias",
    marker="o",
    linestyle="--",
    color=colors[1],
)

plt.legend()

### Effects on background and signal

To finish this class, let's do some diagnostic plots to see the effects of the cuts on the background and the signal distributions. Below we plot the distribution of events in terms of reconstructed energy for different values of efficiency (90% gammaness cut) and $\theta$ cuts (i.e. 68% and 80%). We see that the cuts filter most of the background while the signal survives.

In [None]:
#gammaness cut
back_gn_cut_eff=QTable()
for i_bin, (eng_lo, eng_hi) in enumerate(zip(energy_bins[:-1], energy_bins[1:])):
    
    mask_table_eff=(background_table["reco_energy"].value >np.repeat(eng_lo,(len(background_table)))) \
        & (background_table["reco_energy"].value <np.repeat(eng_hi,(len(background_table)))) \
            & (background_table["gammaness"].value>np.repeat(gh_table_eff['cut'][i_bin],(len(background_table))))
    
    back_masked_gn_eff=background_table[mask_table_eff]
    back_gn_cut_eff=vstack([back_masked_gn_eff,back_gn_cut_eff])
    

#theta 68%,eff: adding the theta68 cut after the gammaness cut
back_theta_cut_eff_68=QTable()
for i_bin, (eng_lo, eng_hi) in enumerate(zip(energy_bins[:-1], energy_bins[1:])):
    
    mask_table_theta_eff_68=(back_gn_cut_eff["reco_energy"].value >np.repeat(eng_lo,(len(back_gn_cut_eff)))) \
        & (back_gn_cut_eff["reco_energy"].value <np.repeat(eng_hi,(len(back_gn_cut_eff)))) \
            & (back_gn_cut_eff["theta"].value < np.repeat(theta_cut_eff_68[i_bin],(len(back_gn_cut_eff))))
    
    back_masked_theta_eff_68=back_gn_cut_eff[mask_table_theta_eff_68]
    back_theta_cut_eff_68=vstack([back_masked_theta_eff_68,back_theta_cut_eff_68])
    
#theta 80%,eff: adding the theta80 cut after the gammaness cut
back_theta_cut_eff_80=QTable()
for i_bin, (eng_lo, eng_hi) in enumerate(zip(energy_bins[:-1], energy_bins[1:])):
    
    mask_table_theta_eff_80=(back_gn_cut_eff["reco_energy"].value >np.repeat(eng_lo,(len(back_gn_cut_eff)))) \
        & (back_gn_cut_eff["reco_energy"].value <np.repeat(eng_hi,(len(back_gn_cut_eff)))) \
            & (back_gn_cut_eff["theta"].value < np.repeat(theta_cut_eff[i_bin],(len(back_gn_cut_eff))))
    back_masked_theta_eff_80=back_gn_cut_eff[mask_table_theta_eff_80]
    back_theta_cut_eff_80=vstack([back_masked_theta_eff_80,back_theta_cut_eff_80])

#making the plots
plt.figure(dpi=70)
plt.title("background")
plt.xlabel("Reconstructed energy [TeV]")
plt.ylabel("Number of events")
plt.semilogx()
plt.yscale("log")
plt.grid()

theta_80_eff=np.array(back_theta_cut_eff_80["reco_energy"].value)
theta_68_eff=np.array(back_theta_cut_eff_68["reco_energy"].value)
gn_eff=np.array(back_gn_cut_eff["reco_energy"].value)
tot_back=np.array(background_table["reco_energy"].value)

#before cuts
plt.hist(tot_back, bins=energy_bins, label="total", histtype="step", linewidth=3, color="b")

#gammaness cut
plt.hist(gn_eff, bins=energy_bins, label="eff_cut", histtype="step", linewidth=3, color="y")

#gammaness + theta 80%, eff
plt.hist(theta_80_eff, bins=energy_bins, label="eff_cut; theta_80", histtype="step", linewidth=3, color="c")

#gammaness + theta 68%, eff
plt.hist(theta_68_eff, bins=energy_bins, label="eff_cut; theta_68", histtype="step", linewidth=3, color="k")

plt.legend()

Now the same for the signal:

In [None]:
#gammaness cut
sig_gn_cut_eff=QTable()
for i_bin, (eng_lo, eng_hi) in enumerate(zip(energy_bins[:-1], energy_bins[1:])):
    mask_E=np.logical_and((signal_table["reco_energy"].value >np.repeat(eng_lo,(len(signal_table)))), \
        (signal_table["reco_energy"].value <np.repeat(eng_hi,(len(signal_table)))))
    mask_table_eff_sig=np.logical_and(mask_E,(signal_table["gammaness"].value>np.repeat(gh_table_eff['cut'][i_bin],(len(signal_table)))))
    
    sig_masked_gn_eff=signal_table[mask_table_eff_sig]
    sig_gn_cut_eff=vstack([sig_masked_gn_eff,sig_gn_cut_eff])
    
    
#theta 68%,eff adding the theta68 cut after the gammaness cut
sig_theta_cut_eff_68=QTable()
for i_bin, (eng_lo, eng_hi) in enumerate(zip(energy_bins[:-1], energy_bins[1:])):
    mask_table_theta_eff_68_sig=(sig_gn_cut_eff["reco_energy"].value >np.repeat(eng_lo,(len(sig_gn_cut_eff)))) \
        & (sig_gn_cut_eff["reco_energy"].value <np.repeat(eng_hi,(len(sig_gn_cut_eff)))) \
            & (sig_gn_cut_eff["theta"].value<np.repeat(theta_cut_eff_68[i_bin],(len(sig_gn_cut_eff))))
   
    sig_masked_theta_eff_68=sig_gn_cut_eff[mask_table_theta_eff_68_sig]
    sig_theta_cut_eff_68=vstack([sig_masked_theta_eff_68,sig_theta_cut_eff_68])

    
#theta 80%,eff adding the theta80 cut after the gammaness cut
sig_theta_cut_eff_80=QTable()
for i_bin, (eng_lo, eng_hi) in enumerate(zip(energy_bins[:-1], energy_bins[1:])):
    
    mask_table_theta_eff_80_sig=(sig_gn_cut_eff["reco_energy"].value >np.repeat(eng_lo,(len(sig_gn_cut_eff)))) \
        & (sig_gn_cut_eff["reco_energy"].value <np.repeat(eng_hi,(len(sig_gn_cut_eff)))) \
            & (sig_gn_cut_eff["theta"].value<np.repeat(theta_cut_eff[i_bin],(len(sig_gn_cut_eff))))
    sig_masked_theta_eff_80=sig_gn_cut_eff[mask_table_theta_eff_80_sig]
    sig_theta_cut_eff_80=vstack([sig_masked_theta_eff_80,sig_theta_cut_eff_80])

#making the plots    
plt.figure(dpi=70)
plt.title("signal")
plt.xlabel("Reconstructed energy [TeV]")
plt.ylabel("Number of events")
plt.semilogx()
plt.yscale("log")
plt.grid()

theta_80_eff_sig=np.array(sig_theta_cut_eff_80["reco_energy"].value)
theta_68_eff_sig=np.array(sig_theta_cut_eff_68["reco_energy"].value)
gn_eff_sig=np.array(sig_gn_cut_eff["reco_energy"].value)
sE=np.array(signal_table["reco_energy"].value)

# before cuts
plt.hist(sE, bins=energy_bins, label="total", histtype="step", linewidth=3, color="b")

#gammaness cut
plt.hist(gn_eff_sig, bins=energy_bins, label="eff_cut", histtype="step", linewidth=3, color="y")

#theta 80%, eff and gammaness cut
plt.hist(theta_80_eff_sig, bins=energy_bins, label="eff_cut; theta_80", histtype="step", linewidth=3, color="c")

#theta 68%, eff and gammaness cut
plt.hist(theta_68_eff_sig, bins=energy_bins, label="eff_cut; theta_68", histtype="step", linewidth=3, color="k")

plt.legend()