# Calibration

**Author(s):**
 - Dr. Michele Peresano (CEA-Saclay/IRFU/DAp/LEPCHE), 2020

**Description:**

This notebook contains DL1-calibration plots and benchmark proposals for the _protopipe_ pipeline.  
This was mainly triggered by the step-by-step comparison against _CTA-MARS_, but it can be extended to other pipelines as well.  
**NOTE** Let's try to follow [this](https://www.overleaf.com/16933164ghbhvjtchknf) document by adding those benchmarks or proposing new ones.

**Requirements:**

To run this notebook you will need an _images.h5_ file which can be generated using _write_dl1.py_ .  
Reference simtel-file, plots, values and settings can be found [here (please, always refer to the latest version)](https://forge.in2p3.fr/projects/benchmarks-reference-analysis/wiki/Comparisons_between_pipelines) until we have a more automatic and fancy approach (aka [cta-benchmarks](https://github.com/cta-observatory/cta-benchmarks)+[ctaplot](https://github.com/cta-observatory/ctaplot)).  

The data format required to run the notebook is the current one used by _protopipe_ . Later on it will be the same as in _ctapipe_ .  
**WARNING:** Mono-telescope images (2 triggers - 1 image or 1 trigger - 1 image) are not currently taken into account by the publicly available development version (the new DL1 script will have them), until then expect a somewhat lower statistics.

**Development and testing:**  

For the moment this notebook is optimized to work only on files produced from LSTCam + NectarCam telescope configurations.  
As with any other part of _protopipe_ and being part of the official repository, this notebook can be further developed by any interested contributor.  
The execution of this notebook is not currently automatic, it must be done locally by the user - preferably _before_ pushing a pull-request.

**TODO:**  
* R1-data level --> pedestal vs pix_id
* R1-data level --> dc_to_phe vs pix_id

## Imports

In [None]:
import os
from pathlib import Path

import numpy as np
from scipy.stats import binned_statistic, binned_statistic_2d, cumfreq, percentileofscore
import tables

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.colors as colors
from matplotlib.colors import LogNorm
import matplotlib.ticker as ticker

## Functions

### Spectral weight from requirement B-TEL-1010 "Intensity Resolution" 

In [None]:
def apply_weight_BTEL1010(tel_data):
    """Apply the weight in requirement B-TEL-1010-Intensity Resolution to the reconstructed images."""
    target_slope = -2.62 # this is the spectral slope as required by the B-TEL-1010 "Intensity Resolution" doc
    spec_slope = -2.0 # this is the spectral slope in the simtel files
    cameras = ["LSTCam", "NectarCam"]
    for camera_index in range(len(cameras)):
        energies = tel_data.col("mc_energy")*1.e3 # GeV
        # each image array needs the same weight
        weights = np.repeat(np.power(energies/200., target_slope - spec_slope), 1855)
    return weights.ravel()

## Load the base data file or reset it if overwritten

In [None]:
def load_reset_images(indir = "./", fileName = "images.h5", config="test"):
    """(Re)load the file containing the images and extract the data per telescope type."""
    # load DL1 images
    data = tables.open_file(f"{indir}/{fileName}")
    data_LST = data.get_node("/images_LSTCam")
    data_MST = data.get_node("/images_NectarCam")
    suffix = config # all generated plots will have this as a suffix in their name
    return data_LST, data_MST, suffix

## Prepare data

First we check if a _plots_ folder exists already.  
If not, we create it.

In [None]:
Path("./plots_calibration").mkdir(parents=True, exist_ok=True)

### Load

In [None]:
data_LST, data_MST, config = load_reset_images()

### Setup

In [None]:
# LSTCam
mc_lst = data_LST.col("mc_phe_image").ravel()
dl1_lst = data_LST.col("dl1_phe_image").ravel()
weights_lst = apply_weight_BTEL1010(data_LST)
# NectarCam
mc_mst = data_MST.col("mc_phe_image").ravel()
dl1_mst = data_MST.col("dl1_phe_image").ravel()
weights_mst = apply_weight_BTEL1010(data_MST)
# Group
mc_all = [mc_lst, mc_mst]
reco_all = [dl1_lst, dl1_mst]
weights_all = [weights_lst, weights_mst]

In [None]:
# filter positive number of photoelectrons (for log-log plots)
good_values_mst = np.where((mc_mst>0) & (dl1_mst>0))
good_values_lst = np.where((mc_lst>0) & (dl1_lst>0))
# combine cameras
mc = [mc_lst[good_values_lst], mc_mst[good_values_mst]]
reco = [dl1_lst[good_values_lst], dl1_mst[good_values_mst]]
# filter also weights
weights = [weights_lst[good_values_lst], weights_mst[good_values_mst]]

In [None]:
print(f"Total number of pixel-wise values read from simtel file without cuts")
print(f"LSTCam = {len(mc_all[0])}")
print(f"NectarCam = {len(mc_all[1])}")
print(f"After removing for 'noise' pixels the number of pixel-wise values reduces to")
print(f"LSTCam = {len(mc[0])}")
print(f"NectarCam = {len(mc[1])}")
print(f"'pixel-wise values' means #pixels * #cameras * #events")
print(f"In this phase all single-telescope images are considered.")

## Comparison plots

### Calibration

#### Correlation between the reconstructed and true number of photoelectrons

In [None]:
nbins_x = 160
nbins_y = 320
cameras = ["LSTCam", "NectarCam"]

for camera_index in range(len(cameras)):
    fig = plt.figure(figsize=(6, 5), tight_layout=False)
    plt.xlabel("log10(true #p.e)")
    plt.ylabel("log10(reco #p.e)")
    
    # This is just to count the real number of events given to the histogram
    # The subsequent histogram has the weights applied, so the number of events there is biased by this
    h_no_weights = plt.hist2d(np.log10(mc[camera_index]), np.log10(reco[camera_index]),
                   bins=[nbins_x, nbins_y],
                   range=[[0,4.2],[-4,4]],
                   norm=LogNorm(),
                  )
    h = plt.hist2d(np.log10(mc[camera_index]), np.log10(reco[camera_index]),
                   bins=[nbins_x, nbins_y],
                   range=[[0,4.2],[-4,4]],
                   norm=LogNorm(),
                   cmap=plt.cm.rainbow,
                   weights=weights[camera_index],
                  )
    
    plt.plot([0, 4], [0, 4], color="black") # line showing perfect correlation
    plt.minorticks_on()
    plt.xticks(ticks=np.arange(-1, 5, 0.5), labels=["",""]+[str(i) for i in np.arange(0, 5, 0.5)])
    plt.xlim(-0.2,4.2)
    plt.colorbar(h[3], 
                 ax=plt.gca(), 
#                  format=ticker.FuncFormatter(fmt)
                )
    plt.grid()
    
    fig.savefig(f"./plots_calibration/recoPhesVsTruePhes_{cameras[camera_index]}_protopipe_{config}.png")
    
    # Print some debug/benchmarking information
    print(f"Total number of events in the plot of {cameras[camera_index]} (before re-weighting) = {h_no_weights[0].sum()}")

#### Charge resolution

In [None]:
# First restore reconstructed negative values, since now we make ratios instead of logarithms
# filter only positive number of true photoelectrons
good_values_lst = np.where(mc_all[0]>0)
good_values_mst = np.where(mc_all[1]>0)
# combine cameras
mc = [mc_all[0][good_values_lst], mc_all[1][good_values_mst]]
reco = [reco_all[0][good_values_lst], reco_all[1][good_values_mst]]
weights = [weights_all[0][good_values_lst], weights_all[1][good_values_mst]]

In [None]:
nbins_x = 160
nbins_y = 320
cameras = ["LSTCam", "NectarCam"]

for camera_index in range(len(cameras)):
    
    fig = plt.figure(figsize=(6, 5), tight_layout=False)
    
    plt.xlabel("log10(true #p.e)")
    plt.ylabel("reconstructed #p.e / true #p.e")
    h = plt.hist2d(np.log10(mc[camera_index]), (reco[camera_index]/mc[camera_index]),
                   bins=[nbins_x, nbins_y],
                   range=[[-0.2,4.2],[-2,6]],
                   norm=LogNorm(),
                   cmap=plt.cm.rainbow,
                   weights=weights[camera_index],
                  )
    plt.plot([0, 4], [1, 1], color="black") # line showing perfect correlation
    plt.colorbar(h[3], ax=plt.gca()
                 #, format=ticker.FuncFormatter(fmt)
                )
    plt.grid()

    fig.savefig(f"./plots_calibration/chargeResolution_{cameras[camera_index]}_protopipe_{config}.png")

#### Calculate average bias correction

In [None]:
# calculate average bias between 50 and 500 phes to be safely away from saturation and from NSB noise
# select true phe between 50 and 500 phe
good_values_lst = np.where((mc_all[0]>=50) & (mc_all[0]<=500))
good_values_mst = np.where((mc_all[1]>=50) & (mc_all[1]<=500))
# consider only the pixels which comply with this condition in both the true and reconstructed samples
mc_lst = mc_all[0][good_values_lst]
mc_mst = mc_all[1][good_values_mst]
reco_lst = reco_all[0][good_values_lst]
reco_mst = reco_all[1][good_values_mst]
# define bias as the difference between reco and true
bias_lst = reco_lst - mc_lst
bias_mst = reco_mst - mc_mst
# take the average
mean_bias_lst = np.mean(bias_lst)
mean_bias_mst = np.mean(bias_mst)
# since in average (reco - true) = mean_bias, in order to correct (always in average)
# the relation reco/true so to get 1, we need to multiply it by [1-(mean_bias/reco)]
intensity_correction_factor_lst = 1 - mean_bias_lst/reco_lst
# which averages to
print("Intensity correction factor for LSTCam: ", np.mean(intensity_correction_factor_lst))
# same for MST
intensity_correction_factor_mst = 1 - mean_bias_mst/reco_mst
print("Intensity correction factor for NectarCam: ", np.mean(intensity_correction_factor_mst))
# Finally we store these results
corr = [np.mean(intensity_correction_factor_lst),
        np.mean(intensity_correction_factor_mst)]

#### Charge resolution (after bias correction)

In [None]:
nbins_x = 160
nbins_y = 320
cameras = ["LSTCam", "NectarCam"]

for camera_index in range(len(cameras)):
    
    fig = plt.figure(figsize=(6, 5), tight_layout=False)

    plt.xlabel("log10(true #p.e)")
    plt.ylabel("{:.2f}*(reconstructed #p.e / true #p.e)".format(corr[camera_index]))
    h = plt.hist2d(np.log10(mc[camera_index]), corr[camera_index]*(reco[camera_index]/mc[camera_index]),
                   bins=[nbins_x, nbins_y],
                   range=[[-0.2,4.2],[-2,6]],
                   norm=LogNorm(),
                   cmap=plt.cm.rainbow,
                   weights=weights[camera_index],
                  )
    plt.plot([0, 4], [1, 1], color="black") # line showing perfect correlation
    plt.colorbar(h[3], ax=plt.gca(),
#                  format=ticker.FuncFormatter(fmt)
                )
    plt.grid()

    fig.savefig(f"./plots_calibration/correctedChargeResolution_{cameras[camera_index]}_protopipe_{config}.png")

#### RMS of intensity resolution

In [None]:
# Filter out again negative reconstructed values, since we will take the logarithm

# filter positive number of photoelectrons
good_values_lst = np.where((mc_all[0]>0) & (reco_all[0]>0))
good_values_mst = np.where((mc_all[1]>0) & (reco_all[1]>0))
# combine cameras
mc = [mc_all[0][good_values_lst], mc_all[1][good_values_mst]]
reco = [reco_all[0][good_values_lst], reco_all[1][good_values_mst]]

In [None]:
nbins = 100
cameras = ["LSTCam", "NectarCam"]

for camera_index in range(len(cameras)):
    
    fig = plt.figure(figsize=(6, 5), tight_layout=False)

    plt.xlabel("log10(true #p.e)")
    plt.ylabel("RMS of reco #p.e / true #p.e around true#p.e")

    t = mc[camera_index]
    r = reco[camera_index]
    icf = corr[camera_index]

    rms = binned_statistic(x=np.log10(t), values=icf*(r/t), statistic='std', bins=nbins, range=[0.,3.2])
    count = binned_statistic(x=np.log10(t), values=icf*(r/t), statistic='count', bins=nbins, range=[0.,3.2])

    bincenters = (rms.bin_edges[1:] + rms.bin_edges[:-1])/2
    mask = rms.statistic > 0

    plt.errorbar(
        bincenters[mask],
        rms.statistic[mask],
        yerr=rms.statistic[mask] / np.sqrt(count.statistic[mask]),
        fmt=".",
        lw=1,
    )

    plt.yscale("log")
    plt.minorticks_on()
    plt.xlim(-0.2,4.2)
    plt.ylim(2.e-2,6.)
    plt.grid(which='major', axis='x')
    plt.grid(which='minor', axis='y')
    
    fig.savefig(f"./plots_calibration/rms_{cameras[camera_index]}_protopipe_{config}.png")

#### Single-pixel spectrum

In [None]:
nbins = 250
xrange = [-1,4]
cameras = ["LSTCam", "NectarCam"]

for camera_index in range(len(cameras)):
    
    fig = plt.figure(figsize=(6, 5), tight_layout=False)
    plt.xlabel("log10(reconstructed #p.e)")
    plt.ylabel("Number of pixels with > x phe")

    # now we use again all the original events
    t = mc_all[camera_index]
    r = reco_all[camera_index]

    signal = r[np.where(t>0)]
    noise = r[np.where(t==0)]

    tot_entries = len(t) # events * camera * pixels

    noise_hist, xbins = np.histogram( np.log10(noise), bins=nbins, range=xrange)
    plt.semilogy(xbins[:-1], noise_hist[::-1].cumsum()[::-1]/tot_entries, drawstyle="steps-post", label="Noise Pixels")

    signal_hist, xbins = np.histogram( np.log10(signal), bins=nbins, range=xrange)
    plt.semilogy(xbins[:-1], signal_hist[::-1].cumsum()[::-1]/tot_entries, drawstyle="steps-post", label="Signal Pixels")

    hist, xbins = np.histogram( np.log10(r), bins=nbins, range=xrange)
    plt.semilogy(xbins[:-1], hist[::-1].cumsum()[::-1]/tot_entries, drawstyle="steps-post",alpha=0.7, label="All Pixels")

    plt.xlim(xrange)
    plt.minorticks_on()
    plt.grid()
    plt.legend()
    
    # Print info about threshold cuts (as from tilcut notes of TS and JD)
    
    # This is the phe cut that rejects 99.7% of the noise
    cut = np.quantile(noise, 0.997)
    signal_saved = percentileofscore(signal, cut)
    plt.vlines(np.log10(cut), ymin=1.e-7, ymax=1, color='red')
    
    print(f"{cameras[camera_index]}: cutting at {cut} rejects 99.7% of the noise and saves {signal_saved:.1f}% of the signal")

    fig.savefig(f"./plots_calibration/singlePixelSpectrum_{cameras[camera_index]}_protopipe_{config}.png")