# Low-Level Corrections, Calibration and Image Extraction: R0 to DL1a

<h1 id="tocheading">Table of Contents</h1>
<div id="toc"></div>

In [None]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import astropy.units as u
from ipywidgets import interact
from tqdm.auto import tqdm

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))

In [None]:
%matplotlib widget

In [None]:
plt.rcParams['figure.constrained_layout.use'] = True
plt.rcParams['figure.dpi'] = 120

## Relevant Data Levels


### R0

The raw data currently written to disk by LST is 
* consisting of two gain channels for each pixel
* requires treatment of LST specific issues
* not in units of a single photo electron but some arbitrary scaling different for the two gains
* only roughly pre-calibrated for the DRS4 baseline offsets

This is in CTA's definition called **R0** and will in the future not be written to disk.

### R1

R1 is the first data level, that will transmitted from the telescopes to the central data processing of the array.

* A single time series for each pixel (Gain selection)
* All telescope specific calibration steps already applied
* In units of photo electrons, at least roughly so later calibration steps are only a small adjustment

But: 

* R1 still has values for all pixels of the camera.
* This is too much data for long-term storage

### DL0

DL0 is the same as R1, but with pixels likely containing no Cherenkov signal removed, aka Data Volume reduction.

This is the first data level intended for long-term archiving in CTA.

### DL1

DL1 is split in two sub-levels

* DL1a is the "Image Level":  From the R1 or DL0 waveforms, we obtain   the number of photons and some kind of "arrival time".
* DL1b are parametrizations of the DL1a images, which can be used to estimate event properties, e.g. using random forests

## Reading LST data using the LSTEventSource

There is no common R0, R1 or DL0 format defined for CTA yet.

LST R0 is stored in a custom file format called `zfits`.

It combines the outer shell of a FITS file with custom binary table extensions that store 
data using [Google Protocol Buffers](https://developers.google.com/protocol-buffers).

The low-level C++ reader/writer and python reader source code can be found in the [CTA Gitlab](https://gitlab.cta-observatory.org/cta-computing/common/acada-array-elements/adh-apis).
The corresponding python package is `protozfits` and available from PyPI and `conda-forge`.

This package is used in `ctapipe_io_lst` to implement the `LSTEventSource`, to read LST data into `ctapipe` data structures.

Because the data rate of LST is too high for a single network connection and disk, LST events are actually written into 4 files in parallel.

The `LSTEventSource` will open all four when given the first and the others are available at the same location.


### Accessing R0

In [None]:
from ctapipe_io_lst import LSTEventSource

In [None]:
# by default, the LSTEventSource wants to apply all needed corrections
# and have all needed information to provide R1. 
# If we are only interested in R0, we switch those options off.
source = LSTEventSource(
    input_url='../data/real/R0/20201120/LST-1.1.Run02965.0000_first400.fits.fz',
    apply_drs4_corrections=False,
    pointing_information=False,
    trigger_information=False,
)

source

We can also rely on the ctapipe `EventSource` machinery identifying a compatible source for our input file.

This enables writing progams that can e.g. handle both simtel array files and LST files:

In [None]:
from ctapipe.io import EventSource

type(EventSource('../data/real/R0/20201120/LST-1.1.Run02965.0000_first400.fits.fz'))

In [None]:
type(EventSource('dataset://gamma_test_large.simtel.gz'))

In [None]:
# take a look at the 31 event (it has a nice little shower) and 268 is the brightes pixel
EVENT = 31
PIXEL = 268

for event in source:
    if event.index.event_id == EVENT:
        break
        
event.index.event_id

In [None]:
event.r0.tel[1].waveform.shape

In [None]:
event.r0.tel[1].waveform

In [None]:
from ctapipe_io_lst.constants import HIGH_GAIN, LOW_GAIN, N_PIXELS

fig, ax = plt.subplots()
plot = ax.stairs(event.r0.tel[1].waveform[0, 0], baseline=400)
ax.axhline(400, ls=':', color='gray')
ax.set_xlabel('Sample')
ax.set_ylabel('R0 value')


def plot_waveform(gain, pixel=PIXEL):
    waveform = event.r0.tel[1].waveform[gain, pixel]
    plot.set_data(waveform)
    ax.set_title(f'Run {event.index.obs_id}, Event: {event.index.event_id}, Gain: {gain}, Pixel {pixel}')
    ax.margins()
    ax.set_ylim(0.9 * waveform.min(), 1.1 * waveform.max())


interact(plot_waveform, gain=(HIGH_GAIN, LOW_GAIN), pixel=(0, N_PIXELS - 1))

### Applying low-level DRS4 corrections

There are three types of DRS4 corrections

* baseline correction: each capacitor of each DRS4 chip has its own offset from the desired baseline of 400
* spikes: at certain positions in the waveform, the values are raised for 3 consecutive samples
* timelapse: depending on the time since the last readout, the values are higher

The timelapse correction is currently applied using a hard coded power law, it does not need a calibration file.

Baseline correction requires the mean baseline value for each capacitor, which is stored in the calibration tree at  
`<base>/monitoring/PixelCalibration/LevelA/drs4_baseline/<date>/<version>/drs4_pedestal.Run<pedestal_run>.h5`

Spike correction is done by subtracting the mean spike height at the calculable spike positions.
These are stored in the same file.

All steps are performed by the `LSTR0Calibrator` which is The corrected values are converted to floats and stored in the `event.r1` container.

In [None]:
# to set configuration options of sub-components, we need to use a `Config` object
from traitlets.config import Config

config = Config({
    'LSTEventSource': {
        'input_url': '../data/real/R0/20201120/LST-1.1.Run02965.0000_first400.fits.fz',
        'pointing_information': False,
        'trigger_information': False,
        ### new ###
        'LSTR0Corrections': {
            'drs4_pedestal_path': '../data/real/monitoring/PixelCalibration/LevelA/drs4_baseline/20201120/v0.8.3/drs4_pedestal.Run02963.0000.h5',
        }
    }
})

# look at the same event again
source = LSTEventSource(config=config, max_events=EVENT)
for event in source:
    pass

In [None]:
fix, ax = plt.subplots()
ax.stairs(event.r1.tel[1].waveform[0, PIXEL])
ax.set_ylabel('R1 value / adc counts')
ax.axhline(0, ls=':', color='gray')
None

### Obtaining correct trigger times and event types

Unfortunately, during commissioning, including information from UCTS, the main source for the trigger time and event type has not always worked reliably.

Because sometimes, information for a specific event is missed, there are *jumps* in the information that need to be detected and corrected.

For this, we are using high-precision counters in the dragon modules.

These counters however are only relative to the run start and need an absolute reference to obtain a valid timestamp.

Reference values can be calculated from the first event of the first subrun.

These reference values are stored in the run summaries for each night and can be used by the `EventTimeCalculator`, which also detects and corrects these *UCTS jumps*.

Currently, flat field events are also not tagged before writing the R0 files.
To identify flat field events, there is a heuristic implemented in the `LSTEventSource`.

In [None]:
from astropy.table import Table

Table.read('../data/real/monitoring/RunSummary/RunSummary_20200218.ecsv')

### Pointing Information

Pointing information also needs to be read from another input file, using `PointingSource`.

Now including both `EventTimeCalculator` and `PointingSource` configuration looks like this:

In [None]:
from traitlets.config import Config

# to set configuration options of sub-components, we need to use a `Config` object
config = Config({
    'LSTEventSource': {
        'input_url': '../data/real/R0/20201120/LST-1.1.Run02965.0000_first400.fits.fz',
        'LSTR0Corrections': {
            'drs4_pedestal_path': '../data/real/monitoring/PixelCalibration/LevelA/drs4_baseline/20201120/v0.8.3/drs4_pedestal.Run02963.0000.h5',
        },
        ### new ###
        'EventTimeCalculator': {
            'run_summary_path': '../data/real/monitoring/RunSummary/RunSummary_20201120.ecsv',
        },
        'PointingSource': {
            'drive_report_path': '../data/real/monitoring/DrivePositioning/drive_log_20201120.txt'
        },
    }
})

source = LSTEventSource(config=config, max_events=200)

event = next(iter(source))

event.pointing.tel[1].altitude, event.pointing.tel[1].azimuth, event.trigger.time

In [None]:
from collections import Counter

source = LSTEventSource(config=config, max_events=200)
Counter(e.trigger.event_type.name for e in source)

### Gain Selection and Pixel Calibration

The last step is to apply the pixel calibration, including the conversion to photo electrons and calculating time correction coefficients.

There are two parts to the time correction:
* Differences between the pixels, e.g. due to different signal delays in the hardware
* Differences due to non-uniform sampling of the DRS4 chip

The source needs to select the appropriate gain channel, convert to photo electrons and fill in the time correction for 
later treatment at the DL1 step.

Low-gain and high-gain also need to be scaled by a calibration factor to harmonize their values and to correct for 
the different pulse shapes of the calibration laser pulses and Cherenkov photon pulses.

Calibration coefficients are read from the calibration file, which is in the tree at:  
`<base>/monitoring/PixelCalibration/LevelA/calibration/<date>/<version>/calibration_filters_52.<calibration run>.h5`

For the DRS4 based time shifts, there is another calibration file needed, located at:  
`../data/real/monitoring/PixelCalibration/LevelA/drs4_time_sampling_from_FF/<date>/<version>/time_calibration.<run>.h5`

Note that DRS4 baseline and calibration file are created nightly, as these coefficients vary with several conditions.

DRS4 time calibration is only redone if hardware in the camera is changed. So take the latest file before the data you are analyzing.


In [None]:
from traitlets.config import Config
from copy import deepcopy

config = Config({
    'LSTEventSource': {
        'input_url': '../data/real/R0/20201120/LST-1.1.Run02965.0000_first400.fits.fz',
        'EventTimeCalculator': {
            'run_summary_path': '../data/real/monitoring/RunSummary/RunSummary_20201120.ecsv',
        },
        'PointingSource': {
            'drive_report_path': '../data/real/monitoring/DrivePositioning/drive_log_20201120.txt'
        },
        'LSTR0Corrections': {
            'drs4_pedestal_path': '../data/real/monitoring/PixelCalibration/LevelA/drs4_baseline/20201120/v0.8.3/drs4_pedestal.Run02963.0000.h5',
            'drs4_time_calibration_path': '../data/real/monitoring/PixelCalibration/LevelA/drs4_time_sampling_from_FF/20191124/v0.8.3/time_calibration.Run01625.0000.h5',
            ### new ###
            'calibration_path': '../data/real/monitoring/PixelCalibration/LevelA/calibration/20201120/v0.8.3/calibration_filters_52.Run02964.0000.h5',
            'calib_scale_high_gain': 1.088,
            'calib_scale_low_gain': 1.004,
        },
    }
})

# look at the same event again
source = LSTEventSource(config=config, max_events=EVENT)
for event in source:
    pass

In [None]:
fix, ax = plt.subplots()

# observe that the first index is now gone, gain selected!
ax.stairs(event.r1.tel[1].waveform[PIXEL])

ax.set_ylabel('R1 value / adc counts')
None

## Going to DL1a

DL1a requires integrating the waveforms around the peak and somehow determining a "peak time".

In the easiest case, just the full waveform is summed and the peak time is the weighted average of the sample position with the sample value.

To reduce noise, we only sum in a smaller window around the highest value, using `LocalPeakWindowSum` from ctapipe.

We also disable the "Integration Correction".

In [None]:
from ctapipe.calib import CameraCalibrator

config = Config({
    'CameraCalibrator': {
        'image_extractor_type': 'LocalPeakWindowSum',
        'LocalPeakWindowSum': {
              'window_shift': 4,
              'window_width': 8,
              'apply_integration_correction': False,
        }
    }
})


calibrator = CameraCalibrator(source.subarray, config=config)
calibrator(event)


event.dl1.tel[1].image, event.dl1.tel[1].peak_time

In [None]:
from ctapipe.visualization import CameraDisplay

fig, (ax_image, ax_peaktime)  = plt.subplots(1, 2)

cam = source.subarray.tel[1].camera.geometry

display_image = CameraDisplay(cam, ax=ax_image, cmap='inferno')
display_peaktime = CameraDisplay(cam, ax=ax_peaktime, cmap='RdBu_r')

display_image.add_colorbar()
display_peaktime.add_colorbar()

display_image.image = event.dl1.tel[1].image
display_peaktime.image = event.dl1.tel[1].peak_time

for ax in (ax_image, ax_peaktime):
    ax.set_title('')
    
fig.suptitle(f'Run {event.index.obs_id}, Event: {event.index.event_id}, Type: {event.trigger.event_type}')

## Using the command line tools 

### Using the `lstchain_{data,mc}_r0_to_dl1` program

For regular analysis, there are two command line program running all these steps and storing DL1 data in an HDF5 file.

One for observed (`data`) and one for simulated (`mc`) data.

Note that this also runs the cleaning and parametrization (DL1b) and muon analysis.

In [None]:
!lstchain_data_r0_to_dl1 --help

In [None]:
!rm -rf /tmp/$USER/lstchain-demo 
!mkdir -p /tmp/$USER/lstchain-demo
!lstchain_data_r0_to_dl1 \
  --input-file ../data/real/R0/20201120/LST-1.1.Run02965.0000_first400.fits.fz \
  --output-dir /tmp/$USER/lstchain-demo \
  --pedestal-file ../data/real/monitoring/PixelCalibration/LevelA/drs4_baseline/20201120/v0.8.3/drs4_pedestal.Run02963.0000.h5 \
  --time-calibration-file ../data/real/monitoring/PixelCalibration/LevelA/drs4_time_sampling_from_FF/20191124/v0.8.3/time_calibration.Run01625.0000.h5 \
  --calibration-file ../data/real/monitoring/PixelCalibration/LevelA/calibration/20201120/v0.8.3/calibration_filters_52.Run02964.0000.h5 \
  --run-summary-path ../data/real/monitoring/RunSummary/RunSummary_20201120.ecsv \
  --pointing-file ../data/real/monitoring/DrivePositioning/drive_log_20201120.txt

In [None]:
!ls /tmp/$USER/lstchain-demo/

In [None]:
from ctapipe.io import read_table
import os

lstchain_r0_to_dl1 = read_table(
    f'/tmp/{os.getenv("USER")}/lstchain-demo/dl1_LST-1.Run02965.0000.h5',
    '/dl1/event/telescope/image/LST_LSTCam'
)
lstchain_r0_to_dl1[:5]

### Using ctapipe-stage1 (renamed to ctapipe-process in 0.12)

At the R0 to DL1a stage, there is nothing lst-specific, that is not handled by the `LSTEventSource`. 

You can use `ctapipe-stage1` with the correct config, to obtain identical images as from `lstchain_data_r0_to_dl1`,
with the advantange of using the standard ctapipe DL1 format instead of the custom, similar but different lstchain format.

We can either give the options to the `LSTEventSource` on the command line or write a json config file.

See https://github.com/cta-observatory/ctapipe_io_lst/blob/master/example_stage1_config.json

In [None]:
!ctapipe-stage1 \
  --input ../data/real/R0/20201120/LST-1.1.Run02965.0000_first400.fits.fz \
  --output /tmp/$USER/lstchain-demo/LST-1.Run02965.0000_first400.dl1.h5\
  --overwrite \
  --progress \
  --write-images \
  --log-level=INFO \
  --LSTEventSource.LSTR0Corrections.drs4_pedestal_path ../data/real/monitoring/PixelCalibration/LevelA/drs4_baseline/20201120/v0.8.3/drs4_pedestal.Run02963.0000.h5 \
  --LSTEventSource.LSTR0Corrections.drs4_time_calibration_path ../data/real/monitoring/PixelCalibration/LevelA/drs4_time_sampling_from_FF/20191124/v0.8.3/time_calibration.Run01625.0000.h5 \
  --LSTEventSource.LSTR0Corrections.calibration_path ../data/real/monitoring/PixelCalibration/LevelA/calibration/20201120/v0.8.3/calibration_filters_52.Run02964.0000.h5 \
  --LSTEventSource.LSTR0Corrections.calib_scale_high_gain=1.088 \
  --LSTEventSource.LSTR0Corrections.calib_scale_low_gain=1.004 \
  --LSTEventSource.EventTimeCalculator.run_summary_path ../data/real/monitoring/RunSummary/RunSummary_20201120.ecsv \
  --LSTEventSource.PointingSource.drive_report_path ../data/real/monitoring/DrivePositioning/drive_log_20201120.txt \
  --CameraCalibrator.image_extractor_type LocalPeakWindowSum \
  --CameraCalibrator.LocalPeakWindowSum.window_shift=4 \
  --CameraCalibrator.LocalPeakWindowSum.window_width=8 \
  --CameraCalibrator.LocalPeakWindowSum.apply_integration_correction=False \

In [None]:
ctapipe_stage_1 = read_table(
    f'/tmp/{os.getenv("USER")}/lstchain-demo/LST-1.Run02965.0000_first400.dl1.h5',
    '/dl1/event/telescope/images/tel_001'
)
ctapipe_stage_1[:5]

### Comparison of lstchain and ctapipe output

In [None]:
from matplotlib.colors import LogNorm


fig, ax = plt.subplots()

bins = np.geomspace(1, 2e3, 101)

cmap = plt.get_cmap('inferno').with_extremes(bad='gray')

ax.hist2d(
    lstchain_r0_to_dl1['image'].ravel(),
    ctapipe_stage_1['image'].ravel(),
    bins=[bins, bins],
    cmap=cmap,
    norm=LogNorm(),
)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_aspect(1)
ax.set_facecolor('gray')

In [None]:
np.all(lstchain_r0_to_dl1['image'].ravel() == ctapipe_stage_1['image'].ravel())