# DL1 Data Handler Demo

This notebook demonstrates how to use DL1 Data Handler to write/read IACT data for use with machine learning analysis in Python.

## Writing Data

Using DL1 Data Writer, it is easy to convert data from simtel or other data formats to a standardized HDF5 (PyTables) file format which can be used conveniently in Python. Note: The code below is very similar to the code which can be found at dl1-data-handler/scripts/write_data.py

In [1]:
from dl1_data_handler.writer import DL1DataWriter, CTAMLDataDumper

We first define a runlist which tells the Data Writer which data files to process and what output HDF5 files we want (their names, locations, and which input files they correspond to). Note that you may have to adjust the runlist below to the desired files/locations. 

NOTE: Runlists can also be provided from the command line in the form of a YAML file and these files can be generated automatically using the dl1-data-handler/scripts/generate_runlist.py script.

In [4]:
runlist = [
    {
        'inputs': ["dl1_data_handler_demo/gamma_20deg_0deg_run100___cta-prod3-demo_desert-2150m-Paranal-baseline-sample_cone10.simtel.gz",
                    "dl1_data_handler_demo/gamma_20deg_0deg_run101___cta-prod3-demo_desert-2150m-Paranal-baseline-sample_cone10.simtel.gz",
                    "dl1_data_handler_demo/gamma_20deg_0deg_run102___cta-prod3-demo_desert-2150m-Paranal-baseline-sample_cone10.simtel.gz",
                    "dl1_data_handler_demo/gamma_20deg_0deg_run103___cta-prod3-demo_desert-2150m-Paranal-baseline-sample_cone10.simtel.gz"],
        'target': "gamma_20deg_0deg_runs100-103___cta-prod3-demo_desert-2150m-Paranal-baseline-sample_cone10.h5"
    },
     {
        'inputs': ["dl1_data_handler_demo/proton_20deg_0deg_run100___cta-prod3-demo_desert-2150m-Paranal-baseline-sample_cone10.simtel.gz",
                    "dl1_data_handler_demo/proton_20deg_0deg_run101___cta-prod3-demo_desert-2150m-Paranal-baseline-sample_cone10.simtel.gz",
                    "dl1_data_handler_demo/proton_20deg_0deg_run102___cta-prod3-demo_desert-2150m-Paranal-baseline-sample_cone10.simtel.gz",
                    "dl1_data_handler_demo/proton_20deg_0deg_run103___cta-prod3-demo_desert-2150m-Paranal-baseline-sample_cone10.simtel.gz"],
        'target': "proton_20deg_0deg_runs100-103___cta-prod3-demo_desert-2150m-Paranal-baseline-sample_cone10.h5"
    },
]


print("Number of input files in runlist: {}".format(
    len([input_file for run in runlist for input_file in run['inputs']])))
print("Number of output files requested: {}".format(
    len(runlist)))

The writing of data by DL1DataWriter is handled in two stages. First, the data is loaded from the input files into ctapipe containers using ctapipe's event source API. Any input format which is supported by ctapipe event sources (including most IACT data formats) can therefore be read in.

Here we set any desired settings for the ctapipe event source we'll be using. We can select a specific subclass of eventsource if we want to, but here we don't, as in that case DL1DataWriter will simply use ctapipe.io.event_source(), which automatically chooses the correct class to use based on the input file.

In [None]:
event_src_settings = {}

The second part of the process, dumping the data from the ctapipe containers to a specific output format, is handled by a custom class called a DL1DataDumper. 

An implementation of DL1DataDumper designed to output PyTables HDF5 files in the standard "CTA ML format" is already provided. It is called CTAMLDataDumper. However, if you would like to implement an alternative data format, it is as easy as implementing your own Data Dumper class inheriting from the generic DL1DataDumper. Here we select CTAMLDataDumper as our desired dumper and designate some settings related to the data dumping. Most of these settings are used for optimizing the output HDF5 files (i.e. compressing them efficiently, adding indexes on columns, etc.).

In [None]:
data_dumper_class = CTAMLDataDumper

dumper_settings = {
    'filter_settings': {
        'complib': 'lzo',
        'complevel': 1
    },
    'expected_tel_types': 10,
    'expected_tels': 300,
    'expected_events': 10000,
    'expected_images_per_event': {
        'LST:LSTCam': 0.5
        'MST:NectarCam': 2.0
        'MST:FlashCam': 2.0
        'MST-SCT:SCTCam': 1.5
        'SST:DigiCam': 1.25
        'SST:ASTRICam': 1.25
        'SST:CHEC': 1.25
    }
    'index_columns': [
        ['/Events', 'mc_energy'],
        ['/Events', 'alt'],
        ['/Events', 'az'],
        ['tel', 'event_index']
    ]
}

Finally, we set some general settings relating to the writer as a whole. Note that all of these settings can also be specified, like the runlist, using an external YAML config file.

In [None]:
writer_settings = {
    'calibration_settings': {
       'r1_product': 'HESSIOR1Calibrator',
       'extractor_product': 'NeighborPeakIntegrator'
    },
    'output_file_size': 1073741824
    'events_per_file': 1000
}

Now all that remains to do is to instantiate our DataWriter and then call process_data with our runlist. After a brief wait, the output files we requested in our runlist should be written!

In [None]:
data_writer = DL1DataWriter(event_source_class=None,
                            event_source_settings=event_src_settings,
                            data_dumper_class=data_dumper_class,
                            data_dumper_settings=dumper_settings,
                            **writer_settings)

data_writer.process_data(runlist)

We can verify that the output files were created and see their sizes:

In [None]:
import os

for run in runlist:
    for output_file in run['target']:
        size = os.path.getsize(output_file)
        print("File: {}, Size: {}".format(output_file, size))

## Reading Data

Having created some sample data files, we can now use DL1DataReader to read examples from them and examine their contents.

In [2]:
from dl1_data_handler.reader import DL1DataReader

In [3]:
reader = DL1DataReader(["/home/bryankim96/dl1_data_handler_demo/gamma_20deg_0deg_runs100-103___cta-prod3-demo_desert-2150m-Paranal-baseline-sample_cone10.h5"])

ValueError: cannot reshape array of size 580644 into shape (118,118)