# Example Usage of Minitree Containing Array Branches

This is an example of how to include arrays (of e.g. peak- or event-level data) in your minitree. It grabs some information about the largest N peaks from each event and stores it in array-type fields in the dataframe. Note that array fields are already supported with pickling (i.e. if you use the `hax.init` option `preferred_minitree_format='pklz'`).

In [None]:
import numpy as np
import pandas as pd
import os
import hax

dataset = '170109_0716'
file_header = '/home/jh3226/analysis/single_electron/'
minitree_header = os.path.join(file_header, 'datasets_reduced_practice/')
nb_top_peaks = 5

### Building the TreeMaker

As usual, you make your TreeMaker class and override the extract_data function, which operates on each event and returns a dictionary. The keys of the dictionary become the branch names, and its values become the branch values. After turning on `uses_arrays` in your TreeMaker, you can have dictionary keys which contain arrays. Here I am saving several fields from the largest five peaks in the TPC that are not 'lone_hit's as arrays.

Note also that this method doesn't currently accept strings, so I have to convert the 'type' attribute to coded ints.

In [None]:
class TopNPeaks(hax.minitrees.TreeMaker):

    __version__ = '0.0.1'
    uses_arrays = True # this tells hax to allow array fields

    def extract_data(self, event):
        peak_field_namelist = ['area', 'area_fraction_top', 'width']
        peaks = event.peaks
        peaks = [peak for peak in peaks if ((peak.type != 'lone_hit') and (peak.detector=='tpc'))]
        nb_peaks = len(peaks)
        result = {}

        # save data in single-valued fields as usual
        result['nb_peaks'] = nb_peaks
        result['time'] = event.start_time

        # get indices of peaks with up to nb_top_peaks largest areas
        areas = np.array([peak.area for peak in peaks])
        if len(areas)>=nb_top_peaks:
            top_indices = np.sort(np.argpartition(areas, -1*nb_top_peaks)[-1*nb_top_peaks:])
        else:
            top_indices = range(len(areas))

        # fill array fields with e.g. peak-level info
        for peak_field in peak_field_namelist:
            # allow for the case of zero peaks
            if nb_peaks == 0:
                result[peak_field] = []
            elif hasattr(peaks[0], peak_field):
                result[peak_field] = np.array([getattr(peaks[peak_index], peak_field) for peak_index in top_indices])
            elif peak_field=='width':
                result[peak_field] = np.array([list(peaks[peak_index].range_area_decile)[5] for peak_index in top_indices])
            else:
                raise ValueError("Field %s doesn't exist" % peak_field)
        
        # convert type to coded ints
        type_ints = {'s1': 1, 's2': 2, 'unknown': 3}
        result['typecode'] = [type_ints[peaks[peak_index].type] for peak_index in top_indices]
        result['peak_index'] = top_indices
        return result

hax.init(experiment='XENON1T', minitree_paths = [minitree_header])
data = hax.minitrees.load(dataset, treemakers=[TopNPeaks])
data.head(2)

New fields have been created by hax containing the lengths of all of the arrays - this is so that the array-type branches can be built and filled in root.

The saved root file can be reloaded as usual through hax.