## CAGE raw energy spectrum checker

This notebook is intended to complement `energy_cal.py`.  
We use the interactive mode to load a raw spectrum from a particular set of cycle files, and use it to pick out the raw locations of the peaks, which can then be added to `metadata/input_peaks.json` as input guesses.

Run this notebook using the `legend-base` Shifter image.  [Here are the instructions to set this up.](https://github.com/legend-exp/legend/wiki/Computing-Resources-at-NERSC)

In [1]:
# install user prerequisites
# !pip install ipympl --user

# Use this at NERSC to get interactive plots.
%matplotlib widget

import os, h5py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from pygama import DataGroup, lh5
import pygama.analysis.histograms as pgh

#### >>> Users, set config here ! <<<
Set the query here to DataGroup to load files.  You may want to refer to `runDB.json` to see how to vary this.  Here we also set the energy parameter of interest. 

In [11]:
# fileDB query
# que = 'run==66 and cycle > 885'
que = 'run==180 and cycle > 1651'

# energy estimator of interest
# etype = 'energy'
etype = 'trapEftp'

# lh5 table name
tb_in = 'ORSIS3302DecoderForEnergy/dsp'

# uncalibrated energy range
xlo, xhi, xpb = 0, 3e6, 10000
xlo, xhi, xpb = 0, 10000, 10  # good for trapEmax and trapEftp

# load the fileDB and make sure the entries exist
dg = DataGroup('cage.json', load=True)
dg.fileDB.query(que, inplace=True)
if len(dg.fileDB)==0:
    print('Error, no files found.  Check your query, and fileDB.h5.')

ecal_cols = ['run', 'cycle', 'skip', 'runtype', 'startTime', 'threshold', 'stopTime', 'runtime']
dg.fileDB[ecal_cols]

Unnamed: 0,run,cycle,skip,runtype,startTime,threshold,stopTime,runtime
1650,180.0,1652,False,bkg,1611819000.0,16.0,1611821000.0,30.004036
1651,180.0,1653,False,bkg,1611821000.0,16.0,1611822000.0,30.126609
1652,180.0,1654,False,bkg,1611822000.0,16.0,1611824000.0,29.990888
1653,180.0,1655,False,bkg,1611824000.0,16.0,1611826000.0,30.01869
1654,180.0,1656,False,bkg,1611826000.0,16.0,1611828000.0,29.990494
1655,180.0,1657,False,bkg,1611828000.0,16.0,1611830000.0,30.110609
1656,180.0,1658,False,bkg,1611830000.0,16.0,1611831000.0,30.129549
1657,180.0,1659,False,bkg,1611831000.0,16.0,1611833000.0,30.080371
1658,180.0,1660,False,bkg,1611833000.0,16.0,1611835000.0,30.092051
1659,180.0,1661,False,bkg,1611835000.0,16.0,1611837000.0,30.077167


#### Load data
Here we use DataGroup's fileDB to select files, retrieve DSP data,
and show some information about what we've selected.

In [12]:
# essentially the same code as in energy_cal::check_raw_spectrum

# load numpy arrays of uncalibrated energy
dsp_list = dg.lh5_dir + dg.fileDB['dsp_path'] + '/' + dg.fileDB['dsp_file']
raw_data = lh5.load_nda(dsp_list, [etype], tb_in, verbose=False)

# get runtime
runtime_min = dg.fileDB['runtime'].sum()

# print columns of table
with h5py.File(dsp_list.iloc[0], 'r') as hf:
    print('\nLH5 columns:', list(hf[f'{tb_in}'].keys()))
    
# histogram energy data for this estimator and normalize by runtime
data = raw_data[etype]
hist, bins, var = pgh.get_hist(data, range=(xlo, xhi), dx=xpb)
bins = bins[1:] # trim zero bin, not needed with ds='steps'
hist_rt = np.divide(hist, runtime_min * 60)

print(f'\nRaw E: {etype}, {len(data)} cts, runtime: {runtime_min:.2f} min')


LH5 columns: ['A_10', 'AoE', 'bl', 'bl_sig', 'bl_slope', 'channel', 'dcr', 'energy', 'hf_max', 'lf_max', 'timestamp', 'tp_0', 'tp_10', 'tp_50', 'tp_80', 'tp_90', 'tp_max', 'trapEftp', 'trapEmax', 'triE']

Raw E: trapEftp, 2600438 cts, runtime: 717.77 min


#### Create interactive spectrum

In [14]:
%matplotlib widget
plt.semilogy(bins, hist_rt, ds='steps', c='b', lw=1, label=etype)
plt.xlabel(etype, ha='right', x=1)
plt.ylabel(f'cts/sec, {xpb}/bin', ha='right', y=1)
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …