**Luca Riitano 6/10/21**

Two algorithms were tried to determine if waveforms were split. The first uses the derivative at block boundaries as a metric. If at least one channel in an ASIC contained a split waveform, we assume the event was split for the entire ASIC. The second uses the peak position of waveforms as a metric. Split waveforms tend to peak very early (~sample 30) or slightly later (sample ~80-90) than clean waveforms. The average peak position of waveforms in all the channels of an ASIC is used to determine the split status. Waveforms that peak at the first or last sample or have peak values of less than 500 ADC are ignored in both algorithms. Note that any missing data will be represented by None.

In [1]:
#Import json to handle json files
import json

In [2]:
#This is how we load the json file
#Load the raw numbers
with open('split_dict_raw.json', 'r') as file:
    data_raw = json.load(file)
#Load the split status
with open('split_dict_status.json', 'r') as file:
    data_status = json.load(file)

The file titled 'split_dict_raw.json' contains the raw number of events that are classified as clean or split by either algorithm. The file contains a multi-level dictionary with keys corresponding to module numbers, ASIC numbers, and run numbers. The value that corresponds to a run number key has an array of dictionaries. Each dictionary in the array corresponds to a subrun. The dictionaries contain the information on the raw event numbers. The example below should make things more clear. 

In [3]:
#Raw numbers example
#Dictionary with modules as keys
module_5 = data_raw['5']
#Dictionary with ASIC's as keys
module_5_asic_0 = data_raw['5']['0']
#Dictionary with runs as keys
module_5_asic_0_run_328555 = data_raw['5']['0']['328555']
print(f"There are {len(module_5_asic_0_run_328555)} subruns in this run: {module_5_asic_0_run_328555}")

There are 31 subruns in this run: [{'split_split': 0.0, 'split_clean': 0.0, 'clean_split': 0.0, 'clean_clean': 1.0, 'total': 1.0}, {'split_split': 0.0, 'split_clean': 251.0, 'clean_split': 4.0, 'clean_clean': 839.0, 'total': 1094.0}, {'split_split': 0.0, 'split_clean': 263.0, 'clean_split': 6.0, 'clean_clean': 847.0, 'total': 1116.0}, {'split_split': 0.0, 'split_clean': 255.0, 'clean_split': 1.0, 'clean_clean': 841.0, 'total': 1097.0}, {'split_split': 0.0, 'split_clean': 228.0, 'clean_split': 3.0, 'clean_clean': 874.0, 'total': 1105.0}, {'split_split': 0.0, 'split_clean': 264.0, 'clean_split': 3.0, 'clean_clean': 841.0, 'total': 1108.0}, {'split_split': 0.0, 'split_clean': 248.0, 'clean_split': 2.0, 'clean_clean': 858.0, 'total': 1108.0}, {'split_split': 0.0, 'split_clean': 254.0, 'clean_split': 7.0, 'clean_clean': 861.0, 'total': 1122.0}, {'split_split': 2.0, 'split_clean': 276.0, 'clean_split': 1.0, 'clean_clean': 828.0, 'total': 1107.0}, {'split_split': 1.0, 'split_clean': 300.0, 'c

In [4]:
#Array of subruns
module_5_asic_0_run_328555_second_subrun = data_raw['5']['0']['328555'][1]
print(module_5_asic_0_run_328555_second_subrun)

{'split_split': 0.0, 'split_clean': 251.0, 'clean_split': 4.0, 'clean_clean': 839.0, 'total': 1094.0}


Note that the first four keys have two words in the key name. The first word is the classification according to the derivative algorithm. The second word is the classification according to the peak position algorithm. The last key is the total number of events used in the subrun.

In [5]:
super_clean_perc = module_5_asic_0_run_328555_second_subrun['clean_clean'] / module_5_asic_0_run_328555_second_subrun['total']
print(f"Fraction of events in this subrun that are very likely clean: {super_clean_perc}")

Fraction of events in this subrun that are very likely clean: 0.7669104204753199


The file titled 'split_dict_status.json' contains the split classification of runs for a given ASIC. It seems that an entire run of data for a given ASIC is either highly split or devoid of splits and therefore can be classified as split or clean. The file contains a multi-level dictionary with keys corresponding to module numbers, ASIC numbers, and run numbers. The value that corresponds to a run number key is a dictionary with three values. The keys for the lowest level dictionary are 'derivative', 'peak', and 'best', corresponding to the split status of the run according to the derivative algorithm, the peak position algorithm, and the best guess, respectively. Keep in mind that True corresponds to split and False corresponds to clean. The example below should make things more clear.

In [6]:
#Run status example
#Dictionary with modules as keys
module_5 = data_status['5']
#Dictionary with ASIC's as keys
module_5_asic_0 = data_status['5']['0']
#Dictionary with runs as keys
module_5_asic_0_run_328555 = data_status['5']['0']['328555']
print(module_5_asic_0_run_328555)

{'derivative': True, 'peak': False, 'best': False}


In [7]:
if data_status['5']['0']['328557']['best']:
    print(f"Best guess is that this run is split.")
else:
    print(f"Best guess is that this run is clean.")

Best guess is that this run is clean.


**List of Runs**

List of Crab runs: 328555, 328557, 328564, 328565, 328567, 328569, 328572, 328573, 328574, 328581, 328583,
         328585, 328592, 328597, 328599, 328606, 328608, 328610, 328615, 328617, 328619, 328627,
         328629, 328630, 328631, 328640, 328642, 328646, 328700, 328717, 328733, 328748, 328750,
         328761, 328770, 328772, 328781, 328792, 328794, 328821, 328846, 328854, 328856, 328865,
         328867 
         
List of Markarian runs: 331543, 331549, 331550, 331551, 331552, 331653, 331654, 331655, 331659, 331661,
         331663, 331664, 331675, 331676, 331760, 331761, 331762, 331775, 331776, 331779, 331780, 331784,
         331787, 331789, 331792, 331798, 331799, 331816, 331817, 331818, 331819, 331822, 331823, 331828,
         331831, 331834, 331838, 331843, 331844, 331847, 331848, 331851, 331857, 331859, 331860, 331861,
         331862, 331865, 331866, 331868, 331869, 331870

**Practical Example**

The function below is useful for determining the module and ASIC number of a given pixel. Entering a pixel number returns the pixel number of all the pixels in the ASIC, the ASIC number, and the module number. The pixel number here starts at zero in the bottom left corner of the camera in sky view and increments by one moving left to right. The first row in sky view contains pixel numbers 0-39, the second 40-79, and so on.

In [8]:
def loc_select(pix):
    """
    Accepts a pixel index as an input and returns all the pixel in the same ASIC as well as the module and ASIC number.
    The pixel index is determined by counting off pixels left to right, then bottom to top from the sky view of the
    camera.
    """
    
    #Determine the pixel, ASIC, and module row and column that the pixel resides in
    pix_row = pix // 40
    pix_col = pix % 40
    asic_row = pix_row // 4
    asic_col = pix_col // 4
    mod_row = asic_row // 2
    mod_col = asic_col // 2
    
    #Populate an array with the index of pixels in the same ASIC
    result = []
    count = asic_row * 160
    count += asic_col * 4
    for i in range(4):
        for j in range(4):
            result.append(count)
            count += 1
        count += 36
    
    #Find the ASIC number (recall that each module column is rotated 180 degrees)
    if mod_col % 2 == 0:
        asic_num = (3 - (2 * (asic_row % 2))) - (1 - (asic_col % 2))
    else:
        asic_num = (2 * (asic_row % 2)) + (1 - (asic_col % 2))
        
    #Module ordering in sky view
    mod_location = [[6,107,114,111,100],
    [7,112,124,123,115],
    [8,121,110,108,119],
    [9,106,126,125,103],
    [2,3,1,5,4]]
    
    #Get the module number
    mod_num = mod_location[mod_row][mod_col]
    
    return result, asic_num, mod_num

Say we want to find the percentage of events across the camera for a run that are clean. Here is how we would determine that.

In [9]:
pixel_list = []
asic_list = []
mod_list = []
#Loop through ASIC's
for n in range(100):
    #Choose a pixel in each each ASIC. Here we use the pixel in the bottom left corner.
    #Increase pixel number by 160 every 10 loops (go up an ASIC row) and 4 every loop (go across an ASIC column)
    pix = ((n // 10) * 160) + (4 * (n % 10))
    pixel_list.append(pix)
    
    #Find the corresponding ASIC and Module number
    r, a, m = loc_select(pix)
    asic_list.append(a)
    mod_list.append(m)
print(pixel_list, asic_list, mod_list)

[0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 160, 164, 168, 172, 176, 180, 184, 188, 192, 196, 320, 324, 328, 332, 336, 340, 344, 348, 352, 356, 480, 484, 488, 492, 496, 500, 504, 508, 512, 516, 640, 644, 648, 652, 656, 660, 664, 668, 672, 676, 800, 804, 808, 812, 816, 820, 824, 828, 832, 836, 960, 964, 968, 972, 976, 980, 984, 988, 992, 996, 1120, 1124, 1128, 1132, 1136, 1140, 1144, 1148, 1152, 1156, 1280, 1284, 1288, 1292, 1296, 1300, 1304, 1308, 1312, 1316, 1440, 1444, 1448, 1452, 1456, 1460, 1464, 1468, 1472, 1476] [2, 3, 1, 0, 2, 3, 1, 0, 2, 3, 0, 1, 3, 2, 0, 1, 3, 2, 0, 1, 2, 3, 1, 0, 2, 3, 1, 0, 2, 3, 0, 1, 3, 2, 0, 1, 3, 2, 0, 1, 2, 3, 1, 0, 2, 3, 1, 0, 2, 3, 0, 1, 3, 2, 0, 1, 3, 2, 0, 1, 2, 3, 1, 0, 2, 3, 1, 0, 2, 3, 0, 1, 3, 2, 0, 1, 3, 2, 0, 1, 2, 3, 1, 0, 2, 3, 1, 0, 2, 3, 0, 1, 3, 2, 0, 1, 3, 2, 0, 1] [6, 6, 107, 107, 114, 114, 111, 111, 100, 100, 6, 6, 107, 107, 114, 114, 111, 111, 100, 100, 7, 7, 112, 112, 124, 124, 123, 123, 115, 115, 7, 7, 112, 112, 124, 124, 123, 123, 115, 1

In [10]:
#Count the number of events that are considered clean by both algorithms
clean_count = 0
#Count the total number of events
total_count = 0
#The run we're interested in
run = 328555
#Loop through the ASIC's
for n, mod in enumerate(mod_list):
    #Check if we have data for that run and ASIC combination
    try:
        subruns = data_raw[str(mod)][str(asic_list[n])][str(run)]
    except:
        print(f"Module {mod}, ASIC {asic_list[n]} has no data for run {run}.")
    #Loop through the subruns
    for m, sub in enumerate(subruns):
        clean_count += sub['clean_clean']
        total_count += sub['total']

#Print results
print(f"Run {run} has {100 * clean_count / total_count}% of events that are very likely clean")    

Module 110, ASIC 2 has no data for run 328555.
Module 110, ASIC 3 has no data for run 328555.
Module 110, ASIC 0 has no data for run 328555.
Module 110, ASIC 1 has no data for run 328555.
Run 328555 has 35.05105367139205% of events that are very likely clean


In [11]:
print(f"Module 110 is the empty module and will always contain {data_raw['110']}.")

Module 110 is the empty module and will always contain None.
