# Investigate afterpulse by day

Currently, I expect that each day there is an afterpulse calibration, there should be 3 files that are possible candidates for being the afterpulse calibration file:

1. Afterpulse file with closed aperture (desired file)
2. File where aperture is open and data is collected
3. Resuming of data collection

I would like to investigate the order in which these are taken, so that I can more reliably use them in my work...

## Loading the filenames

In [1]:
import mplgz_to_ingested as mplgz
import xarray as xr
import matplotlib.pyplot as plt

In [2]:
dir_root = '/gws/nopw/j04/ncas_radar_vol2/data/ICECAPSarchive/mpl/raw'
candidateFiles = mplgz.afterpulse.get_all_afterpulse(dir_root)

print(candidateFiles)

['201301090011.mpl.gz', '201508161226.mpl.gz', '201508161955.mpl.gz', '201508162015.mpl.gz', '201601192225.mpl.gz', '201601261328.mpl.gz', '201601261338.mpl.gz', '201601261351.mpl.gz', '201602241305.mpl.gz', '201602241314.mpl.gz', '201602241335.mpl.gz', '201603231417.mpl.gz', '201603231426.mpl.gz', '201603231440.mpl.gz', '201604271549.mpl.gz', '201604271605.mpl.gz', '201604271615.mpl.gz', '201605271732.mpl.gz', '201605271741.mpl.gz', '201605271752.mpl.gz', '201605310003.mpl.gz', '201606011603.mpl.gz', '201606051811.mpl.gz', '201606052039.mpl.gz', '201606052250.mpl.gz', '201606251729.mpl.gz', '201606251739.mpl.gz', '201606251749.mpl.gz', '201607251715.mpl.gz', '201607251726.mpl.gz', '201607251736.mpl.gz', '201608301729.mpl.gz', '201609071535.mpl.gz', '201609071547.mpl.gz', '201609071558.mpl.gz', '201610251342.mpl.gz', '201610251350.mpl.gz', '201611291428.mpl.gz', '201611291436.mpl.gz', '201611291446.mpl.gz', '201612281738.mpl.gz', '201612281749.mpl.gz', '201612281758.mpl.gz', '201701301

In [4]:
# the calibration files start earnestly from 2016 onwards, so I'll remove the first 4 elements from the list
tocheck = candidateFiles[4:]
print(tocheck)

['201601192225.mpl.gz', '201601261328.mpl.gz', '201601261338.mpl.gz', '201601261351.mpl.gz', '201602241305.mpl.gz', '201602241314.mpl.gz', '201602241335.mpl.gz', '201603231417.mpl.gz', '201603231426.mpl.gz', '201603231440.mpl.gz', '201604271549.mpl.gz', '201604271605.mpl.gz', '201604271615.mpl.gz', '201605271732.mpl.gz', '201605271741.mpl.gz', '201605271752.mpl.gz', '201605310003.mpl.gz', '201606011603.mpl.gz', '201606051811.mpl.gz', '201606052039.mpl.gz', '201606052250.mpl.gz', '201606251729.mpl.gz', '201606251739.mpl.gz', '201606251749.mpl.gz', '201607251715.mpl.gz', '201607251726.mpl.gz', '201607251736.mpl.gz', '201608301729.mpl.gz', '201609071535.mpl.gz', '201609071547.mpl.gz', '201609071558.mpl.gz', '201610251342.mpl.gz', '201610251350.mpl.gz', '201611291428.mpl.gz', '201611291436.mpl.gz', '201611291446.mpl.gz', '201612281738.mpl.gz', '201612281749.mpl.gz', '201612281758.mpl.gz', '201701301635.mpl.gz', '201701301644.mpl.gz', '201701301704.mpl.gz', '201702271235.mpl.gz', '201702281

## Number of files per calibration day

This section will focus on determining how many files are generated per day when calibrations are performed. This will allow us to determine if edge-cases like files-on-the-hour or calibrations across 2 days occur.

In [8]:
# I will recursively split the list tocheck until they are associated by day. Then I will determine the number per day and print this off.
tocheck = candidateFiles[4:]

n_per_day = {}

for f in tocheck:
    year = f[:4]
    if year not in n_per_day:
        n_per_day[year] = {}

    month = f[4:6]
    if month not in n_per_day[year]:
        n_per_day[year][month] = {}

    day = f[6:8]
    if day not in n_per_day[year][month]:
        n_per_day[year][month][day] = 0

    n_per_day[year][month][day] += 1

# print results
dist = [0]*7

print(f' {"year":<6} | {"month":<6} | {"day":<6} | num')
for year in n_per_day:
    for month in n_per_day[year]:
        for day in n_per_day[year][month]:
            print(f' {year:<6} | {month:<6} | {day:<6} | {n_per_day[year][month][day]}')

            dist[n_per_day[year][month][day] - 1] +=1
    print('') # blank line to seperate years

print('Distribution:')
print(f' 1  | 2  | 3  | 4  | 5  | 6  | 7')
print(f' {dist[0]:<3}| {dist[1]:<3}| {dist[2]:<3}| {dist[3]:<3}| {dist[4]:<3}| {dist[5]:<3}| {dist[6]:<3}')

 year   | month  | day    | num
 2016   | 01     | 19     | 1
 2016   | 01     | 26     | 3
 2016   | 02     | 24     | 3
 2016   | 03     | 23     | 3
 2016   | 04     | 27     | 3
 2016   | 05     | 27     | 3
 2016   | 05     | 31     | 1
 2016   | 06     | 01     | 1
 2016   | 06     | 05     | 3
 2016   | 06     | 25     | 3
 2016   | 07     | 25     | 3
 2016   | 08     | 30     | 1
 2016   | 09     | 07     | 3
 2016   | 10     | 25     | 2
 2016   | 11     | 29     | 3
 2016   | 12     | 28     | 3

 2017   | 01     | 30     | 3
 2017   | 02     | 27     | 1
 2017   | 02     | 28     | 2
 2017   | 03     | 01     | 1
 2017   | 03     | 28     | 3
 2017   | 04     | 27     | 3
 2017   | 05     | 19     | 1
 2017   | 05     | 20     | 4
 2017   | 06     | 20     | 3
 2017   | 06     | 28     | 1
 2017   | 07     | 20     | 3
 2017   | 08     | 21     | 3
 2017   | 09     | 19     | 3
 2017   | 10     | 19     | 3
 2017   | 11     | 23     | 3
 2017   | 12     | 19     | 3

 2018 