## Pandas analysis

In the following a series of exercises is proposed on a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a couple of FPGA's. Each measurement (i.e. each raw) consists of the address of the TDC providing the signal, 'FPGA' and 'TDC_Channel, and the timing information itself, 'ORBIT_CNT', 'BX_COUNTER' and 'TDC_MEAS'. Each TDC count correspond 25/30 ns, whereas the BX_COUNTER feauters gets updated every 25 ns and the ORBIT_CNT every 'x' BX_COUNTER. You can see these way of storing the time as similar to hours, minutes and seconds.

1\. Create a Pandas DataFrame by read N raws of the 'data_000637.txt' dataset. Choose N to be smaller than or equal to the maximum number of raws and larger that 10k.

2\. Find out the value of 'x'

3\. Find out how much the data taking lasted. You can either make an estimate on the baseis of the fraction of the measurements (raws) you read, or perform this check precisely by reading out the whole dataset

4\. Create a new column with the actual time in ns (as a combination of the other three columns with timing information)

5\. Replace the values (all 1) of the HEAD column randomly with 0 or 1

6\. Create a new DataFrame with only the raws with HEAD=1

7\. Make two occupancy plots (one per FPGA), i.e. plot the number of counts per TDC channel

8\. Use the groupby method to find out the noisy channels, i.e. the TDC channels with most counts (say the top 3)

9\. Count the number of unique orbits. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

1. Create a Pandas DataFrame by read N raws of the 'data_000637.txt' dataset. Choose N to be smaller than or equal to the maximum number of raws and larger that 10k.

In [None]:
df = pd.read_csv("./data/data_000637.txt")
df

2\. Find out the value of 'x'

In [None]:
numb_bx = df['BX_COUNTER'].max()+1

3\. Find out how much the data taking lasted. You can either make an estimate on the baseis of the fraction of the measurements (raws) you read, or perform this check precisely by reading out the whole dataset

In [None]:
df_0 = df[df['FPGA']==0]
df_1 = df[df['FPGA']==1]

`o` is the orbit_cnt time in ns(update every `numb_bx` bx_counter time 25 ns

`b` is the bx_counter effective time in ns (updated every 25 ns)

`t` is the tdc_meas effective time

for every time I take the difference between the minimum and the maximum value so I take account for offset since I don't know time 0.
In my implementation time 0 is the minimum time over the dataset

In [None]:
#TIME FOR FPGA0
o = (df_0['ORBIT_CNT'].max()-df_0['ORBIT_CNT'].min())*numb_bx*25 
b = (df_0[df_0['ORBIT_CNT'] == df_0['ORBIT_CNT'].max()]['BX_COUNTER'].max()-df_0[df_0['ORBIT_CNT'] == df_0['ORBIT_CNT'].min()]['BX_COUNTER'].min())*25

temp = df_0[df_0['ORBIT_CNT'] == df_0['ORBIT_CNT'].max()]
upper = temp[temp['BX_COUNTER']== temp['BX_COUNTER'].max()]['TDC_MEAS'].max()

temp = df_0[df_0['ORBIT_CNT'] == df_0['ORBIT_CNT'].min()]
lower = temp[temp['BX_COUNTER']== temp['BX_COUNTER'].min()]['TDC_MEAS'].min()

t = upper - lower
o+b+t*25/30

In [None]:
#TIME FOR FPGA1
o = (df_1['ORBIT_CNT'].max()-df_1['ORBIT_CNT'].min())*numb_bx*25
b = (df_1[df_1['ORBIT_CNT'] == df_1['ORBIT_CNT'].max()]['BX_COUNTER'].max()-df_1[df_1['ORBIT_CNT'] == df_1['ORBIT_CNT'].min()]['BX_COUNTER'].min())*25 

temp = df_1[df_1['ORBIT_CNT'] == df_1['ORBIT_CNT'].max()]
upper = temp[temp['BX_COUNTER']== temp['BX_COUNTER'].max()]['TDC_MEAS'].max()

temp = df_1[df_1['ORBIT_CNT'] == df_1['ORBIT_CNT'].min()]
lower = temp[temp['BX_COUNTER']== temp['BX_COUNTER'].min()]['TDC_MEAS'].min()
t = upper - lower

o + b + t*25/30

In [None]:
#TIME FOR all dataset
o = (df['ORBIT_CNT'].max()-df['ORBIT_CNT'].min())*numb_bx*25
b = (df[df['ORBIT_CNT'] == df['ORBIT_CNT'].max()]['BX_COUNTER'].max()-df[df['ORBIT_CNT'] == df['ORBIT_CNT'].min()]['BX_COUNTER'].min())*25 

temp = df[df['ORBIT_CNT'] == df['ORBIT_CNT'].max()]
upper = temp[temp['BX_COUNTER']== temp['BX_COUNTER'].max()]['TDC_MEAS'].max()

temp = df[df['ORBIT_CNT'] == df['ORBIT_CNT'].min()]
lower = temp[temp['BX_COUNTER']== temp['BX_COUNTER'].min()]['TDC_MEAS'].min()
t = upper - lower
o+b+t*25/30

4\. Create a new column with the actual time in ns (as a combination of the other three columns with timing information)



In [None]:
df['time'] = (df['ORBIT_CNT']*numb_bx + df['BX_COUNTER'])*25+ df['TDC_MEAS']*25/30 #

In [None]:
df['time'].max()-df['time'].min()

We can see here that the results are comparable. Precisions errors are expected due to the number and the operations. (if we do the 25/30 multiplication before or after the sum)
We notice also a small disalignment between df_0 and df_1

5\. Replace the values (all 1) of the HEAD column randomly with 0 or 1

In [None]:
df['HEAD'] = np.random.randint(0,2, len(df['HEAD']))

6\. Create a new DataFrame with only the raws with HEAD=1



In [None]:
new_df = df[df['HEAD']==1]

7\. Make two occupancy plots (one per FPGA), i.e. plot the number of counts per TDC channel


In [None]:
a = new_df[new_df['FPGA'] == 0]
a.groupby(['TDC_CHANNEL']).sum()['HEAD'].plot()

In [None]:
a = new_df[new_df['FPGA'] == 1]
a.groupby(['TDC_CHANNEL']).sum()['HEAD'].plot()


8\. Use the groupby method to find out the noisy channels, i.e. the TDC channels with most counts (say the top 3)

In [None]:
new_df[['HEAD','TDC_CHANNEL']].groupby(['TDC_CHANNEL']).count().sort_values(['HEAD'], ascending=False).head(5)


9\. Count the number of unique orbits. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139


In [None]:
len(df[df['TDC_CHANNEL']==139]['ORBIT_CNT'].unique())