## Pandas analysis

In the following a series of exercises is proposed on a dataset containg timing information from a series of Time-to-Digital-Converters (TDC) implemented in a couple of FPGA's. Each measurement (i.e. each raw) consists of the address of the TDC providing the signal, 'FPGA' and 'TDC_Channel, and the timing information itself, 'ORBIT_CNT', 'BX_COUNTER' and 'TDC_MEAS'. Each TDC count correspond 25/30 ns, whereas the BX_COUNTER feauters gets updated every 25 ns and the ORBIT_CNT every 'x' BX_COUNTER. You can see these way of storing the time as similar to hours, minutes and seconds.

1\. Create a Pandas DataFrame by read N raws of the 'data_000637.txt' dataset. Choose N to be smaller than or equal to the maximum number of raws and larger that 10k.

In [1]:
import pandas as pd
import numpy as np

N = 100000

file_name = "~/data/data_000637.txt"
data = pd.read_csv(file_name, nrows=N) 

data

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS
0,1,0,123,3869200167,2374,26
1,1,0,124,3869200167,2374,27
2,1,0,63,3869200167,2553,28
3,1,0,64,3869200167,2558,19
4,1,0,64,3869200167,2760,25
...,...,...,...,...,...,...
99995,1,0,64,3869201161,2378,29
99996,1,0,70,3869201161,2472,26
99997,1,0,58,3869201161,2558,0
99998,1,0,57,3869201161,2561,23


2\. Find out the value of 'x'

In [2]:
x = np.max(data['BX_COUNTER'])
print(x)

3563


3\. Find out how much the data taking lasted. You can either make an estimate on the baseis of the fraction of the measurements (raws) you read, or perform this check precisely by reading out the whole dataset

In [3]:
data = pd.read_csv(file_name) 

data['time_ns'] = data['TDC_MEAS']*25/30 + data['BX_COUNTER']*25 + data['ORBIT_CNT']*x*25

Delta_t = np.max(data['time_ns']) - np.min(data['time_ns'])

print("Delta t =", Delta_t*1e-9, "s")

Delta t = 0.9801411533125001 s


4\. Create a new column with the actual time in ns (as a combination of the other three columns with timing information)

In [4]:
data

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,time_ns
0,1,0,123,3869200167,2374,26,3.446490e+14
1,1,0,124,3869200167,2374,27,3.446490e+14
2,1,0,63,3869200167,2553,28,3.446490e+14
3,1,0,64,3869200167,2558,19,3.446490e+14
4,1,0,64,3869200167,2760,25,3.446490e+14
...,...,...,...,...,...,...,...
1310715,1,0,62,3869211171,762,14,3.446500e+14
1310716,1,1,4,3869211171,763,11,3.446500e+14
1310717,1,0,64,3869211171,764,0,3.446500e+14
1310718,1,0,139,3869211171,769,0,3.446500e+14


5\. Replace the values (all 1) of the HEAD column randomly with 0 or 1

In [5]:
data['HEAD'] = np.random.randint(2, size= len(data))
data

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,time_ns
0,1,0,123,3869200167,2374,26,3.446490e+14
1,1,0,124,3869200167,2374,27,3.446490e+14
2,0,0,63,3869200167,2553,28,3.446490e+14
3,0,0,64,3869200167,2558,19,3.446490e+14
4,1,0,64,3869200167,2760,25,3.446490e+14
...,...,...,...,...,...,...,...
1310715,0,0,62,3869211171,762,14,3.446500e+14
1310716,1,1,4,3869211171,763,11,3.446500e+14
1310717,1,0,64,3869211171,764,0,3.446500e+14
1310718,0,0,139,3869211171,769,0,3.446500e+14


6\. Create a new DataFrame with only the raws with HEAD=1

In [6]:
new_data = data[data['HEAD'] == 1]
new_data

Unnamed: 0,HEAD,FPGA,TDC_CHANNEL,ORBIT_CNT,BX_COUNTER,TDC_MEAS,time_ns
0,1,0,123,3869200167,2374,26,3.446490e+14
1,1,0,124,3869200167,2374,27,3.446490e+14
4,1,0,64,3869200167,2760,25,3.446490e+14
9,1,0,60,3869200167,2788,7,3.446490e+14
13,1,0,36,3869200167,2791,23,3.446490e+14
...,...,...,...,...,...,...,...
1310711,1,1,39,3869211171,430,0,3.446500e+14
1310713,1,0,64,3869211171,758,18,3.446500e+14
1310714,1,0,60,3869211171,762,2,3.446500e+14
1310716,1,1,4,3869211171,763,11,3.446500e+14


7\. Make two occupancy plots (one per FPGA), i.e. plot the number of counts per TDC channel

In [7]:
import matplotlib.pyplot as plt

FPGA_0 = new_data[new_data['FPGA'] == 0]
FPGA_1 = new_data[new_data['FPGA'] == 1]

fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2,
                               figsize=(8, 4))

TDC_CHANNEL_0 = FPGA_0['TDC_CHANNEL']
TDC_CHANNEL_1 = FPGA_1['TDC_CHANNEL']

ax1.hist( TDC_CHANNEL_0, bins=np.arange(TDC_CHANNEL_0.min(), TDC_CHANNEL_0.max()),label='0', alpha=0.5)
ax2.hist( TDC_CHANNEL_1, bins=np.arange(TDC_CHANNEL_1.min(), TDC_CHANNEL_1.max()),label='1', alpha=0.5)

ax1.legend()
ax1.set_title('FPGA 0')
ax2.legend()
ax2.set_title('FPGA 1')

Text(0.5, 1.0, 'FPGA 1')

8\. Use the groupby method to find out the noisy channels, i.e. the TDC channels with most counts (say the top 3)

In [8]:
group_0 = FPGA_0.groupby('TDC_CHANNEL').sum()
group_1 = FPGA_1.groupby('TDC_CHANNEL').sum()

most_0 = group_0['HEAD'].nlargest(3)
most_1 = group_1['HEAD'].nlargest(3)

print(most_0)
print('\n')
print(most_1)

TDC_CHANNEL
139    37631
64     32423
63     31858
Name: HEAD, dtype: int64


TDC_CHANNEL
2      16373
139    16142
1      14067
Name: HEAD, dtype: int64


9\. Count the number of unique orbits. Count the number of unique orbits with at least one measurement from TDC_CHANNEL=139

In [9]:
u_FPGA_0 = pd.unique(FPGA_0['ORBIT_CNT'])
u_FPGA_1 = pd.unique(FPGA_1['ORBIT_CNT'])

print('FPGA 0: ',u_FPGA_0.shape[0])
print('FPGA 1: ',u_FPGA_1.shape[0])

FPGA 0:  10996
FPGA 1:  10969
