### Note: I received four participant CSV files and determined that the second and third file were the same participant, so I removed participant_3.csv from the analysis

### Import necessary packages

In [153]:
import pandas as pd
import numpy as np

## Read each participant's CSV file to their own dataframe

In [154]:
p1 = pd.read_csv('participant_1.csv')
p2 = pd.read_csv('participant_2.csv')
p3 = pd.read_csv('participant_4.csv')

## Merge the participant dataframes into a single dataframe

In [186]:
frames = [p1, p2, p3]

In [192]:
participants = pd.concat(frames, keys=['Participant 1', 'Participant 2', 'Participant 3'])

### Check shape of dataframe to see the number of datapoints before cleaning

In [193]:
participants.shape

(94373, 13)

## Remove null values and missing data

In [194]:
ptemp = participants.replace(0,np.nan)

In [195]:
p_clean = ptemp.dropna(subset=['r_size','r_conf','r_x_pos','r_y_pos','l_size','l_conf','l_x_pos','l_y_pos'])

### Fix timestamp at 0 seconds

In [197]:
p_clean.loc['Participant 1'].at[0,'timestamp']=0

In [198]:
p_clean.loc['Participant 2'].at[0,'timestamp']=0

In [199]:
p_clean.loc['Participant 3'].at[0,'timestamp']=0

### Check shape of files post-cleaning to make sure there are still enough datapoints

In [200]:
p_clean.shape

(88507, 13)

Only ~6000 datapoints (~6%) were lost from cleaning; probably worth seeing how many more datapoints are lost due to a threshhold for left/right eye confidence

## Insert a confidence threshhold

In [235]:
p_temp = p_clean[p_clean['r_conf'] > 0.2]

In [236]:
p_corr = p_temp[p_temp['l_conf'] > 0.2]

In [238]:
p_corr.shape

(84166, 13)

Requiring each eye to have a minimum confidence value of 0.2 removes a mere 4000 datapoints from the original 95000, yet likely confers a benefit from removing outliers.

I also noticed some of the pupil sizes were odd (a few were the value of Pi for example) so I will create another threshhold.

## Insert a pupil size threshhold

In [263]:
p_temp_2 = p_corr[p_corr['r_size'] > 5]

In [264]:
p_final = p_temp_2[p_temp_2['l_size'] > 5]

In [265]:
p_temp_2.shape

(84166, 13)

In [266]:
p_final.shape

(84166, 13)

It seems the previous threshhold for confidence was enough to take care of pupil size outliers.