# COLTAG Data
This loads seven datasets:
| Dataset | Data points | Description |
| - | - | - |
| Clicks | 28660 | Mouse events, mouse button pressed and released. Includes pixel coordinates. |
| Events | 4078 | Game events: Survey points, drag-and-drop interaction. |
| Interventions | 100 | Experimenter notes on interventions like hints and help given. |
| MiniPXI | 32* | Results from MiniPXI including birth year and gender. |
| Sessions | 40 | Session meta data, start time, software version and last commit for prototype. |
| Surveys | 747 | Submitted survey points, featuring one or two emotions ordered alphabetically. |

\* Eight MiniPXI data points were lost due to a bug in the telemetry implementation.



In [22]:
import pandas as pd

def load_data(name):
    data = pd.read_csv(f"data/{name}.csv")
    return data

clicks =        load_data("clicks")
events =        load_data("events")
interventions = load_data("interventions")
minipxi =       load_data("minipxi")
sessions =      load_data("sessions")
surveys =       load_data("surveys")

print("Clicks:\t\t", len(clicks))
print("Events:\t\t", len(events))
print("Interventions:\t", len(interventions))
print("Minipxi:\t", len(minipxi))
print("Sessions:\t", len(sessions))
print("Surveys:\t", len(surveys))

Clicks:		 28660
Events:		 4078
Interventions:	 100
Minipxi:	 32
Sessions:	 40
Surveys:	 747


In [None]:
# Output
single_emotions = surveys[surveys["Emotion2"].isnull()]
print("One emotion:\t", single_emotions["SessionID"].count(), "(", single_emotions["SessionID"].count()/len(surveys), ")")
print("Two emotions:\t", len(surveys)- single_emotions["SessionID"].count(), "(", (len(surveys)- single_emotions["SessionID"].count())/len(surveys), ")")

print("Sessions:\t", len(surveys.groupby("SessionID")))
survey_per_participant = surveys.groupby("SessionID").count()["Emotion1"]
print("Avg surveys:\t", survey_per_participant.mean(), "(std:", survey_per_participant.std(), ")")

single_emotions_pp = surveys[surveys["Emotion2"].isnull()].groupby("SessionID").count()["Emotion1"]
print("Avg single:\t", single_emotions_pp.mean(), "(std:", single_emotions_pp.std(), ")")
duo_emotions_pp = surveys[~surveys["Emotion2"].isnull()].groupby("SessionID").count()["Emotion1"]
print("Avg Duo:\t", duo_emotions_pp.mean(), "(std:", duo_emotions_pp.std(), ")")
print("MiniPXIs:\t", len(minipxi))

Surveys:	 747
One emotion:	 440 ( 0.5890227576974565 )
Two emotions:	 307 ( 0.4109772423025435 )
Sessions:	 40
Avg surveys:	 18.675 (std: 6.580419673312872 )
Avg single:	 11.891891891891891 (std: 5.316555817986469 )
Avg Duo:	 8.297297297297296 (std: 6.695209169684381 )
MiniPXIs:	 32


In [17]:
# Participant information
print(minipxi["Gender"].fillna('Unknown').value_counts())
ages = minipxi['BirthYear'].apply(lambda x: 2024 - x)
print ("Average ages:", ages.mean(), "(std:", ages.std(), ")")
print ("Median ages:", ages.median())
print ("Age range:", ages.min(), "-", ages.max())

Gender
Male       23
Female      6
Unknown     3
Name: count, dtype: int64
Average ages: 27.9375 (std: 5.945654417439085 )
Median ages: 26.5
Age range: 20 - 43


In [19]:
# Matrix
s = surveys.fillna('')
cross = pd.crosstab(s["Emotion1"], s["Emotion2"])
cross

Emotion2,Unnamed: 1_level_0,confusion,engagement,frustration,neutral
Emotion1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
boredom,10,14,1,4,1
confusion,181,0,97,63,45
engagement,128,0,0,28,47
frustration,42,0,0,0,7
neutral,79,0,0,0,0
