# Predict and verify task completion times study
### Author: Claudia, Sarah
### Reviewer: Sarah

## Test Design
We conduct a study in which we measure the task completion time of four calculating tasks and compare them with our own KLM operator values and the ones from Card, Moran and Newell (1980). For this purpose, we created a calculator and used our result from our previous experiment which determined our KLM values. The study was performed at home with a AspireVX 15 notebook, an external keyboard (with numpad and Qwertz design) and a mouse to reduce distractions. 

The study has a within-subject design where each participant solved all the four tasks once to minimize learning effects. 

In order to mitigate confounding and random variables the tasks of the study were presented in a counter balanced order (see Balanced Latin Squares at https://cs.uwaterloo.ca/~dmasson/tools/latin_square/). 

### Tasks
- adding the numbers from 1 to 20 using only the mouse
- adding the numbers from 1 to 20 using only the keyboard
- calculating the result of (3² + 4²) * 15.2 using only the mouse
- calculating the result of (3² + 4²) * 15.2 using only the keyboard

### Procedure
First, the participant was asked about his/her age, occupation and his/her gender was noted. The procedure of the study was explained to him/her and any questions were clarified. 
Afterwards participants solved the tasks. 
For each task the calculator script was executed again to adjust the config file to make the analysis easier. The participant id, the task name, the input type, input value and the timestamp is logged.

### Participants
Due to the current pandemic situation and limited time the selection of participants is limited.
The study was conducted with four participants (3 female, 1 male). Among them was one media informatics student, one mathematics student, one international and cultural business study student and one pensioner with the ages of 18, 20, 22 and 68.
Each participated once.

### Variables
The dependent variable is the task completion time which is calculated from the logged timestamps. 
The independent variables are the tasks and the input type whether a mouse or keyboard was used.
As control variables we document the keyboard, mouse and the global position of the window in the centre of the screen. 

# Analysis

### Import all relevant libraries

In [9]:
import pandas as pd
import pingouin as pg
import seaborn as sns
from matplotlib import pyplot as plt

### All csv column names and csv items that are used more than once as variables

In [10]:
# column names
PARTICIPANT_ID = "participant_id"
TASK = "task"
INPUT_TYPE = "input_type"
TIMESTAMP = "timestamp"

TASK_COMPLETION_TIME = "task_completion_time_in_s"  # TODO which unit?

### Read csv files

In [17]:
raw_data_1_mouse_add = pd.read_csv("study_results/1_mouse_add.csv")
raw_data_1_key_add = pd.read_csv("study_results/1_key_add.csv")
raw_data_1_mouse_complex = pd.read_csv("study_results/1_mouse_complex.csv")
raw_data_1_key_complex = pd.read_csv("study_results/1_key_complex.csv")

raw_data_2_mouse_add = pd.read_csv("study_results/2_mouse_add.csv")
raw_data_2_key_add = pd.read_csv("study_results/2_key_add.csv")
raw_data_2_mouse_complex = pd.read_csv("study_results/2_mouse_complex.csv")
raw_data_2_key_complex = pd.read_csv("study_results/2_key_complex.csv")

raw_data_3_mouse_add = pd.read_csv("study_results/3_mouse_add.csv")
raw_data_3_key_add = pd.read_csv("study_results/3_key_add.csv")
raw_data_3_mouse_complex = pd.read_csv("study_results/3_mouse_complex.csv")
raw_data_3_key_complex = pd.read_csv("study_results/3_key_complex.csv")

raw_data_4_mouse_add = pd.read_csv("study_results/4_mouse_add.csv")
raw_data_4_key_add = pd.read_csv("study_results/4_key_add.csv")
raw_data_4_mouse_complex = pd.read_csv("study_results/4_mouse_complex.csv")
raw_data_4_key_complex = pd.read_csv("study_results/4_key_complex.csv")

raw_data_5_mouse_add = pd.read_csv("study_results/5_mouse_add.csv")
raw_data_5_key_add = pd.read_csv("study_results/5_key_add.csv")
raw_data_5_mouse_complex = pd.read_csv("study_results/5_mouse_complex.csv")
raw_data_5_key_complex = pd.read_csv("study_results/5_key_complex.csv")

raw_data_6_mouse_add = pd.read_csv("study_results/6_mouse_add.csv")
raw_data_6_key_add = pd.read_csv("study_results/6_key_add.csv")
raw_data_6_mouse_complex = pd.read_csv("study_results/6_mouse_complex.csv")
raw_data_6_key_complex = pd.read_csv("study_results/6_key_complex.csv")

raw_data_7_mouse_add = pd.read_csv("study_results/7_mouse_add.csv")
raw_data_7_key_add = pd.read_csv("study_results/7_key_add.csv")
raw_data_7_mouse_complex = pd.read_csv("study_results/7_mouse_complex.csv")
raw_data_7_key_complex = pd.read_csv("study_results/7_key_complex.csv")

raw_data_8_mouse_add = pd.read_csv("study_results/8_mouse_add.csv")
raw_data_8_key_add = pd.read_csv("study_results/8_key_add.csv")
raw_data_8_mouse_complex = pd.read_csv("study_results/8_mouse_complex.csv")
raw_data_8_key_complex = pd.read_csv("study_results/8_key_complex.csv")

raw_data_9_mouse_add = pd.read_csv("study_results/9_mouse_add.csv")
raw_data_9_key_add = pd.read_csv("study_results/9_key_add.csv")
raw_data_9_mouse_complex = pd.read_csv("study_results/9_mouse_complex.csv")
raw_data_9_key_complex = pd.read_csv("study_results/9_key_complex.csv")

raw_data_10_mouse_add = pd.read_csv("study_results/10_mouse_add.csv")
raw_data_10_key_add = pd.read_csv("study_results/10_key_add.csv")
raw_data_10_mouse_complex = pd.read_csv("study_results/10_mouse_complex.csv")
raw_data_10_key_complex = pd.read_csv("study_results/10_key_complex.csv")

raw_data_11_mouse_add = pd.read_csv("study_results/11_mouse_add.csv")
raw_data_11_key_add = pd.read_csv("study_results/11_key_add.csv")
raw_data_11_mouse_complex = pd.read_csv("study_results/11_mouse_complex.csv")
raw_data_11_key_complex = pd.read_csv("study_results/11_key_complex.csv")

raw_data_12_mouse_add = pd.read_csv("study_results/12_mouse_add.csv")
raw_data_12_key_add = pd.read_csv("study_results/12_key_add.csv")
raw_data_12_mouse_complex = pd.read_csv("study_results/12_mouse_complex.csv")
raw_data_12_key_complex = pd.read_csv("study_results/12_key_complex.csv")

raw_data_13_mouse_add = pd.read_csv("study_results/13_mouse_add.csv")
raw_data_13_key_add = pd.read_csv("study_results/13_key_add.csv")
raw_data_13_mouse_complex = pd.read_csv("study_results/13_mouse_complex.csv")
raw_data_13_key_complex = pd.read_csv("study_results/13_key_complex.csv")

raw_data_14_mouse_add = pd.read_csv("study_results/14_mouse_add.csv")
raw_data_14_key_add = pd.read_csv("study_results/14_key_add.csv")
raw_data_14_mouse_complex = pd.read_csv("study_results/14_mouse_complex.csv")
raw_data_14_key_complex = pd.read_csv("study_results/14_key_complex.csv")

raw_data_15_mouse_add = pd.read_csv("study_results/15_mouse_add.csv")
raw_data_15_key_add = pd.read_csv("study_results/15_key_add.csv")
raw_data_15_mouse_complex = pd.read_csv("study_results/15_mouse_complex.csv")
raw_data_15_key_complex = pd.read_csv("study_results/15_key_complex.csv")

raw_data_16_mouse_add = pd.read_csv("study_results/16_mouse_add.csv")
raw_data_16_key_add = pd.read_csv("study_results/16_key_add.csv")
raw_data_16_mouse_complex = pd.read_csv("study_results/16_mouse_complex.csv")
raw_data_16_key_complex = pd.read_csv("study_results/16_key_complex.csv")

### Combine the individual tables of the participants and save them

In [19]:
input_mouse = pd.concat([
    raw_data_1_mouse_add,
    raw_data_1_mouse_complex,
    raw_data_2_mouse_add,
    raw_data_2_mouse_complex,
    raw_data_3_mouse_add,
    raw_data_3_mouse_complex,
    raw_data_4_mouse_add,
    raw_data_4_mouse_complex,
    raw_data_5_mouse_add,
    raw_data_5_mouse_complex,
    raw_data_6_mouse_add,
    raw_data_6_mouse_complex,
    raw_data_7_mouse_add,
    raw_data_7_mouse_complex,
    raw_data_8_mouse_add,
    raw_data_8_mouse_complex,
    raw_data_9_mouse_add,
    raw_data_9_mouse_complex,
    raw_data_10_mouse_add,
    raw_data_10_mouse_complex,
    raw_data_11_mouse_add,
    raw_data_11_mouse_complex,
    raw_data_12_mouse_add,
    raw_data_12_mouse_complex,
    raw_data_13_mouse_add,
    raw_data_13_mouse_complex,
    raw_data_14_mouse_add,
    raw_data_14_mouse_complex,
    raw_data_15_mouse_add,
    raw_data_15_mouse_complex,
    raw_data_16_mouse_add,
    raw_data_16_mouse_complex
])

input_keyboard = pd.concat([
    raw_data_1_key_add,
    raw_data_1_key_complex,
    raw_data_2_key_add,
    raw_data_2_key_complex,
    raw_data_3_key_add,
    raw_data_3_key_complex,
    raw_data_4_key_add,
    raw_data_4_key_complex,
    raw_data_5_key_add,
    raw_data_5_key_complex,
    raw_data_6_key_add,
    raw_data_6_key_complex,
    raw_data_7_key_add,
    raw_data_7_key_complex,
    raw_data_8_key_add,
    raw_data_8_key_complex,
    raw_data_9_key_add,
    raw_data_9_key_complex,
    raw_data_10_key_add,
    raw_data_10_key_complex,
    raw_data_11_key_add,
    raw_data_11_key_complex,
    raw_data_12_key_add,
    raw_data_12_key_complex,
    raw_data_13_key_add,
    raw_data_13_key_complex,
    raw_data_14_key_add,
    raw_data_14_key_complex,
    raw_data_15_key_add,
    raw_data_15_key_complex,
    raw_data_16_key_add,
    raw_data_16_key_complex
])

input_mouse.to_csv("input_mouse.csv", index=False)
input_keyboard.to_csv("input_keyboard.csv", index=False)

In [20]:
# whole data set
task_completion = pd.concat([
    input_mouse,
    input_keyboard
])

task_completion

Unnamed: 0,participant_id,task,input_type,input_value,timestamp
0,1,A,button_clicked,1,2021-05-23 20:01:01.600065
1,1,A,button_clicked,+,2021-05-23 20:01:02.432247
2,1,A,button_clicked,2,2021-05-23 20:01:02.983886
3,1,A,button_clicked,+,2021-05-23 20:01:03.745677
4,1,A,button_clicked,3,2021-05-23 20:01:04.215505
...,...,...,...,...,...
10,16,D,key_pressed,1,2021-05-23 21:43:21.128266
11,16,D,key_pressed,5,2021-05-23 21:43:21.472303
12,16,D,key_pressed,.,2021-05-23 21:43:21.853728
13,16,D,key_pressed,2,2021-05-23 21:43:22.208371


### Statistics for relevant tables

In [21]:
task_completion.describe()

Unnamed: 0,participant_id
count,2151.0
mean,8.426778
std,4.640881
min,1.0
25%,4.0
50%,8.0
75%,12.0
max,16.0


In [22]:
input_mouse.describe()

Unnamed: 0,participant_id
count,1075.0
mean,8.426047
std,4.630768
min,1.0
25%,4.0
50%,8.0
75%,12.0
max,16.0


In [23]:
input_keyboard.describe()

Unnamed: 0,participant_id
count,1076.0
mean,8.427509
std,4.653116
min,1.0
25%,4.0
50%,8.0
75%,12.0
max,16.0


## Calculate task completion time

In [25]:
def calc_time_diff(data):
    start_time = QDateTime.fromString(data["timestamp"][0], Qt.ISODateWithMs)
    start_time_in_mm = QDateTime.toMSecsSinceEpoch(start_time)
    
    number = len(data) - 1
    end_time = QDateTime.fromString(data["timestamp"][number], Qt.ISODateWithMs)
    end_time_in_mm = QDateTime.toMSecsSinceEpoch(end_time)
    
    return end_time_in_mm - start_time_in_mm

### Boxplots

In [24]:
# TODO comparision
box_plot_input_type = sns.boxplot(
    data=task_completion,
    x=INPUT_TYPE,
    y=TASK_COMPLETION_TIME
)

box_plot_input_type.set(xlabel=INPUT_TYPE, ylabel=TASK_COMPLETION_TIME)
plt.show()

ValueError: Could not interpret input 'task_completion_time_in_s'

### T-test

In [8]:
# pg.homoscedasticity(data=task_completion, dv=TASK_COMPLETION_TIME, group=INPUT_TYPE, method="bartlett")
# pg.normality(data=task_completion, dv=TASK_COMPLETION_TIME, group=INPUT_TYPE)
# pg.welch_anova(data=task_completion, dv=TASK_COMPLETION_TIME, between=INPUT_TYPE)

## Results

## Discussion