# Predict and verify task completion times study
### Author: Claudia, Sarah
### Reviewer: Sarah

## Test Design
We conduct a study in which we measure the task completion time of four calculating tasks and compare them with our own KLM operator values and the ones from Card, Moran and Newell (1980). For this purpose, we created a calculator and used our result from our previous experiment which determined our KLM values. The study was performed at home with a AspireVX 15 notebook, an external keyboard (with numpad and Qwertz design) and a mouse to reduce distractions. 

The study has a within-subject design where each participant solved all the four tasks once to minimize learning effects. 

In order to mitigate confounding and random variables the tasks of the study were presented in a counter balanced order (see Balanced Latin Squares at https://cs.uwaterloo.ca/~dmasson/tools/latin_square/). 

### Tasks
- adding the numbers from 1 to 20 using only the mouse
- adding the numbers from 1 to 20 using only the keyboard
- calculating the result of (3² + 4²) * 15.2 using only the mouse
- calculating the result of (3² + 4²) * 15.2 using only the keyboard

### Procedure
First, the participant was asked about his/her age, occupation and his/her gender was noted. The procedure of the study was explained to him/her and any questions were clarified. 
Afterwards participants solved the tasks. 
For each task the calculator script was executed again to adjust the config file to make the analysis easier. The participant id, the task name, the input type, input value and the timestamp is logged.

### Participants
Due to the current pandemic situation and limited time the selection of participants is limited.
The study was conducted with four participants (3 female, 1 male). Among them was one media informatics student, one mathematics student, one international and cultural business study student and one pensioner with the ages of 18, 20, 22 and 68.
Each participated once.

### Variables
The dependent variable is the task completion time which is calculated from the logged timestamps. 
The independent variables are the tasks and the input type whether a mouse or keyboard was used.
As control variables we document the keyboard, mouse and the global position of the window in the centre of the screen. 

# Analysis

### Import all relevant libraries

In [1]:
import pandas as pd
import pingouin as pg
import seaborn as sns
from matplotlib import pyplot as plt

### All csv column names and csv items that are used more than once as variables

In [2]:
# column names
PARTICIPANT_ID = "participant_id"
TASK = "task"
INPUT_TYPE = "input_type"
TIMESTAMP = "timestamp"

TASK_COMPLETION_TIME = "task_completion_time_in_s"  # TODO which unit?

### Read csv files

In [3]:
raw_data_1_mouse_add = pd.read_csv("./1_mouse_add.csv")
raw_data_1_key_add = pd.read_csv("./1_key_add.csv")
raw_data_1_mouse_complex = pd.read_csv("./1_mouse_complex.csv")
raw_data_1_key_complex = pd.read_csv("./1_key_complex.csv")

raw_data_2_mouse_add = pd.read_csv("./2_mouse_add.csv")
raw_data_2_key_add = pd.read_csv("./2_key_add.csv")
raw_data_2_mouse_complex = pd.read_csv("./2_mouse_complex.csv")
raw_data_2_key_complex = pd.read_csv("./2_key_complex.csv")

raw_data_3_mouse_add = pd.read_csv("./3_mouse_add.csv")
raw_data_3_key_add = pd.read_csv("./3_key_add.csv")
raw_data_3_mouse_complex = pd.read_csv("./3_mouse_complex.csv")
raw_data_3_key_complex = pd.read_csv("./3_key_complex.csv")

raw_data_4_mouse_add = pd.read_csv("./4_mouse_add.csv")
raw_data_4_key_add = pd.read_csv("./4_key_add.csv")
raw_data_4_mouse_complex = pd.read_csv("./4_mouse_complex.csv")
raw_data_4_key_complex = pd.read_csv("./4_key_complex.csv")

FileNotFoundError: [Errno 2] No such file or directory: './1_mouse_add.csv'

### Combine the individual tables of the participants and save them

In [None]:
input_mouse = pd.concat([
    raw_data_1_mouse_add,
    raw_data_1_mouse_complex,
    raw_data_2_mouse_add,
    raw_data_2_mouse_complex,
    raw_data_3_mouse_add,
    raw_data_3_mouse_complex,
    raw_data_4_mouse_add,
    raw_data_4_mouse_complex
])

input_keyboard = pd.concat([
    raw_data_1_key_add,
    raw_data_1_key_complex,
    raw_data_2_key_add,
    raw_data_2_key_complex,
    raw_data_3_key_add,
    raw_data_3_key_complex,
    raw_data_4_key_add,
    raw_data_4_key_complex
])

input_mouse.to_csv("input_mouse.csv", index=False)
input_keyboard.to_csv("input_keyboard.csv", index=False)

In [None]:
# whole data set
task_completion = pd.concat([
    input_mouse,
    input_keyboard
])

task_completion

### Statistics for relevant tables

In [4]:
task_completion.describe()

NameError: name 'task_completion' is not defined

In [5]:
input_mouse.describe()

NameError: name 'input_mouse' is not defined

In [6]:
input_keyboard.describe()

NameError: name 'input_keyboard' is not defined

### Boxplots

In [7]:
# TODO comparision
box_plot_input_type = sns.boxplot(
    data=task_completion,
    x=INPUT_TYPE,
    y=TASK_COMPLETION_TIME
)

box_plot_input_type.set(xlabel=INPUT_TYPE, ylabel=TASK_COMPLETION_TIME)
plt.show()

NameError: name 'task_completion' is not defined

### T-test

In [8]:
# pg.homoscedasticity(data=task_completion, dv=TASK_COMPLETION_TIME, group=INPUT_TYPE, method="bartlett")
# pg.normality(data=task_completion, dv=TASK_COMPLETION_TIME, group=INPUT_TYPE)
# pg.welch_anova(data=task_completion, dv=TASK_COMPLETION_TIME, between=INPUT_TYPE)

## Results

## Discussion