# Master Thesis - Analyses

The following code if for the purpose of within-subject analysis on the quantitative and qualitative data gathered.

## Data

### Quantitative
- CAT
- Item ratings {personalised, random, control}

### Qualitative
- Interview themes

## Analyses

- CAT
    - Inter-rater correlation
    - paired t-test (personalised vs random, constraints vs control)
- Item ratings
    - Personalised vs Random
        - Paired t-test
    - Constraints vs Control
        - Paired t-test
- Interview
    - Find themes, cross references with data

In [106]:
# imports
import pandas as pd
import krippendorff
import numpy as np
import math as math
from scipy import stats

### Data prep

In [33]:
# read in data
data = pd.read_csv('./Data/Numeric export Lab 2_TOTAL.csv', sep = ';')

In [34]:
# clean data
# remove NaN rows/columns
data = data.iloc[:6,:18]
data.set_index('ID')

Unnamed: 0_level_0,rand_round_rating,rand_const_rating,rand_move,rand_CAT_1,rand_CAT_2,rand_CAT_3,pers_round_rating,pers_const_rating,pers_move,pers_CAT_1,pers_CAT_2,pers_CAT_3,control_round_rating,control_move,control_CAT_1,control_CAT_2,control_CAT_3
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
1.0,8.0,8.0,"Near the end, I got to a knee on the ground an...",6.0,4.0,2.0,6.0,7.0,"At some point, I got bored of just making stat...",3.0,2.0,4.0,6.0,"Twist on a tango enrosque, keeping it flat and...",6.0,5.0,4.0
3.0,5.0,7.0,Dont remember one particular movement,1.0,1.0,1.0,8.0,8.0,Not any specific but interesting images,4.0,2.0,1.0,5.0,Balance game,3.0,3.0,2.0
9.0,7.0,5.0,Rotations of the arms were symbols of both sid...,6.0,5.0,4.0,7.0,8.0,I tried to make lines/shapes and be fluid with...,6.0,4.0,3.0,6.0,My head was trying to do opposite from my arms...,6.0,6.0,4.0
10.0,6.0,6.0,Steps that resonated in movements in the upper...,2.0,4.0,2.0,4.0,3.0,It was turn that ended up on the floor to than...,4.0,3.0,3.0,5.0,A sequence in the upper body,2.0,3.0,2.0
32.0,8.0,3.0,Small rotations of the pelvic joints on a stat...,3.0,5.0,5.0,7.0,7.0,Initiating movement in my spine.,3.0,3.0,2.0,6.0,Small rotations of the whole shoulder blades.,3.0,3.0,2.0
36.0,8.0,7.0,"The first movement, working with the rotation ...",4.0,6.0,3.0,10.0,9.0,I love the whole bit of material. I find in th...,4.0,6.0,5.0,7.0,a slightly awkward walk backwards with one foo...,5.0,4.0,3.0


In [41]:
# split for convenience
CAT = data.loc[:,['rand_CAT_1', 'rand_CAT_2', 'rand_CAT_3',
                   'pers_CAT_1', 'pers_CAT_2', 'pers_CAT_3',
                   'control_CAT_1', 'control_CAT_2', 'control_CAT_3']]
ratings = data.loc[:,['rand_round_rating', 'pers_round_rating', 'control_round_rating']]

## CAT

### Inter rater reliability

In [78]:
# Inter-rater reliability
rater_1 = CAT.loc[:,['rand_CAT_1', 'pers_CAT_1', 'control_CAT_1']]
rater_2 = CAT.loc[:,['rand_CAT_2', 'pers_CAT_2', 'control_CAT_2']]
rater_3 = CAT.loc[:,['rand_CAT_3', 'pers_CAT_3', 'control_CAT_3']]

# convert to correct format, rows are raters, columns are ratings
rater_1_list = []
rater_2_list = []
rater_3_list = []
for x in range(0, len(rater_1.axes[0])):
    for y in range(0,len(rater_1.axes[1])):
        rater_1_list.append(rater_1.iloc[x,y])
        rater_2_list.append(rater_2.iloc[x,y])
        rater_3_list.append(rater_3.iloc[x,y])
        
raters_T = pd.DataFrame({
    'Rater_1' : rater_1_list,
    'Rater_2' : rater_2_list,
    'Rater_3' : rater_3_list
}).T

# Correlation [0.45628156565656564] being moderate agreement
krippendorff.alpha(raters_T)

0.45628156565656564

### Paired t-test

- Shapiro-Wilk test to check for assumption of normality
- Paired t-test to check for significance in differences means

In [118]:
CAT['rand_CAT_mean'] = CAT.loc[:,['rand_CAT_1', 'rand_CAT_2', 'rand_CAT_3']].mean(axis=1)
CAT['pers_CAT_mean'] = CAT.loc[:,['pers_CAT_1', 'pers_CAT_2', 'pers_CAT_3']].mean(axis=1)
CAT['constr_CAT_mean'] = CAT.loc[:,['rand_CAT_mean', 'pers_CAT_mean']].mean(axis=1)
CAT['control_CAT_mean'] = CAT.loc[:,['control_CAT_1', 'control_CAT_2', 'control_CAT_3']].mean(axis=1)

In [122]:
# Constraints vs Control

# tests of normality
stats.shapiro(CAT['constr_CAT_mean']) #pass
stats.shapiro(CAT['control_CAT_mean']) #pass

# with degrees of freedom n-1 = 5
stats.ttest_rel(CAT['constr_CAT_mean'], CAT['control_CAT_mean']) # not significant

Ttest_relResult(statistic=-0.40378642654362457, pvalue=0.7030599259725309)

In [125]:
# Personalised vs Random

# tests of normality
stats.shapiro(CAT['rand_CAT_mean']) #pass
stats.shapiro(CAT['pers_CAT_mean']) #pass

# with degrees of freedom n-1 = 5
stats.ttest_rel(CAT['rand_CAT_mean'], CAT['pers_CAT_mean']) # not significant

Ttest_relResult(statistic=0.23312620206007822, pvalue=0.8249065272991711)

## Item ratings

In [115]:
# Constraints vs Control

# tests of normality
stats.shapiro(ratings['constr_mean_rating']) #pass
stats.shapiro(ratings['control_round_rating']) #pass

# with degrees of freedom n-1 = 5
stats.ttest_rel(ratings['constr_mean_rating'], ratings['control_round_rating']) # significant at <0.01

Ttest_relResult(statistic=4.183300132670378, pvalue=0.008627407845019445)

In [109]:
# Personalised vs Random

# tests of normality
stats.shapiro(ratings['rand_round_rating']) #pass
stats.shapiro(ratings['pers_round_rating']) #pass

# with degrees of freedom n-1 = 5
stats.ttest_rel(ratings['rand_round_rating'], ratings['pers_round_rating']) # extremely not significant

Ttest_relResult(statistic=0.0, pvalue=1.0)