# Task 2: Does the test Stimulus (independent variable) have a significant influence on speech quality ratings (dependent variable)? If yes, for which cases? Please assume that each of the six files was assessed by a different set of participants AND ONLY USE RATINGS OF THE FIRST REPETITION (repetition = 1). Use the quality ratings provided in the data set speech_quality_repetition_dataset.

### Step 1: Import libraries and read in data set

In [1]:
# install.packages('dplyr')      # processing 
# install.packages('gdata')      # file reading
# install.packages('car')        # homogenity of variances
# install.packages('rstatix')    # ANOVA
# install.packages('lsr')        # effect size

In [2]:
library(dplyr)     # processing
library(readxl)    # reading in data
library(car)       # homogenity of variances
library(rstatix)   # ANOVA
library(lsr)       # effect size

"package 'dplyr' was built under R version 3.6.2"
Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

"package 'car' was built under R version 3.6.2"Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

"package 'rstatix' was built under R version 3.6.2"
Attaching package: 'rstatix'

The following object is masked from 'package:stats':

    filter



In [3]:
# read in data set
get_quality_data <- function() {
    quality_data <- read.csv("datasets/DB03_speech_quality_repetition_dataset.csv")
    # use mean here as indicator of given rating per participant
    quality_data <- quality_data %>% 
                        filter(repetition == 1) %>%
                        rename(Subject = subjectCode, Stimulus = testStimulus, Rating = rating) %>%
                        select(Subject, Stimulus, Rating) %>% 
                        arrange(Subject, Stimulus) 
    
    quality_data
}

quality_data <- get_quality_data()
head(quality_data)

Subject,Stimulus,Rating
vp01,haus_m_700_bpf_200_2800_normAsl_-26,2
vp01,haus_m_700_mnru_Q_14_normAsl_-26,3
vp01,haus_m_700_normAsl_-26,4
vp01,maus_m_700_bpf_200_2800_normAsl_-26,2
vp01,maus_m_700_mnru_Q_14_normAsl_-26,3
vp01,maus_m_700_normAsl_-26,5


### Step 2: Decide on which ANOVA test to use

#### => 1 independent input variables (testStimulus), 1 dependent variable (testStimulus), NOT repeated measures but independent ratings => one-way independent measure ANOVA

### Step 3: Check assumptions

#### 1. Dependent variables on interval or ratio scale => check, because rating is discrete & on interval scale
#### 2. Independent variables with two or more groups => check, because testStimulus has multiple different values (6 in total)
#### 3. Indepenence of observation => check, because observations come from different people in different conditions, i.e. stimuli, therefore independence can be assumed
#### 4. No significant outliers => don't know, need to check that in the next step!
#### 5. Normally distributed population for every single group => don't know, need to check that in the next step!
#### 6. Homogenity of variances => don't know, need to check that in the next step!

### Step 3.1: Outlier detection

In [4]:
# z score method
z_scores <- quality_data %>% 
                    mutate(Std_Dev_Rating = sd(Rating), 
                           Mean_Rating = mean(Rating)) %>%
                    mutate(Z_Score_Rating = (Rating - Mean_Rating) / Std_Dev_Rating) %>%
                    select(Subject, Stimulus, Rating, Z_Score_Rating) %>%
                    arrange(desc(Z_Score_Rating))

head(z_scores)

Subject,Stimulus,Rating,Z_Score_Rating
vp01,maus_m_700_normAsl_-26,5,1.435227
vp02,maus_m_700_normAsl_-26,5,1.435227
vp03,haus_m_700_normAsl_-26,5,1.435227
vp03,maus_m_700_normAsl_-26,5,1.435227
vp04,haus_m_700_normAsl_-26,5,1.435227
vp04,maus_m_700_normAsl_-26,5,1.435227


#### Criterion checked: no significant outliers (no absolute z score greater than 3.29)

### Step 3.2: Normally distributed population for every single group

In [5]:
# normality checking for groups
check_normality_for_group <- function(stimulus) {
    data <- quality_data %>% dplyr::filter(Stimulus == stimulus) 
    test_result <- ks.test(data[['Rating']], "pnorm", mean=mean(data[['Rating']]), sd=sd(data[['Rating']]))
    result_string <- paste0('Normality for Stimulus ', stimulus, ': ')
    
    if(test_result[['p.value']] < 0.05) {
        result_string <- paste0(result_string, as.character(round(test_result[['p.value']], digits=7)), 
                                ' p-value (Kolmogrov-Smirnov) == NO, ')
    } else {
        result_string <- paste0(result_string, as.character(round(test_result[['p.value']], digits=7)), 
                                ' p-value (Kolmogrov-Smirnov) == YES, ')
    }
    
    test_result <- shapiro.test(data[['Rating']])
    if(test_result[['p.value']] < 0.05) {
        result_string <- paste0(result_string, as.character(round(test_result[['p.value']], digits=7)), 
                                ' p-value (Shapiro-Wilk) == NO!')
    } else {
        result_string <- paste0(result_string, as.character(round(test_result[['p.value']], digits=7)), 
                                ' p-value (Kolmogrov-Smirnov) == YES!')
    }
    
    # print
    cat(result_string)
}

In [6]:
# print normality tests
stimuli <- (quality_data %>% distinct(Stimulus))[['Stimulus']]

for (stimulus in stimuli) {
    check_normality_for_group(stimulus)
}

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Stimulus haus_m_700_bpf_200_2800_normAsl_-26: 0.0657124 p-value (Kolmogrov-Smirnov) == YES, 0.0013676 p-value (Shapiro-Wilk) == NO!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Stimulus haus_m_700_mnru_Q_14_normAsl_-26: 0.0028569 p-value (Kolmogrov-Smirnov) == NO, 0.0001636 p-value (Shapiro-Wilk) == NO!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Stimulus haus_m_700_normAsl_-26: 0.0002495 p-value (Kolmogrov-Smirnov) == NO, 2e-07 p-value (Shapiro-Wilk) == NO!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Stimulus maus_m_700_bpf_200_2800_normAsl_-26: 0.0406415 p-value (Kolmogrov-Smirnov) == NO, 0.0002901 p-value (Shapiro-Wilk) == NO!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Stimulus maus_m_700_mnru_Q_14_normAsl_-26: 0.0052889 p-value (Kolmogrov-Smirnov) == NO, 0.0001113 p-value (Shapiro-Wilk) == NO!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Stimulus maus_m_700_normAsl_-26: 0.0001038 p-value (Kolmogrov-Smirnov) == NO, 0 p-value (Shapiro-Wilk) == NO!

#### Normality does NOT exist within all groups - we are still going to continue though.

### Step 3.3: Homogenity of variances

In [7]:
# Check for homogenity of groups' VQ ratings
get_levene_test_results <- function() {

    test_results <- leveneTest(Rating ~ Stimulus, data = quality_data, center = mean)
    test_results

    stimuli <- (quality_data %>% distinct(Stimulus))[['Stimulus']]
    result <- 'F('
    for (stimulus in stimuli) {
        df <- (quality_data %>% dplyr::filter(Stimulus == stimulus) %>% 
               mutate(df = n() - 1))[1,][['df']]
        result <- paste0(result, ' df_{', stimulus, '} = ', df, ';\n')
    }
    result <- paste0(substr(result,1,nchar(result)-2), ' ) = ', 
                     round(test_results[1,2], digits=7), 
                ' | p-value = ', 
                round(test_results[1,3], digits=7))
    
    if(test_results[1,3] > 0.05) {
        result <- paste0(result, ' => homogenity of variance CAN be assumed')
    } else {
        result <- paste0(result, ' => homogenity of variance CANNOT be assumed')
    }
    
    # print
    cat(result)
}

get_levene_test_results()

F( df_{haus_m_700_bpf_200_2800_normAsl_-26} = 36;
 df_{haus_m_700_mnru_Q_14_normAsl_-26} = 36;
 df_{haus_m_700_normAsl_-26} = 36;
 df_{maus_m_700_bpf_200_2800_normAsl_-26} = 36;
 df_{maus_m_700_mnru_Q_14_normAsl_-26} = 36;
 df_{maus_m_700_normAsl_-26} = 36 ) = 1.5042309 | p-value = 0.1896423 => homogenity of variance CAN be assumed

#### As Levene's test delivers p-value > 0.05 => homogenity of variances CAN be assumed!

### Step 4: Conduct one-way independent measure ANOVA

In [8]:
# conduct one-way independent measure ANOVA
anova_results <- aov(Rating~Stimulus, data = quality_data)
summary(anova_results)

# compute effect size
etaSquared(anova_results, anova=TRUE)

             Df Sum Sq Mean Sq F value Pr(>F)    
Stimulus      5  231.8   46.35   66.55 <2e-16 ***
Residuals   216  150.4    0.70                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Unnamed: 0,eta.sq,eta.sq.part,SS,df,MS,F,p
Stimulus,0.6063881,0.6063881,231.7523,5,46.3504505,66.55278,0.0
Residuals,0.3936119,,150.4324,216,0.6964464,,


#### Based on the results: Stimulus has significant effect on speech quality ratings with p value of around 0 (more on that later)!

### Step 5: Pairwise comparison / post hoc test

In [9]:
# as equal sample sizes (each n=148) and Tukey most widely used: Tukey ["Cramming Sam's tips" for post hoc tests (from lecture)]
post_hoc <- TukeyHSD(anova_results)
tibble::rownames_to_column(as.data.frame(post_hoc$Stimulus), 'Compared Stimuli') %>% arrange(desc(`p adj`))

Compared Stimuli,diff,lwr,upr,p adj
maus_m_700_normAsl_-26-haus_m_700_normAsl_-26,0.02702703,-0.5308884,0.584942486,0.99999261
maus_m_700_bpf_200_2800_normAsl_-26-haus_m_700_bpf_200_2800_normAsl_-26,0.05405405,-0.5038614,0.611969513,0.99977121
maus_m_700_mnru_Q_14_normAsl_-26-haus_m_700_mnru_Q_14_normAsl_-26,-0.21621622,-0.7741317,0.341699242,0.87505663
haus_m_700_mnru_Q_14_normAsl_-26-haus_m_700_bpf_200_2800_normAsl_-26,-0.35135135,-0.9092668,0.206564107,0.46094084
maus_m_700_bpf_200_2800_normAsl_-26-haus_m_700_mnru_Q_14_normAsl_-26,0.40540541,-0.1525101,0.963320864,0.29669465
maus_m_700_mnru_Q_14_normAsl_-26-haus_m_700_bpf_200_2800_normAsl_-26,-0.56756757,-1.125483,-0.009652109,0.04358349
maus_m_700_mnru_Q_14_normAsl_-26-maus_m_700_bpf_200_2800_normAsl_-26,-0.62162162,-1.1795371,-0.063706163,0.01921988
haus_m_700_normAsl_-26-haus_m_700_bpf_200_2800_normAsl_-26,1.89189189,1.3339764,2.449807351,0.0
maus_m_700_normAsl_-26-haus_m_700_bpf_200_2800_normAsl_-26,1.91891892,1.3610035,2.476834378,0.0
haus_m_700_normAsl_-26-haus_m_700_mnru_Q_14_normAsl_-26,2.24324324,1.6853278,2.801158702,0.0


#### => More: see interpretation

### Step 6: Interpretation

In [10]:
# compute individual degrees of freedom for groups
get_df_of <- function(stimulus) {
    df <- (quality_data %>% dplyr::filter(Stimulus == stimulus) %>% 
            mutate(df = n() - 1))[1,][['df']]
    
    # print
    paste0('df_{', stimulus, '} = ', df)
}

cat(paste0('Total amount of samples: ', nrow(quality_data)))

Total amount of samples: 222

In [11]:
# compute statistics for independent variables' values
for (stimulus in stimuli) {
    statistics <- quality_data %>% 
                    filter(Stimulus == stimulus) %>%
                    group_by(Stimulus) %>%
                    summarize(mean = mean(Rating), sd = sd(Rating))
    cat(paste0(stimulus, ' (', get_df_of(stimulus), '): mean around ', round(statistics[['mean']], digits=3), ', standard deviation around ', round(statistics[['sd']], digits=3), '\n'))
}

haus_m_700_bpf_200_2800_normAsl_-26 (df_{haus_m_700_bpf_200_2800_normAsl_-26} = 36): mean around 2.622, standard deviation around 0.982
haus_m_700_mnru_Q_14_normAsl_-26 (df_{haus_m_700_mnru_Q_14_normAsl_-26} = 36): mean around 2.27, standard deviation around 0.871
haus_m_700_normAsl_-26 (df_{haus_m_700_normAsl_-26} = 36): mean around 4.514, standard deviation around 0.559
maus_m_700_bpf_200_2800_normAsl_-26 (df_{maus_m_700_bpf_200_2800_normAsl_-26} = 36): mean around 2.676, standard deviation around 0.915
maus_m_700_mnru_Q_14_normAsl_-26 (df_{maus_m_700_mnru_Q_14_normAsl_-26} = 36): mean around 2.054, standard deviation around 0.815
maus_m_700_normAsl_-26 (df_{maus_m_700_normAsl_-26} = 36): mean around 4.541, standard deviation around 0.803


#### Altering the testStimulus aka condition does have a significant effect on the speech quality ratings: 
#### There is a significant (alpha = 0.05) and rather large main effect of the testStimulus on the speech quality ratings (F statistic value of around 66.5529, p-value of around 0 with eta² effect size of around 0.6064). 
#### The total degrees of freedom are the amount of total observations - 1 => 222 - 1 = 221 [37 samples & 36 df for each distinct testStimulus]. 
#### Regarding pairwise comparisons / post hoc tests (Tukey's HSD), a statistically significant difference of ratings occurs each time we alter the testStimulus, EXCEPT for the following five cases (in descending order of p values):
#### 1. Switching between maus_m_700_normAsl_-26 & haus_m_700_normAsl_-26
#### 2. Switching between maus_m_700_bpf_200_2800_normAsl_-26 & haus_m_700_bpf_200_2800_normAsl_-26	
#### 3. Switching between maus_m_700_mnru_Q_14_normAsl_-26 & haus_m_700_mnru_Q_14_normAsl_-26
#### 4. Switching between haus_m_700_mnru_Q_14_normAsl_-26 & haus_m_700_bpf_200_2800_normAsl_-26
#### 5. Switching between maus_m_700_bpf_200_2800_normAsl_-26 & haus_m_700_mnru_Q_14_normAsl_-26
#### All other alterings of the testStimulus lead to a significant difference in speech quality ratings.