# Task 1: Does increasing the bitrate or changing the game (independet variables) have a significant effect on the video quality (VQ) ratings (dependent variable)? Please consider ALL RATINGS at a resolution of 1080p and a framerate of 60 fps. Use the ratings provided in the gaming video quality data set.

### Step 1: Import libraries and read in data set

In [1]:
# install.packages('dplyr')      # processing 
# install.packages('gdata')      # file reading
# install.packages('car')        # homogenity of variances
# install.packages('rstatix')    # Tukey's post hoc test
# install.packages('ez')         # ANOVA table

In [2]:
library(dplyr)     # processing
library(readxl)    # reading in data
library(car)       # homogenity of variances
library(rstatix)   # Tukey's post hoc test
library(ez)        # ANOVA table

"package 'dplyr' was built under R version 3.6.2"
Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

"package 'car' was built under R version 3.6.2"Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

"package 'rstatix' was built under R version 3.6.2"
Attaching package: 'rstatix'

The following object is masked from 'package:stats':

    filter

"package 'ez' was built under R version 3.6.2"Registered S3 methods overwritten by 'lme4':
  method                          from
  cooks.distance.influence.merMod car 
  influence.merMod                car 
  dfbeta.influence.merMod         car 
  dfbetas.influence.merMod        car 


In [3]:
# read in data sets
get_gaming_data <- function() {
    gaming_data <- read_excel("datasets/DB01_gaming_video_quality_dataset.xlsx")
    gaming_data <- gaming_data %>% dplyr::filter(Resolution == 1080, Framerate == 60) %>%
                                    select(PID, Bitrate, Game, VQ) %>%
                                    mutate(Bitrate = as.character(Bitrate), PID = as.factor(PID), Game = as.factor(Game)) %>%
                                    arrange(PID)
    
    gaming_data
}

gaming_data <- get_gaming_data()
head(gaming_data)

PID,Bitrate,Game,VQ
1,2000,Game2,2.9
1,2000,Game5,2.2
1,2000,Game1,2.9
1,2000,Game6,2.0
1,2000,Game3,3.4
1,2000,Game4,2.6


### Step 2: Decide on which ANOVA test to use

#### => 2 independent input variables (bitrate & game), 1 dependent variable, repeated measures for same subjects => two-way repeated measure ANOVA

### Step 3: Check assumptions

#### 1. Dependent variables on interval or ratio scale => check, because VQ is continuous
#### 2. Independent variables with two or more groups => check, because Game1-6 for game, 2k,4k,6k,50k kbps for bitrate
#### 3. Indepenence of observation => check, because participants are independent and within-subject independence is assumed
#### 4. No significant outliers => don't know, need to check that in the next step!
#### 5. Normally distributed population for every single group => don't know, need to check that in the next step!
#### 6. Homogenity of variances => don't know, need to check that in the next step!

### Step 3.1: Outlier detection

In [4]:
# z score method
head(
    gaming_data %>% 
                mutate(Std_Dev_VQ = sd(VQ), 
                       Mean_VQ = mean(VQ)) %>%
                mutate(Z_Score_VQ = (VQ - Mean_VQ) / Std_Dev_VQ) %>%
                select(VQ, Z_Score_VQ) %>%
                drop_na() %>%
                arrange(desc(Z_Score_VQ))
)

VQ,Z_Score_VQ
7.0,2.334889
6.7,2.076914
6.7,2.076914
6.6,1.990922
6.5,1.90493
6.4,1.818938


#### Criterion checked: no significant outliers (no absolute z score greater than 3.29)

### Step 3.2: Normally distributed population for every single group

In [5]:
# normality checking for groups
check_normality_for_group <- function(game, bitrate) {
    data <- gaming_data %>% dplyr::filter(Game == game & Bitrate == bitrate) 
    test_result <- ks.test(data[['VQ']], "pnorm", mean=mean(data[['VQ']]), sd=sd(data[['VQ']]))
    result_string <- paste0('Normality for ', game, ' & bitrate of ', bitrate, ' kbps: ')
    test_result[['p.value']]
    
    if(test_result[['p.value']] < 0.05) {
        result_string <- paste0(result_string, as.character(round(test_result[['p.value']], digits=3)), 
                                ' p-value (Kolmogrov-Smirnov) == NO, ')
    } else {
        result_string <- paste0(result_string, as.character(round(test_result[['p.value']], digits=3)), 
                                ' p-value (Kolmogrov-Smirnov) == YES, ')
    }
    
    test_result <- shapiro.test(data[['VQ']])
    if(test_result[['p.value']] < 0.05) {
        result_string <- paste0(result_string, as.character(round(test_result[['p.value']], digits=3)), 
                                ' p-value (Shapiro-Wilk) == NO!')
    } else {
        result_string <- paste0(result_string, as.character(round(test_result[['p.value']], digits=3)), 
                                ' p-value (Shapiro-Wilk) == YES!')
    }
    
    result_string
}

In [6]:
# print normality tests
games <- (gaming_data %>% distinct(Game) %>% arrange(Game))[['Game']]
bitrates <- (gaming_data %>% distinct(Bitrate) %>% arrange(Bitrate))[['Bitrate']]

for (game in games) {
    for (bitrate in bitrates) {
        cat(check_normality_for_group(game, bitrate))
    }  
}

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game1 & bitrate of 2000 kbps: 0.249 p-value (Kolmogrov-Smirnov) == YES, 0.04 p-value (Shapiro-Wilk) == NO!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game1 & bitrate of 4000 kbps: 0.833 p-value (Kolmogrov-Smirnov) == YES, 0.211 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game1 & bitrate of 50000 kbps: 0.761 p-value (Kolmogrov-Smirnov) == YES, 0.355 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game1 & bitrate of 6000 kbps: 0.975 p-value (Kolmogrov-Smirnov) == YES, 0.308 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game2 & bitrate of 2000 kbps: 0.992 p-value (Kolmogrov-Smirnov) == YES, 0.551 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game2 & bitrate of 4000 kbps: 0.865 p-value (Kolmogrov-Smirnov) == YES, 0.456 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game2 & bitrate of 50000 kbps: 0.062 p-value (Kolmogrov-Smirnov) == YES, 0 p-value (Shapiro-Wilk) == NO!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game2 & bitrate of 6000 kbps: 0.591 p-value (Kolmogrov-Smirnov) == YES, 0.085 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game3 & bitrate of 2000 kbps: 0.583 p-value (Kolmogrov-Smirnov) == YES, 0.045 p-value (Shapiro-Wilk) == NO!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game3 & bitrate of 4000 kbps: 0.903 p-value (Kolmogrov-Smirnov) == YES, 0.104 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game3 & bitrate of 50000 kbps: 0.289 p-value (Kolmogrov-Smirnov) == YES, 0.055 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game3 & bitrate of 6000 kbps: 0.776 p-value (Kolmogrov-Smirnov) == YES, 0.431 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game4 & bitrate of 2000 kbps: 0.345 p-value (Kolmogrov-Smirnov) == YES, 0.006 p-value (Shapiro-Wilk) == NO!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game4 & bitrate of 4000 kbps: 0.794 p-value (Kolmogrov-Smirnov) == YES, 0.871 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game4 & bitrate of 50000 kbps: 0.461 p-value (Kolmogrov-Smirnov) == YES, 0.011 p-value (Shapiro-Wilk) == NO!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game4 & bitrate of 6000 kbps: 0.586 p-value (Kolmogrov-Smirnov) == YES, 0.335 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game5 & bitrate of 2000 kbps: 0.524 p-value (Kolmogrov-Smirnov) == YES, 0.284 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game5 & bitrate of 4000 kbps: 0.87 p-value (Kolmogrov-Smirnov) == YES, 0.439 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game5 & bitrate of 50000 kbps: 0.554 p-value (Kolmogrov-Smirnov) == YES, 0.088 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game5 & bitrate of 6000 kbps: 0.411 p-value (Kolmogrov-Smirnov) == YES, 0.051 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game6 & bitrate of 2000 kbps: 0.48 p-value (Kolmogrov-Smirnov) == YES, 0.039 p-value (Shapiro-Wilk) == NO!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game6 & bitrate of 4000 kbps: 0.673 p-value (Kolmogrov-Smirnov) == YES, 0.051 p-value (Shapiro-Wilk) == YES!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game6 & bitrate of 50000 kbps: 0.267 p-value (Kolmogrov-Smirnov) == YES, 0.001 p-value (Shapiro-Wilk) == NO!

"ties should not be present for the Kolmogorov-Smirnov test"

Normality for Game6 & bitrate of 6000 kbps: 0.515 p-value (Kolmogrov-Smirnov) == YES, 0.119 p-value (Shapiro-Wilk) == YES!

#### Normality is likely to not exist within all groups - we are still going to continue though.

### Step 3.3: Homogenity of variances

In [7]:
# Check for homogenity of groups' VQ ratings
get_levene_test_results <- function() {

    test_results <- leveneTest(VQ ~ Game*Bitrate, data = gaming_data %>% mutate(Bitrate = as.character(Bitrate)), center = mean)

    games <- (gaming_data %>% distinct(Game) %>% arrange(Game))[['Game']]
    bitrates <- (gaming_data %>% distinct(Bitrate) %>% arrange(Bitrate))[['Bitrate']]
    result <- 'F('
    for (game in games) {
        for (bitrate in bitrates) {
            df <- (gaming_data %>% dplyr::filter(Game == game & Bitrate == bitrate) %>% mutate(df = n() - 1))[1,][['df']]
            result <- paste0(result, 'df_{Game ', game, ', ', bitrate, ' kbps bitrate} = ', df, ', \n')
        }  
    }
    result <- paste0(substr(result,1,nchar(result)-3), ') = ', 
                     round(test_results[1,2], digits=3), 
                     ' | p-value = ', 
                     round(test_results[1,3], digits=3))
    
    if(test_results[1,3] > 0.05) {
        result <- paste0(result, ' => homogenity of variance CAN be assumed')
    } else {
        result <- paste0(result, ' => homogenity of variance CANNOT be assumed')
    }
    
    cat(result)
}

get_levene_test_results()

F(df_{Game Game1, 2000 kbps bitrate} = 24, 
df_{Game Game1, 4000 kbps bitrate} = 24, 
df_{Game Game1, 50000 kbps bitrate} = 24, 
df_{Game Game1, 6000 kbps bitrate} = 24, 
df_{Game Game2, 2000 kbps bitrate} = 24, 
df_{Game Game2, 4000 kbps bitrate} = 24, 
df_{Game Game2, 50000 kbps bitrate} = 24, 
df_{Game Game2, 6000 kbps bitrate} = 24, 
df_{Game Game3, 2000 kbps bitrate} = 24, 
df_{Game Game3, 4000 kbps bitrate} = 24, 
df_{Game Game3, 50000 kbps bitrate} = 24, 
df_{Game Game3, 6000 kbps bitrate} = 24, 
df_{Game Game4, 2000 kbps bitrate} = 21, 
df_{Game Game4, 4000 kbps bitrate} = 21, 
df_{Game Game4, 50000 kbps bitrate} = 21, 
df_{Game Game4, 6000 kbps bitrate} = 21, 
df_{Game Game5, 2000 kbps bitrate} = 21, 
df_{Game Game5, 4000 kbps bitrate} = 21, 
df_{Game Game5, 50000 kbps bitrate} = 21, 
df_{Game Game5, 6000 kbps bitrate} = 21, 
df_{Game Game6, 2000 kbps bitrate} = 21, 
df_{Game Game6, 4000 kbps bitrate} = 21, 
df_{Game Game6, 50000 kbps bitrate} = 21, 
df_{Game Game6, 6000 kbps 

#### Therefore, homogenity of variances cannot be assumed - we are still going to continue though.

### Step 4: Conduct two-way repeated measure ANOVA

In [8]:
# conduct two-way repeated measure ANOVA
# subjects that we cannot use here, as each subject must give ratings for 4 bitrates & 6 games to be included

PIDs_included <- (gaming_data %>% group_by(PID) %>% summarize(ratings = n()) %>% filter(ratings == 6*4))[['PID']]
ezANOVA_data <- gaming_data %>% filter(PID %in% PIDs_included) %>% mutate(Bitrate = as.factor(as.character(Bitrate)))

head(ezANOVA_data)

anova_res = ezANOVA(
    data = ezANOVA_data, 
    dv = .(VQ), 
    wid = .(PID), 
    within = .(Bitrate, Game)
)

anova_res

PID,Bitrate,Game,VQ
1,2000,Game2,2.9
1,2000,Game5,2.2
1,2000,Game1,2.9
1,2000,Game6,2.0
1,2000,Game3,3.4
1,2000,Game4,2.6


"You have removed one or more Ss from the analysis. Refactoring "PID" for ANOVA."

Unnamed: 0,Effect,DFn,DFd,F,p,p<.05,ges
2,Bitrate,3,63,278.877001,2.6161449999999998e-36,*,0.5884888
3,Game,5,105,3.388669,0.00705276,*,0.04209685
4,Bitrate:Game,15,315,2.078991,0.01074579,*,0.03757574

Unnamed: 0,Effect,W,p,p<.05
2,Bitrate,0.7873053,0.45220179,
3,Game,0.2833088,0.04632612,*
4,Bitrate:Game,6.014363e-05,0.03972042,*

Unnamed: 0,Effect,GGe,p[GG],p[GG]<.05,HFe,p[HF],p[HF]<.05
2,Bitrate,0.8609659,1.3854660000000001e-31,*,0.9922616,4.791737e-36,*
3,Game,0.7173209,0.01636462,*,0.883289,0.009960099,*
4,Bitrate:Game,0.4598203,0.05027964,,0.7078586,0.02438876,*


#### Based on Mauchly's sphericity test results: p values of game main effect & game-bitrate interaction effect smaller than 0.05: correction needed here, sphericity assumed for bitrate main effect (as p value > 0.05)
#### => Bitrate main effect p value around 0 < 0.05 (sphericity assumed) => significant impact of bitrate on video quality (VQ) ratings
#### => Game main effect p value (after Greenhouse-Geisser correction) around 0.0164 < 0.05 => significant impact of game on video quality (VQ) ratings
#### => Bitrate-Game interaction effect p value (after Greenhouse-Geisser correction) around 0.0503 > 0.05 => No significant interaction effect between bitrate & game

### Step 5: Pairwise comparison / post hoc test

In [9]:
# as equal sample sizes (22) and Tukey most widely used: Tukey ["Cramming Sam's tips" for post hoc tests (from lecture)]
tukey_results <- TukeyHSD(aov(VQ ~ Bitrate * Game, data = gaming_data))
tukey_results

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = VQ ~ Bitrate * Game, data = gaming_data)

$Bitrate
                 diff        lwr        upr    p adj
4000-2000   1.0531915  0.8223278  1.2840552 0.00e+00
50000-2000  2.4340426  2.2031789  2.6649062 0.00e+00
6000-2000   1.4787234  1.2478597  1.7095871 0.00e+00
50000-4000  1.3808511  1.1499874  1.6117147 0.00e+00
6000-4000   0.4255319  0.1946682  0.6563956 1.55e-05
6000-50000 -0.9553191 -1.1861828 -0.7244555 0.00e+00

$Game
                   diff         lwr         upr     p adj
Game2-Game1  0.41500000  0.11076727  0.71923273 0.0015007
Game3-Game1  0.21400000 -0.09023273  0.51823273 0.3369045
Game4-Game1  0.36854545  0.05411216  0.68297875 0.0110397
Game5-Game1  0.12309091 -0.19134238  0.43752420 0.8732262
Game6-Game1  0.06740909 -0.24702420  0.38184238 0.9900762
Game3-Game2 -0.20100000 -0.50523273  0.10323273 0.4096916
Game4-Game2 -0.04645455 -0.36088784  0.26797875 0.9982808
Game5-Game2 

#### Many results here, however, based on the ratio of p values smaller than 0.05 for the respective effects, switching bitrates seems to have an even larger effect on the VQ ratings than switching games.

### Step 6: Interpretation

In [10]:
# compute individual degrees of freedom for groups
print_degrees_of_freedom <- function() {
    result <- ''
    
    for (game in games) {
        for (bitrate in bitrates) {
            df <- (ezANOVA_data %>% dplyr::filter(Game == game & Bitrate == bitrate) %>% mutate(df = n() - 1))[1,][['df']]
            result <- paste0(result, 'df_{Game ', game, ', ', bitrate, ' kbps bitrate} = ', df, '\n')
        }  
    }
    
    # pretty print
    result <- paste0(result, '\ndf_total = ', nrow(ezANOVA_data) - 1)
    cat(result)
}

print_degrees_of_freedom()

df_{Game Game1, 2000 kbps bitrate} = 21
df_{Game Game1, 4000 kbps bitrate} = 21
df_{Game Game1, 50000 kbps bitrate} = 21
df_{Game Game1, 6000 kbps bitrate} = 21
df_{Game Game2, 2000 kbps bitrate} = 21
df_{Game Game2, 4000 kbps bitrate} = 21
df_{Game Game2, 50000 kbps bitrate} = 21
df_{Game Game2, 6000 kbps bitrate} = 21
df_{Game Game3, 2000 kbps bitrate} = 21
df_{Game Game3, 4000 kbps bitrate} = 21
df_{Game Game3, 50000 kbps bitrate} = 21
df_{Game Game3, 6000 kbps bitrate} = 21
df_{Game Game4, 2000 kbps bitrate} = 21
df_{Game Game4, 4000 kbps bitrate} = 21
df_{Game Game4, 50000 kbps bitrate} = 21
df_{Game Game4, 6000 kbps bitrate} = 21
df_{Game Game5, 2000 kbps bitrate} = 21
df_{Game Game5, 4000 kbps bitrate} = 21
df_{Game Game5, 50000 kbps bitrate} = 21
df_{Game Game5, 6000 kbps bitrate} = 21
df_{Game Game6, 2000 kbps bitrate} = 21
df_{Game Game6, 4000 kbps bitrate} = 21
df_{Game Game6, 50000 kbps bitrate} = 21
df_{Game Game6, 6000 kbps bitrate} = 21

df_total = 527

In [11]:
# compute statistics for independent variables' values
get_statistics <- function(for_bitrate=TRUE) {
    result_string <- ''
    
    if(for_bitrate) { 
        for (bitrate in bitrates) {
            data <- gaming_data %>% 
                        filter(Bitrate == bitrate) %>% 
                        group_by(Bitrate) %>% 
                        summarize(mean = round(mean(VQ), digits=5), sd = round(sd(VQ), digits=5))
            result_string <- paste0(result_string, 'Mean of ', bitrate, ' kbps bitrate = ', data['mean'], ', standard deviation = ', data['sd'], '\n')
        }  
    } else {
        for (game in games) {
            data <- gaming_data %>% 
                        filter(Game == game) %>% 
                        group_by(Game) %>% 
                        summarize(mean = round(mean(VQ), digits=5), sd = round(sd(VQ), digits=5))
            result_string <- paste0(result_string, 'Mean of ', game, ' = ', data['mean'], ', standard deviation = ', data['sd'], '\n')
        }
    }
    
    result_string
}

# pretty print
cat(get_statistics(for_bitrate=TRUE))
cat(get_statistics(for_bitrate=FALSE))

Mean of 2000 kbps bitrate = 3.04326, standard deviation = 0.76963
Mean of 4000 kbps bitrate = 4.09645, standard deviation = 0.8355
Mean of 50000 kbps bitrate = 5.4773, standard deviation = 0.65448
Mean of 6000 kbps bitrate = 4.52199, standard deviation = 0.80161
Mean of Game1 = 4.086, standard deviation = 1.17284
Mean of Game2 = 4.501, standard deviation = 1.17297
Mean of Game3 = 4.3, standard deviation = 1.11137
Mean of Game4 = 4.45455, standard deviation = 1.09428
Mean of Game5 = 4.20909, standard deviation = 1.21245
Mean of Game6 = 4.15341, standard deviation = 1.18136


#### Altering the bitrate or the game does indeed have a significant effect on the video quality (VQ) ratings (with selected alpha of  0.05): 
#### There is no significantly noticable interaction effect between bitrate & game (F statistic value of around 2.079, Greenhouse-Geisser corrected p-value of around 0.0503 with eta² effect size of around 0.0376), BUT a significantly noticable main effect of the played game on the video quality (VQ) ratings (F statistic value of around 3.389, Greenhouse-Geisser corrected p-value of around 0.0164 with eta² effect size of around 0.0421), AND a significant rather large main effect of the bitrate on the video quality (VQ) ratings (F statistic value of around 278.877, p-value of around 0 (sphericity assumed) with eta² effect size of around 0.5885). 
#### The total degrees of freedom are the amount of total observations in ANOVA - 1 => 528 - 1 = 527 [for more specific degrees of freedom and statistics of the individual groups, see above].
#### Regarding pairwise comparisons / post hoc tests (Tukey's HSD), there is a statistically significant difference of video quality (VQ) ratings when switching between Game2 & Game1, between Game4 & Game1 and between Game6 & Game2. Also, altering the bitrate has a statistically significant impact on the VQ ratings in all tested cases. More more information on the post hoc results, please see above.