# 08 ANOVA - Task 1

Provide a script and html file which calulates the suitable ANOVA to answer the
following research questions (RQ). Please also report the results as a text
conclusion including the test statistic value (F) with degree of freedom, significance
value as well as pairwise comparisions.

Does increasing the bitrate or changing the game (independent variables)
have a significant effect on the video quality (VQ) ratings (dependent variable).
Please consider only ratings at a resolution of 1080p and a framerate of 60 fps
(conditions 36 and 50). Use the ratings provided in the gaming video quality dataset.

## Import and Initializing

In [159]:
import numpy as np
import pandas as pd
import scipy

# pip install pingouin
import pingouin as pg

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="whitegrid", context="talk")
cm = sns.diverging_palette(127, 14, s=99, l=55, as_cmap=True)

FIGSIZE = (20,4)

## Loading the data

In [120]:
dataset = pd.read_excel(
    "../datasets/DB01_gaming_video_quality_dataset.xlsx",
    usecols=["PID", "Game", "Condition", "VQ"],
    dtype={"Condition": str},
).dropna()

mask = (dataset.Condition == "36") | (dataset.Condition == "50")
dataset = dataset.loc[mask]
dataset.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 94 entries, 2867 to 3242
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   PID        94 non-null     int64  
 1   Game       94 non-null     object 
 2   Condition  94 non-null     object 
 3   VQ         94 non-null     float64
dtypes: float64(1), int64(1), object(2)
memory usage: 3.7+ KB


## Check general requirements

### Measurement

Independent variables: `Condition` (encodes bitrate), `Game`.<br>
The dependent variable (`VQ`) is measured at the interval level.

### Balance
Remove subjects who do not have measurements for both games.

In [122]:
value_counts = dataset.PID.value_counts()
value_counts.loc[value_counts < 4] = pd.NA
value_counts.dropna(inplace=True)
dataset = dataset.loc[dataset.PID.isin(value_counts.index)]

## Two-way Repeated Measure ANOVA

In [157]:
result = pg.rm_anova(dataset, dv="VQ", subject="PID", within=["Condition", "Game"])
result.style.background_gradient(cmap=cm, subset=["p-unc", "p-GG-corr"])

Unnamed: 0,Source,SS,ddof1,ddof2,MS,F,p-unc,p-GG-corr,np2,eps
0,Condition,35.003,1,21,35.003,200.035,0.0,0.0,0.905,1.0
1,Game,0.003,1,21,0.003,0.004,0.947343,0.947343,0.0,1.0
2,Condition * Game,0.05,1,21,0.05,0.066,0.799766,0.799766,0.003,1.0


- `Source`: Name of the within-group factor
- `ddof1`: Degrees of freedom (numerator)
- `ddof2`: Degrees of freedom (denominator)
- `F`: F-value
- `p-unc`: Uncorrected p-value
- `np2`: Partial eta-square effect size
- `eps`: Greenhouse-Geisser epsilon factor (= index of sphericity)
- `p-GG-corr`: Greenhouse-Geisser corrected p-value
- `W-spher`: Sphericity test statistic
- `p-spher`: p-value of the sphericity test
- `sphericity`: sphericity of the data (boolean)

> [...] The default for two-way design is to return both the uncorrected and Greenhouse-Geisser corrected p-values. Note that sphericity test for two-way design are not currently implemented in Pingouin.

Source: https://pingouin-stats.org/generated/pingouin.rm_anova.html?highlight=rm_anova#pingouin.rm_anova

In [160]:
result = pg.pairwise_ttests(dataset, dv="VQ", subject="PID", within=["Condition", "Game"])
result.style.background_gradient(cmap=cm, subset=["p-unc"])

Unnamed: 0,Contrast,Condition,A,B,Paired,Parametric,T,dof,Tail,p-unc,BF10,hedges
0,Condition,-,36,50,True,True,-14.143,21.0,two-sided,0.0,2328000000.0,-2.155
1,Game,-,Game1,Game6,True,True,-0.067,21.0,two-sided,0.947343,0.223,-0.017
2,Condition * Game,36,Game1,Game6,True,True,-0.226,21.0,two-sided,0.823686,0.228,-0.071
3,Condition * Game,50,Game1,Game6,True,True,0.151,21.0,two-sided,0.881675,0.225,0.043
