# Task 3: Internal Consistency

>  Does Video Quality, Video Fragmentation, Video Unclearness, and Video Discontinuity build a same general construct? (What is their internal consistency?), if not, which combination does?

## Setup
### Importing required modules

In [48]:
import itertools

import numpy as np  # pip install numpy
import pandas as pd  # pip install pandas

### Loading the dataset

Corresponding columns:
- `VQ`: Video Quality
- `VF`: Video Fragmentation
- `VU`: Video Unclearness
- `VD`: Video Discontinuity

In [50]:
db_01 = pd.read_excel("../datasets/DB01_gaming_video_quality_dataset.xlsx")
db_01.head()

Unnamed: 0,PID,Game,Condition,Condition_params,Resolution,Framerate,Bitrate,VQ,VF,VU,VD,AC,Age,Gender,Expertise,Monitor
0,1,Game1,15,480_30_300,480,30,300,1.4,1.3,1.6,5.7,0,21.0,Male,4.0,Desktop
1,2,Game1,15,480_30_300,480,30,300,1.2,1.2,4.6,6.3,0,20.0,Female,1.0,Desktop
2,3,Game1,15,480_30_300,480,30,300,2.5,2.3,2.8,4.4,0,,,,
3,4,Game1,15,480_30_300,480,30,300,2.0,3.0,2.0,4.8,0,22.0,Female,1.0,smallPhone
4,5,Game1,15,480_30_300,480,30,300,2.4,3.0,2.0,5.5,0,23.0,Male,3.0,Desktop


# Cronbach alpha function

> ```py
def cronbach(itemscores):
    itemvars = itemscores.var(axis=1, ddof=1)
    tscores = itemscores.sum(axis=0)
    nitems = len(itemscores)
    return nitems / (nitems-1) * (1 - itemvars.sum() / tscores.var(ddof=1))
```
> NumPy has a variance function built in. Specifying `ddof=1` uses a denominator of `N-1`, giving a sample variance. There's also a sum builtin.

Source: https://stackoverflow.com/a/20799687

## Adapted version for our use case:

In our case:
- columns contain scores for a specific "question" (item) from all subjects
- rows contain scores for all "questions" (items) from a given subject

In case of the function above:
- The function above assumes the rows to contain the the scores for a single question.
- It therefore would require us to transpose the input. For efficiency reasons, we adapt the function instead to expect our input format.

In [42]:
def cronbach(itemscores):
    """
    Each column is assumed to contain scores for a given item.
    """
    itemvars = itemscores.var(axis="index", ddof=1)
    tscores = itemscores.sum(axis="columns")
    nitems = len(itemscores.columns)
    return nitems / (nitems-1) * (1 - itemvars.sum() / tscores.var(ddof=1))

Example usage for all four columns:

In [47]:
columns = ["VQ", "VF", "VU", "VD"]
cronbach(db_01[columns])

0.7421240885043254

## Calculating Internal Consistency For Selected Columns

As we have seen above, the internal consistency between all requested columns is 0.7 (acceptable). Below we will explore the internal consistency of all combinations by leaving one or two columns out.

### Generate combinations and construct dataframe from cobinations

In [106]:
leave_zero_out = itertools.combinations(columns, 4)
leave_one_out = itertools.combinations(columns, 3)
leave_two_out = itertools.combinations(columns, 2)
combinations = itertools.chain(leave_zero_out, leave_one_out, leave_two_out)

grid = [[col in comb for col in columns] for comb in combinations]
column_selection = pd.DataFrame(grid, columns=columns)

### Calculate Cronbach's alpha for every combination

In [None]:
def select_columns_and_calc_cronbach(row):
    selected_columns = row.index[row]
    selected_data = db_01[selected_columns]
    return cronbach(selected_data)

cronbach_results = column_selection.aggregate(select_columns_and_calc_cronbach, axis="columns")

### Assign consistency labels for each Cronbach's alpha results

In [None]:
bin_labels = ["Unacceptable", "Poor", "Questionable", "Acceptable", "Good", "Excellent"]
bin_edges = [-np.inf, 0.5, 0.6, 0.7, 0.8, 0.9, np.inf]
bins = pd.IntervalIndex.from_breaks(bin_edges, closed="left")
bin_label_mapping = dict(zip(bins, bin_labels))

consistency_bins = pd.cut(cronbach_results, bins)
consistency = consistency_bins.transform(lambda alpha_bin: bin_label_mapping[alpha_bin])

### Concatenate and display results

In [108]:
pd.concat(
    [column_selection, cronbach_results, consistency],
    keys=["Column Selection", "Cronbach's Alpha", "Consistency"],
    axis="columns"
)

Unnamed: 0_level_0,Column Selection,Column Selection,Column Selection,Column Selection,Cronbach's Alpha,Consistency
Unnamed: 0_level_1,VQ,VF,VU,VD,0,1
0,True,True,True,True,0.742124,Acceptable
1,True,True,True,False,0.829821,Good
2,True,True,False,True,0.650001,Questionable
3,True,False,True,True,0.656003,Questionable
4,False,True,True,True,0.525368,Poor
5,True,True,False,False,0.824703,Good
6,True,False,True,False,0.850252,Good
7,True,False,False,True,0.373154,Unacceptable
8,False,True,True,False,0.617272,Questionable
9,False,True,False,True,0.285473,Unacceptable


## Discussion

Even though the internal consistency between all four columns is acceptable, one can see that the consistency is always better if the `VD` (Video Discontinuity) columns was left out. All smaller combinations including the `VD` column are either questionable or unacceptable.