In [3]:
import pandas as pd
import numpy as np
import scipy as sp
from scipy import stats
from IPython.display import display, Markdown, Latex

## Cronbachs Alpha

Reliability in general:

$Rel = \frac{Var(t)}{Var(t) + Var(e)} = 1 -\frac{Var(e)}{Var(t) + Var(e)}$

Conbach's Alpha:

$\alpha = \frac{m}{m-1} * \left( 1 - \frac{\sum_{i=1}^m Var(x_i)}{Var(y)} \right)$, with $y = \sum_{i=1}^m x_i$

Alternatively:

$\alpha = \frac{m * \overline{Cov}}{\overline{Var} + (m-1) * \overline{Cov}}$, with number of items $m$, average covariance of items $\overline{Cov}$ and average variance of items $\overline{Var}$

**Threshold values:**
- should be higher than .7
- For new scales values above .6 are often acceptable

**Interpretation:**
- Share of variance of the composite explained by the true score
- $\alpha:$ expected the correlation of scale with other scales of similar reliability
- $\sqrt{\alpha}$ corresponds to the correlation of the scale with the true score

==> High score = reliability at construct level is good; Items have a high shared variance

In [14]:
# Alternative way of calculation:
means = np.array([5.57, 5.89, 6.19, 5.15])
standard_deviations = np.array([1.35, 1.12, 1.11, 1.54])
correlations = np.array([[1, 0, 0, 0], [0.72, 1, 0, 0], [0.6, 0.72, 1, 0], [0.51, 0.45, 0.55, 1]]) # Provide correlations are comment out and provide covariances directly below

assert np.all(correlations <= 1)
assert len(means) == len(standard_deviations)

factor = np.array([standard_deviations]).transpose() * np.array([standard_deviations])
covariances = factor * correlations

m = len(means)
mean_variance = np.mean(standard_deviations**2)
covariances_lower_tria = np.tril(covariances, k=-1)
mean_covariance = np.sum(covariances_lower_tria) / np.sum(covariances_lower_tria != 0) # Divide by the number of non-zero entries in the matrix (number of items below the diagonal)

alpha = (m * mean_covariance) / (mean_variance + (m - 1) * mean_covariance)
display(Markdown("Covariances:"), pd.DataFrame(covariances).round(decimals=3))
display(Markdown(f"$\\alpha = {alpha}$,    with " + "$\overline{Var} = " + str(mean_variance) + "$,  $\overline{Cov} = " + str(mean_covariance) + "$"))

Covariances:

Unnamed: 0,0,1,2,3
0,1.823,0.0,0.0,0.0
1,1.089,1.254,0.0,0.0
2,0.899,0.895,1.232,0.0
3,1.06,0.776,0.94,2.372


$\alpha = 0.8384610974243325$,    with $\overline{Var} = 1.67015$,  $\overline{Cov} = 0.9432440000000001$

In [None]:
#TODO: Add standard way of calculation

6

## Item-to-total correlation

Diagnostic information at the item level

Defined as: 

$r_{it} = r(x_i, y) = \frac{Cov(x_i,y)}{s_{x_i}*s_y}$, where $x_1,...,x_m$ items and $y = \sum_{i=1}^m x_i$

-> Overestimation with small number of items ==> Corrected Item-to-Total correlation

$r_{ot}(o) = r(x_o,y)$, where $x_1, ..., x_o, ..., x_m$ scale items, $x_o$ focal item and $y = \left( \sum_{i=1}^m x_i \right) - x_o$

**Interpretation:**

A item-to-total correlation of less than 0.2/0.3 indicates that the corresponding item does not correlate well with the scale overall (may be dropped)

Used to discover items that do not correlate well with the other items and can thus be discarded; eliminating garbage items

In [55]:
# Item-to-total correlation

# TODO: Check if correct

# Alternative 1: Calculate using covariance and standard deviation
covariances = np.array((
    [1.8225  , 0.      , 0.      , 0.      ],
    [1.08864 , 1.2544  , 0.      , 0.      ],
    [0.8991  , 0.895104, 1.2321  , 0.      ],
    [1.06029 , 0.77616 , 0.94017 , 2.3716  ]
))

covariances_filled = covariances + np.tril(covariances, k=-1).T
variances = covariances.diagonal()
combined_covariances = np.sum(covariances_filled, axis=0)
summed_covariances = np.sum(np.tril(covariances, k=-1))*2

item_to_total_correlations = combined_covariances / np.sqrt(variances * (np.sum(variances) + summed_covariances))
item_to_total_correlations

array([0.85037776, 0.84481512, 0.8422695 , 0.78796299])

In [69]:
# Corrected Item-to-total correlation

# Alternative 1: Calculate using covariance and standard deviation
item_index = 3 # zero based index => number minus 1
covariances = np.array((
    [1.8225  , 0       , 0       , 0       ],
    [1.08864 , 1.2544  , 0       , 0       ],
    [0.8991  , 0.895104, 1.2321  , 0       ],
    [1.06029 , 0.77616 , 0.94017 , 2.3716  ]
))

covariances_filled = covariances + np.tril(covariances, k=-1).T
variances = covariances.diagonal()
np.fill_diagonal(covariances_filled, 0)
combined_covariances = np.sum(covariances_filled, axis=0)

summed_covariances = []
summed_variances = []

for item_index in range(len(variances)):
    mask = np.ones_like(covariances, dtype=bool)
    mask[item_index] = False
    summed_covariance = np.sum(np.tril(covariances, k=-1)[mask])*2
    summed_covariances.append(summed_covariance)
    
    mask = np.ones_like(variances, dtype=bool)
    mask[item_index] = False
    summed_variance = np.sum(variances[mask])
    summed_variances.append(summed_variance)

summed_covariances = np.array(summed_covariances)
summed_variances = np.array(summed_variances)

corrected_item_to_total_correlations = combined_covariances / np.sqrt(variances * (summed_variances + summed_covariances))
corrected_item_to_total_correlations

array([0.56135307, 0.6456219 , 0.67856801, 0.56804131])

In [60]:
covariances

array([[1.8225  , 0.      , 0.      , 0.      ],
       [1.08864 , 1.2544  , 0.      , 0.      ],
       [0.8991  , 0.895104, 1.2321  , 0.      ],
       [1.06029 , 0.77616 , 0.94017 , 2.3716  ]])

In [64]:
np.tril(covariances, k=-1).T

array([[0.      , 1.08864 , 0.8991  , 1.06029 ],
       [0.      , 0.      , 0.895104, 0.77616 ],
       [0.      , 0.      , 0.      , 0.94017 ],
       [0.      , 0.      , 0.      , 0.      ]])