# Realiability and Significance testing
This notebook utilizes the cleaned and saved dataframes to evaluate the reliability of survey questions and assess the significance of the results.

## Install Pingouins (optional)
Un hash the below if you need to install or upgrade 

In [None]:
#pip install pingouin

In [None]:
#pip install --upgrade pingouin

## Imports

In [None]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import pingouin as pg
from scipy.stats import spearmanr

import seaborn as sns
sns.set_style('darkgrid', {'axes.facecolor': '0.9', "grid.color": ".6", "grid.linestyle": ":"})
sns.set_context("talk")

import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

## Load data

In [None]:
hypo_df = pd.read_pickle("../saved_data_frames/hypothesis_df.pkl")
#hypo_df.head()

## Randomly extract 50 data points for Reliability testing

In [None]:
reliability_50 = hypo_df.sample(n=50, random_state=186)
#reliability_50.head()

## Convert answers to numerical scales


In [None]:
convert_dict = {
    
    # true/false (knowledge)
    'True':1,
    'False':2,
    
    # frequency (knowledge)
    'Less than 1 time each month':1,
    '1 time each month': 2,
    '1 time each week':3,
    '2 to 3 times each week':4,
    '1 time each day':5,
    '2 to 3 times each day':6,
    
    # percent ranges
    'None of the residents under my care experience bleeding when brushing their teeth':1,
    'Less than 25%':2,
    '25% to 50%':3,
    '50% to 75%':4,
    'Greater than 75%':5, 
    
    # agree - disagree (attitude)
    'Strongly Agree':1,
    'Agree':2,
    'Neutral':3,
    'Disagree':4,
    'Strongly Disagree':5
}

In [None]:
num_answers = reliability_50.replace(convert_dict)
num_answers.head()

## Reliability Coefficient using Cronbach's Alpha
Cronbach's alpha coefficient, often referred to as simply "Cronbach's alpha," is a widely used measure of reliability in the context of Likert scale data and surveys. It assesses the internal consistency of a set of Likert-type items, indicating the degree to which those items are measuring the same underlying construct or dimension.

Here are some key points about using Cronbach's alpha coefficient to assess reliability with Likert data:

1. **Internal Consistency:** Cronbach's alpha assesses the degree to which items in a Likert scale or questionnaire consistently measure the same concept or construct. In other words, it measures how closely related the responses to different items are within the same scale.

2. **Scale Structure:** It's typically used for scales or questionnaires where respondents rate their agreement or disagreement with a series of statements on a scale (e.g., strongly agree, agree, neutral, disagree, strongly disagree).

3. **Range of Values:** Cronbach's alpha ranges from 0 to 1, with higher values indicating greater internal consistency. Generally, an alpha of 0.70 or higher is considered acceptable for most research purposes, although the specific threshold may vary depending on the field and context.

4. **Calculation:** The formula for Cronbach's alpha involves calculating the average correlation between each item and all other items in the scale. A higher alpha suggests that the items are more closely related.

5. **Interpreting Results:** If Cronbach's alpha is too low, it may indicate that the items in the scale are not measuring the same underlying construct effectively, and some items may need to be revised or removed. On the other hand, a very high alpha might suggest redundancy among items.

6. **Contextual Considerations:** While Cronbach's alpha is a valuable tool, it's important to consider the context of your study and the specific construct you are measuring. In some cases, lower alpha values may be acceptable if the scale is meant to capture a diverse range of opinions.

7. **Sample Size:** Sample size can influence the stability of Cronbach's alpha estimates. Larger samples tend to produce more reliable estimates of alpha.

In summary, Cronbach's alpha helps ensure that the items within a scale or questionnaire are consistent in measuring the intended construct, increasing the confidence in the validity of the results.

### Extract out just Likert Scale Questions


In [None]:
[i for i in enumerate(num_answers.columns)]

In [None]:
likert = num_answers[num_answers.columns[6:12]]

### Calculate Cronbach's Alpha of Likert Data

In [None]:
pg.cronbach_alpha(data=likert)

## Guttman's Lambda
Guttman's Lambda is a measure of internal consistency and reliability used for ordinal data. It is an alternative to Cronbach's alpha, specifically designed for assessing the internal consistency of ordinal variables. Guttman's Lambda evaluates how well a set of items or questions within a scale measures a single underlying construct or dimension.

Here are some key points about Guttman's Lambda:

1. **Purpose:** Guttman's Lambda is used to assess the internal consistency of a set of items or questions with ordered response categories. It helps determine the extent to which these items are measuring the same underlying construct or trait.

2. **Ordinal Data:** Guttman's Lambda is particularly well-suited for ordinal data, where the response categories have a meaningful order but may not have equal intervals between them. It can be applied to Likert scale items, Likert-type questions, and other ordinal variables.

4. **Range:** Guttman's Lambda values range from 0 to 1, similar to Cronbach's alpha. Higher values indicate greater internal consistency among the items, suggesting that the items are measuring the same underlying construct more reliably.

5. **Interpretation:** When interpreting Guttman's Lambda, a value close to 1 suggests high internal consistency among the items, indicating that they effectively measure the intended construct. Conversely, a lower value may indicate inconsistency or a need to revise the items within the scale.

7. **Advantages:** Guttman's Lambda takes into account the ordered relationships between response categories, making it a suitable choice for assessing reliability when dealing with ordinal data. It provides a more accurate measure of internal consistency for such data compared to Cronbach's alpha.

In summary, Guttman's Lambda is a valuable tool for assessing the internal consistency and reliability of ordinal data, particularly when dealing with items or questions that have ordered response categories. Use it to ensure that the items within a scale are effectively measuring the same underlying construct. 

### Extract out just Ordinal Questions

In [None]:
percentages = num_answers[num_answers.columns[12:15]]

### Calculate the covariance matrix

In [None]:
cov_mat_percent = np.cov(percentages, rowvar=False)
cov_mat_percent

### Calculate the reduced covariance matrices by excluding diagonal elements

In [None]:
reduced_percent = cov_mat_percent - np.diag(np.diag(cov_mat_percent))
reduced_percent

### Calculate Guttman's Lambda

In [None]:
# determinant of FULL covariance matrix of percentages
deter_full_percent = np.linalg.det(cov_mat_percent)

# determinant of REDUCED covariance matrix of percentages
deter_reduced_percent = np.linalg.det(reduced_percent)

guttmans_lambda_percent = deter_reduced_percent / deter_full_percent
print(f"Guttman's Lambda of Percentage questions: {guttmans_lambda_percent}")

## Kuder-Richardson Formula (KR-20) 
The Kuder-Richardson Formula 20 (KR-20) is a reliability coefficient used to assess the internal consistency of a test or set of items with dichotomous (binary) responses, such as correct and incorrect answers. KR-20 is a variant of the Kuder-Richardson reliability formula, and it is commonly used in educational and psychological research to measure the consistency of test items that have only two possible responses.

Here are some key points about KR-20:

1. **Purpose:** KR-20 is used to determine how well a set of dichotomous test items (questions or items with only two response options, often correct and incorrect) measures a single underlying construct or trait. It evaluates whether these items consistently assess the same characteristic.

2. **Binary Data:** KR-20 is specifically designed for data with two response categories, typically coded as 1 (correct) and 0 (incorrect). It assumes that items are scored in a binary fashion.

3. **Calculation:** The formula for KR-20 involves calculating the proportion of variance explained by the differences in the respondents' scores on the test items. It takes into account the number of items, the mean score, and the variance of scores.

4. **Interpretation:** KR-20 values range from 0 to 1. Higher values indicate greater internal consistency among the test items, suggesting that the items are measuring the same underlying construct more reliably. A KR-20 value close to 1 suggests high internal consistency, while a lower value indicates lower consistency.

5. **Limitations:** KR-20 has limitations. It assumes that the test items are parallel forms of each other, meaning that they have the same level of difficulty and measure the same construct. It may not be appropriate for tests with items that have different levels of difficulty or items that measure multiple dimensions.

6. **Usage:** KR-20 is commonly used in educational assessments, particularly for multiple-choice exams where each question has only two possible answers (correct or incorrect).

Here's the formula for calculating KR-20:

$$KR20 = \frac{k}{k-1}\displaystyle \Bigg (1 - \frac{\sum \limits _{j=1} ^{k} p_{j} q_{j}}{\sigma^2}\Bigg ) $$
where:

- $k$: Total number of questions
- $p_{j}$: Proportion of individuals who answered question j correctly
- $q_{j}$: Proportion of individuals who answered question j incorrectly 
- $\sigma^2$: Variance of scores for all individuals who took the test 

In [None]:
def kuder_richardson(df):

    # Total number of questions
    k = len(df.columns)
    
    # Proportion of individuals who answered question j correctly
    pj = df.mean()
    
    # Proportion of individuals who answered question j incorrectly
    qj = 1-pj
    
    # Variance of scores for all individuals who took the test
    o2 = np.var(df.sum(axis=1), ddof=1)
    
    # Calculate KR-20 coefficient
    KR_20 = (k / (k - 1)) * (1 - ((np.sum(pj*qj) / o2)))

    #print("KR-20 Coefficient:", KR_20)
    
    return KR_20

In [None]:
# get dicatomous questions 
t_f = num_answers[num_answers.columns[2:6]]
frequency = num_answers[num_answers.columns[-2:]]

In [None]:
# convert to correct/incorrect
[i for i in enumerate(t_f.columns)]

In [None]:
binary_col0 = [1 if answer == 1 else 0 for answer in t_f[t_f.columns[0]]] # fluoridated products

binary_col1 = [1 if answer == 2 else 0 for answer in t_f[t_f.columns[1]]] # healthy gums bleed

binary_col2 = [1 if answer == 1 else 0 for answer in t_f[t_f.columns[2]]] # Dry mouth

binary_col3 = [1 if answer == 1 else 0 for answer in t_f[t_f.columns[3]]] # Snacking bad

t_f[t_f.columns[0]] = binary_col0
t_f[t_f.columns[1]] = binary_col1
t_f[t_f.columns[2]] = binary_col2
t_f[t_f.columns[3]] = binary_col3

kr_20_tf = kuder_richardson(t_f)

print("KR-20 Coefficient:", kr_20_tf)

In [None]:
[i for i in enumerate(frequency.columns)]

In [None]:
binary_c0 = [1 if answer == 6 \
               else 0 for answer in frequency[frequency.columns[0]]] # How often should residents brush

binary_c1 = [1 if answer == 6 \
               else 0 for answer in frequency[frequency.columns[1]]] # How often should residents floss

frequency[frequency.columns[0]] = binary_c0
frequency[frequency.columns[1]] = binary_c1  

kr_20_freq = kuder_richardson(frequency)

print("KR-20 Coefficient:", kr_20_freq)

In [None]:
binary = pd.concat([t_f, frequency], axis=1)
kr_20_binary = kuder_richardson(binary)

print("KR-20 Coefficient:", kr_20_binary)

In [None]:
from itertools import combinations
 
def combos(arr, r):
 
    # return set of all subsets of length r
    return set(list(combinations(arr, r)))


tup_list = []
for length_of_tuples in range(2,len(binary.columns)):
    tup_list.extend(list(combos(list(range(len(binary.columns))), length_of_tuples)))

                    
#tup_list

In [None]:
best = 0
best_cols = []
for tup in tup_list:   
    df_cols = []
    temp_dfs = []
    df = binary[binary.columns[list(tup)]]
    df_cols.append(list(df.columns))
    kr_coeff = kuder_richardson(df)
    if kr_coeff > best:
        best = kr_coeff
        best_cols = list(df.columns)
        
print(best)
print(best_cols)

## Remove `reliability_50` from `hypo_df` to create `sig_testing`

In [None]:
sig_testing = hypo_df.drop(index=list(reliability_50.index))

### Convert `None` values

In [None]:
sig_testing = sig_testing.replace(
    {'None of the residents under my care experience bleeding when brushing their teeth':
     'None'}
)
sig_testing.head()

In [None]:
sig_testing.shape

In [None]:
pd.to_pickle(sig_testing, "../saved_data_frames/sig_testing_PRE.pkl")

In [None]:
#from scipy.stats import mannwhitneyu