# Introduction

This Jupyter Notebook provides a step-by-step guide to performing preliminary statistical tests, including the **Kaiser-Meyer-Olkin (KMO) Test**, **Bartlett’s Test of Sphericity**, and **Cronbach's Alpha**. These tests are crucial for assessing the suitability of your dataset for **Principal Component Analysis (PCA)**. 

- **KMO Test** evaluates the adequacy of sampling by measuring the proportion of variance in variables that could be explained by underlying factors. Higher values (close to 1) indicate that PCA is appropriate.  
- **Bartlett’s Test of Sphericity** checks whether the correlation matrix of the data is significantly different from an identity matrix, which is a prerequisite for PCA. A significant p-value suggests that the dataset is suitable for dimensionality reduction.  
- **Cronbach’s Alpha** measures the internal consistency (reliability) of a set of items or variables, ensuring that they are sufficiently correlated to represent a cohesive construct.

By following this notebook, you will learn how to calculate these metrics and interpret their results, ensuring that your dataset meets the assumptions and criteria for performing PCA effectively.


In [None]:
#You may need to install the following so check 
pip install factor-analyzer scipy

In [None]:
#Import the required below:
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from math import pi
from factor_analyzer import calculate_kmo
from factor_analyzer import calculate_bartlett_sphericity
from scipy.stats import pearsonr

In [None]:
# Set working directory
# Set the working directory to the path where your files are located
os.chdir('path_to_your_data')

In [None]:
# Load your existing data (replace 'your_data.csv' with your file path)
df = pd.read_csv('your_Data.csv', encoding='ISO-8859-1')


In [None]:
# Extract the subset of columns for each TPB construct (example: Attitude)
attitude_data = df[['Attitude3 Coded', 'Attitude3 Coded', 'Attitude4 Coded', 'Attitude5 Coded', 'Attitude6 Coded',
                      'Attitude7 Coded', 'Attitude8 Coded', 'Attitude9 Coded']]

# Calculate the KMO test
kmo_all, kmo_model = calculate_kmo(attitude_data)
print("KMO Test Value for Attitude:", kmo_model)

In [None]:
# Calculate Bartlett’s Test for Attitude
chi_square_value, p_value = calculate_bartlett_sphericity(attitude_data)
print("Bartlett’s Test Chi-Square Value:", chi_square_value)
print("Bartlett’s Test p-Value:", p_value)

# Interpretation
if p_value < 0.05:
    print("Bartlett's Test is significant (p < 0.05). The data is suitable for factor analysis.")
else:
    print("Bartlett's Test is not significant (p ≥ 0.05). The data may not be suitable for factor analysis.")

In [None]:
# Function to calculate Cronbach’s Alpha
def cronbach_alpha(df):
    items = df.columns
    item_scores = df.to_numpy()
    item_count = len(items)

    variance_items = np.var(item_scores, axis=0, ddof=1)
    total_score = item_scores.sum(axis=1)
    variance_total = np.var(total_score, ddof=1)

    alpha = (item_count / (item_count - 1)) * (1 - (variance_items.sum() / variance_total))
    return alpha

# Calculate Cronbach’s Alpha for Attitude
attitude_alpha = cronbach_alpha(attitude_data)
print("Cronbach’s Alpha for Attitude:", attitude_alpha)

# Interpretation
if attitude_alpha >= 0.7:
    print("Internal consistency is acceptable (α ≥ 0.7).")
else:
    print("Internal consistency is questionable (α < 0.7).")

In [None]:
# Extract the subset of columns for each TPB construct (example 2: Social norms)
snorms_data = df[['SNorms1 Coded', 'SNorms2 Coded', 'SNorms3 Coded', 'SNorms4 Coded']]

# Calculate the KMO test
kmo_all, kmo_model = calculate_kmo(snorms_data)
print("KMO Test Value for Social Norms:", kmo_model)

In [None]:
# Calculate Bartlett’s Test for Social norm
chi_square_value, p_value = calculate_bartlett_sphericity(snorms_data)
print("Bartlett’s Test Chi-Square Value:", chi_square_value)
print("Bartlett’s Test p-Value:", p_value)

# Interpretation
if p_value < 0.05:
    print("Bartlett's Test is significant (p < 0.05). The data is suitable for factor analysis.")
else:
    print("Bartlett's Test is not significant (p ≥ 0.05). The data may not be suitable for factor analysis.")

In [None]:
# Function to calculate Cronbach’s Alpha
def cronbach_alpha(df):
    items = df.columns
    item_scores = df.to_numpy()
    item_count = len(items)

    variance_items = np.var(item_scores, axis=0, ddof=1)
    total_score = item_scores.sum(axis=1)
    variance_total = np.var(total_score, ddof=1)

    alpha = (item_count / (item_count - 1)) * (1 - (variance_items.sum() / variance_total))
    return alpha

# Calculate Cronbach’s Alpha for Attitude
snorms_alpha = cronbach_alpha(snorms_data)
print("Cronbach’s Alpha for Social Norms:", snorms_alpha)

# Interpretation
if snorms_alpha >= 0.7:
    print("Internal consistency is acceptable (α ≥ 0.7).")
else:
    print("Internal consistency is questionable (α < 0.7).")

In [None]:
# Extract the subset of columns for each TPB construct (example 3: Climate change pereception)
ccperc_data = df[['CCPerception1 Coded', 'CCPerception2 Coded', 'CCPerception3 Coded', 'CCPerception4 Coded', 'CCPerception5 Coded', 'CCPerception6 Coded', 'CCPerception7 Coded', 'CCPerception8 Coded', 'CCPerception9 Coded', 'CCPerception10 Coded', 'CCPerception11 Coded', 'CCPerception12 Coded', 'CCPerception13 Coded', 'CCPerception14 Coded']]

# Calculate the KMO test
kmo_all, kmo_model = calculate_kmo(ccperc_data)
print("KMO Test Value for Climate Change Perception:", kmo_model)

In [None]:
# Calculate Bartlett’s Test for Climate change pereception
chi_square_value, p_value = calculate_bartlett_sphericity(ccperc_data)
print("Bartlett’s Test Chi-Square Value:", chi_square_value)
print("Bartlett’s Test p-Value:", p_value)

# Interpretation
if p_value < 0.05:
    print("Bartlett's Test is significant (p < 0.05). The data is suitable for factor analysis.")
else:
    print("Bartlett's Test is not significant (p ≥ 0.05). The data may not be suitable for factor analysis.")

In [None]:
# Function to calculate Cronbach’s Alpha
def cronbach_alpha(df):
    items = df.columns
    item_scores = df.to_numpy()
    item_count = len(items)

    variance_items = np.var(item_scores, axis=0, ddof=1)
    total_score = item_scores.sum(axis=1)
    variance_total = np.var(total_score, ddof=1)

    alpha = (item_count / (item_count - 1)) * (1 - (variance_items.sum() / variance_total))
    return alpha

# Calculate Cronbach’s Alpha for Attitude
ccperc_alpha = cronbach_alpha(snorms_data)
print("Cronbach’s Alpha for CLimate Change Perception:", ccperc_alpha)

# Interpretation
if ccperc_alpha >= 0.7:
    print("Internal consistency is acceptable (α ≥ 0.7).")
else:
    print("Internal consistency is questionable (α < 0.7).")

Follow the above examples to run the tests for all groups of variables in your data and adjust your analysis accordingly