<a href="https://colab.research.google.com/github/cfsarmiento/GenAI-Research/blob/main/PCA_weighted_composite_score.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Compute PCA-weighted composite scores

I use in here the following approach, as described in the literature. For examples check

- [https://www-users.york.ac.uk/~mb55/msc/clinimet/week7/scales.pdf](https://www-users.york.ac.uk/~mb55/msc/clinimet/week7/scales.pdf)
-[De Pauw SS, Mervielde I, De Clercq BJ, De Fruyt F, Tremmery S, Deboutte D. Personality symptoms and self-esteem as correlates of psychopathology in child psychiatric patients: Evaluating multiple informant data. Child Psychiatry and Human Development. 2009;40:499–515](https://d1wqtxts1xzle7.cloudfront.net/39504333/Personality_symptoms_and_self-esteem_as_20151028-15947-1tfnqo5-libre.pdf?1446062707=&response-content-disposition=inline%3B+filename%3DPersonality_Symptoms_and_Self_Esteem_as.pdf&Expires=1692921439&Signature=Mml6kL2ZfLAuHN4AXfdNsOP4omTK8ohAuoVknj4tJN7BT~IAzBa8tQH1zcbQIHOXf7N0PEoB8hysuegAbj9PzRd8PgN-MB2~j0koxqkxtalcWlhcflIcsaJhdjVZoZ1ENlxvQRC1Khe4-dtx9dVRKAXYJhtQKL-RK3UPZfAM-EvJsdPvNVusAg2nH822pYdaYNqxVAnGvHtVz1q-DBq07k19w9Sjowgx4C7jqiBkObBDCbzCvQL1naH9Oel83eVyd9OouGSVrEM9MS-WqdnDs0fD7PQa-J9G5mA6zxYB0gguaDuTiChZUYIWo3BjgAXuEz3m8mtbFeDcJFP9ilpHmw__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA)

- [Song MK, Lin FC, Ward SE, Fine JP. Composite variables: when and how. Nurs Res. 2013 Jan-Feb;62(1):45-9. doi: 10.1097/NNR.0b013e3182741948. PMID: 23114795; PMCID: PMC5459482.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5459482/)

Procedure:
- Check the Cronbach-Alplha  reliability coefficient to check the internal consistency of the $K$ Likert-scale variables $x_i$ (all placed in a pandas dataframe).
- Normalize each variable, substracting the mean and dividing by the stdev.
- Apply PCA, and compute the normalized loadings
- Use the normalized loadings as weights to compute the weighted  index:

$$comp\_score_{i}=\sum_{k=1}^{K}{w_i \times x_i}$$

The class depicted below includes the following methods:

- LikertCompositeCalculator( dataframe): this is the instantiation of the class, providing the dataframe with it.
- calc_ca() : calculates Cronbach-Alpha from scratch
- calc2_ca(): calculates Cronbach-Alpha using pengouin package
- calc_composite_score(): reports the composite_socre_vector (same size as the variables $x_i$), and the weight vector (size $K$)


In [None]:
!pip install pingouin --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/198.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.4/198.6 kB[0m [31m1.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m198.6/198.6 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for littleutils (setup.py) ... [?25l[?25hdone


In [None]:
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from scipy.stats import zscore
import pingouin as pg


class LikertCompositeCalculator:
    def __init__(self, data):
        self.data = data

    def calc_ca(self):
        # Calculated manually (checking the package)
        k = self.data.shape[1]
        variance_sum = self.data.var(axis=0, ddof=1).sum()
        total_variance = self.data.sum(axis=1).var(ddof=1)
        cronbach_alpha = (k / (k - 1)) * (1 - (variance_sum / total_variance))
        return cronbach_alpha

    def calc2_ca(self):
        # USing pingouin
        return pg.cronbach_alpha(self.data)

    def calc_composite_score(self):
        # Step 2: Standardization
        # This is needed for PCA - values are measured in stddevs:  (xi-xi_mean)/xi_sd
        standardized_data = zscore(self.data, ddof=1)

        # Step 3: Perform PCA
        pca = PCA()
        pca.fit(standardized_data)

        # Step 4: Calculate Weights
        weights = pca.components_[0] / np.sum(pca.components_[0])

        # Step 5: Calculate Composite Scores
        composite_scores = np.dot(standardized_data, weights)

        return composite_scores, weights


In [None]:
#  Code to test the class with synthetic data
responses = pd.DataFrame({
        'x1': [4, 5, 1, 5, 5, 4, 3, 4, 5, 4],
        'x2': [3, 5, 2, 5, 4, 4, 4, 4, 4, 3],
        'x3': [4, 5, 1, 5, 5, 4, 4, 4, 3, 5],
        'x4': [4, 4, 3, 3, 5, 5, 4, 5, 4, 5]
    })

composite = LikertCompositeCalculator(responses)

# Calculate Cronbach's alpha
cronbach_alpha = composite.calc_ca()

# Calculate Cronbach's alpha
cronbach_alpha2 = composite.calc2_ca()

# Calculate composite score and weights using PCA
composite_scores, weights = composite.calc_composite_score()

print("Cronbach's Alpha:", cronbach_alpha)
print("Cronbach's Alpha:", cronbach_alpha2)
print("Composite Scores:", composite_scores)
print("Weights:", weights)

Cronbach's Alpha: 0.8246913580246913
Cronbach's Alpha: (0.8246913580246915, array([0.539, 0.951]))
Composite Scores: [-0.26824542  0.77637853 -2.15439811  0.58579423  0.67930215  0.20999959
 -0.21666205  0.20999959  0.02226736  0.15556413]
Weights: [0.29444015 0.26434194 0.29088298 0.15033493]
