# Load Data

In [1]:
from helpers import res, category, keyword
import pingouin as pg

# Internal Validity - by Category

## Findings

- See Correlation-within-each-Category section for details.

### Policy Transparency
- There are negative correlations in this category. However, they can be explained as follows:
- There are 2 separate groups in this category: (T3, T11) v. (T6, A1)
- Former is about both about budget, and latter about representation of undocumented population in policy creation
- So they are not required to be related

### Undocumented Access
- No notable "contraditions"

### Identification and Residency Requirements
- The only 2 questions have low correlation 0.230
- But they ask unrelated questions - ID v. residency

### Marginalized Access
- All questions here have low or negative correlations
- But they are also unrelated questions by nature

### Privacy Guarantees
- A31 is the odd one, while the other 2 have high correlation
- But A31 asks an unrelated question

## Finding Low Variance Questions

- With low variance questions, we expect their correlation with others to be undefined.

In [2]:
desc = res.describe()
display(desc.loc[:, desc.loc['std', :] < 0.5])

Unnamed: 0,T1,T2,T11,A26
count,21.0,21.0,21.0,21.0
mean,1.0,1.0,-1.666667,1.0
std,0.0,0.0,0.483046,0.0
min,1.0,1.0,-2.0,1.0
25%,1.0,1.0,-2.0,1.0
50%,1.0,1.0,-2.0,1.0
75%,1.0,1.0,-1.0,1.0
max,1.0,1.0,-1.0,1.0


### Correlation within each Category

In [3]:
for cat in category.unique():
    print('#'*20, cat, '#'*20)
    corr = res[category.index[category == cat]].corr()
    display(corr)

#################### Policy Transparency ####################


Unnamed: 0,T1,T2,T3,T11,T6,A1
T1,,,,,,
T2,,,,,,
T3,,,1.0,0.993615,-0.370779,-0.350868
T11,,,0.993615,1.0,-0.346688,-0.32705
T6,,,-0.370779,-0.346688,1.0,0.997782
A1,,,-0.350868,-0.32705,0.997782,1.0


#################### Undocumented Access ####################


Unnamed: 0,T4,T5,A3,A11,A12,A13
T4,1.0,0.38737,0.772653,0.369089,0.708902,0.576995
T5,0.38737,1.0,0.514343,0.364308,0.629414,0.375995
A3,0.772653,0.514343,1.0,0.265936,0.722762,0.496813
A11,0.369089,0.364308,0.265936,1.0,0.45398,0.666867
A12,0.708902,0.629414,0.722762,0.45398,1.0,0.553877
A13,0.576995,0.375995,0.496813,0.666867,0.553877,1.0


#################### Identification and Residency Requirements ####################


Unnamed: 0,T9,T10
T9,1.0,0.352911
T10,0.352911,1.0


#################### Marginalized Access ####################


Unnamed: 0,T7,A18,A19,A21,A26,A27
T7,1.0,0.152504,0.015375,0.083915,,0.344594
A18,0.152504,1.0,-0.384895,-0.238036,,0.001659
A19,0.015375,-0.384895,1.0,0.309141,,0.405776
A21,0.083915,-0.238036,0.309141,1.0,,0.445347
A26,,,,,,
A27,0.344594,0.001659,0.405776,0.445347,,1.0


#################### Privacy Guarantees ####################


Unnamed: 0,T12,A5,A31
T12,1.0,0.780869,0.085749
A5,0.780869,1.0,-0.066959
A31,0.085749,-0.066959,1.0


# Internal Validity - by "Keyword"

## Findings

- A3 and T9 both ask about the ID requirements yet their correlation is -0.138
- T4 and T5 have slightly low correlation but they are necessarily related
- All other keyword groups have high correlation when appropriate
- Note that NaN's are because there is only 1 variable

In [4]:
for kw in keyword.unique():
    print('#'*20, kw, '#'*20)
    corr = res[keyword.index[keyword == kw]].corr()
    print(corr)

#################### In public docs ####################
    T1
T1 NaN
#################### In statements ####################
    T2
T2 NaN
#################### Budget ####################
           T3       T11
T3   1.000000  0.993615
T11  0.993615  1.000000
#################### Representation ####################
          T6        A1
T6  1.000000  0.997782
A1  0.997782  1.000000
#################### Inclusion ####################
         T4       T5
T4  1.00000  0.38737
T5  0.38737  1.00000
#################### ID ####################
         A3       T9
A3  1.00000  0.10555
T9  0.10555  1.00000
#################### Same Basis ####################
     A11
A11  1.0
#################### Vaccine Subsidy ####################
     A12
A12  1.0
#################### Vaccine Choice ####################
     A13
A13  1.0
#################### Residency ####################
     T10
T10  1.0
#################### Housing ####################
     T7
T7  1.0
#################### Incarcerat

# Cronbach's Alpha

Cronbach's Alpha measurs if a set of data measures the same underlying factor.

## Overall

- Conclusion: 0.729 (Good)

- Nunnally (1978)
  - 0.70 &#8594; Exploratory research
  - 0.80 &#8594; Basic research
  - 0.90 &#8594; Applied research (e.g. techniques in clinical psychology)
  
https://www.researchgate.net/publication/230786782_The_Sources_of_Four_Commonly_Reported_Cutoff_Criteria_What_Did_They_Really_Say

## Category Level

- Only undocumented access gets very high Cronbach's Alpha score.
- But as we have noted in correlation section, same category does not necessarily measure the same underlying factor.

In [5]:
print("Overall Cronbach's Alpha:", pg.cronbach_alpha(data = res)[0])

Overall Cronbach's Alpha: 0.729089392999182


In [6]:
for cat in category.unique():
    print('#'*20, cat, '#'*20)
    print(pg.cronbach_alpha(res[category.index[category == cat]])[0])

#################### Policy Transparency ####################
0.198157762518841
#################### Undocumented Access ####################
0.8679950186799502
#################### Identification and Residency Requirements ####################
0.5141672425708361
#################### Marginalized Access ####################
0.3420078226857888
#################### Privacy Guarantees ####################
0.5603217158176943
