# Correlation among Prevention Quality Indicators (PQIs)

Prevention Quality Indicators (PQIs)

County-by-county correlations for OSHPD Prevention Quality Indicators. 

A high correlation indicates that across many counties, counties that have a high risk-adjusted rate for indicator_1 also has a high risk-adjusted rate for indicator_2. The counties in this dataset are the counties of the locations of the hospitals where patients are treated, not the counties of the residences of the patients. Because small counties will have a small number of hospitals and patients are likely to visit hospitals in other counties, small counties may have inaccurate correlations. 

A high correlation does not indicate that individuals in a county are highly likely to have both conditions. For instance, it could be that hospitals that specialize the diseases of one indicator are also well know for treating the other. In such a case, the correlation is becase the treatements of the diseases are correlated, not that the prevalence of the diseses are corelated. 


In [76]:
%matplotlib inline
import pandas as pd
import numpy as np
from ambry import get_library
l = get_library()
b = l.bundle('oshpd.ca.gov-pqi-0.0.3')
p = b.partition('oshpd.ca.gov-pqi-pqi-county-0.0.3').localize()

In [74]:
# Get all of the risk adjusted rates, and remove all of the composites, because those will be, by definition, 
# correlated with their components. 
cols = [c for c in df.columns if (c.endswith('_risk_adjusted_rate') and 'composite' not in c)]
short_cols = [c.replace('_risk_adjusted_rate','') for c in cols]
dfrar = df[cols]
dfrar.columns = short_cols

corrm=dfrar.corr()
# Correlation matrices are symmetric around the diagonal, so we only need one half,
# the lower triangle
corrm.loc[:,:] = np.tril(corrm,k=-1) 


In [75]:
# Stacking puts the column headings into a multi-index
# Reseting the index puts the two levels of the multi-index into columns
corrstk = corrm.stack().reset_index().copy()

corrstk.columns = ["indicator_1", "indicator_2", 'correlation']
corrstk.sort('correlation', ascending=False)

# Removing correlation == 0 removes the duplicate indicator pairs; since we took only
# the bottom of the triangle, all of the duplicate pairs on the top are 0
corrstk[corrstk.correlation!=0].sort('correlation', ascending = False)



Unnamed: 0,indicator_1,indicator_2,correlation
67,pqi_8_heart_failure,pqi_3_diabetes_long_term_complications,0.770318
161,pqi_16_lower_extremity_amputation_among_patien...,pqi_8_heart_failure,0.754825
107,pqi_12_urinary_tract_infection,pqi_5_copd_or_asthma_in_older_adults,0.714779
158,pqi_16_lower_extremity_amputation_among_patien...,pqi_3_diabetes_long_term_complications,0.702959
94,pqi_11_bacterial_pneumonia,pqi_5_copd_or_asthma_in_older_adults,0.695201
111,pqi_12_urinary_tract_infection,pqi_11_bacterial_pneumonia,0.678049
81,pqi_10_dehydration,pqi_5_copd_or_asthma_in_older_adults,0.674903
97,pqi_11_bacterial_pneumonia,pqi_10_dehydration,0.644036
68,pqi_8_heart_failure,pqi_5_copd_or_asthma_in_older_adults,0.627826
106,pqi_12_urinary_tract_infection,pqi_3_diabetes_long_term_complications,0.624238
