---
format: 
  html:
    toc: false
    page-layout: full
execute:
    echo: false
---


<div class="text-box">
    
## 2.4 **BONUS** Statistical Analysis and Seaborn Heatmap
    
To further investigate the relationships between educational attainment, household income, and the presence of college buildings, I'll: 
   
1) Compute the correlation, p-value, and standard errors of Associates Degree, Bachelors Degree, Presence of Building in Tract, and Median Household Income for the entireity of Philadelphia 
    
    
2) Plot these correlations on a Seaborn heat map with an interactive tool tip that shows the  correlation, p-value, and standard errors of each variable as it correlates with one another. 
    
    
3) Repeat the previous two steps for tracts that have larger White-Latine populations and tracks that have larger Black Latine populations
    
    

In [4]:
import pandas as pd
import numpy as np
from scipy.stats import pearsonr





import altair as alt
import geopandas as gpd
import hvplot.pandas
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
import requests
import folium
import panel as pn
import xyzservices

In [2]:
#| echo: true
#| code-fold: true


tracts_all_df=pd.read_csv("demographics_with_tracts_and_buildings_per_tract.csv")


corr_numeric_cols = [
    "Median Household Income",
    "Bachelors Degree or Higher",
    "Associates Degree",
    "buildings_cat",
    "White and Latino/Hispanic",
    "Black and Latino/Hispanic"
    
]

for col in corr_numeric_cols:
        tracts_all_df[col] = pd.to_numeric(tracts_all_df[col], errors='coerce')

n = len(corr_numeric_cols)
corr_matrix = np.zeros((n, n))
pval_matrix = np.zeros((n, n))
stderr_matrix = np.zeros((n, n)) 

for i in range(n):
    for j in range(n):
        if i == j:
            corr_matrix[i, j] = 1.0
            pval_matrix[i, j] = 0.0
            stderr_matrix[i, j] = 0.0
        elif i < j:
            pair_df = tracts_all_df[[corr_numeric_cols[i], corr_numeric_cols[j]]].dropna()
            if len(pair_df) < 3:
                corr_matrix[i, j] = np.nan
                corr_matrix[j, i] = np.nan
                pval_matrix[i, j] = np.nan
                pval_matrix[j, i] = np.nan
                stderr_matrix[i, j] = np.nan
                stderr_matrix[j, i] = np.nan
            else:
                x = pair_df[corr_numeric_cols[i]]
                y = pair_df[corr_numeric_cols[j]]
                r, p = pearsonr(x, y)
                
                corr_matrix[i, j] = r
                corr_matrix[j, i] = r
                pval_matrix[i, j] = p
                pval_matrix[j, i] = p
                
                n_pairs = len(pair_df)
                
                stderr = np.sqrt((1 - r**2) / (n_pairs - 2))
                stderr_matrix[i, j] = stderr
                stderr_matrix[j, i] = stderr

corr_df = pd.DataFrame(corr_matrix, columns=corr_numeric_cols, index=corr_numeric_cols)
pval_df = pd.DataFrame(pval_matrix, columns=corr_numeric_cols, index=corr_numeric_cols)
stderr_df = pd.DataFrame(stderr_matrix, columns=corr_numeric_cols, index=corr_numeric_cols)

corr_df


Unnamed: 0,Median Household Income,Bachelors Degree or Higher,Associates Degree,buildings_cat,White and Latino/Hispanic,Black and Latino/Hispanic
Median Household Income,1.0,0.607788,0.008413,0.079966,-0.093035,-0.18144
Bachelors Degree or Higher,0.607788,1.0,0.196838,0.000716,-0.007511,-0.100238
Associates Degree,0.008413,0.196838,1.0,-0.191563,0.255804,0.17122
buildings_cat,0.079966,0.000716,-0.191563,1.0,-0.12722,-0.159477
White and Latino/Hispanic,-0.093035,-0.007511,0.255804,-0.12722,1.0,0.405201
Black and Latino/Hispanic,-0.18144,-0.100238,0.17122,-0.159477,0.405201,1.0


In [3]:
#| echo: true
#| code-fold: true

merged_long = pd.concat([
    corr_df.stack().rename('Correlation'),
    pval_df.stack().rename('p_value'),
    stderr_df.stack().rename('std_err')
], axis=1).reset_index().rename(columns={'level_0': 'Variable1', 'level_1': 'Variable2'})


heatmap = alt.Chart(merged_long).mark_rect().encode(
    x=alt.X('Variable1:O', sort=sorted(merged_long["Variable1"].unique())),
    y=alt.Y('Variable2:O', sort=sorted(merged_long["Variable2"].unique()), scale=alt.Scale(reverse=True)),
    color=alt.Color('Correlation:Q',
                    scale=alt.Scale(scheme='redblue', domain=(-1, 1))),
    tooltip=[
        alt.Tooltip('Variable1:N'),
        alt.Tooltip('Variable2:N'),
        alt.Tooltip('Correlation:Q', format=".3f"),
        alt.Tooltip('p_value:Q', format=".3g"),
        alt.Tooltip('std_err:Q', format=".3g")
    ]
).properties(
    width=450,
    height=450,
    title="Correlation Heatmap (Altair) with p-values & Std. Error"
)


heatmap.display()

</div>
<div class="text-box">
    
## Analysis 

    
Based on the provided correlation heatmap, 
    
The relationship between post-secondary achievement rates and the presence of post-secondary school buildings within Philadelphia's census tracts appears to be minimal. 

Specifically, the number of college buildings (building_count) shows a correlation of almost 0 with both bachelor's degrees or higher (r = 0.001) and a weak negative correlation with associates degrees (r = -0.192), indicating that an increase in college buildings is not strongly associated with higher post-secondary attainment. 
    
When distinguishing between Black-Latines and White-Latines, Black and Latino/Hispanic populations show a modest positive correlation with associate degree attainment (r = 0.171), whereas White and Latino/Hispanic populations display a stronger positive correlation (r = 0.256) with associate degrees. 

Moreover, building off of our findings in 1.3, Black-Latino hispanics have a modest negative correlation with Median Household Income, indicating that the higher the Median Household Income, the lower the Black-Latin population, in comparison with White-Latinos who have no correlation with Median Household Income. 
    
    
However, as noted in **part 2.3** there are no  strong relationships between presence of post-secondary building and any demographic variables. 
     

</div>