---
format: 
  html:
    toc: false
    page-layout: full
execute:
    echo: false
---

<div class="text-box">
    
# 2.3 Bar Graphs of Tracts with College Buildings vs Tracts Without College Buildings
    
    
Here you'll find my steps to make a bar graph looking at the demographic characteristics of Philadelphia census tracts with a college building in comparison to tracts without a college building. The comparison of Buildings vs No Buildings will be done over three distinct sets of census tracts that I'll elaborate on in **part 2.3.3**
    
    
    
</div>

In [7]:
import pandas as pd
import numpy as np
from scipy.stats import pearsonr





import altair as alt
import geopandas as gpd
import hvplot.pandas
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
import requests
import folium
import panel as pn
import xyzservices



<div class="text-box">
    
## 2.3.1 Load Data and Counting
    
Here I'll
    
1) Load the buildings per tract data frame from part **2.1**
    
2) Group buildings by census tracts
    
3) Create a count column counting the college buildings per census tract. 

In [3]:
#| echo: true
#| code-fold: true

college_buildings = gpd.read_file("Universities_Colleges.geojson")
demographics_with_tracts = gpd.read_file("identity_with_tracts.geojson")


if college_buildings.crs != demographics_with_tracts.crs:
    demographics_with_tracts = demographics_with_tracts.to_crs(college_buildings.crs)


college_buildings = college_buildings[college_buildings.is_valid]
demographics_with_tracts = demographics_with_tracts[demographics_with_tracts.is_valid]





buildings_per_tract = gpd.sjoin(
    college_buildings,
    demographics_with_tracts,
    how="left",         
    predicate="within"  
)




if 'NAME_left' in buildings_per_tract.columns:
    count_col = 'NAME_left'
elif 'BUILDING_ID' in buildings_per_tract.columns:
    count_col = 'BUILDING_ID'  
else:
    
    count_col = buildings_per_tract.columns[0]

counts = (
    buildings_per_tract
    .groupby("tract")[count_col]
    .count()
    .reset_index()
)

counts.rename(columns={count_col: "building_count"}, inplace=True)


all_tracts_df = demographics_with_tracts[[
    'Total Population', 'Median Household Income',
    'Black and Latino/Hispanic', 'Bachelors Degree or Higher',
    'Associates Degree', 'Masters Degree', 'White and Latino/Hispanic',
    'state', 'county', 'tract','geometry'
]].copy()


all_tracts_df['tract'] = all_tracts_df['tract'].astype(str).str.strip()
counts['tract'] = counts['tract'].astype(str).str.strip()


result = all_tracts_df.merge(counts, on="tract", how="left")


result["building_count"] = result["building_count"].fillna(0).astype(int)




building_freq = result['building_count'].value_counts().sort_index().reset_index()


building_freq.columns = ['Building Count', 'Frequency']




result.head()







Unnamed: 0,Total Population,Median Household Income,Black and Latino/Hispanic,Bachelors Degree or Higher,Associates Degree,Masters Degree,White and Latino/Hispanic,state,county,tract,geometry,building_count
0,4098,80470,60,1083,177,518,475,42,101,2701,"POLYGON ((-75.15600 39.92553, -75.15591 39.925...",0
1,4300,76060,0,991,338,427,117,42,101,2702,"POLYGON ((-75.15284 39.92511, -75.15277 39.925...",0
2,4452,65847,18,555,404,328,499,42,101,2801,"POLYGON ((-75.15910 39.92593, -75.15902 39.926...",0
3,5772,67585,289,1566,91,698,289,42,101,2802,"POLYGON ((-75.16707 39.92680, -75.16693 39.926...",0
4,3762,66932,35,865,80,600,290,42,101,2900,"POLYGON ((-75.16949 39.92560, -75.16923 39.926...",0


</div>

<div class="text-box">

## 2.3.2
    
There are a lot of census tracts across Philadelphia with 0 college buildings. 
    
To address this, I will transform the college count variable into a categorical variable, distinguishing between census tracts that have one or more college buildings and those that have none. This approach will facilitate a clearer analysis of the relationship between the presence of college buildings and other variables of interest within each census tract. 

In [8]:
#| echo: true
#| code-fold: true



numeric_cols = [
    "Median Household Income",
    "Bachelors Degree or Higher",
    "Associates Degree",
    "White and Latino/Hispanic",
    "Black and Latino/Hispanic",
]



tracts_all_df = result.copy()

for col in numeric_cols:
    result[col] = pd.to_numeric(tracts_all_df[col], errors='coerce')

def categorize_building_count(count):
    if count == 0:
        return 0
    else:
        return 1
 




tracts_all_df["buildings_cat"] = tracts_all_df["building_count"].apply(categorize_building_count)

tracts_all_df.to_csv("demographics_with_tracts_and_buildings_per_tract.csv")

white_greater_df = tracts_all_df[tracts_all_df["White and Latino/Hispanic"] > tracts_all_df["Black and Latino/Hispanic"]]
black_greater_df = tracts_all_df[tracts_all_df["Black and Latino/Hispanic"] > tracts_all_df["White and Latino/Hispanic"]]




</div>

<div class="text-box">
    
    
## 2.3.3 Bar Graphs Of Means

Now I'll develop three bar graphs with altair displaying the relationship between socioeconomic status (Median Household Income) and post-secondary achievement (Associates & Bachelors) with census tracts that have a post-secondary building, and those that do not. Each bar chart has two sets of bars (Building vs No Building) and the bar graphs are differentiated based on which overarching tracts are included in analysis. The seperation of tracts are as follows; 

a) Bar chart contaning all census tracts in Philadelphia
    
b) Bar chart of tracts with more Black Latines than White Latines
    
c) Bar chart of tracts with more White Latines than Black Latines
    


In [10]:
#| echo: true
#| code-fold: true


def create_mean_plot(df, title):
   
    selected_cols = [
        "Median Household Income",
        "Bachelors Degree or Higher",
        "Associates Degree",
    ]
    
    for col in numeric_cols:
        result[col] = pd.to_numeric(result[col], errors='coerce')
    
    
    df_clean = df.dropna(subset=selected_cols + ["buildings_cat"])
    
   
    mean_df = df_clean.groupby("buildings_cat").agg({col: "mean" for col in selected_cols}).reset_index()
    
    
    mean_df["Building Category"] = mean_df["buildings_cat"].map({0: "No Buildings", 1: "Has Buildings"})
    
   
    mean_long = mean_df.melt(
        id_vars=["Building Category"],
        value_vars=selected_cols,
        var_name="Variable",
        value_name="Mean"
    )
    
    
    color_scale = alt.Scale(
        domain=selected_cols,
        range=["#1f77b4", "#ff7f0e", "#2ca02c"] 
    )
    
    
    chart = alt.Chart(mean_long).mark_bar().encode(
        x=alt.X("Building Category:N", title="Building Category"),
        y=alt.Y("Mean:Q", title="Mean Value"),
        color=alt.Color("Variable:N", scale=color_scale, title="Variable"),
        xOffset=alt.X("Variable:N"), 
        tooltip=[
            alt.Tooltip("Building Category:N", title="Building Category"),
            alt.Tooltip("Variable:N"),
            alt.Tooltip("Mean:Q", format=".2f")
        ]
    ).properties(
        width=300,
        height=400,
        title=title
    ).interactive() 
    
    
    text = chart.mark_text(
        align='center',
        baseline='bottom',
        dy=-5  
    ).encode(
        text=alt.Text('Mean:Q', format=".0f")
    )
    
    
    final_chart = chart + text
    
    return final_chart


Philadelphia_plot= create_mean_plot(
    tracts_all_df,
    "Mean Socioeconomic Variables by Building Category Entire Philadelphia"
)



white_plot = create_mean_plot(
    white_greater_df,
    "Mean Socioeconomic Variables by Building Category (White Latino > Black Latino)"
)


black_plot = create_mean_plot(
    black_greater_df,
    "Mean Socioeconomic Variables by Building Category (Black Latino > White Latino)"
)



 


final_chart = alt.hconcat(Philadelphia_plot,black_plot, white_plot ).resolve_scale(
y='shared'
)

final_chart


</div>



<div class="text-box">

## Analysis 
    
In contrast to my initial insights from part 2.2, the Median Household Income is not lower in tracts that contain a college building. In fact, the Median Household Income appears to be a bit larger when compared to census tracts without a college building. 
    
Moreover, there is also little variation between the mean values  of Associates and Bahcelors Degree within census tracts that have a buildings vs those that do not, indicating that the presence of a college building in the nearby area does little to mediate  post-secondary achievement rates.
    
    
Additionally,  tracts that have a larger Black-Latine population than White-Latine display lower socioeconomic indicators compared to the city average and tracts with a higher White-Latine population. Specifically, Black-Latine > White Latine tracts display lower Median Household Incomes (as displayed in **part 1.3**) and have a lower mean number of individuals with a Bachelor’s degree. However, Black-Latine predominant tracts have a higher mean amount of individuals with associates degrees. Indicating a potential facilitated access of two-year universities within Black-Latine > White Latine tracts. On the other hand, White-Latine/Hispanic predominant tracts demonstrate higher median household incomes and a higher number of individuals holding a Bachelors degree, indicating better economic opportunities and higher four-year educational attainment within White-Latine predominant tracts.


The previous findings hint that demographic compositions may play a more crucial role in shaping educational and economic landscapes within census tracts than the presence of post-secondary buildings. Future studies should continue to analyze other attributes that influence post-secondary achievement within different Latine racial groups. 
    
</div>

<div class="text-box">
    
    
To answer the second question;

"How is post-secondary achievement rate (Associate/Bachelors Degree) influenced by living in the same tract as a post-secondary school building(s). 
       
a) How does this trend differ between tracts that have more Black/Latines, than White/Latines and vice versa."
    
    
    
**Overall, the  presence of college buildings does not significantly impact post-secondary degree attainment levels. The influence of college buildings on post-secondary achievement rates does not differ between Census tracts have more Black/Latines, than White/Latines and vice versa** 
    
The lack of influence that living in a tract with a post-secondary building has on post-secondary achievment rate implies that post-secondary instiutions are doing little to influence the college enrollment and persistance of their local communities. The previous statement will be further discussed in the **implications** section.  


</div>