# Using ACS 5-Year Estimate Median Household Income Data from LA County to visualize wealth stratification.
## Does wealth stratification in LA County have any visual geospatial correlations with coast live oak locations? What are the intersections between wealth and access to green space, specifically green space with native flora like the coast live oak and live oak?

#### We begin with initial data exploration 

In [29]:
import pandas as pd
import geopandas as gpd

In [3]:
incomedf = pd.read_csv('Group Data/MedianHHI.csv')

In [5]:
# this will help with column id
incomedf.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2498 entries, 0 to 2497
Data columns (total 14 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   GeoID                 2498 non-null   int64 
 1   TotalHH               2498 non-null   int64 
 2   Under10k              2498 non-null   object
 3   $10,000 to $14,999    2498 non-null   object
 4   $15,000 to $24,999    2498 non-null   object
 5   $25,000 to $34,999    2498 non-null   object
 6   $35,000 to $49,999    2498 non-null   object
 7   $50,000 to $74,999    2498 non-null   object
 8   $75,000 to $99,999    2498 non-null   object
 9   $100,000 to $149,999  2498 non-null   object
 10  $150,000 to $199,999  2498 non-null   object
 11  $200,000 or more      2498 non-null   object
 12  Median income         2498 non-null   object
 13  Mean income           2498 non-null   object
dtypes: int64(2), object(12)
memory usage: 273.3+ KB


In [6]:
incomedf.head()

Unnamed: 0,GeoID,TotalHH,Under10k,"$10,000 to $14,999","$15,000 to $24,999","$25,000 to $34,999","$35,000 to $49,999","$50,000 to $74,999","$75,000 to $99,999","$100,000 to $149,999","$150,000 to $199,999","$200,000 or more",Median income,Mean income
0,6037101110,1551,4.3,6.2,6.8,5.7,11.0,19.0,6.6,21.0,7.9,11.4,68972,94342
1,6037101122,1383,6.1,0.0,3.7,3.5,1.0,17.3,12.5,20.5,14.8,20.7,118859,146666
2,6037101220,1349,6.3,4.5,7.4,3.6,12.6,21.5,12.8,11.3,9.6,10.5,65139,98721
3,6037101221,1424,5.9,6.3,13.7,9.3,11.6,16.9,10.5,14.4,7.8,3.6,53348,69670
4,6037101222,928,7.3,19.6,9.6,9.5,10.6,5.0,11.5,17.6,3.8,5.6,36779,67587


In [15]:
incomedf['Median income'].describe()

count     2498
unique    2329
top          -
freq        49
Name: Median income, dtype: object

I am going to use a sorted version of the data (tract ID (labeled as GeoID) and median household income)that I want to use going forward:

In [13]:
subsetdf = incomedf[['GeoID', 'Median income']]

In [26]:
subsetdf.head()

Unnamed: 0,GeoID,Median income
0,6037101110,68972
1,6037101122,118859
2,6037101220,65139
3,6037101221,53348
4,6037101222,36779


Now that I have the data that I need (tract ID and median household income), it is time for charts and plots. But first I need to convert median income from obeject to numeric

In [30]:

# Remove non-numeric characters (e.g., '$', ',' in currency formats)
incomedf['Median income'] = incomedf['Median income'].replace('[\$,]', '', regex=True)

# Convert to numeric
incomedf['Median income'] = pd.to_numeric(incomedf['Median income'], errors='coerce')

# Check if the conversion was successful
print(incomedf[['GeoID', 'Median income']].dtypes)  # Should show 'Median income' as float or int

GeoID              int64
Median income    float64
dtype: object


In [37]:
incomedf['Median income'] = pd.to_numeric(incomedf['Median income'], errors='coerce')
incomedf['GeoID'] = pd.to_numeric(incomedf['GeoID'], errors='coerce')

In [38]:
print(incomedf[['GeoID', 'Median income']].dtypes)

GeoID              int64
Median income    float64
dtype: object


In [39]:
print(incomedf['Median income'].isna().sum())  # Count NaNs
print(incomedf['Median income'].describe())    # Check summary statistics


64
count      2434.000000
mean      88343.781841
std       37617.918565
min        9417.000000
25%       61205.750000
50%       81759.500000
75%      107532.500000
max      249432.000000
Name: Median income, dtype: float64


In [51]:
incomedf.head(25).plot.bar(
    x='GeoID',
    y='Median income',
    title='Top 25 Census Tracts with Highest MHHI in Los Angeles County'
)

<Axes: title={'center': 'Top 25 Census Tracts with Highest MHHI in Los Angeles County'}, xlabel='GeoID'>

Mapping

In [41]:
import geopandas as gpd

In [43]:
tracts = gpd.read_file('Group Data/2020_Census_Tracts.geojson')

In [44]:
tracts.head()

Unnamed: 0,OBJECTID,CT20,LABEL,ShapeSTArea,ShapeSTLength,geometry
0,4992,101110,1011.1,12295620.0,15083.854287,"POLYGON ((-118.29793 34.26323, -118.30082 34.2..."
1,4993,101122,1011.22,28457740.0,31671.455844,"POLYGON ((-118.27743 34.25991, -118.27743 34.2..."
2,4994,101220,1012.2,7522093.0,12698.78381,"POLYGON ((-118.27818 34.25577, -118.27887 34.2..."
3,4995,101221,1012.21,3812000.0,9161.710543,"POLYGON ((-118.28735 34.25591, -118.28863 34.2..."
4,4996,101222,1012.22,3191371.0,9980.600461,"POLYGON ((-118.28594 34.2559, -118.28697 34.25..."


In [45]:
tracts.plot(figsize=(12,10))

<Axes: >

In [46]:
tracts = tracts[['CT20','geometry']]
tracts.head()

Unnamed: 0,CT20,geometry
0,101110,"POLYGON ((-118.29793 34.26323, -118.30082 34.2..."
1,101122,"POLYGON ((-118.27743 34.25991, -118.27743 34.2..."
2,101220,"POLYGON ((-118.27818 34.25577, -118.27887 34.2..."
3,101221,"POLYGON ((-118.28735 34.25591, -118.28863 34.2..."
4,101222,"POLYGON ((-118.28594 34.2559, -118.28697 34.25..."


In [48]:
# create a FIPS column
tracts = tracts.loc[tracts['FIPS'] == ('6' + '037' + tracts['CT20'].astype(str))]

In [49]:
tracts.head()

Unnamed: 0,CT20,geometry,FIPS
0,101110,"POLYGON ((-118.29793 34.26323, -118.30082 34.2...",6037101110
1,101122,"POLYGON ((-118.27743 34.25991, -118.27743 34.2...",6037101122
2,101220,"POLYGON ((-118.27818 34.25577, -118.27887 34.2...",6037101220
3,101221,"POLYGON ((-118.28735 34.25591, -118.28863 34.2...",6037101221
4,101222,"POLYGON ((-118.28594 34.2559, -118.28697 34.25...",6037101222


Create a new dataframe based on the join

In [61]:
print(tracts['FIPS'].dtype)
print(incomedf['GeoID'].dtype)

object
int64


In [62]:
# convert to the same data type
incomedf['GeoID'] = incomedf['GeoID'].astype(object) 

In [63]:
# confirm it worked
print(tracts['FIPS'].dtype)
print(incomedf['GeoID'].dtype)

object
object


In [64]:
tracts_MHHI = tracts.merge(incomedf, left_on='FIPS', right_on='GeoID')

In [65]:
tracts_MHHI.head()

Unnamed: 0,CT20,geometry,FIPS,GeoID,TotalHH,Under10k,"$10,000 to $14,999","$15,000 to $24,999","$25,000 to $34,999","$35,000 to $49,999","$50,000 to $74,999","$75,000 to $99,999","$100,000 to $149,999","$150,000 to $199,999","$200,000 or more",Median income,Mean income


In [66]:
tracts_MHHI.plot(figsize=(12,10),
                 column='med_hh_income',
                 legend=True, 
                 scheme='NaturalBreaks')

ValueError: aspect must be finite and positive 