## Children Household Count Per County File Processing

In this file, we will be making a dataframe that includes the number of children in all of the households in each county of the US.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import pandas as pd
import geopandas as gpd

### Importing county level shape file

This shapefile of US counties was also found online. This file provides the polygon geometries of each county in the US; this will allow us to make visualizations of the US with breakdowns by county. In this case, we are making this processing file to allow us make a map of US with children count by county.

In [4]:
df_counties = gpd.read_file('/hpc/group/codeplus22-vis/celine_data/source_files/county_shapefiles/counties.shp')

df_counties['county_fips'] = df_counties['STATEFP'] + df_counties['COUNTYFP']
df_counties

Unnamed: 0,STATEFP,COUNTYFP,COUNTYNS,AFFGEOID,GEOID,NAME,NAMELSAD,STUSPS,STATE_NAME,LSAD,ALAND,AWATER,geometry,county_fips
0,20,161,00485044,0500000US20161,20161,Riley,Riley County,KS,Kansas,06,1579077672,32047392,"POLYGON ((-96.96095 39.28670, -96.96106 39.288...",20161
1,19,159,00465268,0500000US19159,19159,Ringgold,Ringgold County,IA,Iowa,06,1386932347,8723135,"POLYGON ((-94.47167 40.81255, -94.47166 40.819...",19159
2,30,009,01720111,0500000US30009,30009,Carbon,Carbon County,MT,Montana,06,5303728455,35213028,"POLYGON ((-109.79867 45.16734, -109.68779 45.1...",30009
3,16,007,00395090,0500000US16007,16007,Bear Lake,Bear Lake County,ID,Idaho,06,2527123155,191364281,"POLYGON ((-111.63452 42.57034, -111.63010 42.5...",16007
4,55,011,01581065,0500000US55011,55011,Buffalo,Buffalo County,WI,Wisconsin,06,1750290818,87549529,"POLYGON ((-92.08384 44.41200, -92.08310 44.414...",55011
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3229,53,003,01533502,0500000US53003,53003,Asotin,Asotin County,WA,Washington,06,1647427905,11291731,"POLYGON ((-117.47999 46.12199, -117.41948 46.1...",53003
3230,13,043,00342852,0500000US13043,13043,Candler,Candler County,GA,Georgia,06,629520841,15018189,"POLYGON ((-82.25457 32.35150, -82.25276 32.353...",13043
3231,48,451,01384011,0500000US48451,48451,Tom Green,Tom Green County,TX,Texas,06,3941965409,48077315,"POLYGON ((-101.26763 31.55646, -101.25039 31.5...",48451
3232,39,089,01074057,0500000US39089,39089,Licking,Licking County,OH,Ohio,06,1767478831,12761090,"POLYGON ((-82.78181 39.94698, -82.78126 39.955...",39089


### Importing infousa data

This preprocessed infousa household data contains the children counts of households as well as the zipcode and county fips of each household. This file also includes the transformed latitude and longitude coordinates.

In this dataset, the county_fips column represents each individual county; this is the column we will be grouping the children count by. 

In [5]:
df_hh = pd.read_parquet('/hpc/group/codeplus22-vis/celine_data/zip_00_99.parquet')
df_hh

Unnamed: 0,zip,county_fips,state,child_num,has_child,age_code,lat_h_4326,lon_h_4326,lat_h_3857,lon_h_3857
0,18833,42113,PA,0,0,K,41.546738,-76.540436,-8.520442e+06,5.093323e+06
1,18833,42015,PA,0,0,H,41.590800,-76.424200,-8.507503e+06,5.099879e+06
2,18833,42015,PA,1,1,C,41.600392,-76.441724,-8.509454e+06,5.101307e+06
3,18833,42015,PA,0,0,L,41.592483,-76.437832,-8.509021e+06,5.100129e+06
4,18833,42015,PA,1,1,H,41.566196,-76.347977,-8.499018e+06,5.096218e+06
...,...,...,...,...,...,...,...,...,...,...
190987608,92003,06073,CA,0,0,C,33.285885,-117.240445,-1.305115e+07,3.933312e+06
190987609,92003,06073,CA,0,0,E,33.284700,-117.210800,-1.304785e+07,3.933154e+06
190987610,92003,06073,CA,0,0,G,33.282869,-117.183963,-1.304486e+07,3.932911e+06
190987611,92003,06073,CA,0,0,H,33.278284,-117.181181,-1.304455e+07,3.932300e+06


In [9]:
df_hh = df_hh[['county_fips', 'state', 'child_num', 'has_child', 'age_code']]
df_hh

Unnamed: 0,county_fips,state,child_num,has_child,age_code
0,42113,PA,0,0,K
1,42015,PA,0,0,H
2,42015,PA,1,1,C
3,42015,PA,0,0,L
4,42015,PA,1,1,H
...,...,...,...,...,...
190987608,06073,CA,0,0,C
190987609,06073,CA,0,0,E
190987610,06073,CA,0,0,G
190987611,06073,CA,0,0,H


### Groupby

Here, we are taking the infousa preprocessed file and calculating the number of children per county. We are doing so using the ```groupby``` function, which groups the county_fips together by summing the corresponding ```child_num``` values. The resulting dataframe is giving the calculated number of children per county_fips.

In [10]:
df_child_count = df_hh.groupby('county_fips')['child_num'].sum().reset_index()
df_child_count

Unnamed: 0,county_fips,child_num
0,01001,19566
1,01003,60951
2,01005,6527
3,01007,6087
4,01009,18963
...,...,...
3104,56037,10878
3105,56039,2754
3106,56041,4950
3107,56043,1841


### Merging child number counts with county geometries

Now that we have the child_num calculated for each ```county_fips```, we are merging the county geometry shapefile with the child count dataframe on the ```county_fips``` column. Afterwards, we will select on the columns we want to keep in this dataframe, before exporting this as a parquet file to be used in future visualizations.

In [11]:
df_final = df_child_count.merge(df_counties, on = ['county_fips'], how = 'left')

df_final

Unnamed: 0,county_fips,child_num,STATEFP,COUNTYFP,COUNTYNS,AFFGEOID,GEOID,NAME,LSAD,ALAND,AWATER,geometry
0,01001,19566,01,001,00161526,0500000US01001,01001,Autauga,06,1.539602e+09,2.570696e+07,"POLYGON ((-86.92120 32.65754, -86.92035 32.658..."
1,01003,60951,01,003,00161527,0500000US01003,01003,Baldwin,06,4.117547e+09,1.133056e+09,"POLYGON ((-88.02858 30.22676, -88.02399 30.230..."
2,01005,6527,01,005,00161528,0500000US01005,01005,Barbour,06,2.292145e+09,5.053870e+07,"POLYGON ((-85.74803 31.61918, -85.74544 31.618..."
3,01007,6087,01,007,00161529,0500000US01007,01007,Bibb,06,1.612167e+09,9.602089e+06,"POLYGON ((-87.42194 33.00338, -87.31854 33.006..."
4,01009,18963,01,009,00161530,0500000US01009,01009,Blount,06,1.670104e+09,1.501542e+07,"POLYGON ((-86.96336 33.85822, -86.95967 33.857..."
...,...,...,...,...,...,...,...,...,...,...,...,...
3104,56037,10878,56,037,01609192,0500000US56037,56037,Sweetwater,06,2.700575e+10,1.662303e+08,"POLYGON ((-110.05438 42.01103, -110.05436 42.0..."
3105,56039,2754,56,039,01605083,0500000US56039,56039,Teton,06,1.035178e+10,5.708649e+08,"POLYGON ((-111.05361 44.66627, -110.75076 44.6..."
3106,56041,4950,56,041,01605084,0500000US56041,56041,Uinta,06,5.391632e+09,1.662582e+07,"POLYGON ((-111.04662 41.15604, -111.04659 41.2..."
3107,56043,1841,56,043,01605085,0500000US56043,56043,Washakie,06,5.798139e+09,1.042960e+07,"POLYGON ((-108.55056 44.16845, -108.50652 44.1..."


In [12]:
df_final = df_final[['STATEFP', 'NAME', 'county_fips','child_num', 'geometry']]
df_final

Unnamed: 0,STATEFP,NAME,county_fips,child_num,geometry
0,01,Autauga,01001,19566,"POLYGON ((-86.92120 32.65754, -86.92035 32.658..."
1,01,Baldwin,01003,60951,"POLYGON ((-88.02858 30.22676, -88.02399 30.230..."
2,01,Barbour,01005,6527,"POLYGON ((-85.74803 31.61918, -85.74544 31.618..."
3,01,Bibb,01007,6087,"POLYGON ((-87.42194 33.00338, -87.31854 33.006..."
4,01,Blount,01009,18963,"POLYGON ((-86.96336 33.85822, -86.95967 33.857..."
...,...,...,...,...,...
3104,56,Sweetwater,56037,10878,"POLYGON ((-110.05438 42.01103, -110.05436 42.0..."
3105,56,Teton,56039,2754,"POLYGON ((-111.05361 44.66627, -110.75076 44.6..."
3106,56,Uinta,56041,4950,"POLYGON ((-111.04662 41.15604, -111.04659 41.2..."
3107,56,Washakie,56043,1841,"POLYGON ((-108.55056 44.16845, -108.50652 44.1..."


In [13]:
df_final.rename(columns = {'STATEFP': 'state', 'NAME':'county'}, inplace = True)
df_final

Unnamed: 0,state,county,county_fips,child_num,geometry
0,01,Autauga,01001,19566,"POLYGON ((-86.92120 32.65754, -86.92035 32.658..."
1,01,Baldwin,01003,60951,"POLYGON ((-88.02858 30.22676, -88.02399 30.230..."
2,01,Barbour,01005,6527,"POLYGON ((-85.74803 31.61918, -85.74544 31.618..."
3,01,Bibb,01007,6087,"POLYGON ((-87.42194 33.00338, -87.31854 33.006..."
4,01,Blount,01009,18963,"POLYGON ((-86.96336 33.85822, -86.95967 33.857..."
...,...,...,...,...,...
3104,56,Sweetwater,56037,10878,"POLYGON ((-110.05438 42.01103, -110.05436 42.0..."
3105,56,Teton,56039,2754,"POLYGON ((-111.05361 44.66627, -110.75076 44.6..."
3106,56,Uinta,56041,4950,"POLYGON ((-111.04662 41.15604, -111.04659 41.2..."
3107,56,Washakie,56043,1841,"POLYGON ((-108.55056 44.16845, -108.50652 44.1..."


### Converting GeoDataFrame

In order to export this dataframe as a shapefile, the pandas dataframe must be converted to a GeoDataFrame using the ```gpd.GeoDataFrame()``` function. 

In [15]:
gpd_df = gpd.GeoDataFrame(df_final)
gpd_df

Unnamed: 0,state,county,county_fips,child_num,geometry
0,01,Autauga,01001,19566,"POLYGON ((-86.92120 32.65754, -86.92035 32.658..."
1,01,Baldwin,01003,60951,"POLYGON ((-88.02858 30.22676, -88.02399 30.230..."
2,01,Barbour,01005,6527,"POLYGON ((-85.74803 31.61918, -85.74544 31.618..."
3,01,Bibb,01007,6087,"POLYGON ((-87.42194 33.00338, -87.31854 33.006..."
4,01,Blount,01009,18963,"POLYGON ((-86.96336 33.85822, -86.95967 33.857..."
...,...,...,...,...,...
3104,56,Sweetwater,56037,10878,"POLYGON ((-110.05438 42.01103, -110.05436 42.0..."
3105,56,Teton,56039,2754,"POLYGON ((-111.05361 44.66627, -110.75076 44.6..."
3106,56,Uinta,56041,4950,"POLYGON ((-111.04662 41.15604, -111.04659 41.2..."
3107,56,Washakie,56043,1841,"POLYGON ((-108.55056 44.16845, -108.50652 44.1..."


In [16]:
gpd_df.to_file('/hpc/group/codeplus22-vis/celine_data/children_count_by_county.shp')