# Get Demographic Data by Neighborhood

## Overview

Many people identify with their neighborhood so I attempted for find shapefiles at this level of detail. I found an incomplete dataset no longer being maintained by Zillow which will suffice for this project. I'm certain that more data is available from paid providers but my goal was to complete using public (free) sources.

Zipcodes proved to be problematic as they frequently are not contiguous nor are they distinct. It's (relatively) common for zipcodes to overlap or be broken into parts. I found [scholarly articles](https://towardsdatascience.com/stop-using-zip-codes-for-geospatial-analysis-ceacb6e80c38) why Zipcodes are not an appropriate drill-down from State/City.

> 'FL', 'NY', 'PA', 'TN', 'IN', 'OH', 'NC', 'UT', 'MI', 'IL', 'KY'

## Imports

In [1]:
import pandas as pd
import geopandas as gpd
from geocodio import GeocodioClient
import json
import os

import shapefile
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon

## Read Shapefiles

In [2]:
shpfile = '../../Shapefiles/Neighborhoods/FL_Neighborhood_Coordinates.shp'
neighborhoods = gpd.read_file(shpfile)
neighborhoods.head()

Unnamed: 0,state,city,name,geometry
0,FL,Jacksonville,"Sandalwood (Jacksonville, FL)",POINT (-81.50473 30.30603)
1,FL,Jacksonville,"Beach Haven (Jacksonville, FL)",POINT (-81.46188 30.26974)
2,FL,Jacksonville,"East Arlington (Jacksonville, FL)",POINT (-81.49206 30.33901)
3,FL,Jacksonville,"Golden Glades - The Woods (Jacksonville, FL)",POINT (-81.46221 30.30422)
4,FL,Jacksonville,"Bayard (Jacksonville, FL)",POINT (-81.51440 30.14201)


In [3]:
len(neighborhoods)

2680

## Read CoreLife Locations

In [4]:
shpfile = 'CoreLife_Coordinates.shp'
corelife = gpd.read_file(shpfile)
corelife.head()

Unnamed: 0,city,street,state,zipcode,type,geometry
0,Allentown,"833 North Krocks Rd, Suite 101",PA,18106,CoreLife,POINT (-75.56274 40.56345)
1,American Fork,197 NW State Street,UT,84003,CoreLife,POINT (-111.81399 40.38137)
2,Amherst,1595 Niagara Falls Boulevard,NY,14226,CoreLife,POINT (-78.82231 43.00002)
3,Ann Arbor,"205 North Maple Road, Suite 26",MI,48103,CoreLife,POINT (-83.78045 42.28266)
4,Boardman,700 Boardman-Poland Road,OH,44512,CoreLife,POINT (-80.64279 41.02461)


In [5]:
%%time
corelife['name'] = corelife.apply(lambda row: row['type'] + ' (' + row['street'] + ')', axis=1)

Wall time: 7 ms


In [6]:
corelife = corelife[['state', 'city', 'name', 'geometry']]

In [7]:
corelife.head()

Unnamed: 0,state,city,name,geometry
0,PA,Allentown,"CoreLife (833 North Krocks Rd, Suite 101)",POINT (-75.56274 40.56345)
1,UT,American Fork,CoreLife (197 NW State Street),POINT (-111.81399 40.38137)
2,NY,Amherst,CoreLife (1595 Niagara Falls Boulevard),POINT (-78.82231 43.00002)
3,MI,Ann Arbor,"CoreLife (205 North Maple Road, Suite 26)",POINT (-83.78045 42.28266)
4,OH,Boardman,CoreLife (700 Boardman-Poland Road),POINT (-80.64279 41.02461)


## Combine DataFrames

In [8]:
df = pd.concat([corelife, neighborhoods], sort=False, ignore_index=True).drop_duplicates().reset_index(drop=True)

In [9]:
df.tail()

Unnamed: 0,state,city,name,geometry
2735,FL,Altoona,"Pittman (Altoona, FL)",POINT (-81.64415 28.99689)
2736,FL,Paisley,"Lake Kathryn (Paisley, FL)",POINT (-81.49278 29.00381)
2737,FL,Bokeelia,"Pine Island Center (Bokeelia, FL)",POINT (-82.12879 26.63425)
2738,FL,Bokeelia,"Pineland (Bokeelia, FL)",POINT (-82.14526 26.66511)
2739,FL,Bokeelia,"Useppa Island (Bokeelia, FL)",POINT (-82.21333 26.66124)


## Build list of coordinates to call Geocodio with

In [11]:
points = list(df.apply(lambda row: (row['geometry'].y, row['geometry'].x, row['name']), axis=1))

In [12]:
points[0:5]

[(40.56345438125713,
  -75.56274122089594,
  'CoreLife (833 North Krocks Rd, Suite 101)'),
 (40.38136506086821, -111.81398896415881, 'CoreLife (197 NW State Street)'),
 (43.000019465722204,
  -78.82230631343448,
  'CoreLife (1595 Niagara Falls Boulevard)'),
 (42.28265869239204,
  -83.78044687481446,
  'CoreLife (205 North Maple Road, Suite 26)'),
 (41.024609999999996,
  -80.64278999999999,
  'CoreLife (700 Boardman-Poland Road)')]

## Geocodio API Call

In [13]:
with open ('Geocodio_API.txt', 'r') as f:
    
    API_KEY = f.read()

In [14]:
client = GeocodioClient(API_KEY)

## Main Program

In [15]:
%%time

if __name__ == '__main__':
    
    n = len(points)
    
    data = list()

    for i, pt in enumerate(points):
            
        print (f'\rPoint {i} of {n} - {(i+1)/n*100:3.0f}% complete', end='')
        
        listing = dict()
    
        demographics = client.reverse((pt[0],pt[1]),
                                      fields=['acs-demographics', 
                                              'acs-economics',
                                              'acs-families', 
                                              'acs-housing', 
                                              'acs-social'])
    
        try:
            
            listing['location'] = pt[2]
            listing['latitude'] = pt[0]
            listing['longitude'] = pt[1]
            listing['address'] = demographics['results'][0]['formatted_address']
            listing['street'] = demographics['results'][0]['address_components']['street']
            listing['city'] = demographics['results'][0]['address_components']['city']
            listing['county'] = demographics['results'][0]['address_components']['county']
            listing['state'] = demographics['results'][0]['address_components']['state']
            listing['zipcode'] = demographics['results'][0]['address_components']['zip']
            listing['male'] = demographics['results'][0]['fields']['acs']['social']['Population by minimum level of education']['Male']['percentage']
            listing['female'] = demographics['results'][0]['fields']['acs']['social']['Population by minimum level of education']['Female']['percentage']
            listing['veterans'] = demographics['results'][0]['fields']['acs']['social']['Population with veteran status']['Veteran']['percentage']
            listing['owners'] = demographics['results'][0]['fields']['acs']['housing']['Ownership of occupied units']['Owner occupied']['percentage']
            listing['renters'] = demographics['results'][0]['fields']['acs']['housing']['Ownership of occupied units']['Renter occupied']['percentage']
            listing['median_home'] = demographics['results'][0]['fields']['acs']['housing']['Median value of owner-occupied housing units']['Total']['value']
            listing['families'] = demographics['results'][0]['fields']['acs']['families']['Household type by household']['Family households']['percentage']
            listing['median_income'] = demographics['results'][0]['fields']['acs']['economics']['Median household income']['Total']['value']
            listing['median_age'] = demographics['results'][0]['fields']['acs']['demographics']['Median age']['Total']['value']
            listing['hispanic'] = demographics['results'][0]['fields']['acs']['demographics']['Race and ethnicity']['Hispanic or Latino']['percentage']
            listing['non_hispanic'] = demographics['results'][0]['fields']['acs']['demographics']['Race and ethnicity']['Not Hispanic or Latino']['percentage']

            if (listing['median_home'] != 0 and listing['median_income'] != 0):
        
                data.append(listing)
            
        except Exception as e:
            
            print (f'\nError: {e}')
    
    print ('\n')

Point 38 of 2737 -   1% complete
Error: 'Male'
Point 47 of 2737 -   2% complete
Error: 'Male'
Point 59 of 2737 -   2% complete
Error: 'Male'
Point 122 of 2737 -   4% complete
Error: 'street'
Point 157 of 2737 -   6% complete
Error: 'Male'
Point 181 of 2737 -   7% complete
Error: 'Male'
Point 229 of 2737 -   8% complete
Error: 'Male'
Point 247 of 2737 -   9% complete
Error: 'street'
Point 257 of 2737 -   9% complete
Error: 'street'
Point 258 of 2737 -   9% complete
Error: 'street'
Point 266 of 2737 -  10% complete
Error: 'Male'
Point 369 of 2737 -  14% complete
Error: 'street'
Point 408 of 2737 -  15% complete
Error: 'Male'
Point 621 of 2737 -  23% complete
Error: 'Male'
Point 885 of 2737 -  32% complete
Error: 'Male'
Point 1308 of 2737 -  48% complete
Error: 'Male'
Point 1567 of 2737 -  57% complete
Error: 'Male'
Point 1650 of 2737 -  60% complete
Error: 'street'
Point 2559 of 2737 -  94% complete
Error: 'street'
Point 2669 of 2737 -  98% complete
Error: 'Male'
Point 2721 of 2737 -  99

In [16]:
len(data)

2575

data

## Convert List of Dictionaries to Pandas Dataframe

In [17]:
df = pd.DataFrame(data)

In [18]:
df.head()

Unnamed: 0,location,latitude,longitude,address,street,city,county,state,zipcode,male,female,veterans,owners,renters,median_home,families,median_income,median_age,hispanic,non_hispanic
0,"CoreLife (833 North Krocks Rd, Suite 101)",40.563454,-75.562741,"5430 Us Hwy 222, Allentown, PA 18106",Us Hwy 222,Allentown,Lehigh County,PA,18106,0.43,0.57,0.064,0.866,0.134,255600,0.786,74355,38.4,0.082,0.918
1,CoreLife (197 NW State Street),40.381365,-111.813989,"183 N West State Rd, American Fork, UT 84003",West State,American Fork,Utah County,UT,84003,0.528,0.472,0.068,0.514,0.486,138000,0.734,42391,29.3,0.168,0.832
2,CoreLife (1595 Niagara Falls Boulevard),43.000019,-78.822306,"1645 Niagara Falls Blvd, Buffalo, NY 14228",Niagara Falls,Buffalo,Erie County,NY,14228,0.528,0.472,0.041,0.198,0.802,85500,0.374,16005,33.3,0.033,0.967
3,"CoreLife (205 North Maple Road, Suite 26)",42.282659,-83.780447,"2499 I- 94 Bus Lp, Ann Arbor, MI 48103",I- 94 Bus Lp,Ann Arbor,Washtenaw County,MI,48103,0.479,0.521,0.045,0.532,0.468,222000,0.542,77539,38.4,0.045,0.955
4,CoreLife (700 Boardman-Poland Road),41.02461,-80.64279,"700 Boardman Poland Rd, Youngstown, OH 44512",Boardman Poland,Youngstown,Mahoning County,OH,44512,0.482,0.518,0.067,0.77,0.23,121900,0.559,60242,48.1,0.018,0.982


In [19]:
df.describe()

Unnamed: 0,latitude,longitude,male,female,veterans,owners,renters,median_home,families,median_income,median_age,hispanic,non_hispanic
count,2575.0,2575.0,2575.0,2575.0,2575.0,2575.0,2575.0,2575.0,2575.0,2575.0,2575.0,2575.0,2575.0
mean,27.858484,-81.506866,0.476666,0.523336,0.085115,0.671328,0.328673,245636.9,0.644404,62807.135146,43.333515,0.206464,0.793537
std,2.405747,1.912336,0.058539,0.05854,0.054091,0.216432,0.216432,217719.5,0.163173,32306.193125,10.578097,0.221639,0.221639
min,24.549922,-111.890963,0.234,0.12,0.0,0.058,0.0,9999.0,0.075,12742.0,19.9,0.0,0.0
25%,26.34084,-82.476776,0.444,0.491,0.043,0.523,0.157,126700.0,0.538,40052.0,35.6,0.056,0.7285
50%,27.759134,-81.461244,0.475,0.525,0.079,0.709,0.291,188100.0,0.66,56414.0,41.8,0.128,0.872
75%,28.360768,-80.235149,0.509,0.556,0.119,0.843,0.477,292050.0,0.7655,77553.0,49.25,0.2715,0.944
max,43.213752,-73.419049,0.88,0.766,0.341,1.0,0.942,2000001.0,1.0,250001.0,88.4,1.0,1.0


## Output File to CSV

In [20]:
filename = 'Neighborhood_Demographics.csv'

In [21]:
df.to_csv(filename, index=None)