# IBM Capstone Project: Battle of the Neighborhoods
## Chicago Neighborhood Opportunity Fund Allocation  

See the main part of this project in [Capstone_BattleOfNeighborhoods.ipynb](Capstone_BattleOfNeighborhoods.html).  

## Notes about getting and saving census tract data

1. My cloud environments have some resource limitations. My home PC is relatively well resourced
2. Installing geopandas on my home PC aborted due to dependency issues 
3. I have geopandas in some cloud environments
4. I'll be doing much of the processing on my home PC, but use geopandas in the cloud. Therefore some cells will be specific to certain environments, but I will identified where cells are dependent on specific environments  
5. Create GeoJSON and CSV files for Cook County (Chicago) Census Tracts
6. Read list of Invest SOUTH/WEST RFP locations, assign a Census Tract number, and write to a GeoJSON and CSV file

## Input-Output

**Expected Input**  
The following file is created by the main report notebook [Capstone_BattleOfNeighborhoods.ipynb](Capstone_BattleOfNeighborhoods.html). It must exist to run this notebook  
`./data/isw_rfp_updated.csv`  

**Expected Output**  
The following files will be needed by the main report notebook [Capstone_BattleOfNeighborhoods.ipynb](Capstone_BattleOfNeighborhoods.html) as input. They must exist in the local directory structure of the main report for it to finish.  
`./data/cook_tract.geojson`  
`./data/cook_tract.csv`   
`./data/isw_rfp_tract.csv`  
`./data/isw_rfp_tract.geojson`  


## Introduction <a name="Introduction"></a>

While researching a topic for this capstone project I came across an academic paper that influenced where millions of dollars of development funds will be used. The "Neighborhood Opportunity Fund" [(NOF)](https://neighborhoodopportunityfund.com) in Chicago can award up to \$2.5 million for individual projects in "Qualified Investment Areas". Information on the program explained that the geographic areas that were choosen to receive the grants was influenced by the study ["Mapping the DNA of Urban Neighborhoods: Clustering Longitudinal Sequences of Neighborhood Socioeconomic Change"](https://www.tandfonline.com/doi/full/10.1080/00045608.2015.1096188) by Elizabeth Delmelle. The study classified neighborhoods by socioeconomic, housing, and demographic variables using US census data from the years of 1970 to 2010, therefore it could be used to identify areas in need of investment. Throughout this report I will refer to the study as "Dr. Delmelle's study", the "DNA of Urban Neighborhoods study", or the "2010 study". 2010 is the year of the latest data used in the study, not its publication date.   

This project will try to repeat and extend this study with the lastest census data (2019), then compare the new results to the locations of where some of the funds are going to be awarded. The objective will be to evaluate if the funds, (given the updated information), are going to neighborhoods most in need of investment. The original study examined both Chicago and Los Angeles. This project will only address Chicago.  
  
An organization called [INVEST South/West](https://www.chicago.gov/city/en/sites/invest_sw/home.html) is in charge of allocating much of the NOF for parts of Chicago. They have an [RFP page](https://www.chicago.gov/city/en/sites/invest_sw/home/requests-for-proposals.html) explaining how developers can solict these funds for projects in predetermined locations. 

## Environment Setup

In [16]:
# Set up file names to work on Windows or Linux file system
import os
datadir = os.path.join('.', 'data')
if not os.path.exists(datadir):
    os.makedirs(datadir)
censusdir = os.path.join(datadir, 'census')
if not os.path.exists(censusdir):
    os.makedirs(censusdir)

zip_tigerline_ill_tract = os.path.join(censusdir, 'tl_2010_17031_tract10.zip')
cook_tract_geojson_file = os.path.join(datadir, 'cook_tract.geojson') 
cook_tract_csv_file = os.path.join(datadir, 'cook_tract.csv')  

isw_rfp_updated_csv=os.path.join(datadir, 'isw_rfp_updated.csv')
isw_rfp_tract_csv=os.path.join(datadir, 'isw_rfp_tract.csv')
isw_rfp_tract_geojson=os.path.join(datadir, 'isw_rfp_tract.geojson')

Import libraries used in all environments

In [17]:
try:
    import json
except ImportError as e:
    !pip install json
    import json

try:
    import shapefile
except ImportError as e:
    !pip install pyshp
    import shapefile
    
try:
    import zipfile
except ImportError as e:
    !pip install zipfile
    import zipfile
    
try:
    from io import StringIO
    from io import BytesIO
except ImportError as e:
    !pip install io
    from io import StringIO
    from io import BytesIO
    
try:
    import itertools
    from itertools import zip_longest
except ImportError as e:
    !pip install itertools
    import itertools
    from itertools import zip_longest

import numpy as np
import pandas as pd
import platform
import math

The following cell installs are for GeoPandas. I only run these in my IBM cloud environment since my home PC always errors out while trying to install one of the prerequisites for GeoPandas

In [18]:
try:
    import rtree
except ImportError as e:
    !pip install rtree
    import rtree

try:
    import pygeos
except ImportError as e:
    !pip install pygeos
    import pygeos

try:
    import psycopg2
except ImportError as e:
    !pip install psycopg2
    import psycopg2

try:
    import geopandas as gpd
    from geopandas.tools import overlay
except ImportError as e:
    !pip install geopandas
    import geopandas as gpd
    from geopandas.tools import overlay

## Download Census Tracts

Get the TigerLine (Shape Files) from the Census Department for Cook County. The Census department does not have GeoJson, which which works well with GeoPandas, so we will convert them.

In [19]:
# Setup functions for the conversion to GeoJson

def make_geojson(content, gfile, tocrs=""):
    # Convert a shape file to a GeoJSON
    # Parameters:
    #  content: A zip file containing shape files
    #  gfile: A string with a file name to save the GeoJSON file
    #  tocsr: A coordinate system to convert the coordinates to
    # Returns: None, but does write a file to disk
    shp_file=""
    with zipfile.ZipFile(content) as f:
        for name in f.namelist():
            if name.endswith('.shp'):
                shp_file=os.path.join(censusdir, name)
                f.extract(name, censusdir)
            if name.endswith('.shx'):
                f.extract(name, censusdir)
            if name.endswith('.prj'):
                f.extract(name, censusdir)
            if name.endswith('.dbf'):
                f.extract(name, censusdir)
                
    file = gpd.read_file(shp_file)
    if tocrs != "":
        crs_file=file.to_crs(crs=tocrs)
        crs_file.to_file(gfile, driver='GeoJSON')
    else:
        file.to_file(gfile, driver='GeoJSON')


Download shapefiles for Census Tract, which are sets of Census Block Groups, but smaller than a Public Use Microdata Area and Neighborhood

In [20]:
if platform.system() == 'Windows':
    #  Use this to dowload the Illinois Census Tract shape files only on Windows (home) PC
    !curl https://www2.census.gov/geo/pvs/tiger2010st/17_Illinois/17031/tl_2010_17031_tract10.zip --output .\data\census\tl_2010_17031_tract10.zip
else:
    #  Use this to dowload the CookCounty Census Tract shape files on Linux Environment
    !curl https://www2.census.gov/geo/pvs/tiger2010st/17_Illinois/17031/tl_2010_17031_tract10.zip --output ./data/census/tl_2010_17031_tract10.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2047k    0 2047k    0     0  4519k      0 --:--:-- --:--:-- --:--:-- 4519k


The next cells will assume we have GeoPandas installed in the current environment

In [21]:
# NB: Census tract CRS is NAD83
make_geojson(zip_tigerline_ill_tract, cook_tract_geojson_file, tocrs="WGS84")
# Delete the fields we don't need and give more intuitive names to those we do 
cook_tract_geojson=gpd.read_file(cook_tract_geojson_file)
cook_tract_geojson=cook_tract_geojson.rename(columns={"GEOID10": "CensusTract", 'COUNTYFP10': 'County'})
cook_tract_geojson=cook_tract_geojson.drop(['STATEFP10', 'TRACTCE10', 'NAME10', 'NAMELSAD10', 'MTFCC10', 
                                                          'FUNCSTAT10', 'ALAND10', 'AWATER10'], axis=1)

Write the new files to disk

In [22]:
cook_tract_geojson.to_csv(cook_tract_csv_file, index=False)
cook_tract_geojson.to_file(cook_tract_geojson_file, driver='GeoJSON')

Only in the IBM cloud environment for the next cell. A below cell should be removed when uploading this notebook to github

In [23]:
# @hidden_cell
# The following code contains the credentials for a file in your IBM Cloud Object Storage.
# You might want to remove those credentials before you share your notebook.
credentials = {

}


In [24]:
# Use this cell if we are in the IBM Cloud ONLY. 
# Assume that GeoJSON file with cook county only created in the previous cell has be uploaded to IBM Cloud Object Storage
# We retrieve the object and write to a local file
import types
import ibm_boto3
from botocore.client import Config

cos = ibm_boto3.client(service_name='s3',
    ibm_api_key_id=credentials['IBM_API_KEY_ID'],
    ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url=credentials['ENDPOINT'])


In [25]:
key_isw_rfp_updated_csv=os.path.basename(isw_rfp_updated_csv)
try:
    res=cos.download_file(Bucket=credentials['BUCKET'],Key=key_isw_rfp_updated_csv,
                          Filename=isw_rfp_updated_csv)
except Exception as e:
    print(Exception, e)
else:
    print('Files Downloaded')

Files Downloaded


## Assign the NOF RFP Location to Census Tract

The below cell is where we read `./data/isw_rfp_updated.csv` and assign it a census tract number. Function that will assign the point (RFP location) or a smaller geographic polygon to a be within a larger polygon (census tract)

In [26]:
# Adapted from https://medium.com/analytics-vidhya/point-in-polygon-analysis-using-python-geopandas-27ea67888bff
def get_pip (points, plane): 
    # Parameters:
    #  points: geopanda dataframe with points or polygons to be assigned to larger polygons
    #  plane: geopanda dataframe with larger planes(polygons) which contain areas in points 
    # Returns: new geopanda dataframe layed out like points with polygon from plane assigned
    
    # Validate we can handle the type of geometry
    if points.geom_type[0] == 'Point':
        print("\nWorking with Point.")
    elif (points.geom_type[0] == 'Polygon' or points.geom_type[0] == 'MultiPolygon') :
        print("\nWorking with Polygon.")
    else:
        print('Abort function. Cannot handle geom_type ', points.geom_type[0])
        return
    h_list = list(plane.larger_plane)
    # Create empty dataframe
    df = gpd.GeoDataFrame().reindex_like(points).dropna()
    for h in h_list:
        # Get geometry for specific neighborhood
        pol = (plane.loc[plane.larger_plane==h])
        pol.reset_index(drop = True, inplace = True)
        # Identify those records from points that are intersecting with the region polygon 
        #  1st (point) is more precise but does not work with all records
        #  or a polygon intersecting with a polygon. We risk duplicates on 2nd option. 
        if points.geom_type[0] == 'Point':
            pip_mask = points.within(pol.loc[0, 'geometry'])
        elif (points.geom_type[0] == 'Polygon' or points.geom_type[0] == 'MultiPolygon'):
            pip_mask = points.intersects(pol.loc[0, 'geometry'])  
        # Filter points to keep only the intersecting records
        pip_data = points.loc[pip_mask].copy()
        # Create a new column and assign the region name as the value
        pip_data['assigned_plane']= h
        # Append filling venue data to empty dataframe
        df = df.append(pip_data)
    #checking there are no more than one larger shape assigned to a smaller shape or point   
    print('Original dataframe count=',len(points),'\nNew dataframe count=', len(df))
    if df.loc[df.subplane.duplicated() == True].shape[0] > 0:
        print("There are subpolygons assigned to more than one polygon. Count=", df.loc[df.subplane.duplicated() == True].shape[0])
        if df.loc[df.subplane.duplicated() == True].shape[0] < 6:
            print(df.loc[df.subplane.duplicated() == True]['subplane'])
    else:
        print("No duplicates!")    
    # Checking all of our smaller shapes or points have not been assigned to a larger shape
    if points.loc[~points.subplane.isin(df.subplane)].shape[0] > 0:
        print("There are subpolygons without an assigned polygon. Count=", points.loc[~points.subplane.isin(df.subplane)].shape[0])
        if points.loc[~points.subplane.isin(df.subplane)].shape[0] < 6:
            print(points.loc[~points.subplane.isin(df.subplane)]['subplane'])
    else:
        print("No unassigned!")
    df.reset_index(inplace=True, drop=True)
    #df = df.drop(columns='geometry')
    return df


Load the files in to pandas then a geopanda version

In [27]:
isw_rfp_updated = pd.read_csv(isw_rfp_updated_csv)
isw_rfp_updated = isw_rfp_updated.drop_duplicates()
isw_rfp_updated = isw_rfp_updated.reset_index(drop=True)
isw_rfp_updated['id'] = isw_rfp_updated.index
isw_rfp_updated_gpd = gpd.GeoDataFrame(isw_rfp_updated, 
                                       geometry=gpd.points_from_xy(isw_rfp_updated.lng,
                                                                   isw_rfp_updated.lat))
isw_rfp_updated_gpd = isw_rfp_updated_gpd.rename(columns={'id':'subplane'})
isw_rfp_updated_gpd['assigned_plane'] = ''

cook_tract_geojson=gpd.read_file(cook_tract_geojson_file)
cook_tract_geojson=cook_tract_geojson.rename(columns={'CensusTract':'larger_plane'})

Assign a tract number to RFP locations

In [28]:
isw_rfp_tract_gpd = get_pip(isw_rfp_updated_gpd, cook_tract_geojson)


Working with Point.
Original dataframe count= 39 
New dataframe count= 39
No duplicates!
No unassigned!


Save data frame as file, a GeoJSON and a CSV

In [29]:
isw_rfp_tract_gpd.drop('subplane', axis=1, inplace=True)
isw_rfp_tract_gpd = isw_rfp_tract_gpd.rename(columns={'assigned_plane': 'CensusTract'})
isw_rfp_tract_gpd.to_file(isw_rfp_tract_geojson, driver='GeoJSON')
isw_rfp_tract_pd = pd.DataFrame(isw_rfp_tract_gpd.loc[:, isw_rfp_tract_gpd.columns != 'geometry'])
isw_rfp_tract_pd.to_csv(isw_rfp_tract_csv, index=False)      

## Save the files from the file system to the IBM cloud storage

In [30]:
# IBM cloud only cell. Store the file to cloud storage so we can download to other environments
# The file will be in the bucket but not appear as project asset until added on the console.

key_isw_rfp_tract_geojson=os.path.basename(isw_rfp_tract_geojson)
cos.upload_file(Filename=isw_rfp_tract_geojson, Bucket=credentials['BUCKET'],Key=key_isw_rfp_tract_geojson)

key_isw_rfp_tract_csv=os.path.basename(isw_rfp_tract_csv)
cos.upload_file(Filename=isw_rfp_tract_csv, Bucket=credentials['BUCKET'],Key=key_isw_rfp_tract_csv)

key_cook_tract_geojson_file=os.path.basename(cook_tract_geojson_file)
cos.upload_file(Filename=cook_tract_geojson_file, Bucket=credentials['BUCKET'],Key=key_cook_tract_geojson_file)

key_cook_tract_csv_file=os.path.basename(cook_tract_csv_file)
cos.upload_file(Filename=cook_tract_csv_file, Bucket=credentials['BUCKET'],Key=key_cook_tract_csv_file)
