### Data Cleaning: Join BIDs to Blocks dataset

By ADA Group 1

In this Jupyter Notebook, we will join the Business Improvement District (BID) information to the Block Level data using BIDs unique identifier number as a key.

The Block level data was previously joined in GIS with BIDs shapefile. Therefore, the Blocks dataset includes a BID dummy variable and a BID unique identifier that we will use to join further BID information including BID creation year, assesed value and number of businesses.

#### Data Sources

* **BID data** NYC Small Business Services. Obtained from https://www1.nyc.gov/site/sbs/neighborhoods/bid-directory.page and NYC Open Data. Obatined from:  https://data.cityofnewyork.us/Business/Business-Improvement-Districts/ejxk-d93y/data
* **Census Blocks data** Center for Urban Research, The Graduate Center, City University of New York (CUNY). Obtained from  http://www.urbanresearchmaps.org/plurality/blockmaps.htm

For Data Dictionary, please refer to the notebook 00_ReadMe.

In [1]:
# general use imports
%pylab inline
import datetime
import numpy as np
import os
import six
import warnings
import matplotlib.pyplot as plt
import re

# pandas-related imports
from __future__ import print_function
import pandas as pd
import scipy
import sklearn

# record linkage package
import recordlinkage as rl
from recordlinkage.preprocessing import clean, phonenumbers, phonetic

# CSV file reading-related imports
import csv

# sqlalchemy an psycopg2 are sql connection packages
from sqlalchemy import create_engine


print( "Imports loaded at " + str( datetime.datetime.now() ) )

Populating the interactive namespace from numpy and matplotlib
Imports loaded at 2018-04-24 23:46:22.016156


### Load Bussines Improvement Districts data

In [2]:
df = pd.read_csv('../Data/bidsmerged_update__2_.csv') 
df.shape
# 75 rows and 22 columns

(75, 22)

In [3]:
df.count()

org_id              75
assessment          75
org_name            75
org_address         73
org_address2        55
org_city            74
org_state           74
org_zip             73
boro_id             75
org_phone           72
org_fax             63
org_website         73
org_email           74
org_boundary        75
org_neighborhood    74
org_date            75
org_month           75
org_day             75
org_year            75
org_realestate      69
org_blocks          72
org_businesses      74
dtype: int64

In [4]:
df.tail()

Unnamed: 0,org_id,assessment,org_name,org_address,org_address2,org_city,org_state,org_zip,boro_id,org_phone,...,org_email,org_boundary,org_neighborhood,org_date,org_month,org_day,org_year,org_realestate,org_blocks,org_businesses
70,57,"$8,800,000.00",Garment District Alliance,209 West 38th Street,2nd Floor,New York,NY,10018,Manhattan,2127649600.0,...,bblair@garmentdistrictnyc.com,Area generally bounded by Fifth Avenue on the ...,Midtown,Oct-93,10,1,1993,"The Garment District, located in the heart of ...",95.0,500.0
71,75,$0.00,Morris Park,,,New York,NY,,Bronx,,...,morrisparkbid@gmail.com,Morris Park Avenue from Williamsbridge Road to...,Morris Park,18-Jan,1,1,2018,,,0.0
72,1397,"$907,000.00",Court-Livingston-Schermerhorn,c/o Downtown Brooklyn Partnership,"1 MetroTech Center North, Suite 1003",Brooklyn,NY,11201,Brooklyn,7184031600.0,...,rmyer@downtownbrooklyn.com,Area generally bounded by Court Street on the ...,Downtown Brooklyn,7-Jun,6,1,2007,The area is a mixed-use neighborhood in the he...,68.0,194.0
73,1,"$2,004,500.00",Fulton Mall Improvement Association,c/o Downtown Brooklyn Partnership,"1 MetroTech Center North, Suite 1003",Brooklyn,NY,11201,Brooklyn,7184031600.0,...,rmyer@downtownbrooklyn.com,Fulton Street from Adams Street to Flatbush Av...,Downtown Brooklyn,Jun-76,6,1,1976,Fulton Street Mall is considered one of the mo...,17.0,150.0
74,22,"$3,827,675.00",MetroTech,"1 MetroTech Center North, Suite 1003",,Brooklyn,NY,11201-3858,Brooklyn,7184031600.0,...,rmyer@downtownbrooklyn.com,Area generally bounded by Adams Street to the ...,Downtown Brooklyn,Jan-92,1,1,1992,Located in the heart of 'America's Fourth Larg...,95.0,134.0


### Load Block Level data
As mentioned above, this dataset was created in GIS by spatially joining the BID shapefile to the Census Block Groups Shapefile. Therefore, Blocks that belong to BIDs already have a BID unique identifier. 


In [5]:
blocks= pd.read_csv("../Data/blocks_clean.csv", usecols=range(1,23))
blocks.head()

Unnamed: 0,BLOCKID,Pop10,Pop00,shWhite00,shLatino00,shBlack00,shAsian00,shOther00,pct_ch_white,pct_ch_hisp,...,pct_ch_other,pop_pct_ch,BoroName,NTACode,NTAName,A_poly,bid_id,bid_name,a_weight,BID_dummy
0,360050300004003,249,272.0,58.088235,37.132353,1.102941,2.941176,0.735294,-17.525986,9.453992,...,3.28077,91.544118,Bronx,BX10,Pelham Bay-Country Club-City Island,173964,,,1.0,0
1,360050409001001,11,1119.0,1.340483,9.562109,86.058981,0.983021,2.055407,-1.340483,-9.562109,...,-2.055407,0.983021,Bronx,BX43,Norwood,1387986,,,1.0,0
2,360050409002000,3223,1886.0,26.988335,9.80912,59.172853,1.855779,2.173913,-12.498729,1.422652,...,0.401327,170.890774,Bronx,BX05,Bedford Park-Fordham North,833865,,,1.0,0
3,360050419001004,225,233.0,19.313305,60.944206,6.437768,11.587983,1.716738,-5.535527,-8.944206,...,-0.383405,96.566524,Bronx,BX43,Norwood,48578,,,1.0,0
4,360050449011006,28,28.0,71.428571,28.571429,0.0,0.0,0.0,-3.571429,-7.142857,...,0.0,100.0,Bronx,BX62,Woodlawn-Wakefield,49237,,,1.0,0


In [6]:
blocks.head()

Unnamed: 0,BLOCKID,Pop10,Pop00,shWhite00,shLatino00,shBlack00,shAsian00,shOther00,pct_ch_white,pct_ch_hisp,...,pct_ch_other,pop_pct_ch,BoroName,NTACode,NTAName,A_poly,bid_id,bid_name,a_weight,BID_dummy
0,360050300004003,249,272.0,58.088235,37.132353,1.102941,2.941176,0.735294,-17.525986,9.453992,...,3.28077,91.544118,Bronx,BX10,Pelham Bay-Country Club-City Island,173964,,,1.0,0
1,360050409001001,11,1119.0,1.340483,9.562109,86.058981,0.983021,2.055407,-1.340483,-9.562109,...,-2.055407,0.983021,Bronx,BX43,Norwood,1387986,,,1.0,0
2,360050409002000,3223,1886.0,26.988335,9.80912,59.172853,1.855779,2.173913,-12.498729,1.422652,...,0.401327,170.890774,Bronx,BX05,Bedford Park-Fordham North,833865,,,1.0,0
3,360050419001004,225,233.0,19.313305,60.944206,6.437768,11.587983,1.716738,-5.535527,-8.944206,...,-0.383405,96.566524,Bronx,BX43,Norwood,48578,,,1.0,0
4,360050449011006,28,28.0,71.428571,28.571429,0.0,0.0,0.0,-3.571429,-7.142857,...,0.0,100.0,Bronx,BX62,Woodlawn-Wakefield,49237,,,1.0,0


### Merge the two datasets
We inner merge BIDs to Blocks, using bid_id on the Blocks datasets and org_id on BIDs dataset.N The new dataset would have additional BID information such as formation years, the organization email etc.

In [7]:
blocksBID=blocks.merge(df, left_on='bid_id', right_on='org_id', how='inner')
blocksBID.head()

Unnamed: 0,BLOCKID,Pop10,Pop00,shWhite00,shLatino00,shBlack00,shAsian00,shOther00,pct_ch_white,pct_ch_hisp,...,org_email,org_boundary,org_neighborhood,org_date,org_month,org_day,org_year,org_realestate,org_blocks,org_businesses
0,360050389002001,186,117.199997,25.59727,55.460752,15.358362,0.853242,3.412969,44.295203,-32.880107,...,execdirector@pitkinbid.org,Pitkin Avenue from Howard Avenue to Mother Gas...,Brownsville,Oct-93,10,1,1993,The Pitkin Avenue BID is comprised of 10 block...,32.0,180.0
1,360050389002003,299,299.0,33.779264,59.866221,4.347826,1.337793,0.668896,-7.35786,5.351171,...,execdirector@pitkinbid.org,Pitkin Avenue from Howard Avenue to Mother Gas...,Brownsville,Oct-93,10,1,1993,The Pitkin Avenue BID is comprised of 10 block...,32.0,180.0
2,360050389003005,157,146.0,47.945205,39.726027,6.164384,0.684932,5.479452,-3.359218,4.85996,...,execdirector@pitkinbid.org,Pitkin Avenue from Howard Avenue to Mother Gas...,Brownsville,Oct-93,10,1,1993,The Pitkin Avenue BID is comprised of 10 block...,32.0,180.0
3,360050389003003,269,210.0,27.619048,67.619048,1.428571,0.952381,2.380952,5.094707,-6.652505,...,execdirector@pitkinbid.org,Pitkin Avenue from Howard Avenue to Mother Gas...,Brownsville,Oct-93,10,1,1993,The Pitkin Avenue BID is comprised of 10 block...,32.0,180.0
4,360050391006001,454,452.0,28.097345,64.159292,0.221239,3.539823,3.982301,-8.053292,3.682118,...,execdirector@pitkinbid.org,Pitkin Avenue from Howard Avenue to Mother Gas...,Brownsville,Oct-93,10,1,1993,The Pitkin Avenue BID is comprised of 10 block...,32.0,180.0


In [9]:
blocksBID.to_csv("../Data/block_dummies_BIDS.csv", encoding='utf8')