# Capstone Project - Battle of Neighborhoods

### Data Utilized on the Research

<b>Data Source</b>
* New York City Open Data
        - NYPD Complaint Historical Data 2006 to 2017
        - NYC Borough Boundaries (JSON)
* New York City Planning
        - NYC Borough Population 1900 to 2010
        - NYC Total Housing Units 1940 to 2010
* Foursquare API

### Data Wrangling and Transformation Approach

<b>New York City Open Data</b>
* NYPD Complaint Historical Data 2006 to 2017<br>
    Due to space limitation of the Cognitive Class lab environment, This 1.8GB dataset with 6.04 million rows and 35 columns had to be directly downloaded into a personal computer and loaded into a Microsoft Access DB so data required can be extracted and transformed into a format required using SQL.<br>
    Source: https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Historic/qgea-i56i
* NYC Borough Boundaries (JSON)<br>
    Enconding format had to be enhanced a bit to be able to incorporate Crime Data in generating the Python Folium Choropleth Map.
    <br>Source: https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm

<b>New York City Planning</b>
* NYC Borough Total Population 1900 to 2010<br>
    Minimal transformation required.<br>
    Source: https://www1.nyc.gov/site/planning/data-maps/nyc-population/historical-population.page
* NYC Borough Total Housing Units 1940 to 2010<br>
    Minimal transformation required.<br>
    Source: https://www1.nyc.gov/site/planning/data-maps/nyc-population/historical-population.page

<b>Foursquare API</b>

A Foursquare developer account was opened in order to take advantage of its API which provides social networking location information about venues, users, and check-ins. The API enables the research to perform k-means clustering of New York City underlying neighborhood.

### Sample Transformed Datasets

<b>New York City Open Data - NYPD Complaint Historical Data 2006 to 2017</b>

In [2]:
import pandas as pd
df_complaint_hist = pd.read_csv('nypd_complaint_data_2006-2017.csv')
complaint_hist = df_complaint_hist.set_index('Year')
complaint_hist

Unnamed: 0_level_0,Bronx,Brooklyn,Manhattan,Queens,Staten Island
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2006,111263,157808,127654,105425,27022
2007,117154,155830,130216,105634,26126
2008,114191,155632,128514,102885,26735
2009,112973,150739,124811,99187,24276
2010,111566,151482,121391,98575,24114
2011,108657,151497,115844,98639,23424
2012,106790,154885,119706,98927,23614
2013,104421,150184,118294,101291,22721
2014,105861,148700,113345,100245,22846
2015,104912,143116,113231,94842,22131


<b>New York City Planning - NYC Borough Population 1900 to 2010</b>

In [8]:
df_population_data = pd.read_csv('nyc_total_population_1900-2010.csv')
population_data = df_population_data.set_index('Year')
population_data

Unnamed: 0_level_0,Bronx,Brooklyn,Manhattan,Queens,Staten Island
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1900,200507,1166582,1850093,152999,67021
1910,430980,1634351,2331542,284041,85969
1920,732016,2018356,2284103,469042,116531
1930,1265258,2560401,1867312,1079129,158346
1940,1394711,2698285,1889924,1297634,174441
1950,1451277,2738175,1960101,1550849,191555
1960,1424815,2627319,1698281,1809578,221991
1970,1471701,2602012,1539233,1986473,295443
1980,1168972,2230936,1428285,1891325,352121
1990,1203789,2300664,1487536,1951598,378977


<b>New York City Planning - NYC Borough Total Housing Units 1940 to 2010</b>

In [9]:
df_housing_data = pd.read_csv('nyc_housing_unit_1940-2010.csv')
housing_data = df_housing_data.set_index('Year')
housing_data

Unnamed: 0_level_0,Bronx,Brooklyn,Manhattan,Queens,Staten Island
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1940,395245,762526,617373,394389,48839
1950,432259,814134,635944,495308,55820
1960,473160,875446,727432,616922,65156
1970,508789,902622,714593,708419,89961
1980,451118,881367,754796,740129,119000
1990,440955,873671,785127,752690,139726
2000,490659,930866,798144,817250,163993
2010,511896,1000293,847090,835127,176656


<b>Foursquare API - Neighborhood Venues</b>

In [10]:
df_borough_venues = pd.read_csv('borough_venues.csv', index_col=0)
borough_venues = df_borough_venues.set_index('Neighborhood')
borough_venues.head()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Astoria,40.768509,-73.915654,Favela Grill,40.767348,-73.917897,Brazilian Restaurant
Astoria,40.768509,-73.915654,Orange Blossom,40.769856,-73.917012,Gourmet Shop
Astoria,40.768509,-73.915654,Titan Foods Inc.,40.769198,-73.919253,Gourmet Shop
Astoria,40.768509,-73.915654,CrossFit Queens,40.769404,-73.918977,Gym
Astoria,40.768509,-73.915654,Off The Hook,40.7672,-73.918104,Seafood Restaurant
