# Capstone Project - The Battle of Neighborhoods

### The Battle of Neighborhoods (Week 1)

### Section 1 - Introduction / Business Problem:

Our client, who is a Chinese investor living in New York stated that he would like to start a new Asian restaurant in NY. Declaring that he thinks to start a business especially around NY City itself and he wants me to analyze NY neighborhoods and Asian restaurants in order to suggest him an optimum location.
The investor wants me to focus on some criteria in particular:

* Other Asian restaurants in NY City should be shown on NY map.
* Asian population (number of people) in each borough should be taken into account.

#### Demographic Analysis of New York

According to the most recent research, the racial composition of New York City was:

* White: 42.78%
* Black or African American: 24.32%
* Other race: 15.12%
* Asian: 14.00%
* Two or more races: 3.33%
* Native American: 0.40%
* Native Hawaiian or Pacific Islander: 0.05%

New York City alone, according to the 2010 Census, has now become home to more than one million Asian Americans, greater than the combined totals of San Francisco and Los Angeles. Sharing **14.00%** of total population, New York contains the highest total Asian population of any U.S. city proper. New York has the largest Chinese population of any city outside of Asia and within the U.S. with an estimated population of **628,763** as of 2017. 6.0% of New York City is of Chinese ethnicity, with about forty percent of them living in the borough of Queens alone. Koreans make up 1.2% of the city's population, and Japanese at 0.3%. Filipinos are the largest southeast Asian ethnic group at 0.8%, followed by Vietnamese who make up only 0.2% of New York City's population. Indians are the largest South Asian group, comprising 2.4% of the city's population, and Bangladeshis and Pakistanis at 0.7% and 0.5%, respectively.

### Section 2 - Gathering Data:

#### New York Population Data

In [None]:
import numpy as np 
import time
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd 
import numpy as np
import json 
import requests 
from pandas.io.json import json_normalize 

from geopy.geocoders import Nominatim 
!conda install -c conda-forge folium=0.5.0 --yes 
import folium 

print('Libraries imported.')

In [2]:
#Gathering NY Population data from https://data.cityofnewyork.us/api/views/6khm-nrue/rows.csv?accessType=DOWNLOAD

ny_data = pd.read_csv("https://data.cityofnewyork.us/api/views/6khm-nrue/rows.csv?accessType=DOWNLOAD",index_col=0) 
ny_data

Unnamed: 0_level_0,BoroughName,ProgramTypeName,Female Count,Female Percentage,Male Count,Male Percentage,Gender Nonconforming Count,Gender Nonconforming Percentage,American Indian or Alaskan Native Count,American Indian or Alaskan Native Percentage,Asian Count,Asian Percentage,Black or African American Count,Black or African American Percentage,Multi-race Count,Multi-race Percentage,Native Hawaiian or Other Pacific Islander Count,Native Hawaiian or Other Pacific Islander Percentage,White or Caucasian Count,White or Caucasian Percentage,Hispanic or Latino(a) Count,Hispanic or Latino(a) Percentage,Not Hispanic or Latino(a) Count,Not Hispanic or Latino(a) Percentage
Data os of Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
06/01/2019,Brooklyn,Adult Literacy - ABE/HSE,242,7.61,234,7.36,0,0.0,0,0.0,16,0.5,129,4.05,4,0.12,1,0.03,147,4.62,0,0.0,0,0.0
06/01/2019,Brooklyn,Adolescent Literacy,23,0.72,13,0.4,0,0.0,0,0.0,0,0.0,20,0.62,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0
06/01/2019,Brooklyn,COMPASS Explore,526,16.54,315,9.9,3,0.09,7,0.22,60,1.88,376,11.82,15,0.47,2,0.06,57,1.79,6,0.18,10,0.31
06/01/2019,Brooklyn,Adult Literacy - BENL/ESOL (Discretionary),677,21.29,245,7.7,1,0.03,0,0.0,414,13.02,15,0.47,4,0.12,0,0.0,36,1.13,0,0.0,0,0.0
06/01/2019,Brooklyn,Adult Literacy - NDA (ABE/HSE/Spanish HSE),363,11.41,176,5.53,0,0.0,7,0.22,11,0.34,198,6.22,6,0.18,3,0.09,19,0.59,0,0.0,0,0.0
06/01/2019,Brooklyn,Adult Literacy - BENL/ESOL,1649,51.87,601,18.9,1,0.03,3,0.09,654,20.57,389,12.23,5,0.15,8,0.25,685,21.54,0,0.0,0,0.0
06/01/2019,Brooklyn,Adult Literacy - ABE/HSE (Discretionary),129,4.05,67,2.1,0,0.0,1,0.03,13,0.4,37,1.16,8,0.25,1,0.03,19,0.59,0,0.0,0,0.0
06/01/2019,Brooklyn,COMPASS SONYC Pilot,122,3.83,126,3.96,0,0.0,3,0.09,4,0.12,170,5.34,2,0.06,0,0.0,12,0.37,0,0.0,7,0.22
06/01/2019,Brooklyn,COMPASS High School,326,10.25,153,4.81,1,0.03,2,0.06,104,3.27,271,8.52,4,0.12,0,0.0,8,0.25,1,0.03,6,0.18
06/01/2019,Brooklyn,Cornerstone,6602,207.67,8310,261.39,15,0.47,110,3.46,282,8.87,11333,356.48,226,7.1,33,1.03,347,10.91,0,0.0,0,0.0


In [3]:
ny_data.columns

Index(['BoroughName', 'ProgramTypeName', 'Female Count', 'Female Percentage',
       'Male Count', 'Male Percentage', 'Gender Nonconforming Count',
       'Gender Nonconforming Percentage',
       'American Indian or Alaskan Native Count',
       'American Indian or Alaskan Native Percentage', 'Asian Count',
       'Asian Percentage', 'Black or African American Count',
       'Black or African American Percentage', 'Multi-race Count',
       'Multi-race Percentage',
       'Native Hawaiian or Other Pacific Islander Count',
       'Native Hawaiian or Other Pacific Islander Percentage',
       'White or Caucasian Count', 'White or Caucasian Percentage',
       'Hispanic or Latino(a) Count', 'Hispanic or Latino(a) Percentage',
       'Not Hispanic or Latino(a) Count',
       'Not Hispanic or Latino(a) Percentage'],
      dtype='object')

Two columns are required BoroughName and Asian Percentage

In [163]:
grouped_ny_data =ny_data.groupby('BoroughName', as_index=False).agg({"Asian Count": "sum"})
grouped_ny_data 

Unnamed: 0,BoroughName,Asian Count
0,Bronx,930
1,Brooklyn,12589
2,Manhattan,5199
3,Outside of NYC,156
4,Queens,11914
5,Staten Island,953


In [164]:
grouped_ny_data.columns =['Borough Name', 'Asian Count']
grouped_ny_data= grouped_ny_data.sort_values('Asian Count',ascending=False)
grouped_ny_data

Unnamed: 0,Borough Name,Asian Count
1,Brooklyn,12589
4,Queens,11914
2,Manhattan,5199
5,Staten Island,953
0,Bronx,930
3,Outside of NYC,156


In [6]:
ny_geo = pd.read_csv('https://data.cityofnewyork.us/api/views/7t3b-ywvw/rows.csv?accessType=DOWNLOAD', error_bad_lines=False)
ny_geo

Unnamed: 0,the_geom,BoroCode,BoroName,Shape_Leng,Shape_Area
0,MULTIPOLYGON (((-73.89680883223774 40.79580844...,2,Bronx,462958.186921,1186612000.0
1,MULTIPOLYGON (((-74.05050806403247 40.56642203...,5,Staten Island,330432.867999,1623921000.0
2,MULTIPOLYGON (((-73.83668274106707 40.59494669...,4,Queens,895169.617616,3044779000.0
3,MULTIPOLYGON (((-74.01092841268031 40.68449147...,1,Manhattan,360282.142897,636594000.0
4,MULTIPOLYGON (((-73.86706149472118 40.58208797...,3,Brooklyn,739911.53321,1937597000.0


In [7]:
ny_geo.columns

Index(['the_geom', 'BoroCode', 'BoroName', 'Shape_Leng', 'Shape_Area'], dtype='object')

In [8]:
ny_geo.columns = ['the_geom', 'BoroCode', 'Borough Name', 'Shape_Leng', 'Shape_Area']
ny_geo

Unnamed: 0,the_geom,BoroCode,Borough Name,Shape_Leng,Shape_Area
0,MULTIPOLYGON (((-73.89680883223774 40.79580844...,2,Bronx,462958.186921,1186612000.0
1,MULTIPOLYGON (((-74.05050806403247 40.56642203...,5,Staten Island,330432.867999,1623921000.0
2,MULTIPOLYGON (((-73.83668274106707 40.59494669...,4,Queens,895169.617616,3044779000.0
3,MULTIPOLYGON (((-74.01092841268031 40.68449147...,1,Manhattan,360282.142897,636594000.0
4,MULTIPOLYGON (((-73.86706149472118 40.58208797...,3,Brooklyn,739911.53321,1937597000.0


In [26]:
# download NY geojson file
!wget --quiet https://data.cityofnewyork.us/api/views/7t3b-ywvw/rows.json?accessType=DOWNLOAD
  
print('GeoJSON file downloaded!')

Ge‡ersiz s�r�c� belirtimi.
GeoJSON file downloaded!


'wget' iç ya da dış komut, çalıştırılabilir
program ya da toplu iş dosyası olarak tanınmıyor.


In [37]:
nyc_geo = 'nyu_2451_34572-geojson.json'
nyc_geo

'nyu_2451_34572-geojson.json'

In [167]:
ny_borough = pd.merge(grouped_ny_data, ny_geo, on='Borough Name')
ny_borough

Unnamed: 0,Borough Name,Asian Count,the_geom,BoroCode,Shape_Leng,Shape_Area
0,Brooklyn,12589,MULTIPOLYGON (((-73.86706149472118 40.58208797...,3,739911.53321,1937597000.0
1,Queens,11914,MULTIPOLYGON (((-73.83668274106707 40.59494669...,4,895169.617616,3044779000.0
2,Manhattan,5199,MULTIPOLYGON (((-74.01092841268031 40.68449147...,1,360282.142897,636594000.0
3,Staten Island,953,MULTIPOLYGON (((-74.05050806403247 40.56642203...,5,330432.867999,1623921000.0
4,Bronx,930,MULTIPOLYGON (((-73.89680883223774 40.79580844...,2,462958.186921,1186612000.0


In [229]:
address = 'New York, USA'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of New York are 40.7127281, -74.0060152.


In [230]:
latitude = 40.7127281
longitude = -74.0060152
# create map and display it
ny_map = folium.Map(location=[latitude, longitude], zoom_start=12)
# display the map of NY
ny_map

In [231]:
ny_map.choropleth(
    geo_data=nyc_geo,
    data=grouped_ny_data,
    columns=['Borough Name', 'Asian Count'],
    key_on='feature.properties.DISTRICT',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Asian Population in NY'
)
# display map
ny_map

In [232]:
# @hidden_cell
CLIENT_ID = '@@@@@@@@@@@'  # your Foursquare ID
CLIENT_SECRET = '@@@@@@@@@@@'  # your Foursquare Secret
VERSION = '@@@@@@@@@@@' # Foursquare API version

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

In [233]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
results = requests.get(url).json()

In [234]:
def getNearbyVenues(names, latitude, longitude, radius=1000, LIMIT=500):

    venues_list=[]
    for name, lat, lng in zip(name, latitude, longitude):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

nearby_venues

Unnamed: 0,name,categories,lat,lng
0,The Bar Room at Temple Court,Hotel Bar,40.711448,-74.006802
1,The Beekman - A Thompson Hotel,Hotel,40.711173,-74.006702
2,City Hall Park,Park,40.712415,-74.006724
3,Alba Dry Cleaner & Tailor,Laundry Service,40.711434,-74.006272
4,The Wooly Daily,Coffee Shop,40.712137,-74.008395
5,Gibney Dance Center Downtown,Dance Studio,40.713923,-74.005661
6,Augustine,French Restaurant,40.71131,-74.00666
7,The Class by Taryn Toomey,Gym / Fitness Center,40.712753,-74.008734
8,Takahachi Bakery,Bakery,40.713653,-74.008804
9,Aahar Indian Cuisine,Indian Restaurant,40.713307,-74.007994


In [235]:
nearby_venues.groupby('categories').count()

Unnamed: 0_level_0,name,lat,lng
categories,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
American Restaurant,2,2,2
Antique Shop,1,1,1
Art Gallery,1,1,1
Baby Store,1,1,1
Bakery,2,2,2
Bar,1,1,1
Bookstore,1,1,1
Boxing Gym,1,1,1
Bubble Tea Shop,1,1,1
Building,1,1,1


In [236]:
asian_rest =  nearby_venues[(nearby_venues.categories == "Indian Restaurant") | (nearby_venues.categories == "Japanese Curry Restaurant") | (nearby_venues.categories == "Japanese Restaurant")| (nearby_venues.categories == "Sushi Restaurant") ]

In [237]:
asian_rest

Unnamed: 0,name,categories,lat,lng
9,Aahar Indian Cuisine,Indian Restaurant,40.713307,-74.007994
51,Nobu Downtown,Japanese Restaurant,40.710532,-74.009593
84,Takahachi,Sushi Restaurant,40.716526,-74.008101
91,Go! Go! Curry,Japanese Curry Restaurant,40.709854,-74.00901


In [238]:

# create map of NY place  using latitude and longitude values
map_ny2 = folium.Map(location=[latitude, longitude], zoom_start=13)
# add markers to map
for lat, lng, label in zip(asian_rest['lat'], asian_rest['lng'], asian_rest['name']):
    label = folium.Popup(label, parse_html=True)
    folium.RegularPolygonMarker(
        [lat, lng],
        number_of_sides=30,
        radius=7,
        popup=label,
        color='blue',
        fill_color='#0f0f0f',
        fill_opacity=0.6,
    ).add_to(map_ny2)  
    
map_ny2

### Section 3 - Results and Discussion:

As seen in the analysis, there are 4 Asian restaurants in New York City itself. In the populatin data Brooklyn is seen that it has the largest potential for Asian restaurants since there are many Asian people live in. Therefore, we can recommend the investor that Brooklyn borough can be reviewed.