# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

<a class="anchor" id="introduction"></a>
## Introduction: Business Problem
This project will help solve the problem of where to open a salad shop in the borough of Manhattan in New York City. I will use Foursquare's location data to find out where salad shops are currently located. I will then use a JSON file containing data about the boundaries of neighborhoods to figure out which neighborhoods might have more salad eaters than salad shops. This project would be of interest to the stakeholders of a salad chain such as Chopt, as well as to small business owners who are researching where to open a new location. 

<a class="anchor" id="data"></a>
##Data
The first piece of data that I will need is the latitudinal and longitudinal coordinates of Manhattan, New York. To get this, I will use the module Nominatim from the geopy.geocoders package. I will search Manhattan, New York as an input and will get the latitude and longitude coordinates of Manhattan in return. I will set the returned values to the variables manhattan_latitude and manhattan_longitude, respectively. These variables will be used for the entirety of the project as the central location data. 

I will then be using the Foursquare API to pull data regarding existing salad shops around Manhattan. The Foursquare API will return a dataframe containing the names, addresses, latitudes and longitudes of salad places in the area. 

Finally, I will be using a JSON file containing geospatial data on the boundaries of Manhattan's neighborhoods. I will then use the folium package to visualize which neighborhoods and areas are might benefit from a salad shop.


##Methodology

First, I will use the latitude and longitude of Manhattan, as well as the category ID of salad places to access the Foursquare API. 

Next, I will clean the dataframe from Foursquare to make it easier to work with. I will then use the folium package to plot the locations of each of the salad shops in our dataframe. 

I will then import the JSON file for Manhattan neighborhoods and use it to generate a heat map showing the neighborhoods that have the highest and lowest amount of salad shops. I will then conduct exploratory data analysis to figure out which zip codes in which neighborhoods have the fewest salad shops. 

Import necessary libraries

In [None]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


Set necessary inputs to access the Foursquare API

In [None]:
CLIENT_ID = 'QEF4SFROUUVOQKNPAFNNRDUW4ACAWSYYG312LM3BKEDDKPIZ' # your Foursquare ID
CLIENT_SECRET = 'CCBFW1WDVTUYDLPFX3SR0QBB5R2UKRYOZ3F1JDJ3PLQMPDRM' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 50
CATEGORYID = '4bf58dd8d48988d1bd941735'
#search_query =
#radius = 

In [None]:
address = 'Manhattan, NY'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
manhattan_latitude = location.latitude
manhattan_longitude = location.longitude
print(manhattan_latitude, manhattan_longitude)

40.7896239 -73.9598939


Call the Foursquare API for a list of salad eateries in Manhattan

In [None]:
#url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}&categoryId={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT,CATEGORYID)
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&limit={}&categoryId={}'.format(CLIENT_ID, CLIENT_SECRET, manhattan_latitude, manhattan_longitude, VERSION, LIMIT,CATEGORYID)
url


'https://api.foursquare.com/v2/venues/search?client_id=QEF4SFROUUVOQKNPAFNNRDUW4ACAWSYYG312LM3BKEDDKPIZ&client_secret=CCBFW1WDVTUYDLPFX3SR0QBB5R2UKRYOZ3F1JDJ3PLQMPDRM&ll=40.7896239,-73.9598939&v=20180604&limit=100&categoryId=4bf58dd8d48988d1bd941735'

In [None]:
results = requests.get(url).json()

In [None]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
salad_dataframe = json_normalize(venues)
salad_dataframe.head()


  """


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,venuePage.id,location.neighborhood
0,5961782bf5e9d7031dd38a93,sweetgreen,"[{'id': '4bf58dd8d48988d1bd941735', 'name': 'S...",v-1606796556,False,2460 Broadway,91st St,40.79149,-73.973945,"[{'label': 'display', 'lat': 40.79148984346160...",1202,10025,US,New York,NY,United States,"[2460 Broadway (91st St), New York, NY 10025, ...",1034027.0,https://www.seamless.com/menu/sweetgreen-2460-...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,,
1,5f3c5b789b9e3e7dcb3b7673,sweetgreen,"[{'id': '4bf58dd8d48988d1bd941735', 'name': 'S...",v-1606796556,False,1740 Broadway,W 56th St,40.765232,-73.98175,"[{'label': 'display', 'lat': 40.765232, 'lng':...",3281,10019,US,New York,NY,United States,"[1740 Broadway (W 56th St), New York, NY 10019...",2305142.0,https://www.seamless.com/menu/sweetgreen-1740-...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,,
2,5e1a0a999ab34d00087e4eb6,sweetgreen,"[{'id': '4bf58dd8d48988d1bd941735', 'name': 'S...",v-1606796556,False,347 Bowery,,40.726361,-73.991448,"[{'label': 'display', 'lat': 40.7263613, 'lng'...",7528,10003,US,New York,NY,United States,"[347 Bowery, New York, NY 10003, United States]",1739666.0,https://www.seamless.com/menu/sweetgreen-347-b...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,,
3,5d66a35c927f8b0008a59e2b,sweetgreen,"[{'id': '4bf58dd8d48988d1bd941735', 'name': 'S...",v-1606796556,False,606 1st Ave,,40.744243,-73.972906,"[{'label': 'display', 'lat': 40.7442426, 'lng'...",5169,10016,US,New York,NY,United States,"[606 1st Ave, New York, NY 10016, United States]",1701182.0,https://www.seamless.com/menu/sweetgreen-606-1...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,,
4,5b181bcad1a402002cce93ef,Fresh & Co,"[{'id': '4bf58dd8d48988d1bd941735', 'name': 'S...",v-1606796556,False,62 Chelsea Piers,,40.748508,-74.008738,"[{'label': 'display', 'lat': 40.74850846533291...",6156,10011,US,New York,NY,United States,"[62 Chelsea Piers, New York, NY 10011, United ...",,,,,,,,


In [None]:
print(salad_dataframe.shape)

(50, 25)


Clean up the dataframe

In [None]:
salad_joints=salad_dataframe[['name','location.formattedAddress','location.lat','location.lng','location.postalCode']]
salad_joints.columns = ['Name', 'Address', 'Latitude', 'Longitude', 'Zip Code']
salad_joints

Unnamed: 0,Name,Address,Latitude,Longitude,Zip Code
0,sweetgreen,"[2460 Broadway (91st St), New York, NY 10025, ...",40.79149,-73.973945,10025
1,sweetgreen,"[1740 Broadway (W 56th St), New York, NY 10019...",40.765232,-73.98175,10019
2,sweetgreen,"[347 Bowery, New York, NY 10003, United States]",40.726361,-73.991448,10003
3,sweetgreen,"[606 1st Ave, New York, NY 10016, United States]",40.744243,-73.972906,10016
4,Fresh & Co,"[62 Chelsea Piers, New York, NY 10011, United ...",40.748508,-74.008738,10011
5,sweetgreen,"[2 Park Ave (at E 32nd St), New York, NY 10016...",40.746307,-73.982236,10016
6,Bagel Pub Park Slope,"[287 9th St (btwn 4th & 5th Ave), Brooklyn, NY...",40.669526,-73.986995,11215
7,Bagel Pub,"[775 Franklin Ave (at St. Johns Pl), Brooklyn,...",40.672343,-73.957283,11238
8,Napolini Express,"[323 Oak St (Meadow Street), Uniondale, NY 115...",40.722067,-73.607023,11553
9,sweetgreen,[311 Amsterdam Ave (btwn W 74th St & W 75th St...,40.780298,-73.980179,10023


In [None]:
salad_joints['Latitude']

0     40.791490
1     40.765232
2     40.726361
3     40.744243
4     40.748508
5     40.746307
6     40.669526
7     40.672343
8     40.722067
9     40.780298
10    40.739183
11    40.749332
12    40.727435
13    40.689910
14    40.762149
15    40.761165
16    40.752706
17    40.735970
18    40.715804
19    40.763714
20    40.702812
21    40.738061
22    40.705626
23    40.774260
24    40.746338
25    40.722076
26    40.733965
27    40.777488
28    40.706239
29    40.729794
30    41.003402
31    40.754640
32    40.765468
33    40.744803
34    40.740738
35    40.767128
36    40.729992
37    40.721184
38    40.679066
39    40.776467
40    40.765453
41    40.741240
42    40.752502
43    40.807284
44    40.737216
45    40.740729
46    40.777037
47    40.773797
48    40.781358
49    40.778012
Name: Latitude, dtype: float64

Map our locations

In [3]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[manhattan_latitude, manhattan_longitude], zoom_start=10)

# add markers to map
for lat, lng in zip(salad_joints['Latitude'], salad_joints['Longitude']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

NameError: ignored

Import the JSON file containing New York's zip code boundaries

In [None]:
import json # library to handle JSON files
with open('/nyc_zip_code_tabulation_areas_polygons.geojson') as json_data:
    newyork_data = json.load(json_data)
newyork_data

{'features': [{'geometry': {'coordinates': [[[[-73.840759, 40.625364],
       [-73.843062, 40.627933],
       [-73.844219, 40.629957],
       [-73.842498, 40.631778],
       [-73.836267, 40.633592],
       [-73.835921, 40.633354],
       [-73.835506, 40.63264],
       [-73.839544, 40.626779],
       [-73.840688, 40.62723],
       [-73.841587, 40.628103],
       [-73.841813, 40.628011],
       [-73.840845, 40.626662],
       [-73.8401, 40.626436],
       [-73.840759, 40.625364]]]],
    'type': 'MultiPolygon'},
   'properties': {'bldgpostalcode': 0,
    'borough': 'Brooklyn',
    'cartodb_id': 173,
    'cty_fips': '047',
    'id': 'http://nyc.pediacities.com/Resource/PostalCode/11693',
    'objectid': 173,
    'po_name': 'Far Rockaway',
    'postalcode': '11693',
    'shape_area': 3497515.77978,
    'shape_leng': 9479.91727716,
    'st_fips': '36',
    'state': 'NY'},
   'type': 'Feature'},
  {'geometry': {'coordinates': [[[[-73.915441, 40.875591],
       [-73.915087, 40.876496],
       

In [None]:
data = newyork_data['features']
df = json_normalize(data)
df.head()

  


Unnamed: 0,type,geometry.type,geometry.coordinates,properties.cartodb_id,properties.objectid,properties.postalcode,properties.po_name,properties.state,properties.borough,properties.st_fips,properties.cty_fips,properties.bldgpostalcode,properties.shape_leng,properties.shape_area,properties.id
0,Feature,MultiPolygon,"[[[[-73.840759, 40.625364], [-73.843062, 40.62...",173,173,11693,Far Rockaway,NY,Brooklyn,36,47,0,9479.917277,3497516.0,http://nyc.pediacities.com/Resource/PostalCode...
1,Feature,MultiPolygon,"[[[[-73.915441, 40.875591], [-73.915087, 40.87...",48,48,10463,Bronx,NY,Manhattan,36,61,0,7791.517127,3119702.0,http://nyc.pediacities.com/Resource/PostalCode...
2,Feature,MultiPolygon,"[[[[-73.932131, 40.869451], [-73.931277, 40.86...",51,51,10033,New York,NY,Manhattan,36,61,0,29415.607123,16156050.0,http://nyc.pediacities.com/Resource/PostalCode...
3,Feature,MultiPolygon,"[[[[-73.837608, 40.743238], [-73.835162, 40.74...",120,120,11367,Flushing,NY,Queens,36,81,0,33506.126605,72810150.0,http://nyc.pediacities.com/Resource/PostalCode...
4,Feature,MultiPolygon,"[[[[-73.880111, 40.837234], [-73.877472, 40.83...",64,64,10472,Bronx,NY,Bronx,36,5,0,27006.042411,30963250.0,http://nyc.pediacities.com/Resource/PostalCode...


Conduct exploratory data analysis to count how many salad shops are in each zip code

In [None]:
df2= df[['properties.postalcode']]
df2.columns=['Zip Code']
df2['Contains Salad Place']=0
df2.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,Zip Code,Contains Salad Place
0,11693,0
1,10463,0
2,10033,0
3,11367,0
4,10472,0
5,10037,0
6,10026,0
7,10119,0
8,10044,0
9,10036,0


In [None]:
all_zip_codes=df['properties.postalcode'].tolist()
zip_with_salad=salad_joints['Zip Code'].tolist()
print("There are {} zip codes total!".format(str(len(all_zip_codes))))

There are 262 zip codes total!


In [None]:
from collections import Counter
Count=Counter(zip_with_salad)


In [None]:
for x in range(0,len(all_zip_codes)):
   if df2['Zip Code'][x] in zip_with_salad:
     df2['Contains Salad Place'][x]=Count[df2['Zip Code'][x]]



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exec(code_obj, self.user_global_ns, self.user_ns)


In [None]:
df2.head(50)

Unnamed: 0,Zip Code,Contains Salad Place
0,11693,0
1,10463,0
2,10033,0
3,11367,0
4,10472,0
5,10037,0
6,10026,0
7,10119,0
8,10044,0
9,10036,1


In [None]:
print(max(df2['Contains Salad Place']))

5


Create a heat map with our data

In [None]:
world_geo = r'/content/nyc_zip_code_tabulation_areas_polygons.geojson' # geojson file

    # create a plain world map
world_map = folium.Map(location=[40.693943, -73.985880], zoom_start=12, tiles='Mapbox Bright')

# generate choropleth map using the total immigration of each country to Canada from 1980 to 2013
folium.Choropleth(
    geo_data=world_geo,
    data=df2,
    columns=['Zip Code', 'Contains Salad Place'],
    key_on='feature.properties.postalcode',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Number of Salad Places Per Neighborhood'
).add_to(world_map)

# display map

world_map

Find which zip codes have the fewest salad shops

In [None]:
sorted(Count.items(), key=lambda kv: kv[1], reverse=False)

[('10573', 1),
 ('11211', 1),
 ('10065', 1),
 ('11105', 1),
 ('10009', 1),
 ('10021', 1),
 ('11217', 1),
 ('11553', 1),
 ('10024', 1),
 ('11238', 1),
 ('11215', 1),
 ('10075', 1),
 ('10018', 1),
 ('10036', 1),
 ('10025', 2),
 ('10005', 2),
 ('10011', 2),
 ('10012', 2),
 ('11201', 2),
 ('10010', 2),
 ('10028', 2),
 ('10016', 3),
 ('10023', 3),
 ('10001', 3),
 ('10014', 3),
 ('10003', 5),
 ('10019', 5)]

##Results
Our results showed that the zip codes 10573, 11211, 10065, 11105, 10009, 10021, 11217, 11553, 10024, 11238, 11215, 10075, 10018, 10036 all only contain a single salad shop. 

Our results also showed that the zip codes 10003, 10019 in Gramercy Park and West Midtown both have 5 salad shops each. These are the most of any zip code we looked at. 

##Obervations
Our results showed that the densest concentrations of salad shops are below central park in midtown Manhattan. The areas with the sparsest concentration of salad shops are lower Manhattan, around the financial districts and the lower east and west sides. 





##Conclusion

In conclusion, I would advise looking further into opening a salad shop in lower Manhattan. This area has the sparsest amount of salad shops. It is also a highly populated area of the city where there are a lot of office buildings where people may be interested in getting a salad for lunch. 