# Battle of the Neighborhoods

## Which Neighborhood to Open a New Flower Shop?


# Introduction:

In the race for space in New York's Flower District, a client would like to examine the competition between Manhatten and Queens for current flower shops. Given the client has a distribution through Queens, the client would like to settle the question; would a new flower shop best serve the business in Manhatten or Queens?

## Executive Summary:
Manhatten is known for it's many "districts", these are areas where a single commodity is sold. There is the garment district, meat district, flower district, etc. Our client would like to maximize their new distribution from Queens as well as reduce competition to decide if opening a new shop would be better in the Manhatten flower district or in Queens. Using data on neighborhoods from a .csv file, we will clean the data using Pandas to look at the different neighborhoods of New York City. Using Foursquare, we will then extract the flower shops venues and compare the two neighborhoods through K Means Clustering. Our result is that flower shops are centered in Manhatten with a few in Queeens and our client would be best served to focus ongoing efforts in that borough of New York. 

# Methodologies:
The methods used for this analysis are as follow:
    - Pandas for cleaning and processing the data
    - K Means for statistical analysis
    - Foursquare API for gathering venue data
    - MatPlotLib via Folium for visualization of the cluster centers

In [1]:
#import necessary libaries
!pip install geopy
!pip install folium
import pandas as pd
import folium
import requests
import json
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
from IPython.display import Image 
from IPython.core.display import HTML
from pandas.io.json import json_normalize
print("Libraries Imported")

Libraries Imported


In [17]:
url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json'
resp = requests.get(url)
newyork_data = json.loads(resp.text)
newyork_data


{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

In [20]:
neighborhoods_data = newyork_data['features']

In [21]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [22]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [23]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods.head    

<bound method NDFrame.head of            Borough  Neighborhood   Latitude  Longitude
0            Bronx     Wakefield  40.894705 -73.847201
1            Bronx    Co-op City  40.874294 -73.829939
2            Bronx   Eastchester  40.887556 -73.827806
3            Bronx     Fieldston  40.895437 -73.905643
4            Bronx     Riverdale  40.890834 -73.912585
..             ...           ...        ...        ...
301      Manhattan  Hudson Yards  40.756658 -74.000111
302         Queens       Hammels  40.587338 -73.805530
303         Queens     Bayswater  40.611322 -73.765968
304         Queens  Queensbridge  40.756091 -73.945631
305  Staten Island     Fox Hills  40.617311 -74.081740

[306 rows x 4 columns]>

In [62]:
#Get coordinates for Manhatten as we are only interested in that borough 
address = 'Queens, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Queens are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7498243, -73.7976337.


In [94]:
#Superimpose over map of Manhatten via Folium
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

NameError: name 'manhattan_data' is not defined

In [65]:
#Foursquare using Manhatten coordinates API Call:
url = 'https://api.foursquare.com/v2/venues/search'

params = dict(
client_id='P0YG5A50MPT5RO30WU5DO2RHOQRFUNJ13OOSE2EYYRXBZDM4',
client_secret='CEZM4SL3F21K0DPTQG0IYTIJ53JHXBWLGZIUUHG42RU13DB2',
v='20180323',
ll='40.7498243, -73.7976337',
query='flower shop',
limit=100
)

resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
data


{'meta': {'code': 200, 'requestId': '606b2e434ba45b203b3ea937'},
 'response': {'venues': [{'id': '49c4489df964a520b9561fe3',
    'name': 'Sycamore Flower Shop + Bar',
    'contact': {},
    'location': {'address': '1118 Cortelyou Rd',
     'crossStreet': 'btwn Stratford & Westminster Rd.',
     'lat': 40.63967564567376,
     'lng': -73.96715718953989,
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.63967564567376,
       'lng': -73.96715718953989},
      {'label': 'entrance', 'lat': 40.639799, 'lng': -73.967221}],
     'distance': 18843,
     'postalCode': '11218',
     'cc': 'US',
     'city': 'Brooklyn',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['1118 Cortelyou Rd (btwn Stratford & Westminster Rd.)',
      'Brooklyn, NY 11218',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d116941735',
      'name': 'Bar',
      'pluralName': 'Bars',
      'shortName': 'Bar',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/cate

In [66]:
#Transforming the json results to a Pandas dataframe:
# assign relevant part of JSON to venues
venues = data['response']['venues']
nearby_venues = json_normalize(venues)
nearby_venues.head()

  nearby_venues = json_normalize(venues)


Unnamed: 0,id,name,categories,verified,referralId,venueChains,hasPerk,location.address,location.crossStreet,location.lat,...,delivery.provider.icon.sizes,delivery.provider.icon.name,beenHere.count,beenHere.lastCheckinExpiredAt,beenHere.marked,beenHere.unconfirmedCount,venuePage.id,hereNow.count,hereNow.summary,hereNow.groups
0,49c4489df964a520b9561fe3,Sycamore Flower Shop + Bar,"[{'id': '4bf58dd8d48988d116941735', 'name': 'B...",True,v-1617636931,[],False,1118 Cortelyou Rd,btwn Stratford & Westminster Rd.,40.639676,...,"[40, 50]",/delivery_provider_seamless_20180129.png,0,0,False,0,68938809.0,0,Nobody here,[]
1,50d85651e4b07f4e6e4d02b6,Ditmars Flower Shop,"[{'id': '4bf58dd8d48988d11b951735', 'name': 'F...",True,v-1617636931,[],False,29-11 Ditmars Blvd,,40.776651,...,,,0,0,False,0,102342337.0,0,Nobody here,[]
2,4b080841f964a520980223e3,Soy Bean Chen Flower Shop,"[{'id': '4bf58dd8d48988d11b951735', 'name': 'F...",False,v-1617636931,[],False,135-26 Roosevelt Ave,,40.75927,...,,,0,0,False,0,,0,Nobody here,[]
3,4f911b136b747fc20fbe5fcc,Summit Flower Shop,"[{'id': '4bf58dd8d48988d11b951735', 'name': 'F...",False,v-1617636931,[],False,912 Summit Ave,,40.759702,...,,,0,0,False,0,,0,Nobody here,[]
4,51e6d199498e9176091f9fb3,Flower Shop,"[{'id': '4bf58dd8d48988d11b951735', 'name': 'F...",False,v-1617636931,[],False,,,40.585746,...,,,0,0,False,0,,0,Nobody here,[]


In [69]:
nearby_venues.shape

(50, 37)

In [72]:
# filter columns
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
#nearby_venues['categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Sycamore Flower Shop + Bar,"[{'id': '4bf58dd8d48988d116941735', 'name': 'B...",40.639676,-73.967157
1,Ditmars Flower Shop,"[{'id': '4bf58dd8d48988d11b951735', 'name': 'F...",40.776651,-73.91121
2,Soy Bean Chen Flower Shop,"[{'id': '4bf58dd8d48988d11b951735', 'name': 'F...",40.75927,-73.831125
3,Summit Flower Shop,"[{'id': '4bf58dd8d48988d11b951735', 'name': 'F...",40.759702,-74.042698
4,Flower Shop,"[{'id': '4bf58dd8d48988d11b951735', 'name': 'F...",40.585746,-73.954768


In [73]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

50 venues were returned by Foursquare.


In [80]:
df_flower_shop = nearby_venues.drop(['name', 'categories'], axis =1)
df_flower_shop.shape

(50, 2)

In [81]:
# set number of clusters
kclusters = 5

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_flower_shop)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 0, 2, 2, 0, 0, 2, 2, 2])

In [84]:
cluster_centers = kmeans.cluster_centers_
cluster_centers

array([[ 40.7953633 , -73.90782286],
       [ 40.95343853, -73.84575536],
       [ 40.69796661, -73.96308048],
       [ 40.943469  , -74.21133   ],
       [ 40.75819907, -73.67709444]])

In [112]:
#Add cluster name 
df_flower_shop['Cluster Labels'] = kmeans.labels_
df_flower_shop.head()

Unnamed: 0,lat,lng,Cluster Labels
0,40.639676,-73.967157,2
1,40.776651,-73.91121,0
2,40.75927,-73.831125,0
3,40.759702,-74.042698,2
4,40.585746,-73.954768,2


In [101]:
import numpy as np
from numpy import linspace
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
print(x,ys)

[0 1 2 3 4] [array([0, 1, 2, 3, 4]), array([ 1,  3,  7, 13, 21]), array([ 2,  7, 20, 41, 70]), array([  3,  13,  41,  87, 151]), array([  4,  21,  70, 151, 264])]


In [121]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
#x = np.arange(kclusters)
#ys = [i + x + (i*x)**2 for i in range(kclusters)
#colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
#rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
#markers_colors = []
markers_colors = []
for lat, lon, cluster in zip(df_flower_shop['lat'], df_flower_shop['lng'], df_flower_shop['Cluster Labels']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
    [lat, lon],
    radius=5,
    popup=label,
    
    fill=True,
    
    fill_opacity=0.7).add_to(map_clusters)


map_clusters

# Summary:
It is through measuring the center of the K Means clusters of the venue flower shop that a clear and distinct result showed that Queeens has very few flower shops as they are centered in Manhatten. It is with this in mind that we are advising the client to seek further into Queens real estate and to focus their marketing and benchmarking for the new flower shop on that borough of New York. If the client does not wish to open a shop in Queeens, this analysis should allow the client to make an informed statistical decision on opening a shop in Manhatten given the center of the clusters of flower shops located there. This concludes this study. 