# The Best Location for a New Juice Bar in Manhattan




## Introduction

In this project we will calculate the best location to open a juice bar in the borough of Manhattan in New York City. The best location will be determined based on the vicinity of other points of interest for our potential customers and the maximal distance from other similar venues. The dataset of venues and their description and location is available through the FourSquare Places API. 

#### Import of all neccessary libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

#### New York has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 


In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

Next, let's load the data.

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Let's take a quick look at the data.

In [4]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [5]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

In [6]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [8]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [9]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Quickly examine the resulting dataframe.

In [10]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


And make sure that the dataset has all 5 boroughs and 306 neighborhoods.

In [11]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


### Lets create a table consisting only of Manhattan neighborhoods

However, for illustration purposes, let's simplify the above map and segment and cluster only the neighborhoods in Manhattan. So let's slice the original dataframe and create a new dataframe of the Manhattan data.

In [14]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


Let's get the geographical coordinates of Manhattan.

In [16]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


As we did with all of New York City, let's visualizat Manhattan the neighborhoods in it.

In [17]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [21]:
CLIENT_ID = 'FMPS5PKTOW4TFKW53XU3ES4AAUOWLCVHK0KVI131CG5RCLHI' # your Foursquare ID
CLIENT_SECRET = 'VNEBHFEU4QBZ02ESPKVJQKEDGX2R1F2OXUYC40RKCGTVHRBG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

LIMIT=100
radius=500

Your credentails:
CLIENT_ID: FMPS5PKTOW4TFKW53XU3ES4AAUOWLCVHK0KVI131CG5RCLHI
CLIENT_SECRET:VNEBHFEU4QBZ02ESPKVJQKEDGX2R1F2OXUYC40RKCGTVHRBG


## 2. Explore Neighborhoods in Manhattan

#### Function to repeat the same process to all the neighborhoods in Manhattan

In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [23]:


manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )



Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


Double-click __here__ for the solution.
<!-- The correct answer is:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )
--> 

#### Resulting dataframe

In [24]:
print(manhattan_venues.shape)
manhattan_venues.head()

(2996, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop


Number of venues returned for each neighborhood

In [25]:
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,59,59,59,59,59,59
Carnegie Hill,85,85,85,85,85,85
Central Harlem,45,45,45,45,45,45
Chelsea,97,97,97,97,97,97
Chinatown,100,100,100,100,100,100
Civic Center,87,87,87,87,87,87
Clinton,100,100,100,100,100,100
East Harlem,43,43,43,43,43,43
East Village,100,100,100,100,100,100
Financial District,100,100,100,100,100,100


#### Let's find out how many unique categories can be curated from all the returned venues

In [26]:
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 329 uniques categories.


In [58]:
x=manhattan_venues.groupby('Venue Category').count().reset_index()
x

Unnamed: 0,Venue Category,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,Accessories Store,1,1,1,1,1,1
1,Adult Boutique,1,1,1,1,1,1
2,Afghan Restaurant,1,1,1,1,1,1
3,African Restaurant,3,3,3,3,3,3
4,American Restaurant,62,62,62,62,62,62
5,Antique Shop,1,1,1,1,1,1
6,Arcade,1,1,1,1,1,1
7,Arepa Restaurant,3,3,3,3,3,3
8,Argentinian Restaurant,4,4,4,4,4,4
9,Art Gallery,35,35,35,35,35,35


In [75]:
points_of_interest=x.iloc[[13,29,37,62,76,],0]

points_of_interest=['Athletics & Sports','Bike Rental / Bike Share','Boxing Gym','Climbing Gym','Cosmetics Shop','Cycle Studio',
                     'Dance Studio','Dog Run','Farmers Market', 'Garden','Gym','Gym / Fitness Center','Health & Beauty Service','Health Food Store','Martial Arts Dojo','Massage Studio','Park','Pilates Studio','Playground','Salad Place','Sporting Goods Shop','Supplement Shop','Yoga Studio']

points_of_interest

['Athletics & Sports',
 'Bike Rental / Bike Share',
 'Boxing Gym',
 'Climbing Gym',
 'Cosmetics Shop',
 'Cycle Studio',
 'Dance Studio',
 'Dog Run',
 'Farmers Market',
 'Garden',
 'Gym',
 'Gym / Fitness Center',
 'Health & Beauty Service',
 'Health Food Store',
 'Martial Arts Dojo',
 'Massage Studio',
 'Park',
 'Pilates Studio',
 'Playground',
 'Salad Place',
 'Sporting Goods Shop',
 'Supplement Shop',
 'Yoga Studio']

In [89]:

poi=manhattan_venues[manhattan_venues['Venue Category'].isin(points_of_interest)].reset_index(drop=True)
print(poi.shape)
poi.head()

(398, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
1,Marble Hill,40.876551,-73.91066,Astral Fitness & Wellness Center,40.876705,-73.906372,Gym
2,Marble Hill,40.876551,-73.91066,Blink Fitness,40.877271,-73.905595,Gym
3,Marble Hill,40.876551,-73.91066,Vitamin Shoppe,40.87716,-73.905632,Supplement Shop
4,Chinatown,40.715618,-73.994279,Sky Ting Yoga,40.715352,-73.992666,Yoga Studio


In [121]:
map_poi = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(poi['Venue Latitude'], poi['Venue Longitude'], poi['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.4,
        parse_html=False).add_to(map_poi)  
    
map_poi

## DBSCAN clustering of points of interest

<a id='item3'></a>

In [87]:
import numpy as np 
from sklearn.cluster import DBSCAN 
import sklearn.utils
from sklearn.preprocessing import StandardScaler

In [105]:

sklearn.utils.check_random_state(1000)
Clus_dataSet = poi[['Venue Latitude','Venue Longitude']]
Clus_dataSet = np.nan_to_num(Clus_dataSet)
Clus_dataSet = StandardScaler().fit_transform(Clus_dataSet)

# Compute DBSCAN
db = DBSCAN(eps=0.15, min_samples=10).fit(Clus_dataSet)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
poi["Clus_Db"]=labels

realClusterNum=len(set(labels)) - (1 if -1 in labels else 0)
clusterNum = len(set(labels)) 


# A sample of clusters
x=poi['Clus_Db'].unique()
x


print('We have', len(x), 'clusters of interest')

poi[["Venue","Venue Latitude","Venue Longitude","Clus_Db"]].head(20)

We have 10 clusters of interest


Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Clus_Db
0,Bikram Yoga,40.876844,-73.906204,-1
1,Astral Fitness & Wellness Center,40.876705,-73.906372,-1
2,Blink Fitness,40.877271,-73.905595,-1
3,Vitamin Shoppe,40.87716,-73.905632,-1
4,Sky Ting Yoga,40.715352,-73.992666,-1
5,oo35mm.com,40.716605,-73.99789,5
6,Highest Natural Point In Manhattan,40.852843,-73.93765,-1
7,Bennett Park,40.852967,-73.937874,-1
8,The Vitamin Shoppe,40.850117,-73.935131,-1
9,Planet Fitness,40.847536,-73.937937,-1


### Cluster Visualization

In [106]:
kclusters=len(x)

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, p, cluster in zip(poi['Venue Latitude'], poi['Venue Longitude'], poi['Venue'], poi['Clus_Db']):
    label = folium.Popup(str(p) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Determening the location of existing Juice Bars

#### In order not to set up our juice bar near other venues of the same king

In [107]:
juice_bars=manhattan_venues[manhattan_venues['Venue Category']=='Juice Bar'].reset_index(drop=True)
print(juice_bars.shape)
juice_bars.head()

(26, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Chinatown,40.715618,-73.994279,Hawa Smoothies,40.7142,-73.98939,Juice Bar
1,Inwood,40.867684,-73.92121,disFruta,40.864613,-73.919199,Juice Bar
2,Manhattanville,40.816934,-73.957385,Oasis Juice Bar,40.815017,-73.958879,Juice Bar
3,Central Harlem,40.815976,-73.943211,Rejuvenate,40.813872,-73.944142,Juice Bar
4,Upper East Side,40.775639,-73.960508,JOE & THE JUICE,40.775665,-73.958473,Juice Bar


#### We can visualize the location of other juice bars

In [108]:
map_juice = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(juice_bars['Venue Latitude'],juice_bars['Venue Longitude'], juice_bars['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_juice)  
    
map_juice

#### And now, to determine the best location for our new juice bar we need to cross-refference the locations of existing juice bars with the spread of clusters of points of interes

In [119]:
competition = folium.map.FeatureGroup()


for lat, lng, in zip(juice_bars['Venue Latitude'],juice_bars['Venue Longitude']):
    competition.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            icon=folium.Icon(color='blue'),
            radius=8, 
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add incidents to map
map_clusters.add_child(competition)

<a id='item4'></a>

In [131]:


r=manhattan_venues[manhattan_venues['Venue Category']=='Sporting Goods Shop']
r



Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
203,Washington Heights,40.851903,-73.9369,Modell's Sporting Goods,40.849397,-73.934406,Sporting Goods Shop
713,Lenox Hill,40.768113,-73.95886,New Balance,40.76755,-73.962297,Sporting Goods Shop
730,Lenox Hill,40.768113,-73.95886,Nike Running,40.766419,-73.962846,Sporting Goods Shop
943,Clinton,40.759101,-73.996119,Pan Aqua Diving,40.759939,-73.99452,Sporting Goods Shop
1058,Midtown,40.754691,-73.981669,NBA Store,40.755305,-73.979377,Sporting Goods Shop
1063,Midtown,40.754691,-73.981669,adidas,40.756222,-73.979094,Sporting Goods Shop
1404,Greenwich Village,40.726933,-73.999914,Nike Soho,40.723241,-73.998898,Sporting Goods Shop
1706,Little Italy,40.719324,-73.997305,Nike Soho,40.723241,-73.998898,Sporting Goods Shop
1728,Soho,40.722184,-74.000657,Nike Soho,40.723241,-73.998898,Sporting Goods Shop
2442,Civic Center,40.715229,-74.005415,Best Made Company,40.718881,-74.004745,Sporting Goods Shop


In [None]:
['Athletics & Sports',
 'Bike Rental / Bike Share',
 'Boxing Gym',
 'Climbing Gym',
 'Cosmetics Shop',
 'Cycle Studio',
 'Dance Studio',
 'Dog Run',
 'Farmers Market',
 'Garden',
 'Gym',
 'Gym / Fitness Center',
 'Health & Beauty Service',
 'Health Food Store',
 'Martial Arts Dojo',
 'Massage Studio',
 'Park',
 'Pilates Studio',
 'Playground',
 'Salad Place',
 'Sporting Goods Shop',
 'Supplement Shop',
 'Yoga Studio']