##    RECOMMENDING WHERE TO OPEN A RESTAURANT 

#### In this projecct we likely predict all the neighborhoods which have less restaurants so that it would be a better opportunity for someone to open a restaurant in those neighborhoods

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ###############################

<a id='item1'></a>

## Download and Explore Dataset

Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 

Luckily, this dataset exists for free on the web.  https://geo.nyu.edu/catalog/nyu_2451_34572

 Simply run a `wget` command and access the data.

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

Next, let's load the data.

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Let's take a quick look at the data.

In [4]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [5]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

In [6]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [8]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [9]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [10]:
neighborhoods.head()


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [11]:
len(neighborhoods['Borough'].unique())

5

In [12]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [35]:
neighborhoods.shape

(306, 4)

Use geopy library to get the latitude and longitude values of New York City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [40]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Create a map of New York with neighborhoods superimposed on top.

In [41]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

#### Define Foursquare Credentials and Version

In [15]:
CLIENT_ID = '3ZHPI4H2VNH0BYJURIRTVF244KLZEA2N51VUYOXD233GG3WR' # your Foursquare ID
CLIENT_SECRET = 'ZQZNVKQQFFVQLMZJERF2V1B0GUCPQMUSCPVHOIURDHGWQRVU' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3ZHPI4H2VNH0BYJURIRTVF244KLZEA2N51VUYOXD233GG3WR
CLIENT_SECRET:ZQZNVKQQFFVQLMZJERF2V1B0GUCPQMUSCPVHOIURDHGWQRVU


Using new york latitude and longitude we obtain different restaurants at differnet neighborhoods

In [16]:
neighborhood_latitude=40.7280
neighborhood_longitude=-74.0060
radius=1000
LIMIT=100
query='Restaurant'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION,query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=3ZHPI4H2VNH0BYJURIRTVF244KLZEA2N51VUYOXD233GG3WR&client_secret=ZQZNVKQQFFVQLMZJERF2V1B0GUCPQMUSCPVHOIURDHGWQRVU&ll=40.728,-74.006&v=20180605&query=Restaurant&radius=1000&limit=100'

Send the GET request and examine the resutls

In [17]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ebc408b963d29001bcb3217'},
 'response': {'venues': [{'id': '4bbf6b6430c99c74302a5511',
    'name': 'PJ Charlton Italian Restaurant',
    'location': {'address': '549 Greenwich St',
     'lat': 40.72744633153894,
     'lng': -74.00865622880806,
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.72744633153894,
       'lng': -74.00865622880806},
      {'label': 'entrance', 'lat': 40.727328, 'lng': -74.008749}],
     'distance': 232,
     'postalCode': '10013',
     'cc': 'US',
     'city': 'New York',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['549 Greenwich St',
      'New York, NY 10013',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d110941735',
      'name': 'Italian Restaurant',
      'pluralName': 'Italian Restaurants',
      'shortName': 'Italian',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/italian_',
       'suffix': '.png'},
      'primary': True}],

#### Now we are ready to clean the json and structure it into a *pandas* dataframe.

<a id='item2'></a>

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000,LIMIT=60):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
       
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&query={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            query,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['venues']
        
   
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng']) 
            for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude']
    
    return(nearby_venues)

#### Now write the code to run the above function on New york data and create a new dataframe called *newyork_venues*.

In [19]:
# type your answer here

newyork_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

newyork_venues.head()

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,Wakefield,40.894705,-73.847201,Bay 241 Restaurant & Lounge,40.901347,-73.846554
1,Wakefield,40.894705,-73.847201,Big Daddy's Caribbean Taste Restaurant,40.899767,-73.857135
2,Wakefield,40.894705,-73.847201,Kaieteur Restaurant & Bakery,40.899768,-73.857184
3,Wakefield,40.894705,-73.847201,Cool Running Restaurant,40.898399,-73.84881
4,Wakefield,40.894705,-73.847201,Bay restaurant,40.89085,-73.84886


In [21]:
newyork_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,Wakefield,40.894705,-73.847201,Bay 241 Restaurant & Lounge,40.901347,-73.846554
1,Wakefield,40.894705,-73.847201,Big Daddy's Caribbean Taste Restaurant,40.899767,-73.857135
2,Wakefield,40.894705,-73.847201,Kaieteur Restaurant & Bakery,40.899768,-73.857184
3,Wakefield,40.894705,-73.847201,Cool Running Restaurant,40.898399,-73.84881
4,Wakefield,40.894705,-73.847201,Bay restaurant,40.89085,-73.84886


#### Let's check the size of the resulting dataframe

In [22]:
print(newyork_venues.shape)


(9042, 6)


Let's check how many venues were returned for each neighborhood

In [23]:
restaurants=newyork_venues.groupby('Neighborhood').count()
restaurants=restaurants.sort_values(by='Venue',ascending=True)
restaurants=restaurants.reset_index()
restaurants.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
0,Somerville,1,1,1,1,1
1,Neponsit,1,1,1,1,1
2,Lighthouse Hill,1,1,1,1,1
3,Breezy Point,1,1,1,1,1
4,Manor Heights,1,1,1,1,1


### K-means clustering to the neighborhoods

Run k-means to cluster the neighborhood into 3 clusters.

In [24]:
kclusters = 3

restaurants_grouped_clustering = restaurants.drop('Neighborhood',axis=1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(restaurants_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:190] 

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

<a id='item3'></a>

<a id='item4'></a>

In [25]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [29]:
restaurants.insert(0,'Cluster Labels',kmeans.labels_)

Noticing the number of neighborhoods present at each cluster.

In [30]:
restaurants1=restaurants.iloc[0:298,0:4]
restaurants1.head()
restaurants1['Cluster Labels'].value_counts()

1    137
2     94
0     67
Name: Cluster Labels, dtype: int64

##### Merge the cluster labels data with the newyork_venues data for visualization

In [31]:
restaurants_merged = neighborhoods.join(restaurants1.set_index('Neighborhood'), on='Neighborhood')
restaurants_merged.head()
restaurants_merged1=restaurants_merged.drop(columns={'Neighborhood Latitude','Neighborhood Longitude'})
restaurants_merged1.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels
0,Bronx,Wakefield,40.894705,-73.847201,1.0
1,Bronx,Co-op City,40.874294,-73.829939,2.0
2,Bronx,Eastchester,40.887556,-73.827806,2.0
3,Bronx,Fieldston,40.895437,-73.905643,0.0
4,Bronx,Riverdale,40.890834,-73.912585,0.0


In [32]:
restaurants_merged1=restaurants_merged1.dropna(subset=['Latitude','Longitude','Cluster Labels'])
restaurants_merged1.isnull().sum()

Borough           0
Neighborhood      0
Latitude          0
Longitude         0
Cluster Labels    0
dtype: int64

#### This is the final merged data

In [33]:
restaurants_merged1.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels
0,Bronx,Wakefield,40.894705,-73.847201,1.0
1,Bronx,Co-op City,40.874294,-73.829939,2.0
2,Bronx,Eastchester,40.887556,-73.827806,2.0
3,Bronx,Fieldston,40.895437,-73.905643,0.0
4,Bronx,Riverdale,40.890834,-73.912585,0.0


#### Finally, let's visualize the resulting clusters using folium

In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(restaurants_merged1['Latitude'], restaurants_merged1['Longitude'], restaurants_merged1['Neighborhood'], restaurants_merged1['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

## Examine Clusters

#### This results section shows what all neighnorhoods are present at each cluster. It is useful for the person who wants to open a restaurant.

#### Cluster 1 shows about the neighborhoods with very few restaurants

In [156]:
restaurants_merged1.loc[restaurants_merged1['Cluster Labels'] == 2, restaurants_merged1.columns[[1] + list(range(5, restaurants_merged1.shape[1]))]]

Unnamed: 0,Neighborhood
1,Co-op City
2,Eastchester
10,Baychester
12,City Island
24,Hunts Point
27,Clason Point
28,Throgs Neck
36,North Riverdale
40,Castle Hill
42,Pelham Gardens


#### Cluster 2 shows about the neighborhoods with moderate restaurants

In [157]:
restaurants_merged1.loc[restaurants_merged1['Cluster Labels'] == 1, restaurants_merged1.columns[[1] + list(range(5, restaurants_merged1.shape[1]))]]

Unnamed: 0,Neighborhood
0,Wakefield
5,Kingsbridge
6,Marble Hill
8,Norwood
9,Williamsbridge
11,Pelham Parkway
13,Bedford Park
14,University Heights
15,Morris Heights
16,Fordham


#### Cluster 3 shows about the neighborhoods with plenty of restaurants

In [158]:
restaurants_merged1.loc[restaurants_merged1['Cluster Labels'] == 0, restaurants_merged1.columns[[1] + list(range(5, restaurants_merged1.shape[1]))]]

Unnamed: 0,Neighborhood
3,Fieldston
4,Riverdale
7,Woodlawn
22,Port Morris
26,Soundview
29,Country Club
31,Westchester Square
33,Morris Park
35,Spuyten Duyvil
37,Pelham Bay


### This is the end of the project
### Thus this project aims to deliver the information about the restaurants at each and every neighbourhood which helps someone who is planning to open a restaurant