# Description of the problem<br>
New York City is the most populous city in the United States. With an estimated population of 8,336,817, New York is also the most densely populated major city in the United States. Located at the southern tip of the U.S. state of New York, the city is the center of the New York metropolitan area, the largest metropolitan area in the world by urban landmass. New York City comprises 5 boroughs sitting where the Hudson River meets the Atlantic Ocean.<br> 
Due to the high population the city has been hit hard by the recent pandemic. In these odds and unsafe times, the idea is to open a food delivery company which delivers menu items from every restaurant in a specific borough. An analysis for every borough will be made, the most suitable to open the food delivery company will be the one with the most restaurants in it. Finally, a map with the most crowded borough will be displayed.<br>
This analysis will be used to open the company as close as possible to the center of the borough (within 500 m). Another approach will be to consider opening the food delivery company in the neighborhood that has restaurants as common venues.  

# Data section

The FourSquare locations will be used in order to find the borough with the biggest number of restaurants. 
There will be analyzed all 5 boroughs which are: Bronx, Queens, Brooklyn, Manhattan and Staten Island.<br>
After importing all the necessary libraries, every borough's address will be used in order to obtain the geographical coordinates of them as in the following example:

#### Import necessary Libraries

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.4.5.2 |       hecda079_0         147 KB  conda-forge
    certifi-2020.4.5.2         |   py36h9f0ad1d_0         152 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         395 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0

The following packages will b

#### Use geopy library to get the latitude and longitude values of Brooklyn.<br>
#### For each borough the address was changed, in order to obtain the correct geographical coordinates.

In [2]:
address = 'Brooklyn, New York, NY'

geolocator = Nominatim(user_agent="ny_boroughs")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of Brooklyn borough are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Brooklyn borough are 40.6501038, -73.9495823.


# Methodology

#### Define Foursquare Credentials and Version

In [3]:
CLIENT_ID = '2ZPJZI54TDGE1YS52RQOZLAD2DPE4LL0EJWVEO2APDIIPUFS' # your Foursquare ID
CLIENT_SECRET = '0R0FJBE52HE51VYIGAAGLFLKKMWSSWCHBXCUANUCFE4C5PYE' # your Foursquare Secret
VERSION = '20200604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 2ZPJZI54TDGE1YS52RQOZLAD2DPE4LL0EJWVEO2APDIIPUFS
CLIENT_SECRET:0R0FJBE52HE51VYIGAAGLFLKKMWSSWCHBXCUANUCFE4C5PYE


#### Search for restaurant  category within 500 meters from the borough's location

In [4]:
search_query = 'Restaurant'
radius = 500
print(search_query + ' .... OK!')

Restaurant .... OK!


#### Define the corresponding URL

In [5]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=2ZPJZI54TDGE1YS52RQOZLAD2DPE4LL0EJWVEO2APDIIPUFS&client_secret=0R0FJBE52HE51VYIGAAGLFLKKMWSSWCHBXCUANUCFE4C5PYE&ll=40.6501038,-73.9495823&v=20200604&query=Restaurant&radius=500&limit=100'

#### Send the GET Request and examine the results

In [6]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ee64d284d8e2536ebddb460'},
 'response': {'venues': [{'id': '50b91d3de4b0802eb8488f39',
    'name': 'Kreyol Flavor Bakery & Restaurant',
    'location': {'address': '2816 Church Ave',
     'lat': 40.650820092517755,
     'lng': -73.95077705383301,
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.650820092517755,
       'lng': -73.95077705383301}],
     'distance': 128,
     'postalCode': '11226',
     'cc': 'US',
     'city': 'Brooklyn',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['2816 Church Ave',
      'Brooklyn, NY 11226',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d144941735',
      'name': 'Caribbean Restaurant',
      'pluralName': 'Caribbean Restaurants',
      'shortName': 'Caribbean',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/caribbean_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1592151318',
    'hasPerk': False

In [7]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
#number of rows

print('Number of restaurants in Brooklyn:',dataframe.shape[0])
dataframe

  """


Number of restaurants in Brooklyn: 28


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet
0,50b91d3de4b0802eb8488f39,Kreyol Flavor Bakery & Restaurant,"[{'id': '4bf58dd8d48988d144941735', 'name': 'C...",v-1592151318,False,2816 Church Ave,40.65082,-73.950777,"[{'label': 'display', 'lat': 40.65082009251775...",128,11226.0,US,Brooklyn,NY,United States,"[2816 Church Ave, Brooklyn, NY 11226, United S...",
1,4b66446af964a520621b2be3,Golden Krust Caribbean Restaurant,"[{'id': '4bf58dd8d48988d144941735', 'name': 'C...",v-1592151318,False,2223 Church Avenue,40.649106,-73.949252,"[{'label': 'display', 'lat': 40.64910636834001...",114,11226.0,US,Brooklyn,NY,United States,"[2223 Church Avenue (at Nostrand Ave), Brookly...",at Nostrand Ave
2,4f32047919833175d60a1dc3,Annettes Restaurant,"[{'id': '4d4b7105d754a06374d81259', 'name': 'F...",v-1592151318,False,2847 Church Ave,40.650928,-73.949867,"[{'label': 'entrance', 'lat': 40.650901, 'lng'...",94,11226.0,US,Brooklyn,NY,United States,"[2847 Church Ave, Brooklyn, NY 11226, United S...",
3,4be36a22d27a20a12c41925b,Bake & Things Restaurant,"[{'id': '4bf58dd8d48988d144941735', 'name': 'C...",v-1592151318,False,184 E 35th St,40.650835,-73.944777,"[{'label': 'display', 'lat': 40.65083532574861...",413,11203.0,US,Brooklyn,NY,United States,[184 E 35th St (E35th Street and Church Avenue...,E35th Street and Church Avenue
4,4c0c025ca1b32d7fa0e49bf0,Kal's Bakery & Restaurant,"[{'id': '4bf58dd8d48988d144941735', 'name': 'C...",v-1592151318,False,3401 Church Ave,40.651083,-73.94581,"[{'label': 'display', 'lat': 40.65108332491816...",336,11203.0,US,Brooklyn,NY,United States,"[3401 Church Ave, Brooklyn, NY 11203, United S...",
5,4f32721319836c91c7d88db5,Fish Yard Restaurant,"[{'id': '4d4b7105d754a06374d81259', 'name': 'F...",v-1592151318,False,2713 Church Ave,40.650877,-73.951942,"[{'label': 'display', 'lat': 40.650877, 'lng':...",217,11226.0,US,Brooklyn,NY,United States,"[2713 Church Ave, Brooklyn, NY 11226, United S...",
6,4f321b2519836c91c7b676d1,Andy's Restaurant,"[{'id': '4d4b7105d754a06374d81259', 'name': 'F...",v-1592151318,False,3209 Church Ave,40.651131,-73.94731,"[{'label': 'display', 'lat': 40.651131, 'lng':...",223,11226.0,US,Brooklyn,NY,United States,"[3209 Church Ave, Brooklyn, NY 11226, United S...",
7,51be39c2498ec841cbc9fcd2,Healthy Juice and Restaurant,"[{'id': '4bf58dd8d48988d112941735', 'name': 'J...",v-1592151318,False,2180 Bedford Ave,40.650761,-73.956101,"[{'label': 'display', 'lat': 40.65076099999999...",555,11226.0,US,Brooklyn,NY,United States,"[2180 Bedford Ave (Church), Brooklyn, NY 11226...",Church
8,55962573498e4220d728565d,Family Stylez Restaurant,"[{'id': '4bf58dd8d48988d144941735', 'name': 'C...",v-1592151318,False,2710 Church Ave,40.650789,-73.952339,"[{'label': 'display', 'lat': 40.65078910490234...",245,11226.0,US,Brooklyn,NY,United States,"[2710 Church Ave (Rodgers), Brooklyn, NY 11226...",Rodgers
9,4f44b01e19836ed00195d6b0,Macky's Restaurant and Bakery,"[{'id': '4d4b7105d754a06374d81259', 'name': 'F...",v-1592151318,False,794 Rogers Ave,40.651342,-73.95253,"[{'label': 'display', 'lat': 40.651342, 'lng':...",284,11226.0,US,Brooklyn,NY,United States,"[794 Rogers Ave, Brooklyn, NY 11226, United St...",


#### Define information of interest and filter dataframe

In [8]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

print('Number of restaurants Brooklyn:',dataframe_filtered.shape[0])
dataframe_filtered

Number of restaurants Brooklyn: 28


Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,id
0,Kreyol Flavor Bakery & Restaurant,Caribbean Restaurant,2816 Church Ave,40.65082,-73.950777,"[{'label': 'display', 'lat': 40.65082009251775...",128,11226.0,US,Brooklyn,NY,United States,"[2816 Church Ave, Brooklyn, NY 11226, United S...",,50b91d3de4b0802eb8488f39
1,Golden Krust Caribbean Restaurant,Caribbean Restaurant,2223 Church Avenue,40.649106,-73.949252,"[{'label': 'display', 'lat': 40.64910636834001...",114,11226.0,US,Brooklyn,NY,United States,"[2223 Church Avenue (at Nostrand Ave), Brookly...",at Nostrand Ave,4b66446af964a520621b2be3
2,Annettes Restaurant,Food,2847 Church Ave,40.650928,-73.949867,"[{'label': 'entrance', 'lat': 40.650901, 'lng'...",94,11226.0,US,Brooklyn,NY,United States,"[2847 Church Ave, Brooklyn, NY 11226, United S...",,4f32047919833175d60a1dc3
3,Bake & Things Restaurant,Caribbean Restaurant,184 E 35th St,40.650835,-73.944777,"[{'label': 'display', 'lat': 40.65083532574861...",413,11203.0,US,Brooklyn,NY,United States,[184 E 35th St (E35th Street and Church Avenue...,E35th Street and Church Avenue,4be36a22d27a20a12c41925b
4,Kal's Bakery & Restaurant,Caribbean Restaurant,3401 Church Ave,40.651083,-73.94581,"[{'label': 'display', 'lat': 40.65108332491816...",336,11203.0,US,Brooklyn,NY,United States,"[3401 Church Ave, Brooklyn, NY 11203, United S...",,4c0c025ca1b32d7fa0e49bf0
5,Fish Yard Restaurant,Food,2713 Church Ave,40.650877,-73.951942,"[{'label': 'display', 'lat': 40.650877, 'lng':...",217,11226.0,US,Brooklyn,NY,United States,"[2713 Church Ave, Brooklyn, NY 11226, United S...",,4f32721319836c91c7d88db5
6,Andy's Restaurant,Food,3209 Church Ave,40.651131,-73.94731,"[{'label': 'display', 'lat': 40.651131, 'lng':...",223,11226.0,US,Brooklyn,NY,United States,"[3209 Church Ave, Brooklyn, NY 11226, United S...",,4f321b2519836c91c7b676d1
7,Healthy Juice and Restaurant,Juice Bar,2180 Bedford Ave,40.650761,-73.956101,"[{'label': 'display', 'lat': 40.65076099999999...",555,11226.0,US,Brooklyn,NY,United States,"[2180 Bedford Ave (Church), Brooklyn, NY 11226...",Church,51be39c2498ec841cbc9fcd2
8,Family Stylez Restaurant,Caribbean Restaurant,2710 Church Ave,40.650789,-73.952339,"[{'label': 'display', 'lat': 40.65078910490234...",245,11226.0,US,Brooklyn,NY,United States,"[2710 Church Ave (Rodgers), Brooklyn, NY 11226...",Rodgers,55962573498e4220d728565d
9,Macky's Restaurant and Bakery,Food,794 Rogers Ave,40.651342,-73.95253,"[{'label': 'display', 'lat': 40.651342, 'lng':...",284,11226.0,US,Brooklyn,NY,United States,"[794 Rogers Ave, Brooklyn, NY 11226, United St...",,4f44b01e19836ed00195d6b0


New York has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, a dataset is downloaded. It  contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 

In [9]:
import json # library to handle JSON files
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

Next, let's load the data.

In [10]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Let's take a quick look at the data.

In [11]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [12]:
neighborhoods_data = newyork_data['features']

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [13]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [14]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [15]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                        'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Quickly examine the resulting dataframe.

In [16]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [17]:
brooklyn_data = neighborhoods[neighborhoods['Borough']== 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


Let's visualize Brooklyn with the neighborhoods in it.

In [38]:
# create map of Brooklyn using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)  
    
map_brooklyn

#### Explore Neighborhoods in Brooklyn


In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
#run the above function on each neighborhood and create a new dataframe called *brooklyn_venues*.
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude']
                                  )


Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Ditmas Park
Wingate
Rugby
Remsen Village
New Lots
Paerdegat Basin
Mill Basin
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus


#### Analyze Each Neighborhood

In [21]:
# one hot encoding
brooklyn_onehot = pd.get_dummies(brooklyn_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
brooklyn_onehot['Neighborhood'] = brooklyn_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [brooklyn_onehot.columns[-1]] + list(brooklyn_onehot.columns[:-1])
brooklyn_onehot = brooklyn_onehot[fixed_columns]

brooklyn_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [22]:
brooklyn_grouped = brooklyn_onehot.groupby('Neighborhood').mean().reset_index()
brooklyn_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Argentinian Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Bath Beach,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,...,0.0,0.021739,0.021739,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0
1,Bay Ridge,0.000000,0.0,0.0,0.0,0.036145,0.0,0.000000,0.0,0.0,...,0.0,0.012048,0.000000,0.012048,0.0,0.000000,0.000000,0.000000,0.0,0.0
2,Bedford Stuyvesant,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,...,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.034483,0.034483,0.0,0.0
3,Bensonhurst,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,...,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0
4,Bergen Beach,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,...,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Vinegar Hill,0.000000,0.0,0.0,0.0,0.034483,0.0,0.034483,0.0,0.0,...,0.0,0.000000,0.000000,0.000000,0.0,0.034483,0.034483,0.034483,0.0,0.0
66,Weeksville,0.000000,0.0,0.0,0.0,0.062500,0.0,0.000000,0.0,0.0,...,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0
67,Williamsburg,0.029412,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,...,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.029412,0.000000,0.0,0.0
68,Windsor Terrace,0.000000,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,...,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.035714,0.0,0.0


#### Let's print each neighborhood along with the top 5 most common venues

In [24]:
num_top_venues = 5

for hood in brooklyn_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = brooklyn_grouped[brooklyn_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bath Beach----
                  venue  freq
0    Chinese Restaurant  0.07
1              Pharmacy  0.07
2           Pizza Place  0.04
3            Donut Shop  0.04
4  Fast Food Restaurant  0.04


----Bay Ridge----
                 venue  freq
0   Italian Restaurant  0.07
1                  Spa  0.07
2          Pizza Place  0.06
3      Thai Restaurant  0.04
4  American Restaurant  0.04


----Bedford Stuyvesant----
           venue  freq
0    Coffee Shop  0.10
1  Deli / Bodega  0.07
2           Café  0.07
3    Pizza Place  0.07
4            Bar  0.07


----Bensonhurst----
                venue  freq
0  Chinese Restaurant  0.13
1      Ice Cream Shop  0.06
2          Donut Shop  0.06
3    Sushi Restaurant  0.06
4  Italian Restaurant  0.06


----Bergen Beach----
                venue  freq
0     Harbor / Marina   0.4
1          Playground   0.2
2  Athletics & Sports   0.2
3      Baseball Field   0.2
4         Yoga Studio   0.0


----Boerum Hill----
           venue  freq
0   Dance Stud

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [26]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = brooklyn_grouped['Neighborhood']

for ind in np.arange(brooklyn_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(brooklyn_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bath Beach,Pharmacy,Chinese Restaurant,Gas Station,Bubble Tea Shop,Donut Shop,Pizza Place,Italian Restaurant,Fast Food Restaurant,Sandwich Place,Flower Shop
1,Bay Ridge,Spa,Italian Restaurant,Pizza Place,American Restaurant,Bar,Thai Restaurant,Greek Restaurant,Grocery Store,Playground,Sandwich Place
2,Bedford Stuyvesant,Coffee Shop,Pizza Place,Bar,Café,Deli / Bodega,Boutique,Tiki Bar,New American Restaurant,Basketball Court,Gift Shop
3,Bensonhurst,Chinese Restaurant,Ice Cream Shop,Sushi Restaurant,Donut Shop,Italian Restaurant,Grocery Store,Bakery,Hotpot Restaurant,Bagel Shop,Shabu-Shabu Restaurant
4,Bergen Beach,Harbor / Marina,Baseball Field,Playground,Athletics & Sports,Entertainment Service,Ethiopian Restaurant,Event Space,Factory,Falafel Restaurant,Farm


## Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [28]:
# import k-means from clustering stage
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

brooklyn_grouped_clustering = brooklyn_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(brooklyn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 0, 1, 3, 1, 4, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [29]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

brooklyn_merged = brooklyn_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
brooklyn_merged = brooklyn_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

brooklyn_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Brooklyn,Bay Ridge,40.625801,-74.030621,1,Spa,Italian Restaurant,Pizza Place,American Restaurant,Bar,Thai Restaurant,Greek Restaurant,Grocery Store,Playground,Sandwich Place
1,Brooklyn,Bensonhurst,40.611009,-73.99518,1,Chinese Restaurant,Ice Cream Shop,Sushi Restaurant,Donut Shop,Italian Restaurant,Grocery Store,Bakery,Hotpot Restaurant,Bagel Shop,Shabu-Shabu Restaurant
2,Brooklyn,Sunset Park,40.645103,-74.010316,1,Mexican Restaurant,Bakery,Mobile Phone Shop,Pizza Place,Latin American Restaurant,Bank,Fried Chicken Joint,Gym,Creperie,Pharmacy
3,Brooklyn,Greenpoint,40.730201,-73.954241,1,Bar,Pizza Place,Coffee Shop,Cocktail Bar,Sushi Restaurant,French Restaurant,Deli / Bodega,Café,Yoga Studio,Polish Restaurant
4,Brooklyn,Gravesend,40.59526,-73.973471,1,Pizza Place,Lounge,Chinese Restaurant,Italian Restaurant,Bakery,Metro Station,Baseball Field,Liquor Store,Furniture / Home Store,Spa


Finally, let's visualize the resulting clusters

In [31]:

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(brooklyn_merged['Latitude'], brooklyn_merged['Longitude'], brooklyn_merged['Neighborhood'], brooklyn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

#### Cluster 1

In [33]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 0, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
45,Bergen Beach,Harbor / Marina,Baseball Field,Playground,Athletics & Sports,Entertainment Service,Ethiopian Restaurant,Event Space,Factory,Falafel Restaurant,Farm


#### Cluster 2

In [34]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 1, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bay Ridge,Spa,Italian Restaurant,Pizza Place,American Restaurant,Bar,Thai Restaurant,Greek Restaurant,Grocery Store,Playground,Sandwich Place
1,Bensonhurst,Chinese Restaurant,Ice Cream Shop,Sushi Restaurant,Donut Shop,Italian Restaurant,Grocery Store,Bakery,Hotpot Restaurant,Bagel Shop,Shabu-Shabu Restaurant
2,Sunset Park,Mexican Restaurant,Bakery,Mobile Phone Shop,Pizza Place,Latin American Restaurant,Bank,Fried Chicken Joint,Gym,Creperie,Pharmacy
3,Greenpoint,Bar,Pizza Place,Coffee Shop,Cocktail Bar,Sushi Restaurant,French Restaurant,Deli / Bodega,Café,Yoga Studio,Polish Restaurant
4,Gravesend,Pizza Place,Lounge,Chinese Restaurant,Italian Restaurant,Bakery,Metro Station,Baseball Field,Liquor Store,Furniture / Home Store,Spa
5,Brighton Beach,Restaurant,Eastern European Restaurant,Russian Restaurant,Beach,Mobile Phone Shop,Sushi Restaurant,Gourmet Shop,Bank,Non-Profit,Bakery
6,Sheepshead Bay,Dessert Shop,Turkish Restaurant,Sandwich Place,Yoga Studio,Grocery Store,Creperie,Diner,Outlet Store,Restaurant,Chinese Restaurant
7,Manhattan Terrace,Pizza Place,Donut Shop,Ice Cream Shop,Coffee Shop,Chinese Restaurant,Steakhouse,Bank,Bagel Shop,Grocery Store,Organic Grocery
8,Flatbush,Deli / Bodega,Caribbean Restaurant,Pharmacy,Mexican Restaurant,Coffee Shop,Juice Bar,Liquor Store,Sandwich Place,Chinese Restaurant,Bank
9,Crown Heights,Pizza Place,Museum,Café,Liquor Store,Sushi Restaurant,Burger Joint,Candy Store,Supermarket,Bakery,Bagel Shop


#### Cluster 3

In [35]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 2, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
30,Mill Island,Pool,Other Repair Shop,Women's Store,Farmers Market,Ethiopian Restaurant,Event Space,Factory,Falafel Restaurant,Farm,Fast Food Restaurant


#### Cluster 4

In [36]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 3, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Brownsville,Restaurant,Moving Target,Spanish Restaurant,Fried Chicken Joint,Pool,Pizza Place,Performing Arts Venue,Park,Chinese Restaurant,Farmers Market
25,Cypress Hills,Latin American Restaurant,Ice Cream Shop,Donut Shop,Fast Food Restaurant,Fried Chicken Joint,Metro Station,Spanish Restaurant,Bank,Pizza Place,Coffee Shop
26,East New York,Spanish Restaurant,Fried Chicken Joint,Pizza Place,Deli / Bodega,Pharmacy,Chinese Restaurant,Salon / Barbershop,Caribbean Restaurant,Bus Station,Music Venue
34,Borough Park,Bank,Pizza Place,Fast Food Restaurant,Deli / Bodega,Pharmacy,Restaurant,Bistro,Farmers Market,Café,Chinese Restaurant
46,Midwood,Pizza Place,Bakery,Candy Store,Video Game Store,Convenience Store,Pharmacy,Ice Cream Shop,Deli / Bodega,Food,Flower Shop
56,Rugby,Bank,Grocery Store,Caribbean Restaurant,Mobile Phone Shop,Bus Station,Fried Chicken Joint,Seafood Restaurant,Sandwich Place,Chinese Restaurant,Deli / Bodega
58,New Lots,Spanish Restaurant,Pizza Place,Chinese Restaurant,Fried Chicken Joint,Grocery Store,Park,Salon / Barbershop,Furniture / Home Store,Bank,Metro Station
60,Mill Basin,Chinese Restaurant,Pizza Place,Japanese Restaurant,Bagel Shop,Italian Restaurant,Bank,Frozen Yogurt Shop,Sushi Restaurant,Supermarket,Liquor Store
63,Weeksville,Chinese Restaurant,Discount Store,Laundry Service,Café,Liquor Store,Gas Station,Park,Grocery Store,Donut Shop,Juice Bar


#### Cluster 5

In [37]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 4, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,East Flatbush,Caribbean Restaurant,Pharmacy,Liquor Store,Food & Drink Shop,Chinese Restaurant,Park,Supermarket,Fast Food Restaurant,Moving Target,Department Store
27,Starrett City,Bus Station,Caribbean Restaurant,Bus Stop,Pizza Place,American Restaurant,Supermarket,Pharmacy,Donut Shop,Ethiopian Restaurant,Event Space
28,Canarsie,Gym,Food,Chinese Restaurant,Asian Restaurant,Caribbean Restaurant,Women's Store,Falafel Restaurant,Ethiopian Restaurant,Event Space,Factory
29,Flatlands,Pharmacy,Caribbean Restaurant,Fried Chicken Joint,Fast Food Restaurant,Paper / Office Supplies Store,Bar,Video Store,Discount Store,Lounge,Dry Cleaner
57,Remsen Village,Caribbean Restaurant,Fast Food Restaurant,Sandwich Place,Gym,Fish Market,Fried Chicken Joint,Supermarket,Spa,Donut Shop,Breakfast Spot
64,Broadway Junction,Donut Shop,Bus Stop,Diner,Bus Station,Fried Chicken Joint,Sandwich Place,Gas Station,Nightclub,Supermarket,Caribbean Restaurant
69,Erasmus,Caribbean Restaurant,Yoga Studio,Juice Bar,Convenience Store,Pharmacy,Donut Shop,Sandwich Place,Music Venue,Mobile Phone Shop,Food Truck


Analyzing the above map and dataframes, one can see the most common venue for each cluster:<br>
*Cluster 1* (label =0) – entertainment such as baseball field, playground, event space<br>
*Cluster 2* (label =1) – mostly restaurants and coffee shops<br>
*Cluster 3* (label =2) – it look a lot like cluster 1, having mostly entertainment venues<br>
*Cluster 4* (label =3) – Chinese and Japanese restaurants as well as pizza places<br>
*Cluster 5* (label =4) – Caribbean Restaurant, pharmacies and supermarkets