<h2>Predictive Analysis for the ride-hailing service</h2>

This Notebook is used to find the trending venues across New york city and then cluster them for analysis.
First step is to import all the necessary packages.

In [1]:
# Downloading all the dependencies
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.18.1               |             py_0          51 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          84 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0     conda-forge

The following packages will be UPDATED:

    geopy:         1.11.0-py36_0 conda-forge --> 1.18.1-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.18.1         | 51 KB     | ##################################### | 100% 
geographiclib-1.49   | 32 KB     | ##################################### | 100% 
Preparing transaction: done

<h3>1. Analyzing the New York Data set</h3>

Data source for gathering the data and converting it into a pandas dataframe is performed in the 'Newyork_data.csv' file. We read the file here and write the data into newyork_data dataframe.

In [2]:
#Read the Newyork neighborhoods data from a dataframe
newyork_data = pd.read_csv('Newyork_data.csv')
newyork_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,10453,Bronx,Central Bronx
1,10457,Bronx,Central Bronx
2,10460,Bronx,Central Bronx
3,10458,Bronx,Bronx Park and Fordham
4,10467,Bronx,Bronx Park and Fordham


<b>Getting the latitude and longitude values for the Newyork City</b>

Using the geopy package, get the latitude and longitude of the location.

In [3]:
address = 'Newyork, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of New York City are 40.83975585, -73.9414480148711.


<b>Defining the credentials to be used for the Four Square API </b>

In [4]:
CLIENT_ID = '4AEKLZ1UYSPNLB5EX2GGFFPGBWP2CPOOEXKSUTEOGOBAPLG4' # Needs to be changed when replicating
CLIENT_SECRET = 'H2EUNDFBGLAJDJZ5HYRSF3E2LJ3DYIVYFHNMLTTJ3UMQUISQ' # Needs to be changed when replicating
VERSION = '20190119'

In [5]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 32000 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/trending?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)

# display URL
print(url)

https://api.foursquare.com/v2/venues/trending?&client_id=4AEKLZ1UYSPNLB5EX2GGFFPGBWP2CPOOEXKSUTEOGOBAPLG4&client_secret=H2EUNDFBGLAJDJZ5HYRSF3E2LJ3DYIVYFHNMLTTJ3UMQUISQ&v=20190119&ll=40.83975585,-73.9414480148711&radius=32000&limit=100


<b>Let's call the API and store the returned data in results</b>

We use the requests package to get the JSON file for the URL

In [6]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c428d14f594df0e8ef29219'},
 'response': {'venues': [{'id': '4ace6c89f964a52078d020e3',
    'name': 'LaGuardia Airport (LGA) (LaGuardia Airport)',
    'location': {'address': 'Grand Central Pkwy',
     'lat': 40.77288813003166,
     'lng': -73.86880874633789,
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.77288813003166,
       'lng': -73.86880874633789}],
     'distance': 9636,
     'postalCode': '11369',
     'cc': 'US',
     'city': 'East Elmhurst',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['Grand Central Pkwy',
      'East Elmhurst, NY 11369',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d1ed931735',
      'name': 'Airport',
      'pluralName': 'Airports',
      'shortName': 'Airport',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/airport_',
       'suffix': '.png'},
      'primary': True}],
    'venuePage': {'id': '72484665'}},
   {'id': '584f1224

<b>Let's normalize the json and store the data in t_venues</b>

json_normalize() is used to flatten the json file and present the data in a tabular format

In [7]:
venues = results["response"]['venues']
t_venues = json_normalize(venues)
t_venues

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,events.count,events.summary,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,venuePage.id
0,"[{'id': '4bf58dd8d48988d1ed931735', 'name': 'A...",,,,,,,,,4ace6c89f964a52078d020e3,Grand Central Pkwy,US,East Elmhurst,United States,,9636,"[Grand Central Pkwy, East Elmhurst, NY 11369, ...","[{'label': 'display', 'lat': 40.77288813003166...",40.772888,-73.868809,,11369,NY,LaGuardia Airport (LGA) (LaGuardia Airport),72484665.0
1,"[{'id': '4bf58dd8d48988d1e5931735', 'name': 'M...",,,,,,,,,584f1224a55db06c5d0b0d17,319 Frost St,US,Brooklyn,United States,at Debevoise St,13413,"[319 Frost St (at Debevoise St), Brooklyn, NY ...","[{'label': 'display', 'lat': 40.71927698026867...",40.719277,-73.938823,,11222,NY,Brooklyn Steel,
2,"[{'id': '4bf58dd8d48988d17f941735', 'name': 'M...",,,,,,,30.0,30 movies,40afe980f964a5203bf31ee3,234 W 42nd St,US,New York,United States,btwn 7th & 8th Ave,10064,"[234 W 42nd St (btwn 7th & 8th Ave), New York,...",,40.756823,-73.98902,,10036,NY,AMC Empire 25,
3,"[{'id': '4bf58dd8d48988d137941735', 'name': 'T...",,,,,,,,,50de3d05e4b0b7819ae447ec,213 West 42nd Street,US,New York,United States,,10060,"[213 West 42nd Street, New York, NY 10036, Uni...","[{'label': 'display', 'lat': 40.7565, 'lng': -...",40.7565,-73.98788,,10036,NY,Lyric Theatre,
4,"[{'id': '4bf58dd8d48988d164941735', 'name': 'P...",,,,,,,,,49b7ed6df964a52030531fe3,Broadway & 7th Ave,US,New York,United States,btwn 42nd & 47th St,9791,"[Broadway & 7th Ave (btwn 42nd & 47th St), New...",,40.758323,-73.985376,,10036,NY,Times Square,
5,"[{'id': '4bf58dd8d48988d129951735', 'name': 'T...",,,,,,,,,42829c80f964a5206a221fe3,87 E 42nd St,US,New York,United States,btwn Vanderbilt & Park Ave,10148,"[87 E 42nd St (btwn Vanderbilt & Park Ave), Ne...",,40.752672,-73.977077,,10017,NY,Grand Central Terminal,91385129.0
6,"[{'id': '4bf58dd8d48988d180941735', 'name': 'M...",,,,,,,19.0,19 movies,453cacc7f964a520153c1fe3,570 2nd Ave,US,New York,United States,btwn 30th & 31st St,11187,"[570 2nd Ave (btwn 30th & 31st St), New York, ...",,40.742972,-73.977192,,10016,NY,AMC Loews Kips Bay 15,
7,"[{'id': '4bf58dd8d48988d11e941735', 'name': 'C...",672586.0,/delivery_provider_seamless_20180129.png,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",seamless,https://www.seamless.com/menu/tanner-smiths-20...,,,54f276c5498e7a6fbeb24115,204 W 55th St,US,New York,United States,btwn 7th Ave & Broadway,9037,"[204 W 55th St (btwn 7th Ave & Broadway), New ...","[{'label': 'display', 'lat': 40.76448552196337...",40.764486,-73.981652,,10019,NY,Tanner Smiths,
8,"[{'id': '4bf58dd8d48988d17f941735', 'name': 'M...",,,,,,,16.0,16 movies,45b893e3f964a520cf411fe3,312 W 34th St,US,New York,United States,btwn 8th & 9th Ave,10691,"[312 W 34th St (btwn 8th & 9th Ave), New York,...",,40.752429,-73.994266,,10001,NY,AMC Loews 34th Street 14,
9,"[{'id': '4bf58dd8d48988d136941735', 'name': 'O...",,,,,,,,,48e480eef964a52022521fe3,70 Lincoln Center Plz,US,New York,United States,at Columbus Ave & W 64th St,8291,[70 Lincoln Center Plz (at Columbus Ave & W 64...,"[{'label': 'display', 'lat': 40.77274188001071...",40.772742,-73.984401,,10023,NY,The Metropolitan Opera (Metropolitan Opera),35504286.0


<b>Let's remove all the venues where the city is not among the 5 boroughs in New York City</b>

Since the radius provided is 32000 miles, it is returning some venues from the neighboring cities. The 'drop' function is used to drop all those rows where the borough name doesn't belong to the New york city.
The resulting dataset is stored in trending_venues

In [8]:
trending_venues = t_venues.drop(t_venues[(t_venues['location.city'] != 'New York') & \
                                         (t_venues['location.city'] != 'Bronx') & \
                                         (t_venues['location.city'] != 'Brooklyn') & \
                                         (t_venues['location.city'] != 'Queens') & \
                                         (t_venues['location.city'] != 'Manhattan') & \
                                         (t_venues['location.city'] != 'Staten Island') \
                                        ].index)
trending_venues

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,events.count,events.summary,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,venuePage.id
1,"[{'id': '4bf58dd8d48988d1e5931735', 'name': 'M...",,,,,,,,,584f1224a55db06c5d0b0d17,319 Frost St,US,Brooklyn,United States,at Debevoise St,13413,"[319 Frost St (at Debevoise St), Brooklyn, NY ...","[{'label': 'display', 'lat': 40.71927698026867...",40.719277,-73.938823,,11222,NY,Brooklyn Steel,
2,"[{'id': '4bf58dd8d48988d17f941735', 'name': 'M...",,,,,,,30.0,30 movies,40afe980f964a5203bf31ee3,234 W 42nd St,US,New York,United States,btwn 7th & 8th Ave,10064,"[234 W 42nd St (btwn 7th & 8th Ave), New York,...",,40.756823,-73.98902,,10036,NY,AMC Empire 25,
3,"[{'id': '4bf58dd8d48988d137941735', 'name': 'T...",,,,,,,,,50de3d05e4b0b7819ae447ec,213 West 42nd Street,US,New York,United States,,10060,"[213 West 42nd Street, New York, NY 10036, Uni...","[{'label': 'display', 'lat': 40.7565, 'lng': -...",40.7565,-73.98788,,10036,NY,Lyric Theatre,
4,"[{'id': '4bf58dd8d48988d164941735', 'name': 'P...",,,,,,,,,49b7ed6df964a52030531fe3,Broadway & 7th Ave,US,New York,United States,btwn 42nd & 47th St,9791,"[Broadway & 7th Ave (btwn 42nd & 47th St), New...",,40.758323,-73.985376,,10036,NY,Times Square,
5,"[{'id': '4bf58dd8d48988d129951735', 'name': 'T...",,,,,,,,,42829c80f964a5206a221fe3,87 E 42nd St,US,New York,United States,btwn Vanderbilt & Park Ave,10148,"[87 E 42nd St (btwn Vanderbilt & Park Ave), Ne...",,40.752672,-73.977077,,10017,NY,Grand Central Terminal,91385129.0
6,"[{'id': '4bf58dd8d48988d180941735', 'name': 'M...",,,,,,,19.0,19 movies,453cacc7f964a520153c1fe3,570 2nd Ave,US,New York,United States,btwn 30th & 31st St,11187,"[570 2nd Ave (btwn 30th & 31st St), New York, ...",,40.742972,-73.977192,,10016,NY,AMC Loews Kips Bay 15,
7,"[{'id': '4bf58dd8d48988d11e941735', 'name': 'C...",672586.0,/delivery_provider_seamless_20180129.png,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",seamless,https://www.seamless.com/menu/tanner-smiths-20...,,,54f276c5498e7a6fbeb24115,204 W 55th St,US,New York,United States,btwn 7th Ave & Broadway,9037,"[204 W 55th St (btwn 7th Ave & Broadway), New ...","[{'label': 'display', 'lat': 40.76448552196337...",40.764486,-73.981652,,10019,NY,Tanner Smiths,
8,"[{'id': '4bf58dd8d48988d17f941735', 'name': 'M...",,,,,,,16.0,16 movies,45b893e3f964a520cf411fe3,312 W 34th St,US,New York,United States,btwn 8th & 9th Ave,10691,"[312 W 34th St (btwn 8th & 9th Ave), New York,...",,40.752429,-73.994266,,10001,NY,AMC Loews 34th Street 14,
9,"[{'id': '4bf58dd8d48988d136941735', 'name': 'O...",,,,,,,,,48e480eef964a52022521fe3,70 Lincoln Center Plz,US,New York,United States,at Columbus Ave & W 64th St,8291,[70 Lincoln Center Plz (at Columbus Ave & W 64...,"[{'label': 'display', 'lat': 40.77274188001071...",40.772742,-73.984401,,10023,NY,The Metropolitan Opera (Metropolitan Opera),35504286.0
10,"[{'id': '4bf58dd8d48988d11e941735', 'name': 'C...",,,,,,,,,4dfd49c6813092a26e4eae3f,196 5th Ave,US,Brooklyn,United States,btwn Union St & Berkeley Pl,18428,"[196 5th Ave (btwn Union St & Berkeley Pl), Br...","[{'label': 'display', 'lat': 40.67683673413956...",40.676837,-73.980225,,11217,NY,Blueprint,97630057.0


Although, we do not need the category data for the venues in this project, I'm just getting the categories for further scope improvement.

In [9]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now, let's find the venues which are trending currently

In [10]:
# filter columns
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng', 'location.postalCode']
trending_venues = trending_venues.loc[:, filtered_columns]

# filter the category for each row
trending_venues['categories'] = trending_venues.apply(get_category_type, axis=1)

# clean columns
trending_venues.columns = [col.split(".")[-1] for col in trending_venues.columns]

trending_venues

Unnamed: 0,name,categories,lat,lng,postalCode
1,Brooklyn Steel,Music Venue,40.719277,-73.938823,11222
2,AMC Empire 25,Movie Theater,40.756823,-73.98902,10036
3,Lyric Theatre,Theater,40.7565,-73.98788,10036
4,Times Square,Plaza,40.758323,-73.985376,10036
5,Grand Central Terminal,Train Station,40.752672,-73.977077,10017
6,AMC Loews Kips Bay 15,Multiplex,40.742972,-73.977192,10016
7,Tanner Smiths,Cocktail Bar,40.764486,-73.981652,10019
8,AMC Loews 34th Street 14,Movie Theater,40.752429,-73.994266,10001
9,The Metropolitan Opera (Metropolitan Opera),Opera House,40.772742,-73.984401,10023
10,Blueprint,Cocktail Bar,40.676837,-73.980225,11217


Let's rename the 'postalCode' to 'PostalCode' so that it matches with the column name in newyork_data dataframe

In [11]:
#Let's rename postalCode to PostalCode and convert it to int64 datatype
trending_venues = trending_venues.rename(columns ={"postalCode": "PostalCode"})
trending_venues['PostalCode'] = pd.to_numeric(trending_venues['PostalCode'])
trending_venues.dtypes

name           object
categories     object
lat           float64
lng           float64
PostalCode      int64
dtype: object

In [12]:
#Let's merge trending_venues with newyork_data to get the neighborhood names for venues
newyork_venues = trending_venues.merge(newyork_data, on='PostalCode', how='left')

#Remove the values having NAN 
newyork_venues = newyork_venues.drop(newyork_venues[newyork_venues['Borough'].isnull()].index)
newyork_venues

Unnamed: 0,name,categories,lat,lng,PostalCode,Borough,Neighborhood
0,Brooklyn Steel,Music Venue,40.719277,-73.938823,11222,Brooklyn,Greenpoint
1,AMC Empire 25,Movie Theater,40.756823,-73.98902,10036,Manhattan,Chelsea and Clinton
2,Lyric Theatre,Theater,40.7565,-73.98788,10036,Manhattan,Chelsea and Clinton
3,Times Square,Plaza,40.758323,-73.985376,10036,Manhattan,Chelsea and Clinton
4,Grand Central Terminal,Train Station,40.752672,-73.977077,10017,Manhattan,Gramercy Park and Murray Hill
5,AMC Loews Kips Bay 15,Multiplex,40.742972,-73.977192,10016,Manhattan,Gramercy Park and Murray Hill
6,Tanner Smiths,Cocktail Bar,40.764486,-73.981652,10019,Manhattan,Chelsea and Clinton
7,AMC Loews 34th Street 14,Movie Theater,40.752429,-73.994266,10001,Manhattan,Chelsea and Clinton
8,The Metropolitan Opera (Metropolitan Opera),Opera House,40.772742,-73.984401,10023,Manhattan,Upper West Side
9,Blueprint,Cocktail Bar,40.676837,-73.980225,11217,Brooklyn,Northwest Brooklyn


<b>Let's create a map of the New York city with trending venues superimposed on it</b>

The map is generated using the Folium package. This done to get a better understanding of the distribution of the trending venues across New York city in Real-Time.

In [16]:
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for name, lat, lng, borough, neighborhood in zip(newyork_venues['name'],newyork_venues['lat'], newyork_venues['lng'], newyork_venues['Borough'], newyork_venues['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    

map_newyork

<h3>2. Analyzing the Neighborhood using One hot encoding</h3>

One hot encoding is done so that we can use the K means algorithm on the dataset.

In [17]:
# one hot encoding
newyork_onehot = pd.get_dummies(newyork_venues[['Neighborhood']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
newyork_onehot['name'] = newyork_venues['name'] 

# move neighborhood column to the first column
fixed_columns = [newyork_onehot.columns[-1]] + list(newyork_onehot.columns[:-1])
newyork_onehot = newyork_onehot[fixed_columns]

newyork_onehot

Unnamed: 0,name,Chelsea and Clinton,Gramercy Park and Murray Hill,Greenpoint,Greenwich Village and Soho,Lower East Side,Northwest Brooklyn,Upper East Side,Upper West Side
0,Brooklyn Steel,0,0,1,0,0,0,0,0
1,AMC Empire 25,1,0,0,0,0,0,0,0
2,Lyric Theatre,1,0,0,0,0,0,0,0
3,Times Square,1,0,0,0,0,0,0,0
4,Grand Central Terminal,0,1,0,0,0,0,0,0
5,AMC Loews Kips Bay 15,0,1,0,0,0,0,0,0
6,Tanner Smiths,1,0,0,0,0,0,0,0
7,AMC Loews 34th Street 14,1,0,0,0,0,0,0,0
8,The Metropolitan Opera (Metropolitan Opera),0,0,0,0,0,0,0,1
9,Blueprint,0,0,0,0,0,1,0,0


<h3>3. Cluster the Trending venues</h3>

Run the k-means to cluster the trending venues across New york city into clusters. Since the data is in Real-Time, we will decide the number of clusters based on the number of rows present in newyork_onehot

In [18]:
# set number of clusters
kclusters = 5

newyork_cluster = newyork_onehot.drop('name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(newyork_cluster)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 2, 2, 2, 1, 1, 2, 2, 0, 4, 0, 0, 2, 1, 2, 1, 1, 1, 1, 2, 3, 4,
       2, 2, 3, 4, 3, 2], dtype=int32)

In [19]:
newyork_merged = newyork_venues

# add clustering labels
newyork_merged['Cluster Labels'] = kmeans.labels_

newyork_merged # check the last columns!

Unnamed: 0,name,categories,lat,lng,PostalCode,Borough,Neighborhood,Cluster Labels
0,Brooklyn Steel,Music Venue,40.719277,-73.938823,11222,Brooklyn,Greenpoint,0
1,AMC Empire 25,Movie Theater,40.756823,-73.98902,10036,Manhattan,Chelsea and Clinton,2
2,Lyric Theatre,Theater,40.7565,-73.98788,10036,Manhattan,Chelsea and Clinton,2
3,Times Square,Plaza,40.758323,-73.985376,10036,Manhattan,Chelsea and Clinton,2
4,Grand Central Terminal,Train Station,40.752672,-73.977077,10017,Manhattan,Gramercy Park and Murray Hill,1
5,AMC Loews Kips Bay 15,Multiplex,40.742972,-73.977192,10016,Manhattan,Gramercy Park and Murray Hill,1
6,Tanner Smiths,Cocktail Bar,40.764486,-73.981652,10019,Manhattan,Chelsea and Clinton,2
7,AMC Loews 34th Street 14,Movie Theater,40.752429,-73.994266,10001,Manhattan,Chelsea and Clinton,2
8,The Metropolitan Opera (Metropolitan Opera),Opera House,40.772742,-73.984401,10023,Manhattan,Upper West Side,0
9,Blueprint,Cocktail Bar,40.676837,-73.980225,11217,Brooklyn,Northwest Brooklyn,4


<h3>4. Visualizing the data</h3>

Let's visualize the data by plotting the clustered venues on the map of New York city

In [20]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(newyork_merged['lat'], newyork_merged['lng'], newyork_merged['name'], newyork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<h3>5. Examine the Clusters</h3>

<b>Cluster 1</b>

In [21]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 0, newyork_merged.columns[[0] + list(range(5, newyork_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,name,Borough,Neighborhood,Cluster Labels
0,Brooklyn Steel,Brooklyn,Greenpoint,0
1,The Metropolitan Opera (Metropolitan Opera),Manhattan,Upper West Side,0
2,92nd Street Y,Manhattan,Upper East Side,0
3,Whitney Museum of American Art,Manhattan,Greenwich Village and Soho,0


<b>Cluster 2</b>

In [22]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 1, newyork_merged.columns[[0] + list(range(5, newyork_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,name,Borough,Neighborhood,Cluster Labels
0,Grand Central Terminal,Manhattan,Gramercy Park and Murray Hill,1
1,AMC Loews Kips Bay 15,Manhattan,Gramercy Park and Murray Hill,1
2,Baekjeong NYC (Kang Ho Dong Baekjeong),Manhattan,Gramercy Park and Murray Hill,1
3,The Lobster Club,Manhattan,Gramercy Park and Murray Hill,1
4,Sai Gon Dep,Manhattan,Gramercy Park and Murray Hill,1
5,Eataly,Manhattan,Gramercy Park and Murray Hill,1
6,Nonono,Manhattan,Gramercy Park and Murray Hill,1


<b>Cluster 3</b>

In [23]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 2, newyork_merged.columns[[0] + list(range(5, newyork_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,name,Borough,Neighborhood,Cluster Labels
0,AMC Empire 25,Manhattan,Chelsea and Clinton,2
1,Lyric Theatre,Manhattan,Chelsea and Clinton,2
2,Times Square,Manhattan,Chelsea and Clinton,2
3,Tanner Smiths,Manhattan,Chelsea and Clinton,2
4,AMC Loews 34th Street 14,Manhattan,Chelsea and Clinton,2
5,New York Penn Station,Manhattan,Chelsea and Clinton,2
6,Del Frisco's Double Eagle Steakhouse,Manhattan,Chelsea and Clinton,2
7,John Golden Theatre,Manhattan,Chelsea and Clinton,2
8,Harry Potter And The Cursed Child,Manhattan,Chelsea and Clinton,2
9,Pioneers Bar,Manhattan,Chelsea and Clinton,2


<b>Cluster 4</b>

In [24]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 3, newyork_merged.columns[[0] + list(range(5, newyork_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,name,Borough,Neighborhood,Cluster Labels
0,The Ten Bells,Manhattan,Lower East Side,3
1,Bowery Meat Company,Manhattan,Lower East Side,3
2,99 Favor Taste,Manhattan,Lower East Side,3


<b>Cluster 5</b>

In [25]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 4, newyork_merged.columns[[0] + list(range(5, newyork_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,name,Borough,Neighborhood,Cluster Labels
0,Blueprint,Brooklyn,Northwest Brooklyn,4
1,Circa Brewing Co,Brooklyn,Northwest Brooklyn,4
2,Vekslers,Brooklyn,Northwest Brooklyn,4
