<h2>Predictive Analysis for the ride-hailing service</h2>

This Notebook is used to find the trending venues across New york city and then cluster them for analysis.
First step is to import all the necessary packages.

In [2]:
# Downloading all the dependencies
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.18.1               |             py_0          51 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          84 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0     conda-forge

The following packages will be UPDATED:

    geopy:         1.11.0-py36_0 conda-forge --> 1.18.1-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.18.1         | 51 KB     | ##################################### | 100% 
geographiclib-1.49   | 32 KB     | ##################################### | 100% 
Preparing transaction: done

<h3>1. Analyzing the New York Data set</h3>

Data source for gathering the data and converting it into a pandas dataframe is performed in the 'Newyork_data.csv' file. We read the file here and write the data into newyork_data dataframe.

In [4]:
#Read the Newyork neighborhoods data from a dataframe
newyork_data = pd.read_csv('Newyork_data.csv')
newyork_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,10453,Bronx,Central Bronx
1,10457,Bronx,Central Bronx
2,10460,Bronx,Central Bronx
3,10458,Bronx,Bronx Park and Fordham
4,10467,Bronx,Bronx Park and Fordham


<b>Getting the latitude and longitude values for the Newyork City</b>

Using the geopy package, get the latitude and longitude of the location.

In [5]:
address = 'Newyork, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of New York City are 40.83975585, -73.9414480148711.


<b>Defining the credentials to be used for the Four Square API </b>

In [6]:
CLIENT_ID = '4AEKLZ1UYSPNLB5EX2GGFFPGBWP2CPOOEXKSUTEOGOBAPLG4' # Needs to be changed when replicating
CLIENT_SECRET = 'H2EUNDFBGLAJDJZ5HYRSF3E2LJ3DYIVYFHNMLTTJ3UMQUISQ' # Needs to be changed when replicating
VERSION = '20190119'

In [7]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 32000 # defining the radius of the New York city

# create URL
url = 'https://api.foursquare.com/v2/venues/trending?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)

# display URL
print(url)

https://api.foursquare.com/v2/venues/trending?&client_id=4AEKLZ1UYSPNLB5EX2GGFFPGBWP2CPOOEXKSUTEOGOBAPLG4&client_secret=H2EUNDFBGLAJDJZ5HYRSF3E2LJ3DYIVYFHNMLTTJ3UMQUISQ&v=20190119&ll=40.83975585,-73.9414480148711&radius=32000&limit=100


<b>Let's call the API and store the returned data in results</b>

We use the requests package to get the JSON file for the URL

In [8]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c436da5dd57975fd53fc09b'},
 'response': {'venues': [{'id': '42911d00f964a520f5231fe3',
    'name': 'New York Penn Station',
    'location': {'address': '1 Penn Plz',
     'crossStreet': 'btwn W 31st & W 33rd St',
     'lat': 40.75073286835804,
     'lng': -73.99233168961688,
     'distance': 10797,
     'postalCode': '10001',
     'cc': 'US',
     'neighborhood': 'Chelsea',
     'city': 'New York',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['1 Penn Plz (btwn W 31st & W 33rd St)',
      'New York, NY 10001',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d129951735',
      'name': 'Train Station',
      'pluralName': 'Train Stations',
      'shortName': 'Train Station',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/trainstation_',
       'suffix': '.png'},
      'primary': True}],
    'venuePage': {'id': '85006792'}},
   {'id': '4abb7c09f964a520d18320e3',
    'name': 'F

<b>Let's normalize the json and store the data in t_venues</b>

json_normalize() is used to flatten the json file and present the data in a tabular format

In [9]:
venues = results["response"]['venues']
t_venues = json_normalize(venues)
t_venues

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,events.count,events.summary,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,venuePage.id
0,"[{'id': '4bf58dd8d48988d129951735', 'name': 'T...",,,,,,,,,42911d00f964a520f5231fe3,1 Penn Plz,US,New York,United States,btwn W 31st & W 33rd St,10797,"[1 Penn Plz (btwn W 31st & W 33rd St), New Yor...",,40.750733,-73.992332,Chelsea,10001,NY,New York Penn Station,85006792.0
1,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",,,,,,,,,4abb7c09f964a520d18320e3,148-152 Worth St,US,New York,United States,at Centre St,14874,"[148-152 Worth St (at Centre St), New York, NY...",,40.714509,-74.002919,,10013,NY,Foley Square,
2,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",,,,,,,,,412d2800f964a520df0c1fe3,59th St to 110th St,US,New York,United States,5th Ave to Central Park West,6503,[59th St to 110th St (5th Ave to Central Park ...,"[{'label': 'display', 'lat': 40.78408342593807...",40.784083,-73.964853,,10028,NY,Central Park,
3,"[{'id': '4bf58dd8d48988d136941735', 'name': 'O...",,,,,,,,,48e480eef964a52022521fe3,70 Lincoln Center Plz,US,New York,United States,at Columbus Ave & W 64th St,8291,[70 Lincoln Center Plz (at Columbus Ave & W 64...,"[{'label': 'display', 'lat': 40.77274188001071...",40.772742,-73.984401,,10023,NY,The Metropolitan Opera (Metropolitan Opera),35504286.0
4,"[{'id': '4bf58dd8d48988d1fa941735', 'name': 'F...",,,,,,,,,49eb2940f964a520a8661fe3,1 Grand Army Plz,US,Brooklyn,United States,,18767,"[1 Grand Army Plz, Brooklyn, NY 11238, United ...","[{'label': 'display', 'lat': 40.67252608292926...",40.672526,-73.969673,,11238,NY,Grand Army Plaza Greenmarket,32897581.0
5,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",,,,,,,,,5bf07de3bcbf7a002cd8d9f3,934 3rd Ave,US,Brooklyn,United States,btwn 35th & 36th St,21180,"[934 3rd Ave (btwn 35th & 36th St), Brooklyn, ...","[{'label': 'display', 'lat': 40.65590374239456...",40.655904,-74.00611,,11232,NY,Japan Village,
6,"[{'id': '4bf58dd8d48988d17f941735', 'name': 'M...",,,,,,,22.0,22 movies,5722dcad498e1dc59a10bca0,445 Albee Square West,US,Brooklyn,United States,Fulton St.,16936,"[445 Albee Square West (Fulton St.), Brooklyn,...","[{'label': 'display', 'lat': 40.69101558292192...",40.691016,-73.983686,Downtown Brooklyn,11201,NY,Alamo Drafthouse Cinema - Brooklyn,210169222.0
7,"[{'id': '4bf58dd8d48988d18f941735', 'name': 'A...",,,,,,,,,427c0500f964a52097211fe3,1000 5th Ave,US,New York,United States,btwn E 80th & E 84th St,6933,"[1000 5th Ave (btwn E 80th & E 84th St), New Y...","[{'label': 'display', 'lat': 40.77972902126812...",40.779729,-73.963416,,10028,NY,The Metropolitan Museum of Art (Metropolitan M...,
8,"[{'id': '4bf58dd8d48988d1ed941735', 'name': 'S...",,,,,,,,,58772ec5fb9d897349b74f2c,660 River Rd,US,Edgewater,United States,,3931,"[660 River Rd, Edgewater, NJ 07020, United Sta...","[{'label': 'display', 'lat': 40.81878541641831...",40.818785,-73.979009,,7020,NJ,SoJo Spa Club,
9,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",,,,,,,,,574dc71c498ef19a1c178913,203 East 92nd Street,US,New York,United States,3rd Ave,6419,"[203 East 92nd Street (3rd Ave), New York, NY ...","[{'label': 'display', 'lat': 40.7825, 'lng': -...",40.7825,-73.95058,,10128,NY,Equinox East 92nd Street,


<b>Let's remove all the venues where the city is not among the 5 boroughs in New York City</b>

Since the radius provided is 32000 miles, it is returning some venues from the neighboring cities. The 'drop' function is used to drop all those rows where the borough name doesn't belong to the New york city.
The resulting dataset is stored in trending_venues

In [10]:
trending_venues = t_venues.drop(t_venues[(t_venues['location.city'] != 'New York') & \
                                         (t_venues['location.city'] != 'Bronx') & \
                                         (t_venues['location.city'] != 'Brooklyn') & \
                                         (t_venues['location.city'] != 'Queens') & \
                                         (t_venues['location.city'] != 'Manhattan') & \
                                         (t_venues['location.city'] != 'Staten Island') \
                                        ].index)
trending_venues

Unnamed: 0,categories,delivery.id,delivery.provider.icon.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.name,delivery.url,events.count,events.summary,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,venuePage.id
0,"[{'id': '4bf58dd8d48988d129951735', 'name': 'T...",,,,,,,,,42911d00f964a520f5231fe3,1 Penn Plz,US,New York,United States,btwn W 31st & W 33rd St,10797,"[1 Penn Plz (btwn W 31st & W 33rd St), New Yor...",,40.750733,-73.992332,Chelsea,10001,NY,New York Penn Station,85006792.0
1,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",,,,,,,,,4abb7c09f964a520d18320e3,148-152 Worth St,US,New York,United States,at Centre St,14874,"[148-152 Worth St (at Centre St), New York, NY...",,40.714509,-74.002919,,10013,NY,Foley Square,
2,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",,,,,,,,,412d2800f964a520df0c1fe3,59th St to 110th St,US,New York,United States,5th Ave to Central Park West,6503,[59th St to 110th St (5th Ave to Central Park ...,"[{'label': 'display', 'lat': 40.78408342593807...",40.784083,-73.964853,,10028,NY,Central Park,
3,"[{'id': '4bf58dd8d48988d136941735', 'name': 'O...",,,,,,,,,48e480eef964a52022521fe3,70 Lincoln Center Plz,US,New York,United States,at Columbus Ave & W 64th St,8291,[70 Lincoln Center Plz (at Columbus Ave & W 64...,"[{'label': 'display', 'lat': 40.77274188001071...",40.772742,-73.984401,,10023,NY,The Metropolitan Opera (Metropolitan Opera),35504286.0
4,"[{'id': '4bf58dd8d48988d1fa941735', 'name': 'F...",,,,,,,,,49eb2940f964a520a8661fe3,1 Grand Army Plz,US,Brooklyn,United States,,18767,"[1 Grand Army Plz, Brooklyn, NY 11238, United ...","[{'label': 'display', 'lat': 40.67252608292926...",40.672526,-73.969673,,11238,NY,Grand Army Plaza Greenmarket,32897581.0
5,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",,,,,,,,,5bf07de3bcbf7a002cd8d9f3,934 3rd Ave,US,Brooklyn,United States,btwn 35th & 36th St,21180,"[934 3rd Ave (btwn 35th & 36th St), Brooklyn, ...","[{'label': 'display', 'lat': 40.65590374239456...",40.655904,-74.00611,,11232,NY,Japan Village,
6,"[{'id': '4bf58dd8d48988d17f941735', 'name': 'M...",,,,,,,22.0,22 movies,5722dcad498e1dc59a10bca0,445 Albee Square West,US,Brooklyn,United States,Fulton St.,16936,"[445 Albee Square West (Fulton St.), Brooklyn,...","[{'label': 'display', 'lat': 40.69101558292192...",40.691016,-73.983686,Downtown Brooklyn,11201,NY,Alamo Drafthouse Cinema - Brooklyn,210169222.0
7,"[{'id': '4bf58dd8d48988d18f941735', 'name': 'A...",,,,,,,,,427c0500f964a52097211fe3,1000 5th Ave,US,New York,United States,btwn E 80th & E 84th St,6933,"[1000 5th Ave (btwn E 80th & E 84th St), New Y...","[{'label': 'display', 'lat': 40.77972902126812...",40.779729,-73.963416,,10028,NY,The Metropolitan Museum of Art (Metropolitan M...,
9,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",,,,,,,,,574dc71c498ef19a1c178913,203 East 92nd Street,US,New York,United States,3rd Ave,6419,"[203 East 92nd Street (3rd Ave), New York, NY ...","[{'label': 'display', 'lat': 40.7825, 'lng': -...",40.7825,-73.95058,,10128,NY,Equinox East 92nd Street,
11,"[{'id': '4bf58dd8d48988d129951735', 'name': 'T...",,,,,,,,,42829c80f964a5206a221fe3,87 E 42nd St,US,New York,United States,btwn Vanderbilt & Park Ave,10148,"[87 E 42nd St (btwn Vanderbilt & Park Ave), Ne...",,40.752672,-73.977077,,10017,NY,Grand Central Terminal,91385129.0


Although, we do not need the category data for the venues in this project, I'm just getting the categories for further scope improvement.

In [11]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now, let's find the venues which are trending currently

In [12]:
# filter columns
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng', 'location.postalCode']
trending_venues = trending_venues.loc[:, filtered_columns]

# filter the category for each row
trending_venues['categories'] = trending_venues.apply(get_category_type, axis=1)

# clean columns
trending_venues.columns = [col.split(".")[-1] for col in trending_venues.columns]

trending_venues

Unnamed: 0,name,categories,lat,lng,postalCode
0,New York Penn Station,Train Station,40.750733,-73.992332,10001
1,Foley Square,Park,40.714509,-74.002919,10013
2,Central Park,Park,40.784083,-73.964853,10028
3,The Metropolitan Opera (Metropolitan Opera),Opera House,40.772742,-73.984401,10023
4,Grand Army Plaza Greenmarket,Farmers Market,40.672526,-73.969673,11238
5,Japan Village,Japanese Restaurant,40.655904,-74.00611,11232
6,Alamo Drafthouse Cinema - Brooklyn,Movie Theater,40.691016,-73.983686,11201
7,The Metropolitan Museum of Art (Metropolitan M...,Art Museum,40.779729,-73.963416,10028
9,Equinox East 92nd Street,Gym,40.7825,-73.95058,10128
11,Grand Central Terminal,Train Station,40.752672,-73.977077,10017


Let's rename the 'postalCode' to 'PostalCode' so that it matches with the column name in newyork_data dataframe

In [13]:
#Let's rename postalCode to PostalCode and convert it to int64 datatype
trending_venues = trending_venues.rename(columns ={"postalCode": "PostalCode"})
trending_venues['PostalCode'] = pd.to_numeric(trending_venues['PostalCode'])
trending_venues.dtypes

name           object
categories     object
lat           float64
lng           float64
PostalCode      int64
dtype: object

In [40]:
#Let's merge trending_venues with newyork_data to get the neighborhood names for venues
newyork_venues = trending_venues.merge(newyork_data, on='PostalCode', how='left')

#Remove the values having NAN 
newyork_venues = newyork_venues.drop(newyork_venues[newyork_venues['Borough'].isnull()].index)
newyork_venues

Unnamed: 0,name,categories,lat,lng,PostalCode,Borough,Neighborhood
0,New York Penn Station,Train Station,40.750733,-73.992332,10001,Manhattan,Chelsea and Clinton
1,Foley Square,Park,40.714509,-74.002919,10013,Manhattan,Greenwich Village and Soho
2,Central Park,Park,40.784083,-73.964853,10028,Manhattan,Upper East Side
3,The Metropolitan Opera (Metropolitan Opera),Opera House,40.772742,-73.984401,10023,Manhattan,Upper West Side
4,Grand Army Plaza Greenmarket,Farmers Market,40.672526,-73.969673,11238,Brooklyn,Central Brooklyn
5,Japan Village,Japanese Restaurant,40.655904,-74.00611,11232,Brooklyn,Sunset Park
6,Alamo Drafthouse Cinema - Brooklyn,Movie Theater,40.691016,-73.983686,11201,Brooklyn,Northwest Brooklyn
7,The Metropolitan Museum of Art (Metropolitan M...,Art Museum,40.779729,-73.963416,10028,Manhattan,Upper East Side
8,Equinox East 92nd Street,Gym,40.7825,-73.95058,10128,Manhattan,Upper East Side
9,Grand Central Terminal,Train Station,40.752672,-73.977077,10017,Manhattan,Gramercy Park and Murray Hill


<b>Let's create a map of the New York city with trending venues superimposed on it</b>

The map is generated using the Folium package. This done to get a better understanding of the distribution of the trending venues across New York city in Real-Time.

In [16]:
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for name, lat, lng, borough, neighborhood in zip(newyork_venues['name'],newyork_venues['lat'], newyork_venues['lng'], newyork_venues['Borough'], newyork_venues['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    

map_newyork

<h3>2. Analyzing the Neighborhood using One hot encoding</h3>

One hot encoding is done so that we can use the K means algorithm on the dataset.

In [17]:
# one hot encoding
newyork_onehot = pd.get_dummies(newyork_venues[['Neighborhood']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
newyork_onehot['name'] = newyork_venues['name'] 

# move neighborhood column to the first column
fixed_columns = [newyork_onehot.columns[-1]] + list(newyork_onehot.columns[:-1])
newyork_onehot = newyork_onehot[fixed_columns]

newyork_onehot

Unnamed: 0,name,Bronx Park and Fordham,Central Brooklyn,Chelsea and Clinton,Gramercy Park and Murray Hill,Greenwich Village and Soho,Lower East Side,Lower Manhattan,Northwest Brooklyn,Sunset Park,Upper East Side,Upper West Side
0,New York Penn Station,0,0,1,0,0,0,0,0,0,0,0
1,Foley Square,0,0,0,0,1,0,0,0,0,0,0
2,Central Park,0,0,0,0,0,0,0,0,0,1,0
3,The Metropolitan Opera (Metropolitan Opera),0,0,0,0,0,0,0,0,0,0,1
4,Grand Army Plaza Greenmarket,0,1,0,0,0,0,0,0,0,0,0
5,Japan Village,0,0,0,0,0,0,0,0,1,0,0
6,Alamo Drafthouse Cinema - Brooklyn,0,0,0,0,0,0,0,1,0,0,0
7,The Metropolitan Museum of Art (Metropolitan M...,0,0,0,0,0,0,0,0,0,1,0
8,Equinox East 92nd Street,0,0,0,0,0,0,0,0,0,1,0
9,Grand Central Terminal,0,0,0,1,0,0,0,0,0,0,0


<h3>3. Cluster the Trending venues</h3>

Run the k-means to cluster the trending venues across New york city into clusters. Since the data is in Real-Time, we will decide the number of clusters based on the number of rows present in newyork_onehot

In [23]:
# set number of clusters
kclusters = 4

newyork_cluster = newyork_onehot.drop('name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(newyork_cluster)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 0, 3, 2, 0, 0, 0, 3, 3, 0, 2, 1, 1, 2, 1, 3, 0, 1, 0, 1, 0, 0,
       0, 0, 0, 3, 0], dtype=int32)

In [24]:
newyork_merged = newyork_venues

# add clustering labels
newyork_merged['Cluster Labels'] = kmeans.labels_

newyork_merged # check the last columns!

Unnamed: 0,name,categories,lat,lng,PostalCode,Borough,Neighborhood,Cluster Labels
0,New York Penn Station,Train Station,40.750733,-73.992332,10001,Manhattan,Chelsea and Clinton,1
1,Foley Square,Park,40.714509,-74.002919,10013,Manhattan,Greenwich Village and Soho,0
2,Central Park,Park,40.784083,-73.964853,10028,Manhattan,Upper East Side,3
3,The Metropolitan Opera (Metropolitan Opera),Opera House,40.772742,-73.984401,10023,Manhattan,Upper West Side,2
4,Grand Army Plaza Greenmarket,Farmers Market,40.672526,-73.969673,11238,Brooklyn,Central Brooklyn,0
5,Japan Village,Japanese Restaurant,40.655904,-74.00611,11232,Brooklyn,Sunset Park,0
6,Alamo Drafthouse Cinema - Brooklyn,Movie Theater,40.691016,-73.983686,11201,Brooklyn,Northwest Brooklyn,0
7,The Metropolitan Museum of Art (Metropolitan M...,Art Museum,40.779729,-73.963416,10028,Manhattan,Upper East Side,3
8,Equinox East 92nd Street,Gym,40.7825,-73.95058,10128,Manhattan,Upper East Side,3
9,Grand Central Terminal,Train Station,40.752672,-73.977077,10017,Manhattan,Gramercy Park and Murray Hill,0


<h3>4. Visualizing the data</h3>

Let's visualize the data by plotting the clustered venues on the map of New York city

In [39]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)*2))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(newyork_merged['lat'], newyork_merged['lng'], newyork_merged['name'], newyork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<h3>5. Examine the Clusters</h3>

<b>Cluster 1</b>

In [35]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 0, newyork_merged.columns[[0] + list(range(5, newyork_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,name,Borough,Neighborhood,Cluster Labels
0,Foley Square,Manhattan,Greenwich Village and Soho,0
1,Grand Army Plaza Greenmarket,Brooklyn,Central Brooklyn,0
2,Japan Village,Brooklyn,Sunset Park,0
3,Alamo Drafthouse Cinema - Brooklyn,Brooklyn,Northwest Brooklyn,0
4,Grand Central Terminal,Manhattan,Gramercy Park and Murray Hill,0
5,Equinox Bond Street,Manhattan,Greenwich Village and Soho,0
6,Equinox Flatiron,Manhattan,Lower East Side,0
7,Katz's Delicatessen,Manhattan,Lower East Side,0
8,The New York Botanical Garden,Bronx,Bronx Park and Fordham,0
9,Craft,Manhattan,Lower East Side,0


<b>Cluster 2</b>

In [36]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 1, newyork_merged.columns[[0] + list(range(5, newyork_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,name,Borough,Neighborhood,Cluster Labels
0,New York Penn Station,Manhattan,Chelsea and Clinton,1
1,MTA Subway - 42nd St/Times Square/Port Authori...,Manhattan,Chelsea and Clinton,1
2,Macy's,Manhattan,Chelsea and Clinton,1
3,Lyric Theatre,Manhattan,Chelsea and Clinton,1
4,Manhattan Neighborhood Network,Manhattan,Chelsea and Clinton,1
5,Times Square,Manhattan,Chelsea and Clinton,1


<b>Cluster 3</b>

In [37]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 2, newyork_merged.columns[[0] + list(range(5, newyork_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,name,Borough,Neighborhood,Cluster Labels
0,The Metropolitan Opera (Metropolitan Opera),Manhattan,Upper West Side,2
1,New York Women's March 2019,Manhattan,Upper West Side,2
2,American Museum of Natural History,Manhattan,Upper West Side,2


<b>Cluster 4</b>

In [38]:
newyork_merged.loc[newyork_merged['Cluster Labels'] == 3, newyork_merged.columns[[0] + list(range(5, newyork_merged.shape[1]))]].reset_index(drop=True)

Unnamed: 0,name,Borough,Neighborhood,Cluster Labels
0,Central Park,Manhattan,Upper East Side,3
1,The Metropolitan Museum of Art (Metropolitan M...,Manhattan,Upper East Side,3
2,Equinox East 92nd Street,Manhattan,Upper East Side,3
3,Solomon R Guggenheim Museum,Manhattan,Upper East Side,3
4,EJ's Luncheonette,Manhattan,Upper East Side,3
