<a href="https://colab.research.google.com/github/earldennison/ibm_coursera_capstone/blob/master/Segmenting_and_Clustering_Neighborhoods_in_Toronto.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Standard Imports
Before we procede we will import the standard libraries we are going to use for manipulating the data

In [0]:
import numpy as np
import pandas as pd

## Getting The Data
Unlike what the coursera instruction says, it is fairly straight forward to load tabular data into a data frame if it is not more than one page long, one does not need beautifulsoup for this, we can easily get data through the ```pd.read_html``` method this method retruns a list of all the tables inside the location of the url you have put in as an arguement. 

In [0]:
pcodes = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

## Creating the Data Frame

I have assigned the data that will be returned to ```pcodes``` short for postal codes. Let us check what type of data ```pcodes``` has

In [3]:
type(pcodes)

list

The type of data is actually a list? But what we want is a data frame right? Lets investigate further.

In [4]:
len(pcodes)

3

Three? its a list with 3 elements in it? Lets check it out

In [5]:
pcodes

[    Postcode  ...                                      Neighbourhood
 0        M1A  ...                                       Not assigned
 1        M2A  ...                                       Not assigned
 2        M3A  ...                                          Parkwoods
 3        M4A  ...                                   Victoria Village
 4        M5A  ...                                       Harbourfront
 5        M5A  ...                                        Regent Park
 6        M6A  ...                                   Lawrence Heights
 7        M6A  ...                                     Lawrence Manor
 8        M7A  ...                                       Not assigned
 9        M8A  ...                                       Not assigned
 10       M9A  ...                                   Islington Avenue
 11       M1B  ...                                              Rouge
 12       M1B  ...                                            Malvern
 13       M2B  ...  

```pcodes``` actually has three tables in it!  That is weird . . . Lets check out what type each individual element is

In [6]:
for item in pcodes:
  print(type(item))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>


Bingo! Each element is actually a data frame, with this it will be simple to just get the dataframe from the index, now what we want is the first so the index would be ```0```. Lets double check the content just to be sure that it is what we want

In [7]:
pcodes[0].head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


Since we don't need the other tables lets just assign the index of 0  `pcodes` to `pcodes` itself clearing out the outher tables

In [0]:
pcodes = pcodes[0]

## Cleaning the Data
Now we have lots of bad data inside the data frame due to the Not assigned values. For this we will just follow the instructions of the coursera guideline. But first lets check the the dataframe as the dilligent datascientists that we are.



In [9]:
pcodes.columns

Index(['Postcode', 'Borough', 'Neighbourhood'], dtype='object')

Now we have the columns we can manipulate the data better. Lets check out the coursera guidelines for transforming the data.
> - The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
- Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
- If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
- Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
- In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

We already have the three columns. So that's done. It seems to have to drop the cells that have no borrough assigned. Okay lets do that. Hmm lets try to replace ```Not assigned``` with ```NaN``` so that we can easily drop the rows

In [0]:
pcodes.replace('Not assigned',np.NaN, inplace = True)

Now we drop the NaN values using ```dropna()```, however, we just want to drop the rows where the borroughs arrent assigned, thankfully ```dropna()``` has ```subset``` parameter which can filter out the columns that we want ```dropna``` to be applied to

In [0]:
pcodes.dropna(subset=['Borough'], axis=0,inplace=True)

In [12]:
pcodes.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


Looks like it was a success M1A and M2A for which Borrough values were Not assigned  have been dropped

Lets try to comply with the other guidlines coursera has given:

>- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
- If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

Uh oh,  if we do the former first rather than the latter it would give us a headache since executing the first guidline would create a string value that is compressed inside the `Neighborhood` series, it would take much more complex processing if we get the string values of neighborhood. So I'll just do the latter first. I'm going to go through the data frame and if the `Neighborhood` is not assigned I will just replace the value in `Neighborhood` with the `Borrough`.


Since we change `Not assigned` to `Nan` there is an easy way  in pandas for this sort of transformation. We are going to use  the `fillna` method

In [13]:
pcodes.fillna(method='ffill', axis = 1, inplace=True)
pcodes.loc[8].to_frame()

Unnamed: 0,8
Postcode,M7A
Borough,Queen's Park
Neighbourhood,Queen's Park


I think a little bit of discussion is in order. As you see from the code above position 8 has been altered and automatically changed to its borough. The method parameter `ffill` means forward fill, so what happens is that once pandas sees that the next line is a `NaN` value it automatically uses the previous value to fill it. the `axis` parameter is the guidline that pandas uses for what scan it uses in order to fill it. Neat huh?

Next we are going to put all the `Neighborhood`s in the same row if they have the same post code.

In [0]:
pcodes = pcodes.groupby(
    by=['Postcode','Borough'])['Neighbourhood'].apply(','.join).reset_index()

In [15]:
pcodes.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


So what happened? Well its basically a one liner but it is quite complex, we must first delve into what the ```groupby``` method does, the groupby method returns a `groupby` object, not exactly a dataframe, however, this groupby object can be iterated through, we can still access the `['Neighbourhood']` column, if we use some sort of function on it then maybe we can get a datarame. We used the `join` method on the `['Neighbourhood']` column to aggregate it. And we used ther `reset_index()` method to reset it else it would treat the `['Neighbourhood']` column as the index



In [16]:
pcodes.shape

(103, 3)

## Getting the Longitudes and Latitudes
Now that we have cleaned the data we must be able to get the latitude and longitude information using the the Postcode so we will import geocoder as per the instruction of coursera

In [17]:
!pip install geocoder #<- uncomment if not installed
import geocoder



However the method `geocoder.google` doesn't seem to be working so we will use `geocoder.arcgis` instead, my thanks to [Asim Islam](https://www.coursera.org/learn/applied-data-science-capstone/profiles/8d41d6357cf7033b900aa7daaafdf2c1) for pointing me in the right direction. Lets test it out if it will indeed get the longitud and latitude

In [18]:
g= geocoder.arcgis('M5A, Toronto, Ontario')
g.latlng

[43.65512000000007, -79.36263979699999]

Cool, now that we have this we make a function to make our lives easier, the function returns the longitude and latitude in list format, the parameters `city` and `state` already have default arguements, which are Toronto and Ontario respectively.

In [0]:
def get_geocode(postal_code, city='Toronto', state='Ontario'):
  return geocoder.arcgis(f'{postal_code}, {city}, {state}').latlng

Now we create the new columns by declaring them, the `zip` functions creates a tuple on the return value of the `get_geocode` once applied, it will the be separated and passed into the new columns

In [0]:
pcodes['Latitude'], pcodes['Longitude'] = zip(*pcodes['Postcode'].apply(get_geocode))

Lets check our Data Frame

In [21]:
pcodes.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.811525,-79.195517
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.78573,-79.15875
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.76569,-79.175256
3,M1G,Scarborough,Woburn,43.768359,-79.21759
4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944


Looks like it worked! On to the next Part

## Filtering out Boroughs only in Toronto
For this we will only get Boroughs that are in Tornto

Using a method to format the contents of `[Borough]` we create a mask the filters only those that have Toronto in the string

In [0]:
toronto = pcodes[pcodes['Borough'].str.contains('Toronto')].reset_index(drop=True)

In [23]:
toronto

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676845,-79.295225
1,M4K,East Toronto,"The Danforth West,Riverdale",43.683262,-79.35512
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.667965,-79.314673
3,M4M,East Toronto,Studio District,43.662766,-79.33483
4,M4N,Central Toronto,Lawrence Park,43.72816,-79.387085
5,M4P,Central Toronto,Davisville North,43.712815,-79.388526
6,M4R,Central Toronto,North Toronto West,43.714523,-79.40696
7,M4S,Central Toronto,Davisville,43.703395,-79.385964
8,M4T,Central Toronto,"Moore Park,Summerhill East",43.690655,-79.383561
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686083,-79.402335


We first import folium and Nominatim. Nominatim is a method for getting the geolocation of a certain place

# Visualizing The Markers

In [0]:
import folium
import requests
from geopy.geocoders import Nominatim

Lets get the latitude and longitude of Toronto for our map, using Nominatim



In [25]:
toronto_loc = Nominatim(user_agent='explorer').geocode('Toronto, Ontario')

print(toronto_loc.latitude)
print(toronto_loc.longitude)

43.653963
-79.387207


In [26]:
toronto_map = folium.Map((toronto_loc.latitude+.02, toronto_loc.longitude), zoom_start =12)
# I just added the .02 since the original latitude didn't pan well with the map
for lon,lat,borough, neigh in zip(toronto['Longitude'], toronto['Latitude'],toronto['Postcode'],toronto['Neighbourhood'] ):
  label = f"{borough}, {neigh}"
  label = folium.Popup(label, parse_html=True)
  folium.CircleMarker([lat,lon],
                      popup= label,
                      radius = 6,
                      color='red',
                      fill=True,
                      fill_color='#3186cc',
                     ).add_to(toronto_map)
toronto_map


## Exploring Toronto

First we need to define the foursquare api credentials

In [0]:
CLIENT_ID =  # your Foursquare ID
CLIENT_SECRET =  # your Foursquare Secret
VERSION =  # Foursquare API version


Once we have done that we need to define a function that will get the venues for us to further analyze the data. The function takes a data frame as an arguement and also returns a data frame with all the needed elements we need

In [0]:
def get_venues(df):
  venue_list=[]
  LIMIT =100
  radius = 500
#   neighborhood_latitude = toronto.loc[0,'Latitude']
#   neighborhood_longitude = toronto.loc[0,'Longitude']
  names = df['Postcode']
  latitudes = df['Latitude']
  longitudes = df['Longitude']
  for name, lat, lng in zip(names,latitudes,longitudes):
    url = f'https://api.foursquare.com/v2/venues/explore?&client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&v={VERSION}&ll={lat},{lng}&radius={radius}&limit={LIMIT}'
    
    results = requests.get(url).json()['response']['groups'][0]['items']
    for result in results:
      venue = result['venue']['name'],
      category = result['venue']['categories'][0]['name'],
      v_lat = result['venue']['location']['lat'], 
      v_lng = result['venue']['location']['lng'], 
      row = (name, lat, lng, venue[0], category[0], v_lat[0], v_lng[0])
      
      venue_list.append(row)
  
  return pd.DataFrame(venue_list, columns=['Postcode','Latitude','Longitude','Venue','Category','Venue Latitude','Venue Longitude'])

Once we have done that  we assign the data into a dataframe

In [0]:
venue_df = get_venues(toronto)

Lets have a look at the data frame we have created

In [30]:
venue_df

Unnamed: 0,Postcode,Latitude,Longitude,Venue,Category,Venue Latitude,Venue Longitude
0,M4E,43.676845,-79.295225,Glen Manor Ravine,Trail,43.676821,-79.293942
1,M4E,43.676845,-79.295225,The Big Carrot Natural Food Market,Health Food Store,43.678879,-79.297734
2,M4E,43.676845,-79.295225,Grover Pub and Grub,Pub,43.679181,-79.297215
3,M4E,43.676845,-79.295225,Upper Beaches,Neighborhood,43.680563,-79.292869
4,M4K,43.683262,-79.355120,Dairy Queen,Fast Food Restaurant,43.684223,-79.357062
5,M4K,43.683262,-79.355120,Dollarama,Discount Store,43.686300,-79.355893
6,M4K,43.683262,-79.355120,Sobeys Urban Fresh,Grocery Store,43.684690,-79.356350
7,M4K,43.683262,-79.355120,Charles Sauriol Parkette,Park,43.685270,-79.356588
8,M4K,43.683262,-79.355120,TTC Bus #8 Broadview,Bus Line,43.687101,-79.355078
9,M4L,43.667965,-79.314673,System Fitness,Gym,43.667171,-79.312733


Now lets check the number of unique categories in the data frame

In [31]:
venue_df['Category'].nunique()

211

Lets add the venues to the map to see what it looks like

In [32]:

for venue, category, lat, lng, in zip(venue_df['Venue'],
                                      venue_df['Category'],
                                      venue_df['Venue Latitude'],
                                      venue_df['Venue Longitude']):

  folium.CircleMarker([lat,lng],

                      radius = 3,
                      color='blue',
                      fill=True,
                      fill_color='#3186cc',
                     ).add_to(toronto_map)

toronto_map

With this information let us get a onehot encoded version of the `venue_df`

In [0]:
one_hot_venues = pd.get_dummies(venue_df['Category'],prefix="", prefix_sep="")
one_hot_venues['Postcode'] = venue_df['Postcode']
one_hot_venues=one_hot_venues[[one_hot_venues.columns[-1]] + list(one_hot_venues.columns[:-1])] 

Lets group the data by the number of a particular venue in a given area 

In [0]:
venues_grouped = one_hot_venues.groupby('Postcode').mean().reset_index()

Now that we have one hot encoded our data frame and grouped it, lets find out the most common venues per `Postcode` the following code creates an empty data frame and appends the needed data in the loop where the columns are the `Postcode` and the venues by rank.

In [35]:
by_rank_df = pd.DataFrame(columns=['Postcode','1st','2nd','3rd','4th','5th','6th','7th','8th','9th','10th'])
grouped_venues = one_hot_venues.groupby('Postcode').mean().T
for i,col in enumerate(grouped_venues.columns):
  print(col, '\n' ,grouped_venues[col].sort_values(ascending=False).to_frame().head(), '\n')
  temp = grouped_venues[col].sort_values(ascending=False).to_frame().reset_index().head(10).T
  temp.columns = ['1st','2nd','3rd','4th','5th','6th','7th','8th','9th','10th']
  row = [col]+temp.loc['index'].to_list()
  by_rank_df.loc[i] = row


M4E 
                     M4E
Health Food Store  0.25
Pub                0.25
Trail              0.25
Neighborhood       0.25
Yoga Studio        0.00 

M4K 
                       M4K
Park                  0.2
Bus Line              0.2
Discount Store        0.2
Grocery Store         0.2
Fast Food Restaurant  0.2 

M4L 
                  M4L
Park            0.10
Sandwich Place  0.10
Liquor Store    0.05
Pub             0.05
Burrito Place   0.05 

M4M 
                           M4M
Bakery               0.058824
Diner                0.058824
Pizza Place          0.058824
Italian Restaurant   0.058824
American Restaurant  0.039216 

M4N 
                    M4N
Bus Line           0.5
Swim School        0.5
Yoga Studio        0.0
Elementary School  0.0
Flea Market        0.0 

M4P 
                         M4P
Food & Drink Shop  0.166667
Clothing Store     0.166667
Hotel              0.166667
Breakfast Spot     0.166667
Park               0.166667 

M4R 
                               M4R


Lets check the newly created data frame

In [36]:
by_rank_df

Unnamed: 0,Postcode,1st,2nd,3rd,4th,5th,6th,7th,8th,9th,10th
0,M4E,Health Food Store,Pub,Trail,Neighborhood,Yoga Studio,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
1,M4K,Park,Bus Line,Discount Store,Grocery Store,Fast Food Restaurant,Yoga Studio,Elementary School,Flea Market,Fish Market,Fish & Chips Shop
2,M4L,Park,Sandwich Place,Liquor Store,Pub,Burrito Place,Burger Joint,Food & Drink Shop,Steakhouse,Sushi Restaurant,Board Shop
3,M4M,Bakery,Diner,Pizza Place,Italian Restaurant,American Restaurant,Café,Gastropub,Arts & Crafts Store,Bar,Brewery
4,M4N,Bus Line,Swim School,Yoga Studio,Elementary School,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
5,M4P,Food & Drink Shop,Clothing Store,Hotel,Breakfast Spot,Park,Gym,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
6,M4R,Playground,Gym Pool,Park,Garden,Eastern European Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
7,M4S,Dessert Shop,Coffee Shop,Italian Restaurant,Café,Sandwich Place,Pizza Place,Thai Restaurant,Indian Restaurant,Seafood Restaurant,Chinese Restaurant
8,M4T,Tennis Court,Playground,Park,Gym,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
9,M4V,Light Rail Station,Coffee Shop,Liquor Store,Supermarket,Yoga Studio,Ethiopian Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant


Now that we have this information lets try to cluster and see what happens

In [0]:
from sklearn.cluster import KMeans

In [38]:
km = KMeans(n_clusters= 5,
            max_iter = 1000, random_state = 0).fit(venues_grouped.drop('Postcode', axis = 1))
km.labels_

array([0, 0, 0, 0, 2, 0, 3, 0, 3, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], dtype=int32)

Now we insert the cluster labels to the `by_rank_df` dataframe

In [0]:
by_rank_df.insert(1, 'Cluster Label', km.labels_)

Checking the data we have created, we join this data frame with the toronto data frame

In [40]:
toronto.join(by_rank_df.set_index('Postcode'), on='Postcode')

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Label,1st,2nd,3rd,4th,5th,6th,7th,8th,9th,10th
0,M4E,East Toronto,The Beaches,43.676845,-79.295225,0.0,Health Food Store,Pub,Trail,Neighborhood,Yoga Studio,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
1,M4K,East Toronto,"The Danforth West,Riverdale",43.683262,-79.35512,0.0,Park,Bus Line,Discount Store,Grocery Store,Fast Food Restaurant,Yoga Studio,Elementary School,Flea Market,Fish Market,Fish & Chips Shop
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.667965,-79.314673,0.0,Park,Sandwich Place,Liquor Store,Pub,Burrito Place,Burger Joint,Food & Drink Shop,Steakhouse,Sushi Restaurant,Board Shop
3,M4M,East Toronto,Studio District,43.662766,-79.33483,0.0,Bakery,Diner,Pizza Place,Italian Restaurant,American Restaurant,Café,Gastropub,Arts & Crafts Store,Bar,Brewery
4,M4N,Central Toronto,Lawrence Park,43.72816,-79.387085,2.0,Bus Line,Swim School,Yoga Studio,Elementary School,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
5,M4P,Central Toronto,Davisville North,43.712815,-79.388526,0.0,Food & Drink Shop,Clothing Store,Hotel,Breakfast Spot,Park,Gym,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
6,M4R,Central Toronto,North Toronto West,43.714523,-79.40696,3.0,Playground,Gym Pool,Park,Garden,Eastern European Restaurant,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
7,M4S,Central Toronto,Davisville,43.703395,-79.385964,0.0,Dessert Shop,Coffee Shop,Italian Restaurant,Café,Sandwich Place,Pizza Place,Thai Restaurant,Indian Restaurant,Seafood Restaurant,Chinese Restaurant
8,M4T,Central Toronto,"Moore Park,Summerhill East",43.690655,-79.383561,3.0,Tennis Court,Playground,Park,Gym,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686083,-79.402335,0.0,Light Rail Station,Coffee Shop,Liquor Store,Supermarket,Yoga Studio,Ethiopian Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant


Looks like row 22 has `NaN` looking at the map looks like this is a purely residential district

In [41]:
toronto.join(by_rank_df.set_index('Postcode'), on='Postcode').loc[22,:].to_frame().T

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Label,1st,2nd,3rd,4th,5th,6th,7th,8th,9th,10th
22,M5N,Central Toronto,Roselawn,43.7119,-79.4191,,,,,,,,,,,


Lets drop this for now

In [0]:
labeled_df=toronto.join(by_rank_df.set_index('Postcode'), on='Postcode').drop(22, axis =0)

Lets try to visualize the labels on the map to see what it looks like

In [43]:
import matplotlib.cm as cm
import matplotlib.colors as colors
map_clusters = folium.Map((toronto_loc.latitude+.02, toronto_loc.longitude), zoom_start =12)

# set color scheme for the clusters
x = np.arange(5)
ys = [i + x + (i*x)**2 for i in range(5)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(labeled_df['Latitude'], labeled_df['Longitude'], labeled_df['Postcode'],labeled_df['Cluster Label'].astype(int)):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

And looks like we are done. Hoped you liked this Notebook