# Peer Graded Assignment: Segmenting and Clustering Neighborhoods in Toronto



### Problem Statement:
<p> You are new to Toronto . You want to know cool places nearby and see what similar places exist . </p>


### Step 1: Gather neighborhood data
Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

![image.png](attachment:image.png)

In [1]:
import pandas as pd
import requests as rq
from bs4 import BeautifulSoup

In [2]:
src_url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = rq.get(src_url)
html_content = response.content
soup = BeautifulSoup(html_content,'html.parser')
bs_table=soup.find(name='table',attrs={'class':'wikitable sortable'})

### Step 2: To create a dataframe from gathered data

1. The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
2. Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
3. More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
4. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
5. Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
6. In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [3]:
cols=bs_table.find_all('th')
ls = [el.getText().rstrip('\n') for el in cols]

In [4]:
df = pd.DataFrame(columns=ls)
rows = bs_table.find_all('tr')
for row in rows:
    vals = row.find_all('td')
    to_inserted=[val.getText().rstrip('\n') for val in vals]
    # Ignoring Not assigned Boroughs and firts header line
    if(len(to_inserted)!=0 and to_inserted[1]!='Not assigned'):
        df.loc[len(df)] = to_inserted
df.columns=['Postal Code','Borough','Neighborhood']
df['Neighborhood'] = df['Neighborhood'].apply(lambda x: df['Borough'] if x=='' else x)
df['Neighborhood'] = df['Neighborhood'].apply(lambda x: x.replace(' / ',',') )
if(len(df['Postal Code'].unique())==df.shape[0]):
    print("All postal codes are unique , hence no need to combine")

All postal codes are unique , hence no need to combine


In [5]:
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park,Harbourfront"
3,M6A,North York,"Lawrence Manor,Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park,Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway,Montgomery Road ,Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business reply mail Processing CentrE
101,M8Y,Etobicoke,"Old Mill South,King's Mill Park,Sunnylea,Humbe..."


In [6]:
df.shape

(103, 3)

### Step 3: Populate the latitude and longitude columns for each postal code
Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In [7]:
#geocoder was not working hence reading from csv
df_coord = pd.read_csv("http://cocl.us/Geospatial_data")
df_coord

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [8]:
df=df.merge(df_coord,how='left',on=['Postal Code'])

In [9]:
df=df[df['Borough'].str.find("Toronto")!=-1].reset_index(drop=True)
df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park,Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park,Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,"Richmond,Adelaide,King",43.650571,-79.384568
9,M6H,West Toronto,"Dufferin,Dovercourt Village",43.669005,-79.442259


### Step 4: Explore neighborhoods  and segment them

In [10]:
import folium

In [11]:
#Toronto latitude and longitude from google : 43.6532,-79.3832
mp=folium.Map(location=[43.6532, -79.3832],zoom_start=11)

In [12]:
for lat,lng,borough,neigh,postal_code in zip(df['Latitude'],df['Longitude'],df['Borough'],df['Neighborhood'],df['Postal Code']):
    folium.Marker(
        location = [lat,lng],
        popup=borough
    ).add_to(mp)
mp

### Define Foursquare Credentials

In [13]:
CLIENT_ID = 'BQBN0VJCQYWDGHJYQS4P0WTSM3QY0X0SDSKQ0BQJLA0TYWAR' # your Foursquare ID
CLIENT_SECRET = 'LOQPZR5YABSVVI3M4O3PMCXFGONYZUNY3AD3NXQ51L410EVG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: BQBN0VJCQYWDGHJYQS4P0WTSM3QY0X0SDSKQ0BQJLA0TYWAR
CLIENT_SECRET:LOQPZR5YABSVVI3M4O3PMCXFGONYZUNY3AD3NXQ51L410EVG


### Form api url for explore venues endpoint limited for 100 venues for a particular neighborhood within 500 m radius

In [14]:
import requests
explore_url="https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}"
#.format(CLIENT_ID,CLIENT_SECRET,VERSION,neighborhood_latitude,neighborhood_longitude,500,100)
#print(search_url)
#res = rq.get(search_url).json()
#print(res)


In [15]:
# Explore neighborhoods of Downtown Toronto Borough

downtown_toronto_df = df[df['Borough']=='Downtown Toronto']
downtown_toronto_df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park,Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park,Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564
8,M5H,Downtown Toronto,"Richmond,Adelaide,King",43.650571,-79.384568
10,M5J,Downtown Toronto,"Harbourfront East,Union Station,Toronto Islands",43.640816,-79.381752
13,M5K,Downtown Toronto,"Toronto Dominion Centre,Design Exchange",43.647177,-79.381576


In [16]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### Step 5: Explore Neighborhoods in Downtown Toronto

In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
     #   print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
downtown_toronto_venues = getNearbyVenues(names=downtown_toronto_df['Neighborhood'],
                                   latitudes=downtown_toronto_df['Latitude'],
                                   longitudes=downtown_toronto_df['Longitude']
                                  )



In [19]:
print(downtown_toronto_venues.shape)
downtown_toronto_venues.head()

(1231, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park,Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,"Regent Park,Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,"Regent Park,Harbourfront",43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
3,"Regent Park,Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
4,"Regent Park,Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [20]:
downtown_toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,58,58,58,58,58,58
"CN Tower,King and Spadina,Railway Lands,Harbourfront West,Bathurst Quay,South Niagara,Island airport",18,18,18,18,18,18
Central Bay Street,64,64,64,64,64,64
Christie,18,18,18,18,18,18
Church and Wellesley,76,76,76,76,76,76
"Commerce Court,Victoria Hotel",100,100,100,100,100,100
"First Canadian Place,Underground city",100,100,100,100,100,100
"Garden District, Ryerson",100,100,100,100,100,100
"Harbourfront East,Union Station,Toronto Islands",100,100,100,100,100,100
"Kensington Market,Chinatown,Grange Park",55,55,55,55,55,55


In [21]:
print('There are {} uniques categories.'.format(len(downtown_toronto_venues['Venue Category'].unique())))

There are 206 uniques categories.


### Step 6: Analyse each neighborhood and find top 5 occuring categories of each neighborhood

In [22]:
df_neigh = downtown_toronto_venues[["Neighborhood","Venue Category"]]
df_neigh_cnt=pd.DataFrame(df_neigh.groupby(['Neighborhood','Venue Category'])['Venue Category'].count())
df_neigh_cnt.columns=["Count"]
#sr.reset_index()
df_neigh_cnt.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Count
Neighborhood,Venue Category,Unnamed: 2_level_1
Berczy Park,Art Gallery,1
Berczy Park,BBQ Joint,1
Berczy Park,Bagel Shop,1
Berczy Park,Bakery,2
Berczy Park,Basketball Stadium,1


In [23]:
df_neigh_cnt_=df_neigh_cnt.sort_values('Count',ascending=False).sort_index(level=0,sort_remaining=False)
df_neigh_cnt_

Unnamed: 0_level_0,Unnamed: 1_level_0,Count
Neighborhood,Venue Category,Unnamed: 2_level_1
Berczy Park,Coffee Shop,4
Berczy Park,Cocktail Bar,3
Berczy Park,Farmers Market,2
Berczy Park,Seafood Restaurant,2
Berczy Park,Restaurant,2
...,...,...
"University of Toronto,Harbord",Noodle House,1
"University of Toronto,Harbord",Coffee Shop,1
"University of Toronto,Harbord",College Arts Building,1
"University of Toronto,Harbord",Nightclub,1


In [24]:
df_neigh_cnt_=df_neigh_cnt_.reset_index()
df_neigh_cnt_.head(15)

Unnamed: 0,Neighborhood,Venue Category,Count
0,Berczy Park,Coffee Shop,4
1,Berczy Park,Cocktail Bar,3
2,Berczy Park,Farmers Market,2
3,Berczy Park,Seafood Restaurant,2
4,Berczy Park,Restaurant,2
5,Berczy Park,Italian Restaurant,2
6,Berczy Park,Café,2
7,Berczy Park,Beer Bar,2
8,Berczy Park,Bakery,2
9,Berczy Park,Cheese Shop,2


In [25]:
df_neigh_cnt_=df_neigh_cnt_.groupby(['Neighborhood']).head(5).reset_index()

In [26]:
df_neigh_cnt_ = df_neigh_cnt_[df_neigh_cnt_['Neighborhood']!='Rosedale'] 
df_neigh_cnt_.shape

(90, 4)

In [27]:
df_neigh_cnt_.reset_index(inplace=True)

In [28]:
df_final = pd.DataFrame(columns=['Neighborhood','Most occuring Category','2nd Most occuring Category','3rd Most occuring Category','4th Most occuring Category','5th Most occuring Category'])
cnt=0
lst=[]
index_col = df_neigh_cnt_.index
for i,n,cat in zip(index_col,df_neigh_cnt_['Neighborhood'],df_neigh_cnt_['Venue Category']):
    if i%5==0:
        lst.append(str(n))
    lst.append(cat)
    if i%5==4:
        df_final.loc[cnt]=lst
        cnt=cnt+1
        lst=[]

df_final

Unnamed: 0,Neighborhood,Most occuring Category,2nd Most occuring Category,3rd Most occuring Category,4th Most occuring Category,5th Most occuring Category
0,Berczy Park,Coffee Shop,Cocktail Bar,Farmers Market,Seafood Restaurant,Restaurant
1,"CN Tower,King and Spadina,Railway Lands,Harbou...",Airport Service,Airport Terminal,Airport Lounge,Airport,Airport Food Court
2,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Thai Restaurant
3,Christie,Grocery Store,Café,Park,Baby Store,Athletics & Sports
4,Church and Wellesley,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Restaurant,Pub
5,"Commerce Court,Victoria Hotel",Coffee Shop,Restaurant,Café,Hotel,Gym
6,"First Canadian Place,Underground city",Coffee Shop,Café,Hotel,Japanese Restaurant,Restaurant
7,"Garden District, Ryerson",Clothing Store,Coffee Shop,Café,Restaurant,Middle Eastern Restaurant
8,"Harbourfront East,Union Station,Toronto Islands",Coffee Shop,Aquarium,Café,Hotel,Italian Restaurant
9,"Kensington Market,Chinatown,Grange Park",Café,Coffee Shop,Bakery,Vietnamese Restaurant,Mexican Restaurant


In [29]:
# one hot encoding
dtoronto_onehot = pd.get_dummies(downtown_toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dtoronto_onehot['Neighborhood'] = downtown_toronto_venues['Neighborhood'] 

cols = list(dtoronto_onehot.columns)
cols.insert(0,cols.pop(cols.index('Neighborhood')))

dtoronto_onehot = dtoronto_onehot.loc[:,cols]

dtoronto_onehot.head()

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,"Regent Park,Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park,Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park,Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park,Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park,Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [30]:
dtoronto_grouped = dtoronto_onehot.groupby('Neighborhood').mean().reset_index()
dtoronto_grouped

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Theater,Theme Restaurant,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0
1,"CN Tower,King and Spadina,Railway Lands,Harbou...",0.055556,0.055556,0.055556,0.111111,0.166667,0.111111,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.015625
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,...,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316
5,"Commerce Court,Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,...,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0
6,"First Canadian Place,Underground city",0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,...,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
7,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0
8,"Harbourfront East,Union Station,Toronto Islands",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,...,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0
9,"Kensington Market,Chinatown,Grange Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.036364,0.0,0.054545,0.018182,0.0,0.0


### Step 7: Cluster Neighborhoods

In [31]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

dtoronto_grouped_clustering = dtoronto_grouped[dtoronto_grouped['Neighborhood']!='Rosedale']
dtoronto_grouped_clustering = dtoronto_grouped_clustering.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dtoronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

len(kmeans.labels_)

18

In [33]:
# add clustering labels

df_final.insert(0, 'Cluster Labels', kmeans.labels_)

dtoronto_merged = downtown_toronto_df

# # merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
dtoronto_merged = dtoronto_merged.join(df_final.set_index('Neighborhood'), on='Neighborhood')

dtoronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,Most occuring Category,2nd Most occuring Category,3rd Most occuring Category,4th Most occuring Category,5th Most occuring Category
0,M5A,Downtown Toronto,"Regent Park,Harbourfront",43.65426,-79.360636,0.0,Coffee Shop,Bakery,Park,Pub,Theater
1,M7A,Downtown Toronto,"Queen's Park,Ontario Provincial Government",43.662301,-79.389494,4.0,Coffee Shop,Sushi Restaurant,Diner,Theater,Yoga Studio
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0.0,Clothing Store,Coffee Shop,Café,Restaurant,Middle Eastern Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0.0,Café,Coffee Shop,American Restaurant,Cocktail Bar,Gastropub
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0.0,Coffee Shop,Cocktail Bar,Farmers Market,Seafood Restaurant,Restaurant


In [34]:
kmeans.labels_

array([0, 3, 4, 2, 0, 0, 0, 0, 0, 1, 4, 0, 0, 0, 0, 0, 0, 1])

In [36]:
dtoronto_merged_final = dtoronto_merged[dtoronto_merged['Neighborhood']!='Rosedale']

In [45]:
dtoronto_merged_final['Cluster Labels'] = dtoronto_merged_final['Cluster Labels'].apply(lambda x: int(x))
dtoronto_merged_final

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,Most occuring Category,2nd Most occuring Category,3rd Most occuring Category,4th Most occuring Category,5th Most occuring Category
0,M5A,Downtown Toronto,"Regent Park,Harbourfront",43.65426,-79.360636,0,Coffee Shop,Bakery,Park,Pub,Theater
1,M7A,Downtown Toronto,"Queen's Park,Ontario Provincial Government",43.662301,-79.389494,4,Coffee Shop,Sushi Restaurant,Diner,Theater,Yoga Studio
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Clothing Store,Coffee Shop,Café,Restaurant,Middle Eastern Restaurant
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Café,Coffee Shop,American Restaurant,Cocktail Bar,Gastropub
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Cocktail Bar,Farmers Market,Seafood Restaurant,Restaurant
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,4,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Thai Restaurant
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564,2,Grocery Store,Café,Park,Baby Store,Athletics & Sports
8,M5H,Downtown Toronto,"Richmond,Adelaide,King",43.650571,-79.384568,0,Coffee Shop,Café,Restaurant,American Restaurant,Clothing Store
10,M5J,Downtown Toronto,"Harbourfront East,Union Station,Toronto Islands",43.640816,-79.381752,0,Coffee Shop,Aquarium,Café,Hotel,Italian Restaurant
13,M5K,Downtown Toronto,"Toronto Dominion Centre,Design Exchange",43.647177,-79.381576,0,Coffee Shop,Hotel,Café,Japanese Restaurant,Restaurant


In [46]:
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[43.6532, -79.3832], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dtoronto_merged_final['Latitude'], dtoronto_merged_final['Longitude'], dtoronto_merged_final['Neighborhood'], dtoronto_merged_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Step 8: Examine the clusters

### Cluster 1

In [49]:
dtoronto_merged_final[dtoronto_merged_final['Cluster Labels']==0].loc[:,['Neighborhood','Most occuring Category','2nd Most occuring Category','3rd Most occuring Category','4th Most occuring Category','5th Most occuring Category']]

Unnamed: 0,Neighborhood,Most occuring Category,2nd Most occuring Category,3rd Most occuring Category,4th Most occuring Category,5th Most occuring Category
0,"Regent Park,Harbourfront",Coffee Shop,Bakery,Park,Pub,Theater
2,"Garden District, Ryerson",Clothing Store,Coffee Shop,Café,Restaurant,Middle Eastern Restaurant
3,St. James Town,Café,Coffee Shop,American Restaurant,Cocktail Bar,Gastropub
5,Berczy Park,Coffee Shop,Cocktail Bar,Farmers Market,Seafood Restaurant,Restaurant
8,"Richmond,Adelaide,King",Coffee Shop,Café,Restaurant,American Restaurant,Clothing Store
10,"Harbourfront East,Union Station,Toronto Islands",Coffee Shop,Aquarium,Café,Hotel,Italian Restaurant
13,"Toronto Dominion Centre,Design Exchange",Coffee Shop,Hotel,Café,Japanese Restaurant,Restaurant
16,"Commerce Court,Victoria Hotel",Coffee Shop,Restaurant,Café,Hotel,Gym
34,Stn A PO Boxes,Coffee Shop,Italian Restaurant,Café,Japanese Restaurant,Hotel
35,"St. James Town,Cabbagetown",Coffee Shop,Italian Restaurant,Café,Bakery,Restaurant


### Cluster 2

In [50]:
dtoronto_merged_final[dtoronto_merged_final['Cluster Labels']==1].loc[:,['Neighborhood','Most occuring Category','2nd Most occuring Category','3rd Most occuring Category','4th Most occuring Category','5th Most occuring Category']]

Unnamed: 0,Neighborhood,Most occuring Category,2nd Most occuring Category,3rd Most occuring Category,4th Most occuring Category,5th Most occuring Category
27,"University of Toronto,Harbord",Café,Japanese Restaurant,Italian Restaurant,Bar,Bookstore
30,"Kensington Market,Chinatown,Grange Park",Café,Coffee Shop,Bakery,Vietnamese Restaurant,Mexican Restaurant


### Cluster 3

In [51]:
dtoronto_merged_final[dtoronto_merged_final['Cluster Labels']==2].loc[:,['Neighborhood','Most occuring Category','2nd Most occuring Category','3rd Most occuring Category','4th Most occuring Category','5th Most occuring Category']]

Unnamed: 0,Neighborhood,Most occuring Category,2nd Most occuring Category,3rd Most occuring Category,4th Most occuring Category,5th Most occuring Category
7,Christie,Grocery Store,Café,Park,Baby Store,Athletics & Sports


### Cluster 4

In [52]:
dtoronto_merged_final[dtoronto_merged_final['Cluster Labels']==3].loc[:,['Neighborhood','Most occuring Category','2nd Most occuring Category','3rd Most occuring Category','4th Most occuring Category','5th Most occuring Category']]

Unnamed: 0,Neighborhood,Most occuring Category,2nd Most occuring Category,3rd Most occuring Category,4th Most occuring Category,5th Most occuring Category
32,"CN Tower,King and Spadina,Railway Lands,Harbou...",Airport Service,Airport Terminal,Airport Lounge,Airport,Airport Food Court


### Cluster 5

In [53]:
dtoronto_merged_final[dtoronto_merged_final['Cluster Labels']==4].loc[:,['Neighborhood','Most occuring Category','2nd Most occuring Category','3rd Most occuring Category','4th Most occuring Category','5th Most occuring Category']]

Unnamed: 0,Neighborhood,Most occuring Category,2nd Most occuring Category,3rd Most occuring Category,4th Most occuring Category,5th Most occuring Category
1,"Queen's Park,Ontario Provincial Government",Coffee Shop,Sushi Restaurant,Diner,Theater,Yoga Studio
6,Central Bay Street,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Thai Restaurant
