<a href="https://colab.research.google.com/github/AlucarD980/Coursera_Capstone/blob/main/Week3_Capstone3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Import Libraries**

In [1]:
import numpy as np
import pandas as pd
import folium
from geopy.geocoders import Nominatim
import requests
import lxml.html as lh
from sklearn.cluster import KMeans
print("Imports Ready")

Imports Ready


### **2. Scrapping Data from Wikipedia**

In [2]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
#Create a handle, page, to handle the contents of the website
page = requests.get(url)
#Store the contents of the website under doc
doc = lh.fromstring(page.content)
#Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')
r = requests.get(url)

#Create empty list
col=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i+=1
    name=t.text_content()
    print ('%d:"%s"'%(i,name))
    col.append((name,[]))

size_of_col = 3
for j in range(1,len(tr_elements)):
    #T is our j'th row
    T=tr_elements[j]
    
    #If row is not of size 3, the //tr data is not from our table 
    if len(T)!=size_of_col:
        break
    
    #i is the index of our column
    i=0
    
    #Iterate through each element of the row
    for t in T.iterchildren():
        data=t.text_content() 
        #Check if row is empty
        if i>0:
        #Convert any numerical value to integers
            try:
                data=int(data)
            except:
                pass
        #Append the data to the empty list of the i'th column
        col[i][1].append(data)
        #Increment i for the next column
        i+=1
        
Dict={title:column for (title,column) in col}
df=pd.DataFrame(Dict)
df = df.replace(r'\n','', regex=True) 
df.columns = ['PostalCode', 'Borough', 'Neighborhood']
df.drop(df.tail(1).index,inplace=True)

df = df[df.Borough != "Not assigned"].reset_index(drop=True)

df = df.replace(r'/',', ', regex=True) 

for index, row in df.iterrows():
    if row["Neighborhood"] == "Not assigned":
        row["Neighborhood"] = row["Borough"]
df.head()

1:"Postal Code
"
2:"Borough
"
3:"Neighbourhood
"


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


### **3. Adding geo columns**

In [3]:
df["Latitude"] = ""
df["Longitude"] = ""
df.shape

(103, 5)

### **4. Cleaning rows with multiple neighborhoods**

In [4]:
df["Neighborhood"] = df["Neighborhood"].str.split(",", n = 1, expand = True) 
df["Neighborhood"] = df["Neighborhood"].str.split("-", n = 1, expand = True) 
df["Neighborhood"].head(5)

0           Parkwoods
1    Victoria Village
2         Regent Park
3      Lawrence Manor
4        Queen's Park
Name: Neighborhood, dtype: object

In [6]:
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,,
1,M4A,North York,Victoria Village,,
2,M5A,Downtown Toronto,Regent Park,,
3,M6A,North York,Lawrence Manor,,
4,M7A,Downtown Toronto,Queen's Park,,
5,M9A,Etobicoke,Islington Avenue,,
6,M1B,Scarborough,Malvern,,
7,M3B,North York,Don Mills,,
8,M4B,East York,Parkview Hill,,
9,M5B,Downtown Toronto,Garden District,,


### **5. GeoLocation**

In [8]:
 to_drop_unknown = []
geolocator = Nominatim(user_agent="ny_explorer")
for index, row in df.iterrows():
    address = row['Neighborhood'] + ', Toronto'
    try:
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))
        df.loc[index, 'Latitude'] = latitude
        df.loc[index, 'Longitude'] = longitude
    except AttributeError:
        print('Cannot do: {}, will drop index: {}'.format(address, index))
        to_drop_unknown.append(index)

The geograpical coordinate of Parkwoods, Toronto are 43.7587999, -79.3201966.
The geograpical coordinate of Victoria Village, Toronto are 43.732658, -79.3111892.
The geograpical coordinate of Regent Park, Toronto are 43.6607056, -79.3604569.
The geograpical coordinate of Lawrence Manor, Toronto are 43.7220788, -79.4375067.
The geograpical coordinate of Queen's Park, Toronto are 43.659659, -79.3903399.
The geograpical coordinate of Islington Avenue, Toronto are 43.6389593, -79.5210499.
The geograpical coordinate of Malvern, Toronto are 43.8091955, -79.2217008.
The geograpical coordinate of Don Mills, Toronto are 43.775347, -79.3459439.
The geograpical coordinate of Parkview Hill, Toronto are 43.7062977, -79.3219073.
The geograpical coordinate of Garden District, Toronto are 43.6564995, -79.3771141.
The geograpical coordinate of Glencairn, Toronto are 43.7087117, -79.4406853.
The geograpical coordinate of West Deane Park, Toronto are 43.6631995, -79.5685684.
The geograpical coordinate of

In [9]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7588,-79.3202
1,M4A,North York,Victoria Village,43.7327,-79.3112
2,M5A,Downtown Toronto,Regent Park,43.6607,-79.3605
3,M6A,North York,Lawrence Manor,43.7221,-79.4375
4,M7A,Downtown Toronto,Queen's Park,43.6597,-79.3903


### **6. Cleaning failed locations**

In [10]:
geo_df = df.drop(to_drop_unknown)
geo_df['Latitude'].replace('', np.nan, inplace=True)
geo_df.dropna(subset=['Latitude'], inplace=True)
geo_df.shape

(99, 5)

### **7. Filter just Toronto Boroughs**

In [12]:
toronto = geo_df[geo_df['Borough'].str.contains("Toronto")].reset_index(drop=True)
toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Regent Park,43.660706,-79.3605
1,M7A,Downtown Toronto,Queen's Park,43.659659,-79.3903
2,M5B,Downtown Toronto,Garden District,43.6565,-79.3771
3,M5C,Downtown Toronto,St. James Town,43.669403,-79.3727
4,M4E,East Toronto,The Beaches,43.671024,-79.2967


In [13]:
toronto.drop_duplicates(subset ="PostalCode",keep = "first", inplace = True)
toronto.reset_index(drop=True) 
print(toronto.shape)

(37, 5)


### **8. Foursquare credentials**

In [14]:
CLIENT_ID = 'OWCJIG1YMHAEHV251OP3NBCTI1B1I5M5GEMYUEFL4VJIM33W' # your Foursquare ID
CLIENT_SECRET = 'I0QDGO1JWGLS1LLFERQZXCJVNOQ43IB0DRS2LQNVJFDZQQRU' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
toronto_df_new = toronto.copy()

Your credentails:
CLIENT_ID: OWCJIG1YMHAEHV251OP3NBCTI1B1I5M5GEMYUEFL4VJIM33W
CLIENT_SECRET:I0QDGO1JWGLS1LLFERQZXCJVNOQ43IB0DRS2LQNVJFDZQQRU


### **9. Top 100 venues within a 500 mts radius**

In [15]:
radius = 500
LIMIT = 100

venues = []

for lat, long, post, borough, neighborhood in zip(toronto_df_new['Latitude'], toronto_df_new['Longitude'], toronto_df_new['PostalCode'], toronto_df_new['Borough'], toronto_df_new['Neighborhood']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,        CLIENT_SECRET,        VERSION,        lat,        long,        radius,         LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            post,             borough,            neighborhood,            lat,             long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [16]:
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['PostalCode', 'Borough', 'Neighborhood', 'BoroughLatitude', 'BoroughLongitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1862, 9)


Unnamed: 0,PostalCode,Borough,Neighborhood,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,M5A,Downtown Toronto,Regent Park,43.660706,-79.360457,Regent Park Aquatic Centre,43.6606,-79.361392,Pool
1,M5A,Downtown Toronto,Regent Park,43.660706,-79.360457,Sumach Espresso,43.658135,-79.359515,Coffee Shop
2,M5A,Downtown Toronto,Regent Park,43.660706,-79.360457,Daniels Spectrum,43.660137,-79.361808,Performing Arts Venue
3,M5A,Downtown Toronto,Regent Park,43.660706,-79.360457,Thai To Go,43.663418,-79.36071,Thai Restaurant
4,M5A,Downtown Toronto,Regent Park,43.660706,-79.360457,Paintbox Bistro,43.66005,-79.362855,Restaurant


### **10. How many venues were returned per postal code**

In [17]:
venues_df.groupby(["PostalCode", "Borough", "Neighborhood"]).count().head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
PostalCode,Borough,Neighborhood,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
M4E,East Toronto,The Beaches,48,48,48,48,48,48
M4K,East Toronto,The Danforth West,32,32,32,32,32,32
M4L,East Toronto,India Bazaar,34,34,34,34,34,34
M4M,East Toronto,Studio District,100,100,100,100,100,100
M4N,Central Toronto,Lawrence Park,52,52,52,52,52,52


In [18]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 241 uniques categories.


In [19]:
venues_df['VenueCategory'].unique()[:10]

array(['Pool', 'Coffee Shop', 'Performing Arts Venue', 'Thai Restaurant',
       'Restaurant', 'Pub', 'Animal Shelter', 'Sushi Restaurant',
       'Auto Dealership', 'Food Truck'], dtype=object)

## **11. Analizing Areas**

In [20]:
# one hot encoding
toronto_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add postal, borough and neighborhood column back to dataframe
toronto_onehot['PostalCode'] = venues_df['PostalCode'] 
toronto_onehot['Borough'] = venues_df['Borough'] 
toronto_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(toronto_onehot.columns[-3:]) + list(toronto_onehot.columns[:-3])
toronto_onehot = toronto_onehot[fixed_columns]

print(toronto_onehot.shape)
toronto_onehot.head()

(1862, 244)


Unnamed: 0,PostalCode,Borough,Neighborhoods,Accessories Store,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Trail,Bistro,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,...,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soup Place,Souvlaki Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Tree,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,M5A,Downtown Toronto,Regent Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,M5A,Downtown Toronto,Regent Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,M5A,Downtown Toronto,Regent Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,M5A,Downtown Toronto,Regent Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,M5A,Downtown Toronto,Regent Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### **12. Group rows by Neighborhood (Mean & Frequency)**

In [21]:
toronto_grouped = toronto_onehot.groupby(["PostalCode", "Borough", "Neighborhoods"]).mean().reset_index()

print(toronto_grouped.shape)
toronto_grouped.head()

(37, 244)


Unnamed: 0,PostalCode,Borough,Neighborhoods,Accessories Store,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Trail,Bistro,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,...,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soup Place,Souvlaki Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Tree,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,M4E,East Toronto,The Beaches,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.020833,0.020833,0.041667,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.020833,0.0,0.0,...,0.0,0.0,0.020833,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.020833,0.0,0.0,0.0,0.020833,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,East Toronto,The Danforth West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,...,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M4L,East Toronto,India Bazaar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.029412,...,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,East Toronto,Studio District,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.02
4,M4N,Central Toronto,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.019231,0.0,0.057692,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.0,0.019231,...,0.019231,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.076923,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### **13. 10 venues per postal code**

In [22]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ['PostalCode', 'Borough', 'Neighborhoods']
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['PostalCode'] = toronto_grouped['PostalCode']
neighborhoods_venues_sorted['Borough'] = toronto_grouped['Borough']
neighborhoods_venues_sorted['Neighborhoods'] = toronto_grouped['Neighborhoods']

for ind in np.arange(toronto_grouped.shape[0]):
    row_categories = toronto_grouped.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighborhoods_venues_sorted.iloc[ind, 3:] = row_categories_sorted.index.values[0:num_top_venues]

# neighborhoods_venues_sorted.sort_values(freqColumns, inplace=True)
print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

(37, 13)


Unnamed: 0,PostalCode,Borough,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,Beach,Japanese Restaurant,Pub,Breakfast Spot,Park,Bar,Salon / Barbershop,Pizza Place,Shopping Mall,Nail Salon
1,M4K,East Toronto,The Danforth West,Grocery Store,Skating Rink,Pharmacy,Coffee Shop,Bus Line,Caribbean Restaurant,French Restaurant,Fish & Chips Shop,Optical Shop,Café
2,M4L,East Toronto,India Bazaar,Indian Restaurant,Grocery Store,Restaurant,Café,Halal Restaurant,Theater,Diner,Pizza Place,Platform,Egyptian Restaurant
3,M4M,East Toronto,Studio District,Coffee Shop,Café,Vegetarian / Vegan Restaurant,Cosmetics Shop,Clothing Store,Shoe Store,Japanese Restaurant,Yoga Studio,Bar,Italian Restaurant
4,M4N,Central Toronto,Lawrence Park,Sushi Restaurant,Italian Restaurant,Bakery,Coffee Shop,Fast Food Restaurant,Cosmetics Shop,Pharmacy,Pub,Asian Restaurant,Bank


### **14. Clustering**

In [23]:


# set number of clusters
kclusters = 5

Toronto_grouped_clustering = toronto_grouped.drop(["PostalCode", "Borough", "Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]



array([0, 3, 3, 1, 1, 1, 0, 1, 2, 1], dtype=int32)

In [24]:


print(toronto_grouped.shape)
print(neighborhoods_venues_sorted.shape)
print(toronto_df_new.shape)
print(kmeans.labels_.size)



(37, 244)
(37, 13)
(37, 5)
37


In [25]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
toronto_merged = toronto_df_new.copy()
# add clustering labels
toronto_merged["Cluster Labels"] = kmeans.labels_
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.drop(["Borough", "Neighborhoods"], 1).set_index("PostalCode"), on="PostalCode")
toronto_merged.rename(columns = {'BoroughLatitude':'Latitude', 'BoroughLongitude':'Longitude'}, inplace = True) 
print(toronto_merged.shape)
toronto_merged.head() # check the last columns!

(37, 16)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Regent Park,43.660706,-79.3605,0,Coffee Shop,Restaurant,Thai Restaurant,Pet Store,Indian Restaurant,Fast Food Restaurant,Food Truck,Pub,Beer Store,Sushi Restaurant
1,M7A,Downtown Toronto,Queen's Park,43.659659,-79.3903,3,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Bubble Tea Shop,Bank,Japanese Restaurant,French Restaurant,Thai Restaurant,Restaurant
2,M5B,Downtown Toronto,Garden District,43.6565,-79.3771,3,Clothing Store,Hotel,Coffee Shop,Japanese Restaurant,Café,Electronics Store,Theater,Lingerie Store,Sandwich Place,Restaurant
3,M5C,Downtown Toronto,St. James Town,43.669403,-79.3727,1,Coffee Shop,Café,Pizza Place,Grocery Store,Pharmacy,Bar,Food & Drink Shop,Market,Breakfast Spot,Restaurant
4,M4E,East Toronto,The Beaches,43.671024,-79.2967,1,Beach,Japanese Restaurant,Pub,Breakfast Spot,Park,Bar,Salon / Barbershop,Pizza Place,Shopping Mall,Nail Salon


In [26]:


# sort the results by Cluster Labels
print(toronto_merged.shape)
toronto_merged.sort_values(["Cluster Labels"], inplace=True)
toronto_merged.head()



(37, 16)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Regent Park,43.660706,-79.3605,0,Coffee Shop,Restaurant,Thai Restaurant,Pet Store,Indian Restaurant,Fast Food Restaurant,Food Truck,Pub,Beer Store,Sushi Restaurant
34,M4X,Downtown Toronto,St. James Town,43.669403,-79.3727,0,Coffee Shop,Café,Pizza Place,Grocery Store,Pharmacy,Bar,Food & Drink Shop,Market,Breakfast Spot,Restaurant
32,M5V,Downtown Toronto,CN Tower,43.642564,-79.3871,0,Hotel,Pizza Place,Coffee Shop,Scenic Lookout,Gym,Bar,Baseball Stadium,Theater,Aquarium,Ice Cream Shop
31,M4V,Central Toronto,Summerhill West,43.681678,-79.3905,0,Italian Restaurant,Coffee Shop,Sushi Restaurant,Café,Spa,Gym / Fitness Center,Pub,Sporting Goods Shop,Beer Store,French Restaurant
30,M5T,Downtown Toronto,Kensington Market,43.655214,-79.4023,0,Café,Vegetarian / Vegan Restaurant,Coffee Shop,Bar,Mexican Restaurant,Vietnamese Restaurant,Bakery,Grocery Store,Hostel,Farmers Market


### **15. Plotting the final Neighborhood Toronto**

In [27]:


map_clusters  = folium.Map(location=[latitude, longitude], zoom_start=12)
# set color scheme for the clusters
x = np.arange(kclusters)
rainbow = [    'red',    'blue',    'orange',    'darkgreen',    'darkblue',    'black']
# add markers to map
markers_colors = []
for lat, lon, post, bor, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['PostalCode'], toronto_merged['Borough'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup('{} ({}): {} - Cluster {}'.format(bor, post, poi, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters



### **Showing if GitHUB does not show it.**

In [28]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://github.com/AlucarD980/Coursera_Capstone/blob/main/Toronto_Map_Cluster.jpg?raw=true")

### **16. Looking at Clusters**

In [29]:
## Cluster 0
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0,Coffee Shop,Restaurant,Thai Restaurant,Pet Store,Indian Restaurant,Fast Food Restaurant,Food Truck,Pub,Beer Store,Sushi Restaurant
34,Downtown Toronto,0,Coffee Shop,Café,Pizza Place,Grocery Store,Pharmacy,Bar,Food & Drink Shop,Market,Breakfast Spot,Restaurant
32,Downtown Toronto,0,Hotel,Pizza Place,Coffee Shop,Scenic Lookout,Gym,Bar,Baseball Stadium,Theater,Aquarium,Ice Cream Shop
31,Central Toronto,0,Italian Restaurant,Coffee Shop,Sushi Restaurant,Café,Spa,Gym / Fitness Center,Pub,Sporting Goods Shop,Beer Store,French Restaurant
30,Downtown Toronto,0,Café,Vegetarian / Vegan Restaurant,Coffee Shop,Bar,Mexican Restaurant,Vietnamese Restaurant,Bakery,Grocery Store,Hostel,Farmers Market
6,Downtown Toronto,0,Coffee Shop,Italian Restaurant,Sandwich Place,Café,Bubble Tea Shop,Middle Eastern Restaurant,Chinese Restaurant,Yoga Studio,Clothing Store,Smoothie Shop
26,Central Toronto,0,Sushi Restaurant,Italian Restaurant,Furniture / Home Store,Coffee Shop,Park,Pub,Ice Cream Shop,Convenience Store,Mexican Restaurant,Middle Eastern Restaurant
25,West Toronto,0,Diner,Tibetan Restaurant,Restaurant,Indian Restaurant,Pharmacy,Bakery,Pizza Place,French Restaurant,Donut Shop,Japanese Restaurant


In [30]:
# Cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Downtown Toronto,1,Park,Japanese Restaurant,Playground,Bike Trail,Yoga Studio,Ethiopian Restaurant,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store
29,Central Toronto,1,Convenience Store,Gym,Restaurant,Tennis Court,Doctor's Office,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant
27,Downtown Toronto,1,Café,Park,Bookstore,Japanese Restaurant,Bakery,Bubble Tea Shop,Restaurant,French Restaurant,Museum,College Arts Building
24,Central Toronto,1,Pizza Place,Sushi Restaurant,Gym,Grocery Store,Thai Restaurant,Bistro,Park,Donut Shop,Fried Chicken Joint,Bookstore
23,Central Toronto,1,Bar,Vegetarian / Vegan Restaurant,Asian Restaurant,Restaurant,Vietnamese Restaurant,Café,Theater,Men's Store,Brewery,Miscellaneous Shop
22,West Toronto,1,Pet Store,Convenience Store,Pizza Place,Tennis Court,Mexican Restaurant,Pool,Food & Drink Shop,School,Café,Gym
18,Central Toronto,1,Sushi Restaurant,Italian Restaurant,Bakery,Coffee Shop,Fast Food Restaurant,Cosmetics Shop,Pharmacy,Pub,Asian Restaurant,Bank
3,Downtown Toronto,1,Coffee Shop,Café,Pizza Place,Grocery Store,Pharmacy,Bar,Food & Drink Shop,Market,Breakfast Spot,Restaurant
4,East Toronto,1,Beach,Japanese Restaurant,Pub,Breakfast Spot,Park,Bar,Salon / Barbershop,Pizza Place,Shopping Mall,Nail Salon
5,Downtown Toronto,1,Coffee Shop,Restaurant,Café,Hotel,Seafood Restaurant,Italian Restaurant,Japanese Restaurant,Gym,Gastropub,Cocktail Bar


In [31]:
# Cluster 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Downtown Toronto,2,Coffee Shop,Café,Hotel,Shoe Store,Thai Restaurant,Restaurant,Clothing Store,Speakeasy,Beer Bar,Bookstore


In [32]:
# Cluster 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,East Toronto,3,Indian Restaurant,Grocery Store,Restaurant,Café,Halal Restaurant,Theater,Diner,Pizza Place,Platform,Egyptian Restaurant
1,Downtown Toronto,3,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Bubble Tea Shop,Bank,Japanese Restaurant,French Restaurant,Thai Restaurant,Restaurant
2,Downtown Toronto,3,Clothing Store,Hotel,Coffee Shop,Japanese Restaurant,Café,Electronics Store,Theater,Lingerie Store,Sandwich Place,Restaurant
28,West Toronto,3,Café,Coffee Shop,Bakery,Bank,Pizza Place,Liquor Store,Flower Shop,Frozen Yogurt Shop,Bookstore,Soccer Field
36,Downtown Toronto,3,Coffee Shop,Pizza Place,Grocery Store,Sushi Restaurant,Caribbean Restaurant,Filipino Restaurant,Market,Restaurant,Food & Drink Shop,Breakfast Spot
12,East Toronto,3,Grocery Store,Skating Rink,Pharmacy,Coffee Shop,Bus Line,Caribbean Restaurant,French Restaurant,Fish & Chips Shop,Optical Shop,Café
13,Downtown Toronto,3,Coffee Shop,Hotel,Café,American Restaurant,Seafood Restaurant,Japanese Restaurant,Restaurant,Asian Restaurant,Deli / Bodega,Beer Bar
21,Central Toronto,3,Pizza Place,Sushi Restaurant,Bagel Shop,Italian Restaurant,Israeli Restaurant,Gastropub,Bank,Intersection,Korean Restaurant,Trail
20,Central Toronto,3,Sushi Restaurant,Italian Restaurant,Furniture / Home Store,Coffee Shop,Park,Pub,Ice Cream Shop,Convenience Store,Mexican Restaurant,Middle Eastern Restaurant
19,Central Toronto,3,Sushi Restaurant,Fast Food Restaurant,Gas Station,Persian Restaurant,Deli / Bodega,Pool,Pub,Restaurant,Skating Rink,Spa


In [33]:
# Cluster 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Downtown Toronto,4,Coffee Shop,Café,Restaurant,Hotel,Pizza Place,Italian Restaurant,Bank,Sandwich Place,Brewery,Plaza


# **Final Observations**

Most of venues were clustered in clusters 1 and 3, given that these are located in very commercial areas. Remaining areas are not so commercial.