# Scraping Wikipedia Tables
I am going to scrape a particular wikipedia page and extract a table and transform it into a pandas Dataframe. 

The task are the following: 
1. The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
2. Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
3. More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
4. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
5. Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
6. In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe


### Notebook's summary: 
1. Create an empty dataframe df (**task1**)
2. Scrape the wiki page 
3. Populate the df and completing task (**taks2, task 4**)
4. Once you have the data in df, wrangle it (**task 3**)
5. Visualize the shape of the dataframe (**task6**)

## 1. Create an empty dataframe 

In [1]:
import pandas as pd 
import requests

In [2]:
columns_name = ["PostalCode","Borough", "Neighborhood" ]
df = pd.DataFrame(columns = columns_name)

## 2. Scrape the Wiki page 
I'm using beautifulsoup 

In [3]:
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')
#print(soup.prettify())

## 3. Populate the dataframe 


by using the browser inspect element I can see that I need to focus on the table tag
Here I am going to drop the first `<tr>` tag since it is representing the headers of the table. 

- content represents the whole content of the table and row represent each row of the table 

- I am using try and except to avoid having any errors during the scraping (playing it safe) but in this particular case we could avoid it since the table is complete. 

- I am avoiding to process any rows that has Borough as Not Assigned. (**task2**) and while checking the neighborhood I am replacing it to `borough` value if it's not assigned.(**task4**) 

- Append everything to df dataframe 

In [4]:
table = soup.find('table',{'class':'wikitable sortable'})
content = table.find_all('tr')
del content[0]

for row in content: 
    element = row.find_all('td')
    if (element[1].text != 'Not assigned'): 
        try: 
            postalcode = element[0].text
        except: 
            postalcode = none
        try: 
            borough = element[1].text
        except: 
            borough = none
        try: 
            neighborhood = element[2].text.split("\n")[0]
            if (neighborhood == "Not assigned"): 
                neighborhood = borough
        except: 
            neighborhood = none


        df = df.append({"PostalCode":postalcode, 
                        "Borough": borough,
                        "Neighborhood": neighborhood}, ignore_index = True)
    else: 
        pass 

In [5]:
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Queen's Park
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


looks ok!

## 4. Wrangle the dataframe 
in order to complete the task 3, we need to check how many unique PostCode we have. 
Task 3: More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

Which postcodes have more than one neighborhood?

In [6]:
postcode = df.groupby("PostalCode")[["Neighborhood"]].count()
postcode.head()

Unnamed: 0_level_0,Neighborhood
PostalCode,Unnamed: 1_level_1
M1B,2
M1C,3
M1E,3
M1G,1
M1H,1


How many rows my final dataframe will need to have: 

In [7]:
df["PostalCode"].nunique()

103

Prepare a brand new dataframe for the final changes. 

In [8]:
columns_name = ["PostalCode","Borough", "Neighborhood" ]
df2 = pd.DataFrame(columns = columns_name)


I will use as a reference the table above which will guide me through how to change the dataframe. 

- If the Postcode has only one neighborhood, then just append the row from df to df2. 

- If the Postcode has more than one neighborhood, then reorder the index, and append to the first instance the remaining neighborhood. Finally append the first instance that contains all the neighborhoods. 

In [9]:

for index, row in postcode.iterrows():
    
    one_postcode = df.loc[df["PostalCode"] == index, ['PostalCode', 'Borough', 'Neighborhood']]
    one_postcode.reset_index(drop=True, inplace=True)
    
    if (len(one_postcode) > 1): 

        for i in range(1,len(one_postcode)): 
            one_postcode["Neighborhood"][0] = one_postcode["Neighborhood"][0] + ", " + one_postcode['Neighborhood'][i]
        first = one_postcode.iloc[[0]]
        first.reset_index(drop=True, inplace=True)
        df2 = df2.append(first, ignore_index = True)
        
    else: 
        df2 = df2.append(one_postcode, ignore_index = True)

In [10]:
df2.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


## 5. Shape of the final dataframe

In [11]:
df2.shape

(103, 3)

## 6. Add coordinates  

In [12]:
import numpy as np
df2["latitude"] = np.nan
df2["longitude"] = np.nan

In [None]:
!pip install geocoder

In [None]:
import geocoder # import geocoder

# initialize your variable to None

for index, ps in df2.iterrows(): 
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(ps["PostalCode"]))
        lat_lng_coords = g.latlng

    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    
    df2["latitude"][index] = latitude
    df2["longitude"][index] = longitude

NO RESPONSE from this call - what a shame! 

In [13]:
del df2["latitude"]
del df2["longitude"]

## Let's use the CSV file


In [14]:
coordinates = pd.read_csv("Geospatial_Coordinates.csv")



In [15]:
df3 = df2.set_index('PostalCode').join(coordinates.set_index('Postal Code'))
df3.reset_index(level=0, inplace=True)

In [16]:
df3

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


## 7. Explore the neighborhood 

In [17]:
CLIENT_ID = 'ALLFWAOB3NHAEMKEOFNHBA5NOQ4021AVH1T5OAGZLZYKTQSE' # your Foursquare ID
CLIENT_SECRET = '0YAIIKKTSKJVZK02YVJU50ESWCWZQSLBESNREAKTQQ4WCUCD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100
radius = 500 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ALLFWAOB3NHAEMKEOFNHBA5NOQ4021AVH1T5OAGZLZYKTQSE
CLIENT_SECRET:0YAIIKKTSKJVZK02YVJU50ESWCWZQSLBESNREAKTQQ4WCUCD


In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Only Toronto

In [None]:
#only_toronto = ["East Toronto", "Central Toronto", "Downtown Toronto", "West Toronto", "East York", "Etobicoke", "Mississauga", "North York", "Queen's Park", "Scarborough", "York"]
#df4 = df3[df3["Borough"].isin(only_toronto)]
#df4.head()

In [19]:
toronto_venues = getNearbyVenues(names=df3['Borough'],
                                   latitudes=df3['Latitude'],
                                   longitudes=df3['Longitude']
                                  )


Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
Scarborough
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
North York
East York
East York
East Toronto
East York
East York
East York
East Toronto
East Toronto
East Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
North York
Central Toronto
Central Toronto
Central Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
Downtown Toronto
North York
North York
York
York
Downtown Toronto
Wes

In [20]:
print(toronto_venues.shape)
toronto_venues.head()

(2235, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Scarborough,43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,Scarborough,43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,Scarborough,43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,Scarborough,43.784535,-79.160497,Affordable Toronto Movers,43.787919,-79.162977,Moving Target
4,Scarborough,43.784535,-79.160497,Scarborough Historical Society,43.788755,-79.162438,History Museum


In [21]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Toronto,109,109,109,109,109,109
Downtown Toronto,1279,1279,1279,1279,1279,1279
East Toronto,125,125,125,125,125,125
East York,73,73,73,73,73,73
Etobicoke,71,71,71,71,71,71
Mississauga,11,11,11,11,11,11
North York,239,239,239,239,239,239
Queen's Park,40,40,40,40,40,40
Scarborough,90,90,90,90,90,90
West Toronto,177,177,177,177,177,177


#### Let's find out how many unique categories can be curated from all the returned venues

In [23]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 275 uniques categories.


## 8. Analyze Each Neighborhood

In [24]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
toronto_onehot.shape

(2235, 275)

Let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [27]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Central Toronto,0.009174,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.009174,0.0,0.0,0.009174,0.0,0.0,0.0,0.0,0.0
1,Downtown Toronto,0.002346,0.0,0.000782,0.000782,0.000782,0.000782,0.000782,0.001564,0.001564,...,0.0,0.0086,0.001564,0.0,0.004691,0.0,0.006255,0.000782,0.000782,0.001564
2,East Toronto,0.016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,East York,0.013699,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.013699,0.0,0.013699,0.0,0.0,0.0,0.0
4,Etobicoke,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0
5,Mississauga,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,North York,0.0,0.004184,0.0,0.0,0.004184,0.0,0.0,0.0,0.0,...,0.0,0.0,0.004184,0.004184,0.008368,0.0,0.0,0.0,0.004184,0.008368
7,Queen's Park,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0
8,Scarborough,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0
9,West Toronto,0.011299,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.00565,0.0,0.0,0.011299,0.0,0.00565,0.0,0.0,0.0


In [29]:
toronto_grouped.shape

(11, 275)

#### Top 5 most common venues for each Neighborhood

In [30]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central Toronto----
            venue  freq
0     Coffee Shop  0.08
1  Sandwich Place  0.07
2            Park  0.06
3     Pizza Place  0.06
4            Café  0.05


----Downtown Toronto----
         venue  freq
0  Coffee Shop  0.09
1         Café  0.06
2        Hotel  0.03
3   Restaurant  0.03
4    Gastropub  0.02


----East Toronto----
                venue  freq
0    Greek Restaurant  0.08
1         Coffee Shop  0.07
2      Ice Cream Shop  0.04
3  Italian Restaurant  0.04
4                Park  0.03


----East York----
          venue  freq
0   Coffee Shop  0.07
1   Pizza Place  0.04
2          Park  0.04
3  Burger Joint  0.04
4          Bank  0.04


----Etobicoke----
                  venue  freq
0           Pizza Place  0.11
1        Sandwich Place  0.07
2              Pharmacy  0.06
3  Fast Food Restaurant  0.04
4         Grocery Store  0.04


----Mississauga----
                  venue  freq
0           Coffee Shop  0.18
1                 Hotel  0.18
2   American Restaurant 

#### Let's write a function to sort the venues in descending order.

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Let's create the new dataframe and display the top 10 venues for each neighborhood.

In [41]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Coffee Shop,Sandwich Place,Park,Pizza Place,Café,Sushi Restaurant,Dessert Shop,Pharmacy,Clothing Store,Pub
1,Downtown Toronto,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Japanese Restaurant,American Restaurant,Park
2,East Toronto,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Brewery,Park,Café,Bakery,Pub,Bookstore
3,East York,Coffee Shop,Park,Sporting Goods Shop,Burger Joint,Bank,Pizza Place,Pharmacy,Pet Store,Supermarket,Indian Restaurant
4,Etobicoke,Pizza Place,Sandwich Place,Pharmacy,Café,Fast Food Restaurant,Grocery Store,Gym,Coffee Shop,Beer Store,Park


In [43]:
neighborhoods_venues_sorted.shape

(11, 12)

## 9. Cluster Neighborhoods
Run *k*-means to cluster the neighborhood into 5 clusters.

In [37]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


In [38]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 1, 2, 2, 0, 2, 3, 2, 1], dtype=int32)

In [None]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [50]:
df5 = df.set_index('PostalCode').join(coordinates.set_index('Postal Code'))
df5.reset_index(level=0, inplace=True)

In [51]:
toronto_merged = df5

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Borough')

toronto_merged.head() # check the last columns!

Unnamed: 0,index,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,Rouge,43.806686,-79.194353,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
1,M1B,Scarborough,Malvern,43.806686,-79.194353,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
2,M1C,Scarborough,Highland Creek,43.784535,-79.160497,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
3,M1C,Scarborough,Rouge Hill,43.784535,-79.160497,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
4,M1C,Scarborough,Port Union,43.784535,-79.160497,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station


### Let's visualise the clustering 


In [47]:
## Toronto 
latitude = 43.653963
longitude = -79.387207

In [52]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 10. Examine Clusters

In [62]:
#CLUSTER 1 - West of Toronto (red)

toronto_merged.loc[toronto_merged['Cluster Labels'] == 0]

Unnamed: 0,index,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
161,M7R,Mississauga,Canada Post Gateway Processing Centre,43.636966,-79.615819,0,Coffee Shop,Hotel,Burrito Place,Gym / Fitness Center,Fried Chicken Joint,Middle Eastern Restaurant,Sandwich Place,Mediterranean Restaurant,American Restaurant,Women's Store


In [63]:
# CLUSTER 2 - Near city centre (purple)

toronto_merged.loc[toronto_merged['Cluster Labels'] == 1]

Unnamed: 0,index,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,M4E,East Toronto,The Beaches,43.676357,-79.293031,1,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Brewery,Park,Café,Bakery,Pub,Bookstore
72,M4K,East Toronto,The Danforth West,43.679557,-79.352188,1,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Brewery,Park,Café,Bakery,Pub,Bookstore
73,M4K,East Toronto,Riverdale,43.679557,-79.352188,1,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Brewery,Park,Café,Bakery,Pub,Bookstore
74,M4L,East Toronto,The Beaches West,43.668999,-79.315572,1,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Brewery,Park,Café,Bakery,Pub,Bookstore
75,M4L,East Toronto,India Bazaar,43.668999,-79.315572,1,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Brewery,Park,Café,Bakery,Pub,Bookstore
76,M4M,East Toronto,Studio District,43.659526,-79.340923,1,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Brewery,Park,Café,Bakery,Pub,Bookstore
88,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,1,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Japanese Restaurant,American Restaurant,Park
89,M4X,Downtown Toronto,Cabbagetown,43.667967,-79.367675,1,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Japanese Restaurant,American Restaurant,Park
90,M4X,Downtown Toronto,St. James Town,43.667967,-79.367675,1,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Japanese Restaurant,American Restaurant,Park
91,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,1,Coffee Shop,Café,Restaurant,Hotel,Bakery,Italian Restaurant,Bar,Japanese Restaurant,American Restaurant,Park


In [64]:
#CLUSTER 3 - the rest of the city (blue)

toronto_merged.loc[toronto_merged['Cluster Labels'] == 2]

Unnamed: 0,index,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,Rouge,43.806686,-79.194353,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
1,M1B,Scarborough,Malvern,43.806686,-79.194353,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
2,M1C,Scarborough,Highland Creek,43.784535,-79.160497,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
3,M1C,Scarborough,Rouge Hill,43.784535,-79.160497,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
4,M1C,Scarborough,Port Union,43.784535,-79.160497,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
5,M1E,Scarborough,Guildwood,43.763573,-79.188711,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
6,M1E,Scarborough,Morningside,43.763573,-79.188711,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
7,M1E,Scarborough,West Hill,43.763573,-79.188711,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
8,M1G,Scarborough,Woburn,43.770992,-79.216917,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station
9,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,2,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Bakery,Pizza Place,Sandwich Place,Park,Bus Line,Bus Station


In [65]:
# CLUSTER 4  - City Centre (green)

toronto_merged.loc[toronto_merged['Cluster Labels'] == 3]

Unnamed: 0,index,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
160,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,3,Coffee Shop,Japanese Restaurant,Gym,Burger Joint,Diner,Seafood Restaurant,Sushi Restaurant,Café,Bar,Yoga Studio


In [66]:
# CLUSTER 5 - York Area (Orange)

toronto_merged.loc[toronto_merged['Cluster Labels'] == 4]

Unnamed: 0,index,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
135,M6C,York,Humewood-Cedarvale,43.693781,-79.428191,4,Fast Food Restaurant,Park,Women's Store,Sandwich Place,Hockey Arena,Fried Chicken Joint,Field,Market,Convenience Store,Pizza Place
136,M6E,York,Caledonia-Fairbanks,43.689026,-79.453512,4,Fast Food Restaurant,Park,Women's Store,Sandwich Place,Hockey Arena,Fried Chicken Joint,Field,Market,Convenience Store,Pizza Place
148,M6M,York,Del Ray,43.691116,-79.476013,4,Fast Food Restaurant,Park,Women's Store,Sandwich Place,Hockey Arena,Fried Chicken Joint,Field,Market,Convenience Store,Pizza Place
149,M6M,York,Keelesdale,43.691116,-79.476013,4,Fast Food Restaurant,Park,Women's Store,Sandwich Place,Hockey Arena,Fried Chicken Joint,Field,Market,Convenience Store,Pizza Place
150,M6M,York,Mount Dennis,43.691116,-79.476013,4,Fast Food Restaurant,Park,Women's Store,Sandwich Place,Hockey Arena,Fried Chicken Joint,Field,Market,Convenience Store,Pizza Place
151,M6M,York,Silverthorn,43.691116,-79.476013,4,Fast Food Restaurant,Park,Women's Store,Sandwich Place,Hockey Arena,Fried Chicken Joint,Field,Market,Convenience Store,Pizza Place
152,M6N,York,The Junction North,43.673185,-79.487262,4,Fast Food Restaurant,Park,Women's Store,Sandwich Place,Hockey Arena,Fried Chicken Joint,Field,Market,Convenience Store,Pizza Place
153,M6N,York,Runnymede,43.673185,-79.487262,4,Fast Food Restaurant,Park,Women's Store,Sandwich Place,Hockey Arena,Fried Chicken Joint,Field,Market,Convenience Store,Pizza Place
197,M9N,York,Weston,43.706876,-79.518188,4,Fast Food Restaurant,Park,Women's Store,Sandwich Place,Hockey Arena,Fried Chicken Joint,Field,Market,Convenience Store,Pizza Place



## Clustering map

<img src="toronto_cluster.png">