# <span style="color:blue">Segmenting and Clustering Neighborhoods in Toronto Assignmant</span>
<br /><br />

## <span style="color:blue">Part 1 - Preparation of our dataframe</span>

### <span style="color:blue">We will start by installing some required  libraries to make sure we have all the tools we need for the work in this notebook.</span>


In [1]:
!pip install request
!conda install -c conda-forge geopy --yes
!conda install -c conda-forge folium=0.5.0 --yes

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



### <span style="color:blue">Now we will import the libraries we are going to use in this notebook.</span>

In [2]:
import requests
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
import folium

### <span style="color:blue">Reading the data table from wiki</span>
<br />
<span style="color:blue">Now we will define the URL we are going to use as the URL for the Wiki page that should have the table we want to analyze.</span>
<br />
<span style="color:blue">After defining our URL we will conver the information it stores into a html object.</span>

In [3]:
# Open Canada information link

url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

#response = requests.get(url)
#soup = BeautifulSoup(response.text, 'html.parser')

<span style="color:blue">Next we are going to fetch all the tables from this web page and we wil print out the top 5 rows of each table so we can see which table we want to use.</span>

In [4]:
# Fetch the table with the data
df_wiki = pd.read_html(url,header=0)

# Print out all tables on the requested web page (first 5 rows of each table)
for i in range (len(df_wiki)):
    n = i + 1
    print ('_'*50)
    print('This is table #' + str(n) + ' on the requested web page:')
    print ('_'*50 + '\n')
    table = df_wiki[i]
    print(table.head())
    print('\n\n')

__________________________________________________
This is table #1 on the requested web page:
__________________________________________________

  Postcode           Borough     Neighbourhood
0      M1A      Not assigned      Not assigned
1      M2A      Not assigned      Not assigned
2      M3A        North York         Parkwoods
3      M4A        North York  Victoria Village
4      M5A  Downtown Toronto      Harbourfront



__________________________________________________
This is table #2 on the requested web page:
__________________________________________________

                                          Unnamed: 0  \
0  NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E...   
1                                                 NL   
2                                                  A   

                               Canadian postal codes  \
0  NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E...   
1                                                 NS   
2                           

<span style="color:blue">We can see that the table we want to use is the first table on the requested web page.</span>
<br />
<span style="color:blue">So now we must set the first table to our dataframe.</span>

In [5]:
# Set the first table to our dataframe.
pre_df = df_wiki[0]
pre_df = pre_df.rename(columns = {'Neighbourhood': 'Neighborhood'})

pre_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### <span style="color:blue">Checking the shape of our talble</span>

In [6]:
pre_df.shape

(287, 3)

### <span style="color:blue">Prepartion of the dataframe</span>
<br /> 
<span style="color:blue">We will start with clearing up the table and removing any cell that does not have an assigned borough.</span>

In [7]:
# Check how many rows do not have their borough specified
pre_df['Borough'].value_counts()

Not assigned        77
Etobicoke           44
North York          38
Scarborough         37
Downtown Toronto    37
Central Toronto     17
West Toronto        13
York                 9
East Toronto         7
East York            6
Mississauga          1
Queen's Park         1
Name: Borough, dtype: int64

### <span style="color:blue">We can see that we need to drop 77 rows from our dataframe.</span>

In [8]:
# Delete all rows that do not have a borough assigned to them
df = pre_df
for i in range (len(df['Borough'])):
    if df['Borough'][i] == 'Not assigned':
        df = df.drop(i, axis=0)

df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


In [9]:
df.shape

(210, 3)

In [10]:
# Reset the index numbers
df.reset_index(drop = True, inplace = True)

df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


### <span style="color:blue">Now we will change the Neighborhood value to the Borough value if the Neighborhood value is "Not assigned".</span>

In [11]:
# Check which rows have their Neighborhood set as "Not Assigned" and then change that value to the row's Borough value
for i in range (len(df['Neighborhood'])):
    if df['Neighborhood'][i] == 'Not assigned':
        df['Neighborhood'][i] = df['Borough'][i]

In [12]:
# Group all Neighborhood from same Postcode in to one row an separate them by commas
df = df.groupby(['Postcode', 'Borough']).Neighborhood.agg([('Neighborhood', ', '.join)])

# Restet the index of the new dataframe
df.reset_index(drop = False, inplace = True)

df.head()

Unnamed: 0,Postcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [13]:
df.shape

(103, 3)

### <span style="color:blue">The last line of code conclude part 1 of this project and we can see that our dataframe has 103 row and 3 columns.</span>
### <span style="color:blue">-----------------------------------------------------------------------------------------------------------------------------------------</span>
<br /><br /><br />

## <span style="color:blue">Part 2 - adding the coordinates data into our dataframe</span>

### <span style="color:blue">Let's creat a dataframe with the postal codes from the published CSV file.</span>

In [14]:
# Download the postal codes coordinates
df_postal = pd.read_csv('https://cocl.us/Geospatial_data')
df_postal.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [15]:
df_postal.shape

(103, 3)

### <span style="color:blue">We can see that we have the same amount of rows in both of our dataframes, so now let's join them together using the postal codes (as they are uniqe per row)</span>

In [16]:
# Create a merged dataframe that includes the PostalCode, Borough, Neighborhood, Latitude and Longitude columns
df_final = df.set_index('Postcode').join(df_postal.set_index('Postal Code'))

# Restet the index of the new dataframe
df_final.reset_index(drop = False, inplace = True)

# Rename column Postcode to PostalCode
df_final = df_final.rename(columns={'Postcode': 'PostalCode'})

# Print a summery message
print('The dataframe has {} postal codes and neighborhood groups, and a total of {} boroughs.'.format(
    df_final.shape[0],    
    len(df_final['Borough'].unique())
    )
)

df_final.head()

The dataframe has 103 postal codes and neighborhood groups, and a total of 11 boroughs.


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [17]:
df_final.shape

(103, 5)

### <span style="color:blue">Now we have our final dataframe stored as "df_final"</span>
<br /><br /><br />

## <span style="color:blue">Part 3 - Exploring the data</span>

### <span style="color:blue">let's start by getting the coordinates of Toronto.</span>

In [18]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


### <span style="color:blue">Now we will visualize our data on a map.</span>

In [19]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_final['Latitude'], df_final['Longitude'], df_final['Borough'], df_final['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='#4B0082',
        fill=True,
        fill_color='#9400D3',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### <span style="color:blue">We will focus on the data of the boroughs that their name includes "Toronto", so let's create our dataframe to support this.</span>

In [20]:
# Create a dataframe that focuses on boroughs containing "Toronto" in their name
toronto_data = df_final[df_final['Borough'].str.contains('Toronto') == True]

# Print a summery message
print('The dataframe has {} postal codes and neighborhood groups, and a total of {} boroughs.'.format(
    toronto_data.shape[0],    
    len(toronto_data['Borough'].unique())
    )
)

toronto_data.head()

The dataframe has 39 postal codes and neighborhood groups, and a total of 4 boroughs.


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
43,M4M,East Toronto,Studio District,43.659526,-79.340923
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [21]:
toronto_data.shape

(39, 5)

### <span style="color:blue">Now we will visualize our new dataframe.</span>

In [22]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='#4B0082',
        fill=True,
        fill_color='#9400D3',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### <span style="color:blue">The next step is to explore our data on Foursquare.</span>

In [23]:
# The code was removed by Watson Studio for sharing.

### <span style="color:blue">We are now going to exploce our data of the boroughs that include "Toronto" in their names.</span>
<br />
<span style="color:blue">We will start by focusing our map on the center of our data by getting the average coordinates from our dataframe.</span>

In [24]:
# Get the average values of the coordinates of all Downtown Toronto neighborhoods
toronto_coordinates = toronto_data[toronto_data['Borough'].str.contains('Toronto') == True].mean()

toronto_latitude = toronto_coordinates['Latitude'] # Toronto latitude value
toronto_longitude = toronto_coordinates['Longitude'] # Toronto longitude value


# Print a message with Downtown Toronto's coordinates
print('Latitude and longitude values of {} are {}, {}.'.format('Toronto', 
                                                               toronto_latitude, 
                                                               toronto_longitude))

Latitude and longitude values of Toronto are 43.66727739999999, -79.39353346923077.


### <span style="color:blue">We will need to use the "get_category_type" function for the Foursquare lab, so let's define it real quick.</span>


In [25]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### <span style="color:blue">We are now going to automate the process of retrieving the top venues for each neighborhood (borrowed from the Segmenting and Clustering Neighborhoods in New York City lab).</span>


In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### <span style="color:blue">Let's make use of our getNearbyVenues and retrieve the data we need for our clustering alalysis.</span>

In [27]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'], latitudes=toronto_data['Latitude'], longitudes=toronto_data['Longitude'])

The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The Junction Sout

In [28]:
print(toronto_venues.shape)
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))
toronto_venues.head()

(1685, 7)
There are 232 uniques categories.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Glen Stewart Ravine,43.6763,-79.294784,Other Great Outdoors
4,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood


### <span style="color:blue">Now we will start our analysis for each neighborhood.</span>

In [30]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

# Group rows by the mean frequency of each category for each neighborhood
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()


toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0.0,0.0,0.066667,0.066667,0.066667,0.133333,0.133333,0.133333,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,...,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.0
7,"Chinatown, Grange Park, Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.010638,0.0,0.0,0.0,0.031915,0.0,0.042553,0.010638,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.012048,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,...,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.012048


In [31]:
toronto_grouped.shape

(38, 232)

### <span style="color:blue">Now we are going to sort the venues in descending order for each row in the dataframe (borrowed from the Segmenting and Clustering Neighborhoods in New York City lab).</span> 

In [32]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### <span style="color:blue">The next step is to create our new sorted dataframe including the top 10 venues for each neighborhood.</span> 

In [33]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Thai Restaurant,Steakhouse,Asian Restaurant,Salad Place,Restaurant,Bar,Bakery,Burger Joint
1,Berczy Park,Coffee Shop,Beer Bar,Steakhouse,Bakery,Farmers Market,Cocktail Bar,Seafood Restaurant,Cheese Shop,Café,Breakfast Spot
2,"Brockton, Exhibition Place, Parkdale Village",Breakfast Spot,Coffee Shop,Café,Yoga Studio,Pet Store,Restaurant,Italian Restaurant,Intersection,Burrito Place,Bar
3,Business Reply Mail Processing Centre 969 Eastern,Skate Park,Burrito Place,Recording Studio,Fast Food Restaurant,Auto Workshop,Farmers Market,Spa,Pizza Place,Restaurant,Smoke Shop
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Terminal,Airport Lounge,Sculpture Garden,Rental Car Location,Harbor / Marina,Boat or Ferry,Bar,Airport Gate,Airport Food Court


### <span style="color:blue">We are now going to run K-means to cluster our neighborhoods into 5 clusters and add those clusters to our dataframe.</span> 

In [34]:
# set number of clusters
kclusters = 5

# Remove the "Neighborhood" column
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood').dropna()

toronto_merged.reset_index(drop = True, inplace = True) 
toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0.0,Other Great Outdoors,Pub,Trail,Health Food Store,Wings Joint,Diner,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0.0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Restaurant,Bookstore,Bubble Tea Shop,Caribbean Restaurant,Brewery
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0.0,Sandwich Place,Park,Sushi Restaurant,Fast Food Restaurant,Steakhouse,Ice Cream Shop,Fish & Chips Shop,Burrito Place,Pizza Place,Food & Drink Shop
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0.0,Café,Coffee Shop,Bakery,Italian Restaurant,American Restaurant,Yoga Studio,Bookstore,Sandwich Place,Brewery,Cheese Shop
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,2.0,Park,Gym / Fitness Center,Bus Line,Swim School,Dance Studio,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


### <span style="color:blue">Let's visualize our clusters on a map.</span> 

In [35]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## <span style="color:blue">We will now analyze our clusters and try to name them.</span> 

### <span style="color:blue">Cluster 1:</span> 

In [36]:
cluster_1 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]
cluster_1

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,The Beaches,Other Great Outdoors,Pub,Trail,Health Food Store,Wings Joint,Diner,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
1,"The Danforth West, Riverdale",Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Restaurant,Bookstore,Bubble Tea Shop,Caribbean Restaurant,Brewery
2,"The Beaches West, India Bazaar",Sandwich Place,Park,Sushi Restaurant,Fast Food Restaurant,Steakhouse,Ice Cream Shop,Fish & Chips Shop,Burrito Place,Pizza Place,Food & Drink Shop
3,Studio District,Café,Coffee Shop,Bakery,Italian Restaurant,American Restaurant,Yoga Studio,Bookstore,Sandwich Place,Brewery,Cheese Shop
5,Davisville North,Hotel,Pizza Place,Breakfast Spot,Sandwich Place,Food & Drink Shop,Park,Gym,Clothing Store,Comfort Food Restaurant,Deli / Bodega
6,North Toronto West,Clothing Store,Coffee Shop,Sporting Goods Shop,Park,Chinese Restaurant,Cosmetics Shop,Mexican Restaurant,Burger Joint,Rental Car Location,Restaurant
7,Davisville,Pizza Place,Dessert Shop,Sandwich Place,Italian Restaurant,Gym,Café,Sushi Restaurant,Coffee Shop,Fried Chicken Joint,Farmers Market
9,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",Coffee Shop,Pub,Fried Chicken Joint,Light Rail Station,Restaurant,Liquor Store,Sports Bar,Bagel Shop,Supermarket,Pizza Place
11,"Cabbagetown, St. James Town",Coffee Shop,Restaurant,Pizza Place,Flower Shop,Italian Restaurant,Pub,Bakery,Café,Liquor Store,Deli / Bodega
12,Church and Wellesley,Sushi Restaurant,Coffee Shop,Gay Bar,Japanese Restaurant,Restaurant,Mediterranean Restaurant,Gastropub,Gym,Hotel,Café


### <span style="color:blue">This is our largest cluster so we will check the top 5 most common venues accross all the neighborhoods in this cluset and name it accordingly.</span> 

In [37]:
cluster_1.stack().value_counts().head()

Coffee Shop           24
Café                  23
Restaurant            19
Italian Restaurant    13
Bakery                11
dtype: int64

### <span style="color:blue">We can see that coffee shops and restaurants are the most common category of this cluster so we will name it "Food Orianted Neighborhoods".</span> 
<br /><br />

### <span style="color:blue">For the next 4 clusters we will check all counts since they do not have as many categories in them as in cluster 1.</span> 

### <span style="color:blue">Cluster 2:</span> 

In [38]:
cluster_2 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]
cluster_2

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Rosedale,Park,Trail,Playground,Cuban Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store,Diner
23,"Forest Hill North, Forest Hill West",Park,Jewelry Store,Trail,Sushi Restaurant,Cupcake Shop,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store


### <span style="color:blue">Let's see what are our category counts for this cluster.</span> 

In [39]:
cluster_2.stack().value_counts()

Park                                   2
Discount Store                         2
Trail                                  2
Dumpling Restaurant                    2
Dog Run                                2
Donut Shop                             2
Doner Restaurant                       2
Rosedale                               1
Forest Hill North, Forest Hill West    1
Cupcake Shop                           1
Cuban Restaurant                       1
Jewelry Store                          1
Playground                             1
Sushi Restaurant                       1
Diner                                  1
dtype: int64

### <span style="color:blue">It looks like most of our categories for cluster 2 have to do with the ourdoors so let's name this cluster "Outdoor Activities Neighborhood".</span> 
<br /><br />

### <span style="color:blue">Cluster 3:</span> 

In [40]:
cluster_3 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]
cluster_3

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Lawrence Park,Park,Gym / Fitness Center,Bus Line,Swim School,Dance Studio,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


### <span style="color:blue">Let's see what are our category counts for this cluster.</span> 

In [41]:
cluster_3.stack().value_counts()

Dance Studio                   1
Dumpling Restaurant            1
Swim School                    1
Park                           1
Lawrence Park                  1
Donut Shop                     1
Doner Restaurant               1
Bus Line                       1
Dog Run                        1
Gym / Fitness Center           1
Eastern European Restaurant    1
dtype: int64

### <span style="color:blue">This cluster will be harder to name since it does not have many categories and they dont seem to all fall into one area, but we can see the we can categorieze them as recreational (at least for the bettr part of this cluster) so we will name this cluster as "Recreational Activities Neighborhood".</span> 
<br /><br />

### <span style="color:blue">Cluster 4:</span> 

In [42]:
cluster_4 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]
cluster_4

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Roselawn,Health & Beauty Service,Garden,Wings Joint,Dance Studio,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


### <span style="color:blue">Let's see what are our category counts for this cluster.</span> 

In [43]:
cluster_4.stack().value_counts()

Wings Joint                    1
Dance Studio                   1
Health & Beauty Service        1
Dumpling Restaurant            1
Electronics Store              1
Garden                         1
Donut Shop                     1
Doner Restaurant               1
Roselawn                       1
Dog Run                        1
Eastern European Restaurant    1
dtype: int64

### <span style="color:blue">The results for this cluster are to generic so we will leave it without a defining name.</span> 
<br /><br />

### <span style="color:blue">Cluster 5:</span> 

In [44]:
cluster_5 = toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(6, toronto_merged.shape[1]))]]
cluster_5

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,"Moore Park, Summerhill East",Restaurant,Playground,Intersection,Wings Joint,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Discount Store


### <span style="color:blue">Let's see what are our category counts for this cluster.</span> 

In [45]:
cluster_5.stack().value_counts()

Wings Joint                    1
Dumpling Restaurant            1
Intersection                   1
Playground                     1
Restaurant                     1
Moore Park, Summerhill East    1
Donut Shop                     1
Doner Restaurant               1
Discount Store                 1
Dog Run                        1
Eastern European Restaurant    1
dtype: int64

### <span style="color:blue">As we said about cluster 4, the results for this cluster are to generic so we will leave it without a defining name.</span> 
<br /><br />

### <span style="color:blue">We will finish with a table that shows the clusters and their names.</span> 

In [46]:
cluster_table = pd.DataFrame()
cluster_table['Cluster Number'] = ['Cluster 1', 'Cluster 2', 'Cluster 3', 'Cluster 4', 'Cluster 5']
cluster_table['Cluster Name'] = ['Food Orianted Neighborhoods', 'Outdoor Activities Neighborhood', 'Recreational Activities Neighborhood', '---', '---']
cluster_table

Unnamed: 0,Cluster Number,Cluster Name
0,Cluster 1,Food Orianted Neighborhoods
1,Cluster 2,Outdoor Activities Neighborhood
2,Cluster 3,Recreational Activities Neighborhood
3,Cluster 4,---
4,Cluster 5,---


<br /><br /><br /><br /><br /><br />
### <span style="color:blue">Thank you for taking the time to read though this notebook, hope you enjoyed!</span> 
