<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto</font></h1>

## Problem 1

Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe.

To create the above dataframe:

- The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
- Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that - M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the - neighborhoods separated with a comma as shown in row 11 in the above table.
- If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
- Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
- In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

###### Scrape the List of postal codes of Canada

In [2]:
List_url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
source = requests.get(List_url).text

In [3]:
soup = BeautifulSoup(source, 'xml')

In [4]:
table=soup.find('table')

In [6]:
# create three lists to store table data
postalCodeList = []
boroughList = []
neighborhoodList = []

# append the data into the respective lists
for row in table.find_all('tr'):
    for cell in table.find_all('td'):    
        postalCodeList.append(cell.find("b").text)  
        if (len(cell.find("span")) > 1):
            lst = cell.find("span").text.split("(")
            boroughList.append(lst[0])
            neighborhoodList.append(lst[1][:-1])
        else:
            boroughList.append(cell.find("span").text)
            neighborhoodList.append("Not assigned")

In [7]:
# create a new DataFrame from the three lists
df = pd.DataFrame({"Postalcode": postalCodeList,
                           "Borough": boroughList,
                           "Neighborhood": neighborhoodList})

In [8]:
df.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


###### Data Cleaning
remove rows where Borough is 'Not assigned'

In [9]:
#drop cells with Borough not assigned
df_dropped = df[df.Borough != 'Not assigned'].reset_index(drop=True)
df_dropped.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Queen's Park,Ontario Provincial Government


In [10]:
# group multiple neighborhoods having same postal code
toronto_group = df_dropped.groupby(['Postalcode', 'Borough'], as_index=False).agg(lambda x: ", ".join(x))
toronto_group['Neighborhood'] = toronto_group['Neighborhood'].str.replace('/', ',')
toronto_group.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern , Rouge, Malvern , Rouge, Malvern , Ro..."
1,M1C,Scarborough,"Rouge Hill , Port Union , Highland Creek, Roug..."
2,M1E,Scarborough,"Guildwood , Morningside , West Hill, Guildwood..."
3,M1G,Scarborough,"Woburn, Woburn, Woburn, Woburn, Woburn, Woburn..."
4,M1H,Scarborough,"Cedarbrae, Cedarbrae, Cedarbrae, Cedarbrae, Ce..."


In [11]:
toronto_group.shape

(103, 3)

## Problem 2:

We have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In [12]:
def get_geocode(postal_code):
    # initialize your variable to None
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    return latitude,longitude

In [13]:
#geo_df=pd.read_csv('http://cocl.us/Geospatial_data')
geo_df = pd.read_csv("/Users/anamikasharma/Downloads/Geospatial_Coordinates.csv")

In [14]:
geo_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [15]:
geo_df.rename(columns={'Postal Code':'Postalcode'},inplace=True)
geo_merged = pd.merge(geo_df, toronto_group, on='Postalcode')

In [16]:
geo_data=geo_merged[['Postalcode','Borough','Neighborhood','Latitude','Longitude']]

In [17]:
geo_data.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern , Rouge, Malvern , Rouge, Malvern , Ro...",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill , Port Union , Highland Creek, Roug...",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood , Morningside , West Hill, Guildwood...",43.763573,-79.188711
3,M1G,Scarborough,"Woburn, Woburn, Woburn, Woburn, Woburn, Woburn...",43.770992,-79.216917
4,M1H,Scarborough,"Cedarbrae, Cedarbrae, Cedarbrae, Cedarbrae, Ce...",43.773136,-79.239476


In [18]:
toronto_data=geo_data[geo_data['Borough'].str.contains("Toronto")]
toronto_data.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
37,M4E,East Toronto,"The Beaches, The Beaches, The Beaches, The Bea...",43.676357,-79.293031
40,M4J,East YorkEast Toronto,"The Danforth East, The Danforth East, The Da...",43.685347,-79.338106
41,M4K,East Toronto,"The Danforth West , Riverdale, The Danforth We...",43.679557,-79.352188
42,M4L,East Toronto,"India Bazaar , The Beaches West, India Bazaar ...",43.668999,-79.315572
43,M4M,East Toronto,"Studio District, Studio District, Studio Distr...",43.659526,-79.340923


In [19]:
CLIENT_ID = '2OQOZHBFA51TIMEGH3FSFT5VFRASXIU3YGAHHMXNNQECIVTA' # your Foursquare ID
CLIENT_SECRET = 'R5IYM2203NHGAD5FKP4RFCMHXHANGOGYTKBALODAP5GVZXGO' # your Foursquare Secret
VERSION = '20180604'

In [20]:
def getNearbyVenues(names, latitudes, longitudes):
    radius=500
    LIMIT=100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches, The Beaches
The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East, The Danforth  East
The Danforth West , Riverdale, The Danforth West , Riverdale, The Danforth West , Riverdale, The Danforth West , Riverdale, The Danforth West , Riverdale, The Danforth West , Riverdale, The Danforth West , Riverdale, The Danforth West , Riverdale, The Danforth West , Riverdale, The Danforth West , Riverdale, The Danforth West , Riverdale, T

Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park
Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street
Richmond , Adelaide , King, Richmond , Adelaide , King, Richmond , Adelaide , King, Richmond , Adelaide , King, Richmond , Adelaide , King, Richmond , Adelaide , King, Richmond , Adelaide , King, Richmond , Adelaide , King, Richmond , Adelaide , King, Richmond , Adelaide , King, Richmond , Adelaide , King, Richmond , Adelaide , King, Richmo

Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E, Enclave of M5E
First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Underground city, First Canadian Place , Un

In [22]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"The Beaches, The Beaches, The Beaches, The Bea...",43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
1,"The Beaches, The Beaches, The Beaches, The Bea...",43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
2,"The Beaches, The Beaches, The Beaches, The Bea...",43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
3,"The Beaches, The Beaches, The Beaches, The Bea...",43.676357,-79.293031,Skaut Design,43.680263,-79.290581,Construction & Landscaping
4,"The Danforth East, The Danforth East, The Da...",43.685347,-79.338106,The Path,43.683923,-79.335007,Park


In [23]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park, Berczy Park",47,47,47,47,47,47
"Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place, Brockton , Parkdale Village , Exhibition Place",22,22,22,22,22,22
"CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport, CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport",17,17,17,17,17,17
"Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street, Central Bay Street",62,62,62,62,62,62
"Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie, Christie",15,15,15,15,15,15
"Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley, Church and Wellesley",66,66,66,66,66,66
"Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel, Commerce Court , Victoria Hotel",100,100,100,100,100,100
"Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North, Davisville North",9,9,9,9,9,9
"Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville, Davisville",28,28,28,28,28,28
"Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village, Dufferin , Dovercourt Village",14,14,14,14,14,14


In [24]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot.drop(['Neighborhood'],axis=1,inplace=True) 
toronto_onehot.insert(loc=0, column='Neighborhood', value=toronto_venues['Neighborhood'] )
toronto_onehot.shape

(1500, 219)

In [25]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,"Berczy Park, Berczy Park, Berczy Park, Berczy ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0
1,"Brockton , Parkdale Village , Exhibition Place...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CN Tower , King and Spadina , Railway Lands , ...",0.0,0.0,0.058824,0.058824,0.058824,0.117647,0.176471,0.058824,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Central Bay Street, Central Bay Street, Centra...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.016129
4,"Christie, Christie, Christie, Christie, Christ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [27]:
import numpy as np 

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Berczy Park, Berczy Park, Berczy Park, Berczy ...",Cocktail Bar,Bakery,Sandwich Place,Coffee Shop,Farmers Market,Beer Bar,Seafood Restaurant,Vegetarian / Vegan Restaurant,Cheese Shop,Liquor Store
1,"Brockton , Parkdale Village , Exhibition Place...",Café,Bakery,Breakfast Spot,Coffee Shop,Sandwich Place,Bar,Italian Restaurant,Intersection,Japanese Restaurant,Restaurant
2,"CN Tower , King and Spadina , Railway Lands , ...",Airport Service,Airport Lounge,Coffee Shop,Harbor / Marina,Rental Car Location,Sculpture Garden,Plane,Boat or Ferry,Bar,Airport Terminal
3,"Central Bay Street, Central Bay Street, Centra...",Coffee Shop,Sandwich Place,Sushi Restaurant,Italian Restaurant,Café,Japanese Restaurant,Salad Place,Burger Joint,Bank,Restaurant
4,"Christie, Christie, Christie, Christie, Christ...",Grocery Store,Café,Park,Athletics & Sports,Nightclub,Italian Restaurant,Baby Store,Restaurant,Coffee Shop,Distribution Center


###### Making Clusters for Neighborhood

In [28]:
from sklearn.cluster import KMeans 

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [29]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() 

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,M4E,East Toronto,"The Beaches, The Beaches, The Beaches, The Bea...",43.676357,-79.293031,0,Health Food Store,Pub,Construction & Landscaping,Yoga Studio,Dance Studio,Eastern European Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
40,M4J,East YorkEast Toronto,"The Danforth East, The Danforth East, The Da...",43.685347,-79.338106,2,Park,Convenience Store,Yoga Studio,Dance Studio,Eastern European Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store
41,M4K,East Toronto,"The Danforth West , Riverdale, The Danforth We...",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Yoga Studio,Dessert Shop,Bakery,Bank,Café,Spa
42,M4L,East Toronto,"India Bazaar , The Beaches West, India Bazaar ...",43.668999,-79.315572,0,Park,Fast Food Restaurant,Pub,Food & Drink Shop,Board Shop,Coffee Shop,Sandwich Place,Brewery,Restaurant,Burrito Place
43,M4M,East Toronto,"Studio District, Studio District, Studio Distr...",43.659526,-79.340923,0,Coffee Shop,Gastropub,Bakery,Café,Ice Cream Shop,Italian Restaurant,Bar,Bank,Stationery Store,Fish Market


In [30]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,"Berczy Park, Berczy Park, Berczy Park, Berczy ...",Cocktail Bar,Bakery,Sandwich Place,Coffee Shop,Farmers Market,Beer Bar,Seafood Restaurant,Vegetarian / Vegan Restaurant,Cheese Shop,Liquor Store
1,0,"Brockton , Parkdale Village , Exhibition Place...",Café,Bakery,Breakfast Spot,Coffee Shop,Sandwich Place,Bar,Italian Restaurant,Intersection,Japanese Restaurant,Restaurant
2,0,"CN Tower , King and Spadina , Railway Lands , ...",Airport Service,Airport Lounge,Coffee Shop,Harbor / Marina,Rental Car Location,Sculpture Garden,Plane,Boat or Ferry,Bar,Airport Terminal
3,0,"Central Bay Street, Central Bay Street, Centra...",Coffee Shop,Sandwich Place,Sushi Restaurant,Italian Restaurant,Café,Japanese Restaurant,Salad Place,Burger Joint,Bank,Restaurant
4,0,"Christie, Christie, Christie, Christie, Christ...",Grocery Store,Café,Park,Athletics & Sports,Nightclub,Italian Restaurant,Baby Store,Restaurant,Coffee Shop,Distribution Center


In [31]:
import pandas, os, geopy
from geopy.geocoders import Nominatim

address = 'Toronto, CA'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


###### Creating Map

In [32]:
import folium
from matplotlib import cm , colors

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters