# Segmenting and Clustering Neighborhoods in Toronto

In this assignment, you will be required to explore, segment, and cluster the neighborhoods in the city of Toronto. However, unlike New York, the neighborhood data is not readily available on the internet. What is interesting about the field of data science is that each project can be challenging in its unique way, so you need to learn to be agile and refine the skill to learn new libraries and tools quickly depending on the project.

For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. You will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset.

Once the data is in a structured format, you can replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.

# Part I

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

- Start by creating a new Notebook for this assignment.
- Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:
- To create the above dataframe:
    - The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
    - Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
    - More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
    - If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.
    - Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
    - In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.
- Submit a link to your Notebook on your Github repository.

Note: There are different website scraping libraries and packages in Python. One of the most common packages is BeautifulSoup. Here is the package's main documentation page: http://beautiful-soup-4.readthedocs.io/en/latest/

The package is so popular that there is a plethora of tutorials and examples of how to use it. Here is a very good Youtube video on how to use the BeautifulSoup package: https://www.youtube.com/watch?v=ng2o98k983k

Use the BeautifulSoup package or any other way you are comfortable with to transform the data in the table on the Wikipedia page into the above pandas dataframe

#### Import Libraries

In [9]:
import numpy as np
import pandas as pd
import json
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from bs4 import BeautifulSoup
import xml
!conda install -c conda-forge folium=0.5.0 --yes
import folium 
!conda install -c conda-forge geocoder --yes
import geocoder 
print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geocoder-1.38.1            |             py_0          52 KB  conda-forge
    orderedset-2.0             |           py35_0         685 KB  conda-forge
    ratelim-0.1.6              |           py35_0           5 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         742 KB

The following NEW packages will be INSTALLED:

    geocoder:   1.38.1-py_0  conda-forge
    orderedset: 2.0-py35_0   conda-forge
    ratelim:    0.1.6-py35_0 conda-forge


Downloading and Extracting Packages
geocoder-1.38.1      | 52 KB     | ################################

#### Scrapping data from website and creating dataframe

- HTML tags meaning
    - td: tag for a table Data/cell.
    - th: tag for a table Header.
    - tr: tag for a table Row.

In [10]:
url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(url,'lxml')

table_post = soup.find('table')
fields = table_post.find_all('td')

postcode = []
borough = []
neighbourhood = []

for i in range(0, len(fields), 3):
    postcode.append(fields[i].text.strip())
    borough.append(fields[i+1].text.strip())
    neighbourhood.append(fields[i+2].text.strip())
        
df = pd.DataFrame(data=[postcode, borough, neighbourhood]).transpose()
df.columns = ['Postcode', 'Borough', 'Neighbourhood']
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


#### Drop rows with the value "Not assigned" on the column Borough

In [11]:
df2 = df[df['Borough'] != 'Not assigned']
df2.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


#### Replacing 'Not assigned' on column Neighbourhood by Borough

In [12]:
df2['Neighbourhood'] = df2['Borough'].where(df2['Neighbourhood'] == 'Not assigned', other=df2['Neighbourhood'])
df2[df2['Neighbourhood'] == 'Not assigned']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,Postcode,Borough,Neighbourhood


#### Grouping rows by Postcode and Borough

In [13]:
df3=df2.groupby(['Postcode','Borough'])['Neighbourhood'].apply(lambda x: ','.join(x.astype(str))).reset_index()
df3.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


# Part II

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code.

Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

Important Note: There is a limit on how many times you can call geocoder.google function. It is 2500 times per day. This should be way more than enough for you to get acquainted with the package and to use it to get the geographical coordinates of the neighborhoods in the Toronto.

#### Creating a function to get the Lat Long data from the Postal Code

In [14]:
def get_geocoder(postal_code_from_df):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code_from_df.strip()))
        lat_lng_coords = g.latlng
        latitude = lat_lng_coords[0]
        longitude = lat_lng_coords[1]
    return latitude,longitude

#### Adding Latitude and Longitude columns to dataframe

In [15]:
df3['Latitude'], df3['Longitude'] = zip(*df3['Postcode'].apply(get_geocoder))
df3.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.811525,-79.195517
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.78573,-79.15875
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.76569,-79.175256
3,M1G,Scarborough,Woburn,43.768359,-79.21759
4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944


## Toronto Map

#### Getting Lat and Log of Toronto

In [16]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_ontario")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Toronto, Ontario Latitude and Logitude: {}, {}.'.format(latitude, longitude))

Toronto, Ontario Latitude and Logitude: 43.653963, -79.387207.


#### Ploting Folium Map

In [17]:
Tmap = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, long, post, borough, neigh in zip(df3['Latitude'], df3['Longitude'], df3['Postcode'], df3['Borough'], df3['Neighbourhood']):
    label = "{} ({}): {}".format(borough, post, neigh)
    popup = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=popup,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Tmap)
    
Tmap

# Part III

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:

- to add enough Markdown cells to explain what you decided to do and to report any observations you make.
- to generate maps to visualize your neighborhoods and how they cluster together.


### Counting amount of boroughs and neighborhoods in Toronto

In [18]:
print('The dataframe contains {} boroughs and {} neighborhoods.'.format(
        len(df3['Borough'].unique()),
        df3.shape[0]
    )
)

The dataframe contains 11 boroughs and 103 neighborhoods.


#### Ploting Toronto Map to visualize the neighbourhoods

In [19]:
Tmap2 = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(df3['Latitude'], df3['Longitude'], df3['Borough'], df3['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(Tmap2)  
    
Tmap2

#### Utilizing the Foursquare API to explore the neighborhoods and segment it

In [20]:
LIMIT = 100

CLIENT_ID = 'IP4BV5KG2VGZXNWTDR4JAGV0H1CWPXOTOFIPCS5YFVJNE5TO' # your Foursquare ID
CLIENT_SECRET = 'R4D1BJYGYFLSH453ZDQ04VHLTK2LLBKUXXWE11TSDKDMFSOX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IP4BV5KG2VGZXNWTDR4JAGV0H1CWPXOTOFIPCS5YFVJNE5TO
CLIENT_SECRET:R4D1BJYGYFLSH453ZDQ04VHLTK2LLBKUXXWE11TSDKDMFSOX


### Exploring Toronto Neighborhoods

#### Automating data gathering from Foursquare API 

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Getting Venues from neighbourhoods

In [22]:
Tven = getNearbyVenues(names = df3['Neighbourhood'],
                                   latitudes=df3['Latitude'],
                                   longitudes=df3['Longitude'],
                                  )

Rouge,Malvern
Highland Creek,Rouge Hill,Port Union
Guildwood,Morningside,West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park,Ionview,Kennedy Park
Clairlea,Golden Mile,Oakridge
Cliffcrest,Cliffside,Scarborough Village West
Birch Cliff,Cliffside West
Dorset Park,Scarborough Town Centre,Wexford Heights
Maryvale,Wexford
Agincourt
Clarks Corners,Sullivan,Tam O'Shanter
Agincourt North,L'Amoreaux East,Milliken,Steeles East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview,Henry Farm,Oriole
Bayview Village
Silver Hills,York Mills
Newtonbrook,Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park,Don Mills South
Bathurst Manor,Downsview North,Wilson Heights
Northwood Park,York University
CFB Toronto,Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens,Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West,Riverdale
The Beaches West,Indi

#### Checking new dataset

In [23]:
Tven.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge,Malvern",43.811525,-79.195517,Canadian Appliance Source Whitby,43.808353,-79.191331,Home Service
1,"Highland Creek,Rouge Hill,Port Union",43.78573,-79.15875,Chris Effects Painting,43.784343,-79.163742,Construction & Landscaping
2,"Highland Creek,Rouge Hill,Port Union",43.78573,-79.15875,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Guildwood,Morningside,West Hill",43.76569,-79.175256,The Strawberry Patch,43.764738,-79.173081,Tea Room
4,"Guildwood,Morningside,West Hill",43.76569,-79.175256,Homestead Roofing Repair,43.76514,-79.178663,Construction & Landscaping


In [24]:
Tven.shape

(2480, 7)

### Analysing venues by Neighborhood

#### Grouping dataframe by Neighborhood

In [25]:
Tven.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Agincourt,15,15,15,15,15,15
"Agincourt North,L'Amoreaux East,Milliken,Steeles East",1,1,1,1,1,1
"Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown",14,14,14,14,14,14
"Alderwood,Long Branch",4,4,4,4,4,4
"Bathurst Manor,Downsview North,Wilson Heights",1,1,1,1,1,1
Bayview Village,4,4,4,4,4,4
"Bedford Park,Lawrence Manor East",22,22,22,22,22,22
Berczy Park,63,63,63,63,63,63
"Birch Cliff,Cliffside West",6,6,6,6,6,6


#### Chenking how many unique venue categorires are represented on the dataframe

In [26]:
print('There are a total of {} uniques venues categories.'.format(len(Tven['Venue Category'].unique())))

There are a total of 260 uniques venues categories.


In [30]:
Tven1 = pd.get_dummies(Tven[['Venue Category']], prefix="", prefix_sep="")
Tven1['Neighborhood'] = Tven['Neighborhood'] 
fixed_columns = [Tven1.columns[-1]] + list(Tven1.columns[:-1])
Tven1 = Tven1[fixed_columns]

Tven1.head()

Unnamed: 0,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [32]:
Tven1.shape

(2480, 260)

#### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [33]:
TvenGr = Tven1.groupby('Neighborhood').mean().reset_index()
TvenGr.head()

Unnamed: 0,Neighborhood,Yoga Studio,Adult Boutique,Afghan Restaurant,Airport,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store
0,"Adelaide,King,Richmond",0.000000,0.000000,0.000000,0.0,0.030000,0.000000,0.010000,0.00,0.000000,...,0.000000,0.00000,0.000000,0.010000,0.000000,0.000000,0.000000,0.010000,0.000000,0.000000
1,Agincourt,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.066667,0.000000,0.000000,0.000000
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.00000,0.000000,0.000000,0.000000,0.071429,0.000000,0.000000,0.000000,0.000000
4,"Alderwood,Long Branch",0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
5,"Bathurst Manor,Downsview North,Wilson Heights",0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
6,Bayview Village,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.25000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
7,"Bedford Park,Lawrence Manor East",0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8,Berczy Park,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.015873,0.00,0.000000,...,0.000000,0.00000,0.000000,0.015873,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
9,"Birch Cliff,Cliffside West",0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00,0.000000,...,0.000000,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


In [34]:
TvenGr.shape

(101, 260)

#### Print neighborhood with it's top 5 venues

In [35]:
top5 = 5

for hood in TvenGr['Neighborhood']:
    print("----"+hood+"----")
    temp = TvenGr[TvenGr['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(top5))
    print('\n')

----Adelaide,King,Richmond----
           venue  freq
0    Coffee Shop  0.08
1          Hotel  0.07
2           Café  0.06
3  Deli / Bodega  0.03
4   Burger Joint  0.03


----Agincourt----
                 venue  freq
0   Chinese Restaurant  0.13
1        Shopping Mall  0.13
2          Supermarket  0.13
3  Shanghai Restaurant  0.07
4     Sushi Restaurant  0.07


----Agincourt North,L'Amoreaux East,Milliken,Steeles East----
                      venue  freq
0                  Pharmacy   1.0
1            Pilates Studio   0.0
2            Mattress Store   0.0
3  Mediterranean Restaurant   0.0
4               Men's Store   0.0


----Albion Gardens,Beaumond Heights,Humbergate,Jamestown,Mount Olive,Silverstone,South Steeles,Thistletown----
                 venue  freq
0        Grocery Store  0.14
1          Pizza Place  0.07
2  Fried Chicken Joint  0.07
3           Beer Store  0.07
4          Coffee Shop  0.07


----Alderwood,Long Branch----
            venue  freq
0             Pub  0.25
1 

              venue  freq
0        Playground  0.25
1    Discount Store  0.25
2       Coffee Shop  0.25
3  Department Store  0.25
4       Yoga Studio  0.00


----East Toronto----
                venue  freq
0                 Bar  0.25
1      Farmers Market  0.25
2                Park  0.25
3  Italian Restaurant  0.25
4       Moving Target  0.00


----Emery,Humberlea----
               venue  freq
0        Coffee Shop  0.50
1          Nightclub  0.25
2               Park  0.25
3        Yoga Studio  0.00
4  Mobile Phone Shop  0.00


----Fairview,Henry Farm,Oriole----
                  venue  freq
0        Clothing Store  0.15
1  Fast Food Restaurant  0.11
2           Coffee Shop  0.06
3            Kids Store  0.04
4              Tea Room  0.04


----First Canadian Place,Underground city----
         venue  freq
0  Coffee Shop  0.12
1         Café  0.07
2        Hotel  0.07
3          Bar  0.03
4       Bakery  0.03


----Flemingdon Park,Don Mills South----
           venue  freq
0  Grocer

                       venue  freq
0                Coffee Shop  0.11
1             Clothing Store  0.06
2             Cosmetics Shop  0.04
3                       Café  0.04
4  Middle Eastern Restaurant  0.03


----Scarborough Village----
               venue  freq
0  Indian Restaurant  0.25
1      Train Station  0.25
2      Grocery Store  0.25
3         Restaurant  0.25
4      Metro Station  0.00


----Silver Hills,York Mills----
                      venue  freq
0               Music Venue   1.0
1               Yoga Studio   0.0
2                    Museum   0.0
3            Mattress Store   0.0
4  Mediterranean Restaurant   0.0


----St. James Town----
         venue  freq
0         Café  0.06
1   Restaurant  0.05
2        Hotel  0.05
3  Coffee Shop  0.05
4       Bakery  0.04


----Stn A PO Boxes 25 The Esplanade----
         venue  freq
0  Coffee Shop  0.08
1   Steakhouse  0.04
2        Hotel  0.04
3         Café  0.04
4          Bar  0.04


----Studio District----
               

#### Putting into a dataframe

In [48]:
def return_most_common_venues(row, top10):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:top10]


top10 = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(top10):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

TvenSr = pd.DataFrame(columns=columns)
TvenSr['Neighborhood'] = TvenGr['Neighborhood']

for ind in np.arange(TvenGr.shape[0]):
    TvenSr.iloc[ind, 1:] = return_most_common_venues(TvenGr.iloc[ind, :], top10)

TvenSr.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Hotel,Café,Japanese Restaurant,Restaurant,Burger Joint,Steakhouse,Bar,Bakery,Deli / Bodega
1,Agincourt,Supermarket,Chinese Restaurant,Shopping Mall,Bakery,Grocery Store,Shanghai Restaurant,Sushi Restaurant,Badminton Court,Pool,Bubble Tea Shop
2,"Agincourt North,L'Amoreaux East,Milliken,Steel...",Pharmacy,Food,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space
3,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",Grocery Store,Park,Liquor Store,Sandwich Place,Beer Store,Fried Chicken Joint,Fast Food Restaurant,Coffee Shop,Pizza Place,Japanese Restaurant
4,"Alderwood,Long Branch",Gym,Sandwich Place,Pub,Dance Studio,Doctor's Office,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Women's Store


### Cluster Neighborhoods - K-Means

In [49]:
k = 5
TvenGrCluster = TvenGr.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=k, random_state=0).fit(TvenGrCluster)
kmeans.labels_[0:10]

array([0, 0, 2, 0, 0, 4, 3, 0, 0, 3], dtype=int32)

### Creating a dataframe with the top 10 venues of each neighborhood

In [50]:
TvenSr.insert(0, 'Cluster Labels', kmeans.labels_)
Tmerge = df3
Tmerge = Tmerge.join(TvenSr.set_index('Neighborhood'), on='Neighbourhood')
Tmerge.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge,Malvern",43.811525,-79.195517,1.0,Home Service,Women's Store,Farm,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.78573,-79.15875,0.0,Construction & Landscaping,Bar,Donut Shop,Flower Shop,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.76569,-79.175256,3.0,Construction & Landscaping,Park,Tea Room,Gym / Fitness Center,Event Space,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
3,M1G,Scarborough,Woburn,43.768359,-79.21759,3.0,Business Service,Park,Korean Restaurant,Coffee Shop,Women's Store,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant
4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944,0.0,Playground,Lounge,Women's Store,Falafel Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Farmers Market


### Plotting Clusters

In [77]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i+x+(i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

Tmerge1 = Tmerge.dropna()

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Tmerge1['Latitude'], Tmerge1['Longitude'], Tmerge1['Neighbourhood'], Tmerge1['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(Tmerge['Cluster Labels']), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)],
        fill=True,
        fill_color=rainbow[int(cluster)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster 1

In [78]:
Tmerge.loc[Tmerge['Cluster Labels'] == 0, Tmerge.columns[[1] + list(range(5, Tmerge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Scarborough,0.0,Construction & Landscaping,Bar,Donut Shop,Flower Shop,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm
4,Scarborough,0.0,Playground,Lounge,Women's Store,Falafel Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Farmers Market
5,Scarborough,0.0,Indian Restaurant,Train Station,Grocery Store,Restaurant,Women's Store,Ethiopian Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
6,Scarborough,0.0,Discount Store,Playground,Department Store,Coffee Shop,Event Space,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Falafel Restaurant
7,Scarborough,0.0,Bakery,Intersection,Coffee Shop,Bus Line,Bus Station,Soccer Field,Metro Station,Fast Food Restaurant,Farmers Market,Eastern European Restaurant
8,Scarborough,0.0,Fast Food Restaurant,Discount Store,Coffee Shop,Liquor Store,Sandwich Place,Furniture / Home Store,Burger Joint,Wings Joint,Pizza Place,Pharmacy
10,Scarborough,0.0,Brewery,Gift Shop,Bakery,Farm,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market
11,Scarborough,0.0,Convenience Store,Auto Garage,Intersection,Falafel Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Farm,Donut Shop
12,Scarborough,0.0,Supermarket,Chinese Restaurant,Shopping Mall,Bakery,Grocery Store,Shanghai Restaurant,Sushi Restaurant,Badminton Court,Pool,Bubble Tea Shop
13,Scarborough,0.0,Pharmacy,Pizza Place,Bus Stop,Thai Restaurant,Hobby Shop,Chinese Restaurant,Shopping Mall,Fried Chicken Joint,Coffee Shop,Dumpling Restaurant


### Cluster 2

In [79]:
Tmerge.loc[Tmerge['Cluster Labels'] == 1, Tmerge.columns[[1] + list(range(5, Tmerge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,1.0,Home Service,Women's Store,Farm,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market


### Cluster 3

In [80]:
Tmerge.loc[Tmerge['Cluster Labels'] == 2, Tmerge.columns[[1] + list(range(5, Tmerge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Scarborough,2.0,Pharmacy,Food,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space


### Cluster 4

In [81]:
Tmerge.loc[Tmerge['Cluster Labels'] == 3, Tmerge.columns[[1] + list(range(5, Tmerge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Scarborough,3.0,Construction & Landscaping,Park,Tea Room,Gym / Fitness Center,Event Space,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
3,Scarborough,3.0,Business Service,Park,Korean Restaurant,Coffee Shop,Women's Store,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant
9,Scarborough,3.0,Gym,Park,College Stadium,General Entertainment,Skating Rink,Gym Pool,Ethiopian Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
19,North York,3.0,Construction & Landscaping,Park,Dog Run,Trail,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Fish Market,Farm
23,North York,3.0,Bank,Convenience Store,Park,Speakeasy,Falafel Restaurant,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space
25,North York,3.0,Food & Drink Shop,Park,Fast Food Restaurant,Dog Run,Fish Market,Fish & Chips Shop,Field,Farmers Market,Farm,Falafel Restaurant
26,North York,3.0,Park,Burger Joint,Intersection,Coffee Shop,Women's Store,Event Space,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
30,North York,3.0,Coffee Shop,Park,Airport,Other Repair Shop,Food Court,Women's Store,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant
31,North York,3.0,Park,Hotel,Mobile Phone Shop,Gym / Fitness Center,Moving Target,Women's Store,Ethiopian Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
32,North York,3.0,Moving Target,Park,Business Service,Construction & Landscaping,Costume Shop,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant


### Cluster 5

In [83]:
Tmerge.loc[Tmerge['Cluster Labels'] == 4, Tmerge.columns[[1] + list(range(5, Tmerge.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,North York,4.0,Men's Store,Women's Store,Food,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
