### Exploring the Neighborhoods In Toronto with Python

In [1]:
from bs4 import BeautifulSoup
import requests

Scrape contents from url using Beautifulsoup and find the table that contains the neighborhood data

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
data = requests.get(url)
soup = BeautifulSoup(data.text, 'html.parser')
table = soup.find('table',{'class':'wikitable sortable'} )


Get all rows in the table. Put the cell strings in the 1st row into an array for column names.

In [3]:
rows = table.findChildren(['tr'])
row1cells = rows[0].findChildren(['th'])
colnames = []

for cell in row1cells:
    colnames.append(cell.string.rstrip())


Loop through all rows to extract the three cell values. If the string in the 2nd cell of each row is 'Not assigned', discard the row. 
If the cell string is empty, look for a string in the 'a' tag. 

In [4]:
allRows = []
for row in rows[1:]:
    rowVals = []
    discardRow = False
    for ind, cell in enumerate(row.findChildren(['td'])):
        #print(cell.string)
        if cell.string and cell.string.startswith('Not assigned'):
            if ind == 1:
                discardRow = True
                break
            elif ind == 2:
                rowVals.append(rowVals[1])
        elif not cell.string:
            rowVals.append(cell.a.string.rstrip())
        else:
            rowVals.append(cell.string.rstrip())
    if not discardRow: 
        allRows.append(rowVals)


Convert allRows to DataFrame

In [5]:
import pandas as pd
df = pd.DataFrame.from_records(allRows, columns=colnames)


Group df by Postcode and Borough and concatenate the Neighbourhoods of each group then reset index.

In [6]:
tor_df = df.groupby(['Postcode','Borough'])['Neighbourhood'].apply(lambda x: "%s" %', '.join(x)).to_frame()
tor_df.reset_index(inplace=True)
tor_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [7]:
tor_df.shape

(103, 3)

Import Geospatial_Coordinates.csv

In [8]:
geocoords = pd.read_csv('Geospatial_Coordinates.csv')
geocoords.columns = ['Postcode','Latitude', 'Longitude']
geocoords.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge geocoords and tor_df

In [9]:
tor_df = tor_df.merge(geocoords, how='left',on='Postcode')
tor_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [10]:
tor_df['Borough'].nunique()

11

## Explore the Neighborhoods of Toronto Using Foursquare

#### A map of Toronto including all Boroughs and Neighborhoods will be created. Then I will slice out the boroughs that contain 'Toronto' and create a map that includes only these sliced out Toronto Boroughs. I will only explore neighborhoods in the 4 Toronro Boroughs, Center Toronto, Downtown To

Import geocoder, KMeans, Folium libraries

In [11]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
# import k-means from clustering stage

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.0.2p             |       h470a237_1         3.1 MB  conda-forge
    certifi-2018.10.15         |        py36_1000         138 KB  conda-forge
    geopy-1.17.0               |             py_0          49 KB  conda-forge
    ca-certificates-2018.10.15 |       ha4d7672_0         135 KB  conda-forge
    conda-4.5.11               |        py36_1000         651 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.1 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0            conda-forge
    geopy:           

Get coordinates of Toronto

In [12]:
address = 'Toronto, Ontario, Canada'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of Toronto are 43.653963, -79.387207.


Plot all neighborhoods on Toronto map

In [13]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(tor_df['Latitude'], tor_df['Longitude'], tor_df['Borough'], tor_df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

First look at Boroughs with Toronto in name

In [19]:
toronto_bor = tor_df[tor_df['Borough'].str.contains('Toronto')].reset_index(drop=True)
toronto_bor.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [20]:
toronto_bor.groupby('Borough').count()

Unnamed: 0_level_0,Postcode,Neighbourhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Central Toronto,9,9,9,9
Downtown Toronto,18,18,18,18
East Toronto,5,5,5,5
West Toronto,6,6,6,6


Continue using the  geographical coordinates of Toronto

In [16]:
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [25]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

Plot different colors for the 4 boroughs in the toronto_bor

In [35]:
# create map of Manhattan using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

boroughs = {'Central Toronto':0,'Downtown Toronto':1,'East Toronto':2,'West Toronto':3}

# set color scheme for the 4 boroughs
x = np.arange(4)
ys = [i+x+(i*x)**2 for i in range(4)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to map
for lat, lng, label, bor in zip(toronto_bor['Latitude'], toronto_bor['Longitude'], toronto_bor['Neighbourhood'], toronto_bor['Borough']):
    bor_id = boroughs[bor]
    popup_label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=popup_label,
        color=rainbow[bor_id],
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Use Foursquare to explore Toronto neighborhoods

Define Foursquare Credentials

In [36]:
CLIENT_ID = 'WS4Z4AYZGKHD3SK1PBQFEV2DDDTOEXKOTV0TO2WXREAOM5EW' # your Foursquare ID
CLIENT_SECRET = 'G4AZHVDLQYA02BN1AEUHPQY2AT35CTRZ1CEWWJ0HJZY3X4WK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WS4Z4AYZGKHD3SK1PBQFEV2DDDTOEXKOTV0TO2WXREAOM5EW
CLIENT_SECRET:G4AZHVDLQYA02BN1AEUHPQY2AT35CTRZ1CEWWJ0HJZY3X4WK


#### Explore first neighborhood

In [37]:
toronto_bor.loc[0, 'Neighbourhood']

'The Beaches'

Get the neighborhood coordinates

In [39]:
neighborhood_latitude = toronto_bor.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_bor.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = toronto_bor.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of The Beaches are 43.67635739999999, -79.2930312.


Get the first 100 venues in The Beaches within a radius of 500 meters

In [40]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import json # library to handle JSON files

In [41]:
rds = 500
lmt = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION,neighborhood_latitude,neighborhood_longitude,rds,lmt)
url

'https://api.foursquare.com/v2/venues/explore?client_id=WS4Z4AYZGKHD3SK1PBQFEV2DDDTOEXKOTV0TO2WXREAOM5EW&client_secret=G4AZHVDLQYA02BN1AEUHPQY2AT35CTRZ1CEWWJ0HJZY3X4WK&v=20180605&ll=43.67635739999999,-79.2930312&radius=500&limit=100'

Send get request and examin results

In [42]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5bd3ead4dd5797073f1b58a0'},
  'headerLocation': 'The Beaches',
  'headerFullLocation': 'The Beaches, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 3,
  'suggestedBounds': {'ne': {'lat': 43.680857404499996,
    'lng': -79.28682091449052},
   'sw': {'lat': 43.67185739549999, 'lng': -79.29924148550948}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e77e3861f6ecf8d3648300c',
       'name': 'Starbucks',
       'location': {'address': '637 Kingston Rd.',
        'crossStreet': 'at Main St.',
        'lat': 43.67879837444001,
        'lng': -79.2980449760153,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.67879837444001,
          'lng': -79.2980449760153}],
        'distance': 486,
     

In [43]:
# re-use function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Convert the results json into a dataframe

In [44]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Starbucks,Coffee Shop,43.678798,-79.298045
1,Grover Pub and Grub,Pub,43.679181,-79.297215
2,Upper Beaches,Neighborhood,43.680563,-79.292869


Number of venues returned by Foursquare

In [45]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

3 venues were returned by Foursquare.


#### Explore other neighborhoods in Toronto

Create a function to repeat the same process to all neighborhoods in Toronto

In [46]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [47]:
LIMIT = 100
toronto_venues = getNearbyVenues(names=toronto_bor['Neighbourhood'], latitudes=toronto_bor['Latitude'],longitudes= toronto_bor['Longitude'])


The Beaches
The Danforth West, Riverdale
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Brockton, Exhibition Place, Parkdale Village
High Park, The 

Check the size of toronto_venues

In [49]:
print(toronto_venues.shape)
toronto_venues.head()

(1705, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
1,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
2,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
3,"The Danforth West, Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant
4,"The Danforth West, Riverdale",43.679557,-79.352188,Dolce Gelato,43.677773,-79.351187,Ice Cream Shop


Count number of venues in each neighbourhood

In [50]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Berczy Park,53,53,53,53,53,53
"Brockton, Exhibition Place, Parkdale Village",21,21,21,21,21,21
Business reply mail Processing Centre969 Eastern,17,17,17,17,17,17
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",14,14,14,14,14,14
"Cabbagetown, St. James Town",48,48,48,48,48,48
Central Bay Street,82,82,82,82,82,82
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100
Christie,16,16,16,16,16,16
Church and Wellesley,88,88,88,88,88,88


Count number of unique categories in all the returned venues

In [51]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 234 uniques categories.


### Analyse Each Neighborhood

Encode Venue Category with pd.get_dummies()

In [53]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood, neighborhood Latitude and neighborhood Longitude columns back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 
toronto_onehot['Latitude'] = toronto_venues['Neighborhood Latitude']
toronto_onehot['Longitude'] = toronto_venues['Neighborhood Longitude']

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-3]] + [toronto_onehot.columns[-2]] + [toronto_onehot.columns[-1]] +list(toronto_onehot.columns[:-3])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Latitude,Longitude,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,...,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store
0,0,43.676357,-79.293031,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,43.676357,-79.293031,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,43.676357,-79.293031,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,43.679557,-79.352188,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,43.679557,-79.352188,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [56]:
toronto_onehot.shape

(1705, 236)

Next, let's group rows by neighborhood, latitude, longitude and by taking the mean of the frequency of occurrence of each category

In [58]:
toronto_grouped = toronto_onehot.groupby(['Neighborhood','Latitude','Longitude']).mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Latitude,Longitude,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,...,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store
0,"Adelaide, King, Richmond",43.650571,-79.384568,0.0,0.01,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01
1,Berczy Park,43.644771,-79.373306,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton, Exhibition Place, Parkdale Village",43.636847,-79.428191,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business reply mail Processing Centre969 Eastern,43.662744,-79.321558,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",43.628947,-79.39442,0.0,0.0,0.0,0.0,0.071429,0.071429,0.071429,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown, St. James Town",43.667967,-79.367675,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,43.657952,-79.387383,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0,0.012195,0.0
7,"Chinatown, Grange Park, Kensington Market",43.653206,-79.400049,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.01,0.0,0.0,0.06,0.0,0.04,0.01,0.0
8,Christie,43.669542,-79.422564,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,43.66586,-79.38316,0.011364,0.0,0.011364,0.011364,0.0,0.0,0.0,...,0.011364,0.0,0.0,0.0,0.0,0.011364,0.011364,0.011364,0.0,0.0


In [59]:
toronto_grouped.shape

(38, 236)

Print each neighborhood along with the top 5 most common venues

In [60]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
             venue   freq
0         Latitude  43.65
1      Coffee Shop   0.07
2             Café   0.06
3  Thai Restaurant   0.04
4       Steakhouse   0.04


----Berczy Park----
                venue   freq
0            Latitude  43.64
1         Coffee Shop   0.09
2        Cocktail Bar   0.06
3      Farmers Market   0.04
4  Seafood Restaurant   0.04


----Brockton, Exhibition Place, Parkdale Village----
               venue   freq
0           Latitude  43.64
1        Coffee Shop   0.14
2     Breakfast Spot   0.10
3               Café   0.10
4  Convenience Store   0.05


----Business reply mail Processing Centre969 Eastern----
              venue   freq
0          Latitude  43.66
1  Recording Studio   0.06
2        Skate Park   0.06
3               Spa   0.06
4           Brewery   0.06


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue   freq
0          Latitude  43.63


### Put Venue Data Into a Dataframe

Create a function to sort venues in descending order

In [69]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create the new dataframe and display the top 10 venues of each neighborhood

In [70]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Latitude,Coffee Shop,Café,Thai Restaurant,American Restaurant,Steakhouse,Hotel,Cosmetics Shop,Restaurant,Gym
1,Berczy Park,Latitude,Coffee Shop,Cocktail Bar,Cheese Shop,Beer Bar,Steakhouse,Seafood Restaurant,Restaurant,Bakery,Café
2,"Brockton, Exhibition Place, Parkdale Village",Latitude,Coffee Shop,Breakfast Spot,Café,Convenience Store,Performing Arts Venue,Italian Restaurant,Gym,Furniture / Home Store,Falafel Restaurant
3,Business reply mail Processing Centre969 Eastern,Latitude,Butcher,Skate Park,Light Rail Station,Spa,Farmers Market,Fast Food Restaurant,Brewery,Restaurant,Recording Studio
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Latitude,Airport Lounge,Airport Service,Airport Terminal,Harbor / Marina,Airport,Airport Food Court,Airport Gate,Sculpture Garden,Boutique
5,"Cabbagetown, St. James Town",Latitude,Coffee Shop,Restaurant,Bakery,Indian Restaurant,Italian Restaurant,Chinese Restaurant,Pizza Place,Café,Pub
6,Central Bay Street,Latitude,Coffee Shop,Café,Italian Restaurant,Bar,Bubble Tea Shop,Burger Joint,Sandwich Place,Japanese Restaurant,Spa
7,"Chinatown, Grange Park, Kensington Market",Latitude,Café,Vegetarian / Vegan Restaurant,Bar,Chinese Restaurant,Bakery,Vietnamese Restaurant,Mexican Restaurant,Coffee Shop,Dumpling Restaurant
8,Christie,Latitude,Café,Grocery Store,Park,Coffee Shop,Convenience Store,Athletics & Sports,Italian Restaurant,Baby Store,Diner
9,Church and Wellesley,Latitude,Japanese Restaurant,Coffee Shop,Gay Bar,Sushi Restaurant,Burger Joint,Restaurant,Café,Men's Store,Mediterranean Restaurant


In [71]:
neighborhoods_venues_sorted.shape

(38, 11)

### Cluster Neighborhoods

Run kmeans to cluster the neighborhoods into 4 clusters

In [73]:
# set number of clusters
kclusters = 4

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [74]:
toronto_merged = toronto_grouped

# add clustering labels
toronto_merged['Cluster Labels'] = kmeans.labels_

toronto_merged = toronto_merged.set_index('Neighborhood').join(neighborhoods_venues_sorted.set_index('Neighborhood'))

toronto_merged.head() # check the last columns!

Unnamed: 0_level_0,Latitude,Longitude,Yoga Studio,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Adelaide, King, Richmond",43.650571,-79.384568,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Coffee Shop,Café,Thai Restaurant,American Restaurant,Steakhouse,Hotel,Cosmetics Shop,Restaurant,Gym
Berczy Park,43.644771,-79.373306,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Coffee Shop,Cocktail Bar,Cheese Shop,Beer Bar,Steakhouse,Seafood Restaurant,Restaurant,Bakery,Café
"Brockton, Exhibition Place, Parkdale Village",43.636847,-79.428191,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Coffee Shop,Breakfast Spot,Café,Convenience Store,Performing Arts Venue,Italian Restaurant,Gym,Furniture / Home Store,Falafel Restaurant
Business reply mail Processing Centre969 Eastern,43.662744,-79.321558,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Butcher,Skate Park,Light Rail Station,Spa,Farmers Market,Fast Food Restaurant,Brewery,Restaurant,Recording Studio
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",43.628947,-79.39442,0.0,0.0,0.0,0.0,0.071429,0.071429,0.071429,0.142857,...,Latitude,Airport Lounge,Airport Service,Airport Terminal,Harbor / Marina,Airport,Airport Food Court,Airport Gate,Sculpture Garden,Boutique


In [75]:
toronto_merged.reset_index(inplace=True)

In [105]:
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))

rainbow = [colors.rgb2hex(i) for i in colors_array]
rainbow

['#8000ff', '#2adddd', '#d4dd80', '#ff0000']

Visualize the clusters

In [114]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
rainbow = ['#F01010','#3F1AF9','#CA1AF6','#31BC89']

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examin Clusters

Cluster 1

In [115]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Latitude,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,43.650571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,Latitude,Coffee Shop,Café,Thai Restaurant,American Restaurant,Steakhouse,Hotel,Cosmetics Shop,Restaurant,Gym
1,43.644771,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Coffee Shop,Cocktail Bar,Cheese Shop,Beer Bar,Steakhouse,Seafood Restaurant,Restaurant,Bakery,Café
2,43.636847,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Coffee Shop,Breakfast Spot,Café,Convenience Store,Performing Arts Venue,Italian Restaurant,Gym,Furniture / Home Store,Falafel Restaurant
3,43.662744,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Butcher,Skate Park,Light Rail Station,Spa,Farmers Market,Fast Food Restaurant,Brewery,Restaurant,Recording Studio
4,43.628947,0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,0.0,...,Latitude,Airport Lounge,Airport Service,Airport Terminal,Harbor / Marina,Airport,Airport Food Court,Airport Gate,Sculpture Garden,Boutique
5,43.667967,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Coffee Shop,Restaurant,Bakery,Indian Restaurant,Italian Restaurant,Chinese Restaurant,Pizza Place,Café,Pub
6,43.657952,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,...,Latitude,Coffee Shop,Café,Italian Restaurant,Bar,Bubble Tea Shop,Burger Joint,Sandwich Place,Japanese Restaurant,Spa
7,43.653206,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Café,Vegetarian / Vegan Restaurant,Bar,Chinese Restaurant,Bakery,Vietnamese Restaurant,Mexican Restaurant,Coffee Shop,Dumpling Restaurant
8,43.669542,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Café,Grocery Store,Park,Coffee Shop,Convenience Store,Athletics & Sports,Italian Restaurant,Baby Store,Diner
9,43.66586,0.011364,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,...,Latitude,Japanese Restaurant,Coffee Shop,Gay Bar,Sushi Restaurant,Burger Joint,Restaurant,Café,Men's Store,Mediterranean Restaurant


Cluster 2

In [116]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Latitude,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,43.696948,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Park,Trail,Jewelry Store,Sushi Restaurant,Department Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
24,43.689574,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Park,Playground,Restaurant,Tennis Court,Diner,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant
27,43.679563,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Park,Playground,Trail,Dance Studio,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


Cluster 3

In [117]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Latitude,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,43.72802,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Park,Dim Sum Restaurant,Bus Line,Swim School,Deli / Bodega,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


Cluster 4

In [118]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Latitude,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,43.711695,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,Latitude,Garden,Ice Cream Shop,Deli / Bodega,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
