# Analyzing Toronto Neighborhoods 

## Segmenting and Clustering Neighborhoods in Toronto

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

# Part III Final

Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.

Just make sure:

* to add enough Markdown cells to explain what you decided to do and to report any observations you make.
* to generate maps to visualize your neighborhoods and how they cluster together. 

## Loading Part I and Part II source code from previous notebooks

In [1]:
import json
import pandas as pd
from bs4 import BeautifulSoup
import requests

# Part I
#Loading the dataframe from Wiki page
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = requests.get(url)

#Parsing the HTLM and getting the table
soup = BeautifulSoup(page.text, 'html.parser')
table = soup.find('table');

#Parsing the table assigning it to a DataFrame and setting column names
table_rows = table.find_all('tr')
l = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    l.append(row)
df = pd.DataFrame(l, columns = ['PostalCode', 'Borough', 'Neighborhood'])

#Data Cleanup
df['Neighborhood'] = df['Neighborhood'].str.rstrip('\n')
df.drop(0, inplace = True)
df.sort_values(by = 'PostalCode', inplace = True)

#Remove the 'Not assigned' values from Borough column
df.drop(df[df['Borough'] == 'Not assigned'].index, inplace = True)

#Merging rows with duplicate PostalCode and Borough, retain Neighborhood values and separate by comma
df_boroughs = df[['PostalCode', 'Borough']].copy()
df_boroughs.drop_duplicates(keep = 'first', inplace = True)
df_boroughs.reset_index();
df_grouped = df.groupby('PostalCode')['Neighborhood'].apply(', '.join).reset_index()
df_new = pd.merge(df_boroughs, df_grouped, on = 'PostalCode')

#If Neighborhood is 'Not assigned' set it to value of Borough and print postal code M7A for Queen's Park to validate
df_new.loc[df_new.loc[df_new['Neighborhood'] == 'Not assigned'].index,'Neighborhood'] = \
    df_new.loc[df_new['Neighborhood'] == 'Not assigned']['Borough'] 

#Part II
#Getting Latitude and Longitude coordinates from each neighborhood by getting the Postal Code
df_postal_code = df_new[['PostalCode']].copy()

#Getting the Geospatial data
!wget -q 'https://cocl.us/Geospatial_data'

#Loading and renaming Postal Code column
df_coords = pd.read_csv('Geospatial_data')
df_coords.rename(columns = {'Postal Code': 'PostalCode'}, inplace = True)

#Merging data sorted alphanumerically by PostalCode
df_full = pd.merge(df_new, df_coords, on = 'PostalCode')

## Part III Final  

We'll be exploring Little Portugal neighborhood and looking for Italian Restaurants

Importing additional libs

In [2]:
try: 
    import folium
except:
    !pip install folium
    import folium
    
from geopy.geocoders import Nominatim
import json
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
import numpy as np

Getting Toronto's coordinates

In [3]:
address = 'Toronto, Ontario, Canada'
geolocator = Nominatim(user_agent = 'toronto_explorer')
location = geolocator.geocode(address)
toronto_lat = location.latitude
toronto_lon = location.longitude
print('Geograpical coordnates of Toronto are Latitude {} and Longitude {}.'.format(toronto_lat, toronto_lon))

Geograpical coordnates of Toronto are Latitude 43.653963 and Longitude -79.387207.


## Exploring our dataframe  
Getting coordinates of Little Portugal and its Borough

In [4]:
littlepor = df_full[df_full['Neighborhood'].str.contains('Little Portugal')].copy()
littlepor.reset_index(drop = True, inplace = True)
lat = float(littlepor['Latitude'])
lon = float(littlepor['Longitude'])
borough = littlepor.loc[0, 'Borough']
print('Geograpical coordenades of Little Portugal neighborhood are Latitude %f and Longitude %f.' % (lat, lon))
print('Little Portugal is in the %s borough.' % borough)

Geograpical coordenades of Little Portugal neighborhood are Latitude 43.647927 and Longitude -79.419750.
Little Portugal is in the West Toronto borough.


In [5]:
map_toronto = folium.Map(location = [toronto_lat, toronto_lon], zoom_start = 13)
label = 'Little Portugal, %s' % borough
label = folium.Popup(str(label), parse_html = True)
folium.CircleMarker([lat, lon], popup = label, radius = 20, color = 'green', fill = True, fill_color = 'white', fill_opacity = 0.1, parse_html = False).add_to(map_toronto)  

map_toronto

Exploring the neighborhood using Foursquare

In [6]:
# The code was removed by Watson Studio for sharing.

Getting venues

In [7]:
def ExploreNearbyVenues(names, lats, lons, radius = 100, LIMIT = 50):
    
    venues_list = []
    
    for name in names:

        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lon, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lon, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

    return(nearby_venues)

Getting all the venues within 100 mt of the center of Little Portugal

In [8]:
littlepor_venues = ExploreNearbyVenues(['Little Portugal'], lats = lat, lons = lon, radius = 100, LIMIT = 50)

Using the .shape method to print the number of rows of dataframe

In [9]:
print(littlepor_venues.shape)
littlepor_venues.head(10)

(8, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Little Portugal,43.647927,-79.41975,Bellwoods Brewery,43.647097,-79.419955,Brewery
1,Little Portugal,43.647927,-79.41975,Foxley Bistro,43.648643,-79.420495,Asian Restaurant
2,Little Portugal,43.647927,-79.41975,Reposado,43.647321,-79.420032,Bar
3,Little Portugal,43.647927,-79.41975,YogaSpace,43.647607,-79.420133,Yoga Studio
4,Little Portugal,43.647927,-79.41975,Rotate This,43.648544,-79.420518,Record Shop
5,Little Portugal,43.647927,-79.41975,Superpoint,43.648439,-79.420514,Pizza Place
6,Little Portugal,43.647927,-79.41975,i deal coffee,43.647844,-79.420294,Coffee Shop
7,Little Portugal,43.647927,-79.41975,Bobbie Sue's Mac + Cheese,43.647783,-79.420419,Mac & Cheese Joint


We have 8 venues only  
Checking venue categories of Little Portugal

In [10]:
print('There are {} venue categories.'.format(len(littlepor_venues['Venue Category'].unique())))

There are 8 venue categories.


Checking how many Italian Venues are there

In [11]:
littlepor_venues[littlepor_venues['Venue Category'].str.contains('Italian')]

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category


None Italian restaurants  
Checking for more Italian Restaurants by increasing the radius from 100 to 150mt

In [12]:
radius = 150
LIMIT = 50

pt_restos = '4bf58dd8d48988d110941735'

def SearchNearbyVenues(neighborhoods, lats, lons, categoryId, radius, LIMIT):
    
    venues_list = []
    
    print('Getting results from:')
    for neighborhood, lat, lon in zip(neighborhoods, lats, lons):

        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lon,
            categoryId,
            radius, 
            LIMIT)
            
        results = requests.get(url).json()['response']['venues']
        
        venues_list.append([(
            neighborhood, 
            lat, 
            lon, 
            v['name'], 
            v['location']['lat'], 
            v['location']['lng']) for v in results])
        
        print('%s ... ' % neighborhood, end = '')

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhoods', 
                  'Neighborhoods Lat', 
                  'Neighborhoods Lon', 
                  'Venue Name', 
                  'Venue Lat', 
                  'Venue Lon']
    
    print('DONE.')

    return(nearby_venues)

In [13]:
df_full.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Port Union, Rouge Hill, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"West Hill, Morningside, Guildwood",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [14]:
pt_venues = SearchNearbyVenues(neighborhoods = df_full['Neighborhood'], lats = df_full['Latitude'], \
                               lons = df_full['Longitude'], categoryId = pt_restos, radius = 200, LIMIT = 50)

Getting results from:
Malvern, Rouge ... Port Union, Rouge Hill, Highland Creek ... West Hill, Morningside, Guildwood ... Woburn ... Cedarbrae ... Scarborough Village ... East Birchmount Park, Ionview, Kennedy Park ... Oakridge, Golden Mile, Clairlea ... Cliffcrest, Cliffside, Scarborough Village West ... Cliffside West, Birch Cliff ... Scarborough Town Centre, Dorset Park, Wexford Heights ... Maryvale, Wexford ... Agincourt ... Sullivan, Clarks Corners, Tam O'Shanter ... Milliken, L'Amoreaux East, Agincourt North, Steeles East ... L'Amoreaux West, Steeles West ... Upper Rouge ... Hillcrest Village ... Oriole, Henry Farm, Fairview ... Bayview Village ... Silver Hills, York Mills ... Newtonbrook, Willowdale ... Willowdale South ... York Mills West ... Willowdale West ... Parkwoods ... Don Mills North ... Don Mills South, Flemingdon Park ... Wilson Heights, Bathurst Manor, Downsview North ... Northwood Park, York University ... CFB Toronto, Downsview East ... Downsview West ... Downsview

Using the .shape method to print the number of rows of dataframe

In [15]:
print(pt_venues.shape)
pt_venues.head(20)

(51, 6)


Unnamed: 0,Neighborhoods,Neighborhoods Lat,Neighborhoods Lon,Venue Name,Venue Lat,Venue Lon
0,Cedarbrae,43.773136,-79.239476,terry's restaurant,43.774969,-79.240872
1,"Don Mills South, Flemingdon Park",43.7259,-79.340923,Sorento Restaurant,43.726575,-79.341989
2,Studio District,43.659526,-79.340923,Frankie’s Italian,43.660411,-79.343097
3,Studio District,43.659526,-79.340923,Lil' Baci,43.660512,-79.343151
4,Davisville,43.704324,-79.38879,Florentia Ristorante,43.703594,-79.387985
5,Davisville,43.704324,-79.38879,Positano,43.704558,-79.388639
6,Davisville,43.704324,-79.38879,Tavolino,43.704115,-79.388434
7,"St. James Town, Cabbagetown",43.667967,-79.367675,F'Amelia,43.667536,-79.368613
8,"St. James Town, Cabbagetown",43.667967,-79.367675,La locale,43.665912,-79.368514
9,Church and Wellesley,43.66586,-79.38316,Olympic 76 Pizza,43.666707,-79.384719


Cleaning up duplicated entries and checking the final count of Italian Restaurants in Little Portugal

In [16]:
pt_venues.drop_duplicates('Venue Lat', keep = 'first', inplace = True)
pt_venues.drop(['Neighborhoods Lat', 'Neighborhoods Lon'], axis = 1, inplace = True)
print('There are %i Portuguese restaurants in Toronto.' % int(pt_venues.shape[0]))
pt_venues.tail()



There are 44 Portuguese restaurants in Toronto.


Unnamed: 0,Neighborhoods,Venue Name,Venue Lat,Venue Lon
46,Glencairn,Pasta Goodness,43.709457,-79.443863
47,"Little Portugal, Trinity",Superpoint,43.648439,-79.420514
48,"Little Portugal, Trinity",Pizzeria Libretto,43.648979,-79.420604
49,Queen's Park,Mercatto,43.660391,-79.387664
50,"Bloordale Gardens, Eringate, Markland Wood, Ol...",Cafe Sympatico,43.64182,-79.576721


Plotting

In [17]:
for venue, lat, lon in zip(pt_venues['Venue Name'], pt_venues['Venue Lat'], pt_venues['Venue Lon']):
    label = folium.Popup(venue, parse_html = True)
    folium.CircleMarker([lat, lon], popup = label, radius = 5, color = 'green', fill = True, fill_color = 'red', fill_opacity = 0.3, parse_html = False).add_to(map_toronto)  

map_toronto

Clustering with kmeans

In [18]:
kclusters = 5

pt_coords = pt_venues.drop(['Venue Name', 'Neighborhoods'], axis = 1)

kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(pt_coords)

# check cluster labels generated for each row in the dataframe
#kmeans.labels_[0:10] 

pt_venues.insert(0, 'Cluster Label', kmeans.labels_)

Creating a new map with color scheme for clusters visualization

In [19]:
map_clusters = folium.Map(location = [toronto_lat, toronto_lon], zoom_start = 11)

x = np.arange(kclusters)
ys = [i + x + (i * x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, neighborhood, cluster in zip(pt_venues['Venue Lat'], pt_venues['Venue Lon'], pt_venues['Neighborhoods'], \
                                    pt_venues['Cluster Label']):
    label = folium.Popup('Cluster ' + str(cluster) + ' - ' + str(neighborhood), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius = 7,
        popup = label,
        color = rainbow[cluster - 1],
        fill = True,
        fill_color = rainbow[cluster - 1],
        fill_opacity = 0.4).add_to(map_clusters)

In [20]:
map_clusters

The red cluster is in the Little Portugal area and other yellow cluster, also more three outlier clusters with just a few cluster members