# Segmenting and Clustering Neighborhoods in Toronto

# Instructions
In this assignment, you will be required to explore, segment, and cluster the neighborhoods in the city of Toronto. However, unlike New York, the neighborhood data is not readily available on the internet. What is interesting about the field of data science is that each project can be challenging in its unique way, so you need to learn to be agile and refine the skill to learn new libraries and tools quickly depending on the project.

For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. You will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset.

Once the data is in a structured format, you can replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.

Your submission will be a link to your Jupyter Notebook on your Github repository.

# Part 1

In [1]:
import numpy as np
import pandas as pd

URL read with pandas and encoding 'Not assigned' values as NaN

In [2]:
url ='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(url, na_values= 'Not assigned')[0]

Filtering out the rows with NaN values on the column 'Borough':

In [3]:
df = df[df['Borough'].notna()]

Restarting the dataframe index:

In [4]:
df.reset_index(inplace=True, drop=True)

Grouping the data using the postal code:

In [5]:
df_grouped = df.groupby(by='Postal Code').first()
df_grouped.reset_index(inplace=True)
df_grouped.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Replacing the 'Not assigned' value on Neighbourhood by the respective 'Borough':

In [6]:
df_grouped['Neighbourhood'].replace('Not assigned', df_grouped['Borough'])
df_grouped.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [7]:
df_grouped.shape

(103, 3)

# Part 2

Import the geocoder lib:

In [13]:
import geocoder

Definition of the geodata search function:

In [26]:
def geodataSearch(postal_code):
    from numpy import nan
    lat_lng_coords = None
    num_calls = 0
    LIMIT = 10
    while(lat_lng_coords is None and num_calls < LIMIT):
        g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
        num_calls += 1
    
    if(num_calls >= LIMIT):
        print('Google is not responding')
        return [nan, nan]

    return lat_lng_coords

In [27]:
latitude, longitude = geodataSearch('M5G')

Google is not responding


Since geocoder is no longer responding to calls, I'll work with the csv data indicated in the course website:

In [28]:
geodata = pd.read_csv('Geospatial_Coordinates.csv')
geodata.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Creating a unified dataframe from informations in df_grouped and geodata

In [34]:
df_geoToronto = pd.concat([df_grouped,geodata[['Latitude','Longitude']]], axis=1)
df_geoToronto.head(15)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848
