<html><center>
    <h1>Segmenting and Clustering Neighborhoods in Toronto</h1>
    <br>
    by Célestin PHAM
    <br>
</center></html>

In [57]:
# import dependencies

import pandas as pd
import numpy as np

## Part 1 - Retrieve Neighborhoods from Wikipedia

In [58]:
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text, 'html.parser')


1. I will go through al "TD" tags to extract postal codes information.
2. Discarding the "Not assigned" in borough (designed by a i tag)
3. And parsing the text to extract the borough, and the neighborhoods which is between brackets.

In [66]:
zip_codes = []

# Look for all "td" tags in the wiki webpage
for tag in soup.find_all('td'):
    
    # Discard all "Not assigned" Borough, and not Postal Codes related "td" tags
    if not(tag.find("p")) or str(tag.i) == '<i>Not assigned</i>':
        continue
    
    to_parse = tag.p.span.text.replace(")", "").split("(")
    
    # Consider what is between brackets as the neighborhoods
    if len(to_parse)== 1:
        to_parse = to_parse[0].split("/")

    # Add to the zip codes list 
    zip_codes.append([
        tag.p.b.text.strip(),
        to_parse[0].strip(),
        to_parse[1].replace(" /", ",").strip()])
    
zip_codes = pd.DataFrame(data=zip_codes, columns=["PostalCode","Borough","Neighborhood"])
zip_codes.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


In [67]:
zip_codes.shape

(103, 3)

# Part 2 - Get neighborhood latitude / longitude from Geocoder

### Geocoder could not find the latitude / longitude, so using the data directly from the csv file

In [107]:
!wget -q -O geospatial_data.csv "http://cocl.us/Geospatial_data"
geo_data = pd.read_csv("geospatial_data.csv")
geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [108]:
geo_data.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)
df_zip_codes = pd.merge(zip_codes, geo_data, on="PostalCode", how='left')
df_zip_codes.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


## Part 3 - Explore and cluster the neighborhoods in Toronto

Use geocoder to retrieve Toronto Longitude / Latitude

In [115]:
# !conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

In [112]:
location = Nominatim(user_agent= "toronto_explorer").geocode('Toronto, Canada')
toronto = (location.latitude, location.longitude)
print('The geograpical coordinate of the City of Toronto are {}, {}.'.format(toronto[0], toronto[1]))

The geograpical coordinate of the City of Toronto are 43.653963, -79.387207.


### Install Folium to display map

In [114]:
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    branca-0.4.0               |             py_0          26 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         673 KB

The following NEW packages will be INSTALLED:

    altair:  4.0.1-py_0 conda-forge
    branca:  0.4.0-py_0 conda-forge
    folium:  0.5.0-py_0 conda-forge
    vincent: 0.4.4-py_1 conda-forge


Downloading and Extracting Packages
branca-0.4.0         | 26 KB     | #####

In [None]:
# create map of New York using latitude and longitude values
map = folium.Map(location=[toronto[0], toronto[1]], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_zip_codes['Latitude'], df_zip_codes['Longitude'], df_zip_codes['Borough'], df_zip_codes['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)  
    
map