### Geocoding Toronto Neighborhoods

First I install the needed packages. Beautifulsoup4 and lxml for webscrapping and geocoder for, well, geocoding.

In [1]:
import sys
!{sys.executable} -m pip install beautifulsoup4
!{sys.executable} -m pip install lxml
!{sys.executable} -m pip install geocoder



In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import geocoder

In [3]:
source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(source, "lxml")
html_table = soup.table

In [4]:
postal_codes = pd.read_html(str(html_table))[0]
postal_codes = postal_codes.loc[postal_codes.Borough != "Not assigned"]
postal_list = postal_codes.loc[:,"Postcode"].drop_duplicates().to_list()

I request the data from wikipedia and take the first table found in the page, check that it's indeed the Postcode data set, drop missing values and make a list with all the unique values of the postcodes.

In [5]:
list_of_df = list()
for postal_code in postal_list:
    df = postal_codes.loc[postal_codes.Postcode == postal_code]
    neighborhoods = ', '.join(df.loc[:,"Neighbourhood"].to_list())
    df_2 = df.loc[:,["Postcode","Borough"]].drop_duplicates()
    df_2["Neighborhoods"] = neighborhoods
    list_of_df.append(df_2)
postal_codes = pd.concat(list_of_df)

To retrieve the data in the desired manner, I loop through the postcode list. For each value, I subset the original dataframe to get the name of the Borouh and a list of nieghborhoods, and then convert this list of neoghborhoods to a comma separated string and then assign it to the Neighborhood column. After doing this to everty postode, I concat all the data into a single data frame.

In [6]:
postal_codes.sample(20)

Unnamed: 0,Postcode,Borough,Neighborhoods
127,M3M,North York,Downsview Central
139,M1N,Scarborough,"Birch Cliff, Cliffside West"
242,M8W,Etobicoke,"Alderwood, Long Branch"
125,M2M,North York,"Newtonbrook, Willowdale"
30,M3C,North York,"Flemingdon Park, Don Mills South"
165,M2R,North York,Willowdale West
34,M6C,York,Humewood-Cedarvale
149,M9N,York,Weston
239,M5W,Downtown Toronto,Stn A PO Boxes 25 The Esplanade
144,M5N,Central Toronto,Roselawn


In [7]:
list_of_longitudes = list()
list_of_latitudes = list()
for postal_code in postal_list:
    search = '{}, Toronto, Ontario'.format(postal_code)
    lat_lng_coords = None
    k_tries = 0
    
    while(lat_lng_coords is None and k_tries < 100 ):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
        k_tries = k_tries + 1
    list_of_latitudes.append(lat_lng_coords[0])
    list_of_longitudes.append(lat_lng_coords[1])

In [8]:
postal_codes["latitude"] = list_of_latitudes
postal_codes["longitude"] = list_of_longitudes



In [9]:
postal_codes.sample(10)

Unnamed: 0,Postcode,Borough,Neighborhoods,latitude,longitude
183,M5S,Downtown Toronto,"Harbord, University of Toronto",43.66311,-79.401801
98,M5K,Downtown Toronto,"Design Exchange, Toronto Dominion Centre",43.6471,-79.381531
109,M2L,North York,"Silver Hills, York Mills",43.757095,-79.38032
264,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,43.64869,-79.38544
245,M1X,Scarborough,Upper Rouge,43.834215,-79.216701
116,M6L,North York,"Downsview, North Park, Upwood Park",43.71381,-79.488301
215,M5V,Downtown Toronto,"CN Tower, Bathurst Quay, Island airport, Harbo...",43.640815,-79.399538
127,M3M,North York,Downsview Central,43.73369,-79.49674
52,M1G,Scarborough,Woburn,43.768369,-79.21759
112,M4L,East Toronto,"The Beaches West, India Bazaar",43.667965,-79.314667


In [10]:
postal_codes.shape

(103, 5)