# Toronto Clustering

The Wikipedia table is almost in the format we would like it to be in. Nonetheless, I treated it as if it was not and formatted it anyway, in order to demonstrate my solution for how it could be done.

First we have to import pandas and download the table from the Wikipedia entry.

In [119]:
import pandas as pd

url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
df = pd.read_html(url)[0]

We do not want any entries where "Borough" is "Not assigned".

In [120]:
df = df[df["Borough"]!="Not assigned"]

If "Neighbourhood" is "Not assigned", we want to replace it with the value in "Borough".

In [121]:
df["Neighbourhood"] = df.apply(lambda x: x["Borough"] 
                               if x["Neighbourhood"] == "Not assigned" 
                               else x["Neighbourhood"], axis=1)

We want to concatenate the "Neighbourhood" cells for which the "Postal Code"s are the same into a single row, where their strings get separated by comas. Finally, we show the first eleven rows of our dataframe.

In [122]:
replace = dict((key,"") for key in df[df.duplicated(subset="Postal Code",keep=False)]["Postal Code"].unique())

for index, row in df.iterrows():
    if row["Postal Code"] in replace.keys():
        if replace[row["Postal Code"]] == "":
            replace[row["Postal Code"]] += row["Neighbourhood"]
        else:
            replace[row["Postal Code"]] += (", "+row["Neighbourhood"])

df["Neighbourhood"] = df.apply(lambda x: replace[x["Postal Code"]]
                               if x["Postal Code"] in replace.keys() 
                               else x["Neighbourhood"], axis=1)

df.drop_duplicates(subset="Postal Code",keep="first",inplace=True)
df = df.reset_index(drop=True)
df.head(11)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


The last step is to print out the shape of our table.

In [123]:
df.shape

(103, 3)

Here I used the code snippet shown in the exercise, but changed the provider in order to get longitude and the latitude.

In [114]:
import geocoder # import geocoder

latitude = []
longitude = []

for i in range(0,df.shape[0]):
    # initialize your variable to None
    lat_lng_coords = None

    postal_code = df.iloc[i]["Postal Code"]

    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.geocodefarm('Ontario {}, Toronto'.format(postal_code))
        lat_lng_coords = g.latlng

    latitude.append(lat_lng_coords[0])
    longitude.append(lat_lng_coords[1])


Status code Unknown from https://www.geocode.farm/v3/json/forward/: ERROR - HTTPSConnectionPool(host='www.geocode.farm', port=443): Read timed out. (read timeout=5.0)
Status code Unknown from https://www.geocode.farm/v3/json/forward/: ERROR - HTTPSConnectionPool(host='www.geocode.farm', port=443): Read timed out. (read timeout=5.0)
Status code Unknown from https://www.geocode.farm/v3/json/forward/: ERROR - HTTPSConnectionPool(host='www.geocode.farm', port=443): Read timed out. (read timeout=5.0)
Status code Unknown from https://www.geocode.farm/v3/json/forward/: ERROR - HTTPSConnectionPool(host='www.geocode.farm', port=443): Read timed out. (read timeout=5.0)
Status code Unknown from https://www.geocode.farm/v3/json/forward/: ERROR - HTTPSConnectionPool(host='www.geocode.farm', port=443): Read timed out. (read timeout=5.0)
Status code Unknown from https://www.geocode.farm/v3/json/forward/: ERROR - HTTPSConnectionPool(host='www.geocode.farm', port=443): Read timed out. (read timeout=5.0

We then have to join the new information with the previous dataframe.

In [124]:
df_lat_lng = df.join(pd.DataFrame({"Latitude" : latitude, "Longitude" : longitude}))
df_lat_lng

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.756123,-79.329636
1,M4A,North York,Victoria Village,43.726780,-79.310738
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.655354,-79.365044
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.721996,-79.445915
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.663910,-79.388733
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.652699,-79.511276
99,M4Y,Downtown Toronto,Church and Wellesley,43.666286,-79.382446
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.663506,-79.317429
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.633709,-79.496521
