# Mapping Toronto Neighbourhoods
 This program scrapes the Toronto Wikipedia page on Toronto neighbourhoods, adds longitude and latitude, and then maps the neighbourhoods.

In [348]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np


## Get Data
 Use beautiful soup to scrape the data of Canadian postal codes,
 Find the wikitable in the soup and put it into postalTable

In [349]:
page = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(page.text,'html.parser')
#print(soup.prettify)
postalTable = soup.find('table', class_='wikitable')
#print(postalTable)
#print(postalTable.tbody.text)


## Cleaning Data & Creating DataFrame
 Currently, postalTable.text represents each row of the wikitable as five entries separated by linebreaks.  The first two entries
 are blank, while the remaining three correspond to the columns of the wikitable.  Moreover, the first row contains headers rather
 than column names.  This section will push this data into a list, and use a for loop w/ three indices to gather the data into
 a three-column list, which will then be converted to a dataframe.
 This function will simultaneously clean the data--skipping over postcodes with unassigned boroughs, grouping neighbourhoods that
 share a borough into a single neighbourhoods entry, and giving unassigned neighbourhoods their borough name.

In [350]:
list_postalTable = postalTable.text.split('\n')
length = int(len(array_postalTable)-2)
last = "NOPE"
i = -1 # cannot use enumerate in this particular case
df_postalTable = [] #actually a list at this point
for post, bor, neigh in zip(list_postalTable[7:length:5], list_postalTable[8:length:5], list_postalTable[9:length:5]):
    print(f'{post}, {bor}, {neigh}')
    if bor=='Not assigned':
        continue
    if post==last:
        if neigh == 'Not assigned':
            neigh = bor
        df_postalTable[i][2] = f'{df_postalTable[i][2]}, {neigh}'
        continue
    # else this is a new postal code, perhaps a new borough
    if neigh == 'Not assigned':
        neigh = bor
    df_postalTable.append([post, bor, neigh])
    i+=1
    last = post
df_postalTable = pd.DataFrame(df_postalTable)
df_postalTable.columns = ['Postcode','Borough','Neighbourhood']
df_postalTable

M1A, Not assigned, Not assigned
M2A, Not assigned, Not assigned
M3A, North York, Parkwoods
M4A, North York, Victoria Village
M5A, Downtown Toronto, Harbourfront
M5A, Downtown Toronto, Regent Park
M6A, North York, Lawrence Heights
M6A, North York, Lawrence Manor
M7A, Queen's Park, Not assigned
M8A, Not assigned, Not assigned
M9A, Etobicoke, Islington Avenue
M1B, Scarborough, Rouge
M1B, Scarborough, Malvern
M2B, Not assigned, Not assigned
M3B, North York, Don Mills North
M4B, East York, Woodbine Gardens
M4B, East York, Parkview Hill
M5B, Downtown Toronto, Ryerson
M5B, Downtown Toronto, Garden District
M6B, North York, Glencairn
M7B, Not assigned, Not assigned
M8B, Not assigned, Not assigned
M9B, Etobicoke, Cloverdale
M9B, Etobicoke, Islington
M9B, Etobicoke, Martin Grove
M9B, Etobicoke, Princess Gardens
M9B, Etobicoke, West Deane Park
M1C, Scarborough, Highland Creek
M1C, Scarborough, Rouge Hill
M1C, Scarborough, Port Union
M2C, Not assigned, Not assigned
M3C, North York, Flemingdon Park

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So..."
