## First the data will be imported from wiki

In [1]:
import pandas as pd
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df_list = pd.read_html(url)

In [2]:
len(df_list)

3

In [3]:
df_list

[    Postal Code           Borough  \
 0           M1A      Not assigned   
 1           M2A      Not assigned   
 2           M3A        North York   
 3           M4A        North York   
 4           M5A  Downtown Toronto   
 ..          ...               ...   
 175         M5Z      Not assigned   
 176         M6Z      Not assigned   
 177         M7Z      Not assigned   
 178         M8Z         Etobicoke   
 179         M9Z      Not assigned   
 
                                          Neighbourhood  
 0                                         Not assigned  
 1                                         Not assigned  
 2                                            Parkwoods  
 3                                     Victoria Village  
 4                            Regent Park, Harbourfront  
 ..                                                 ...  
 175                                       Not assigned  
 176                                       Not assigned  
 177                

The list above contains 3 rows with the 3 tables, for this assignment we only want the first table. Therefore the first row will be saved under a new name.

In [4]:
Cdf_list = df_list[0]
Cdf_list

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


The list will then be converted in to a dataframe.

In [5]:
df = pd.DataFrame(Cdf_list)
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [6]:
df.Neighbourhood.value_counts()

Not assigned                               77
Downsview                                   4
Don Mills                                   2
Glencairn                                   1
Weston                                      1
                                           ..
Woodbine Heights                            1
Islington Avenue, Humber Valley Village     1
Hillcrest Village                           1
Wexford, Maryvale                           1
Runnymede, The Junction North               1
Name: Neighbourhood, Length: 100, dtype: int64

In [7]:
df[(df['Neighbourhood']== 'Not assigned') & (df['Borough'] !='Not assigned')]

Unnamed: 0,Postal Code,Borough,Neighbourhood


There are no rows in the data where the neighbourhood is not assigned but the borough is assigned. When looking at the original data, there is also no cases of the post code repeating more than once. It seems as though the orignal data from the wiki page has been changed since this assignment was posted.

In [8]:
cdf = df.drop(df[df.Borough == "Not assigned"].index)
cdf

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [10]:
cdf.shape

(103, 3)

In [11]:
ldf = pd.read_csv(r'/home/reika/Downloads/Geospatial_Coordinates.csv')
ldf

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


## Data + Location Data

In [17]:
locationdf = pd.merge(cdf,
                 ldf[['Postal Code', 'Latitude', 'Longitude']],
                 on='Postal Code')
locationdf.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
