Importing pandas library

In [1]:
import pandas as pd

Installing lxml and importing lxml function to read Wikipedia Page.

In [4]:
pip install lxml

Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/ec/be/5ab8abdd8663c0386ec2dd595a5bc0e23330a0549b8a91e32f38c20845b6/lxml-4.4.1-cp36-cp36m-manylinux1_x86_64.whl (5.8MB)
[K     |████████████████████████████████| 5.8MB 23.0MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.4.1
Note: you may need to restart the kernel to use updated packages.


In [2]:
import lxml

Using pandas to retrieve tables from Wikipedia

In [3]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
tables = pd.read_html(url)

Converting table to pandas dataframe.

In [4]:
df = tables[0]

In [5]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


Dropping Boroughs with Not assigned value.

In [6]:
df = df[df.Borough != 'Not assigned']

In [7]:
df

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
...,...,...,...
282,M8Z,Etobicoke,Kingsway Park South West
283,M8Z,Etobicoke,Mimico NW
284,M8Z,Etobicoke,The Queensway West
285,M8Z,Etobicoke,Royal York South West


Grouping by Postcode and Borough, then aggregating Neighbourhood columns.

In [8]:
df_toronto = df.groupby(['Postcode','Borough'], sort = False).agg(lambda x: ', '.join(x))

In [9]:
df_toronto

Unnamed: 0_level_0,Unnamed: 1_level_0,Neighbourhood
Postcode,Borough,Unnamed: 2_level_1
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,"Harbourfront, Regent Park"
M6A,North York,"Lawrence Heights, Lawrence Manor"
M7A,Queen's Park,Not assigned
...,...,...
M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
M4Y,Downtown Toronto,Church and Wellesley
M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So..."


Resetting index values to assign Borough columns to Neighbourhood columns with Not assigned value.

In [10]:
df_toronto.reset_index(inplace = True) 

In [11]:
df_toronto

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Not assigned
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So..."


When asigning Borough values to 'Not assigned' Neighbourhood values, Postcode gets affected, so I seperate into two dataframes and concatenate later.

In [12]:
df_torontosep = df_toronto[['Borough','Neighbourhood']]
df_torontosep


Unnamed: 0,Borough,Neighbourhood
0,North York,Parkwoods
1,North York,Victoria Village
2,Downtown Toronto,"Harbourfront, Regent Park"
3,North York,"Lawrence Heights, Lawrence Manor"
4,Queen's Park,Not assigned
...,...,...
98,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,Downtown Toronto,Church and Wellesley
100,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So..."


In [13]:
df_torontosep.loc[df_torontosep['Neighbourhood'] == "Not assigned"] = df_torontosep['Borough']
df_torontosep

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,Borough,Neighbourhood
0,North York,Parkwoods
1,North York,Victoria Village
2,Downtown Toronto,"Harbourfront, Regent Park"
3,North York,"Lawrence Heights, Lawrence Manor"
4,Queen's Park,Queen's Park
...,...,...
98,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,Downtown Toronto,Church and Wellesley
100,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So..."


In [14]:
df_toronto.drop(['Borough','Neighbourhood'], axis=1, inplace = True)
df_toronto

Unnamed: 0,Postcode
0,M3A
1,M4A
2,M5A
3,M6A
4,M7A
...,...
98,M8X
99,M4Y
100,M7Y
101,M8Y


In [15]:
df_toronto = pd.concat ([df_toronto, df_torontosep], axis=1)
df_toronto

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So..."


Total number of rows and columns, respectively.

In [16]:
df_toronto.shape

(103, 3)

In [18]:
df_toronto_loc = pd.read_csv('http://cocl.us/Geospatial_data')
df_toronto_loc.head(5)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [20]:
df_toronto.rename(columns={'Postcode':'Postal Code'}, inplace=True)

In [21]:
df_toronto

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So..."


In [25]:
df_torontofull = pd.merge(df_toronto, df_toronto_loc, on='Postal Code')
df_torontofull


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.654260,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,43.662744,-79.321558
101,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So...",43.636258,-79.498509
