# 1. Importing data from a webpage
the following lines imports all the tables and counting them

In [21]:
import pandas as pd
df_CP = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
print('there are', len(df_CP), ' tables in the wikipedia web page')

there are 3  tables in the wikipedia web page


After checking on the webpage, the postale code table is the first one.
The following lines put in the dataframe df_CP only the first table and display it

In [22]:
df_CP = df_CP[0]
df_CP.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,"Malvern, Rouge"


### Cleaning data
let's see how are organized the columns 'Borough' and 'Neighborhood'

In [6]:
Not_assigned_Neighborhood = len(df_CP.loc[df_CP['Neighborhood'].isin(['Not assigned'])])
NaN_Neighborhood = df_CP["Neighborhood"].isna().sum()
Not_assigned_Borough = len(df_CP.loc[df_CP['Borough'].isin(['Not assigned'])])
NaN_Borough = df_CP["Borough"].isna().sum()

print('There are',Not_assigned_Neighborhood, ' non assigned Neighborhood and', Not_assigned_Borough, 'non assigned Boroughs')
print('There are',NaN_Neighborhood, ' data missing in the column Neighborhood and', NaN_Borough, 'data missing in the column Boroughs')

There are 0  non assigned Neighborhood and 77 non assigned Boroughs
There are 77  data missing in the column Neighborhood and 0 data missing in the column Boroughs


From the precedent results we can make the hypothesis that all the rows where there is a 'Non assigned' Boroughs has a missing data in the column 'Neighborhood'. Let's drop the rows which contain the missing data: 

In [23]:
df_CP = df_CP.dropna()
df_CP.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


Let's now check if there are some 'Not assigned' values left after processing

In [8]:
Not_assigned_Neighborhood = len(df_CP.loc[df_CP['Neighborhood'].isin(['Not assigned'])])
NaN_Neighborhood = df_CP["Neighborhood"].isna().sum()
Not_assigned_Borough = len(df_CP.loc[df_CP['Borough'].isin(['Not assigned'])])
NaN_Borough = df_CP["Borough"].isna().sum()

print('There are',Not_assigned_Neighborhood, ' non assigned Neighborhood and', Not_assigned_Borough, 'non assigned Boroughs after processing')
print('There are',NaN_Neighborhood, ' data missing in the column Neighborhood and', NaN_Borough, 'data missing in the column Boroughs after processing')

There are 0  non assigned Neighborhood and 0 non assigned Boroughs after processing
There are 0  data missing in the column Neighborhood and 0 data missing in the column Boroughs after processing


Now the data frame doesn't contain any missing data and 'not assigned' borrows. Let's check the number of rows in the new data frame:

In [9]:
size = df_CP.shape
print('There are', size[0], 'rows in the data frame now')

There are 103 rows in the data frame now


# 2. Getting the coordinates of each postal code

In [10]:
!conda install -c conda-forge geocoder --yes 

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    geocoder-1.38.1            |             py_1          53 KB  conda-forge
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geocoder:        1.38.1-py_1       conda-forge
    python_abi:    

I wanted to try the geocoder package but it return 'Request Denied'

In [24]:

import geocoder # import geocoder

# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Toronto, Ontario'.format('M5G'))
  lat_lng_coords = g.latlng
  print(g)

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

So I broke the 'while' loop and decided to work with the given csv file

In [13]:
!wget -q -O 'location_data.csv' http://cocl.us/Geospatial_data #Download the data and name it 'location_data.csv'
df_coord = pd.read_csv('location_data.csv')                    #read the CSV file
DF_Located = pd.merge(df_CP, df_coord, on='Postal Code')       #merging the 'location' dataframe with the previous dataframe by matching the column 'postal code'

Let's check the data

In [15]:
DF_Located.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
