<p style="font-size:36px"><b>Coursera Capstone Project</b><p>
<p style="font-size:20px">This notebook will contain the code for the capstone project of IBM data science course in Coursera.</p>

<p style="font-size:24px"><b>Week 3: Exercise 1</b><p>
<p>The next cells contain the creation of the initial dataframe with the different neighborhoods in Toronto.</p>

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

In [2]:
toronto_table = []

# Download html page from Wikipedia

html = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(html.text, 'html.parser')

#Find the table with the desired data and create a dataframe, removing the empty cells in the process

table = soup.find('table')
for row in table.findAll('td'):
    neighborhoodRow = []
    neighborhoodRow.append(row.find('b').text)
    borough = row.find('span').text.split('(')[0]
    if borough != 'Not assigned':
        neighborhood = row.find('span').text.split('(')[1]
        neighborhood = neighborhood.replace(' /',',').strip(')').replace(')',' ')
        neighborhoodRow.append(borough)
        neighborhoodRow.append(neighborhood)
        toronto_table.append(neighborhoodRow)

toronto_df = pd.DataFrame(toronto_table, columns = ['PostalCode','Borough','Neighborhood'])

#Cleaning the cells with worng formatting

toronto_df['Borough']=toronto_df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                                     'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                                     'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                                     'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

toronto_df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


<p style="font-size:24px"><b>Week 3: Exercise 2</b><p>
<p>The next cells contain the code to add the latitude and longitude of each neighborhood in Toronto using the package geocoder.</p>

In [None]:
!pip install geocoder

In [None]:
import geocoder 

toronto_df['Latitude'] = np.nan
toronto_df['Longitude'] = np.nan

for PC in toronto_df['PostalCode']:
    lat_lng_coords = None

    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(PC))
        lat_lng_coords = g.latlng

    toronto_df.loc[toronto_df['PostalCode']==PC,'Latitude'] = lat_lng_coords[0]
    toronto_df.loc[toronto_df['PostalCode']==PC,'Longitude'] = lat_lng_coords[1]

<p>Because the geocoder package didn't work, I will download the file provided by the course.</p>

In [3]:
!wget -O GeospatialCoordinates.csv https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv

--2021-06-19 10:30:11--  https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv
Resolving cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)... 198.23.119.245
Connecting to cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)|198.23.119.245|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2788 (2.7K) [text/csv]
Saving to: ‘GeospatialCoordinates.csv’


2021-06-19 10:30:11 (88.1 MB/s) - ‘GeospatialCoordinates.csv’ saved [2788/2788]



In [4]:
toronto_df['Latitude'] = np.nan
toronto_df['Longitude'] = np.nan

coord_df = pd.read_csv('GeospatialCoordinates.csv')

for PC in toronto_df['PostalCode']:
    toronto_df.loc[toronto_df['PostalCode']==PC,'Latitude'] = float(coord_df.loc[coord_df['Postal Code']==PC,'Latitude'])
    toronto_df.loc[toronto_df['PostalCode']==PC,'Longitude'] = float(coord_df.loc[coord_df['Postal Code']==PC,'Longitude'])

toronto_df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


<p style="font-size:24px"><b>Week 3: Exercise 3</b><p>
<p>The next cells contain the code to add the latitude and longitude of each neighborhood in Toronto using the package geocoder.</p>