# CAPSTONE PROJECT - IBM Data Science Professional Certificate
## Author: Eduardo Gaona P.

This Jupyter notebook will serve as the main platform to solve all the tasks from the capstone project of the IBM Data Science Professional Certificate

## Importing the libraries

In [52]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests  # this module helps us to download a web page

Setting the URL for the table and using "request" to get the http information of it

In [53]:
url_wiki = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

Using BeautifulSoup to get the "table" object

In [70]:
data  = requests.get(url_wiki).text
soup = BeautifulSoup(data,"html5lib")
tables = soup.find_all('table')[0] # in html table is represented by the tag <table>

Showing an example of an entry of the table

In [74]:
tables.find_all("td")[54]

<td style="vertical-align:top;">
<p>M1J<br/><span style="font-size:85%;"><a href="/wiki/Scarborough,_Toronto" title="Scarborough, Toronto">Scarborough</a><br/>(<a href="/wiki/Scarborough_Village" title="Scarborough Village">Scarborough Village</a>)</span>
</p>
</td>

In [76]:
tables.find_all("td")[54].text

'\nM1JScarborough(Scarborough Village)\n\n'

Going through all the entries, extracting the text and forming the DataFrame

In [77]:
postal_codes = pd.DataFrame(columns=["PostalCode", "Borough", "Neighborhood"])

for PostalCode in tables.find_all("td"):
  if 'Not assigned' not in PostalCode.text:
    Code = PostalCode.text.replace('\n','')[0:3]
    Borough = PostalCode.text.replace('\n','')[3:PostalCode.text.replace('\n','').find('(')]
    Neighborhood = PostalCode.text.replace('\n','')[PostalCode.text.replace('\n','').find('(')+1:-1].replace(' / ',',')
    postal_codes = postal_codes.append({"PostalCode":Code, "Borough":Borough, "Neighborhood":Neighborhood}, ignore_index=True)

postal_codes.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park,Harbourfront"
3,M6A,North York,"Lawrence Manor,Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


In [79]:
postal_codes[postal_codes['PostalCode'] == 'M5G']

Unnamed: 0,PostalCode,Borough,Neighborhood
24,M5G,Downtown Toronto,Central Bay Street


In [78]:
postal_codes.shape

(103, 3)

In [81]:
pip install geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[?25l[K     |███▎                            | 10 kB 19.4 MB/s eta 0:00:01[K     |██████▋                         | 20 kB 25.6 MB/s eta 0:00:01[K     |██████████                      | 30 kB 25.1 MB/s eta 0:00:01[K     |█████████████▎                  | 40 kB 13.0 MB/s eta 0:00:01[K     |████████████████▋               | 51 kB 5.0 MB/s eta 0:00:01[K     |████████████████████            | 61 kB 4.8 MB/s eta 0:00:01[K     |███████████████████████▎        | 71 kB 5.2 MB/s eta 0:00:01[K     |██████████████████████████▋     | 81 kB 5.4 MB/s eta 0:00:01[K     |██████████████████████████████  | 92 kB 5.4 MB/s eta 0:00:01[K     |████████████████████████████████| 98 kB 3.1 MB/s 
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [None]:
###### didnt work ######
import geocoder # import geocoder

# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Toronto, Ontario'.format('M5G'))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

In [86]:
Path_csv_loc = '/content/drive/MyDrive/Coursera/Geospatial_Coordinates.csv'
df_loc = pd.read_csv(Path_csv_loc)

In [89]:
df_loc.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [96]:
postal_codes_loc = postal_codes.set_index('PostalCode').join(df_loc.set_index('Postal Code')).reset_index()
postal_codes_loc.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park,Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor,Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


In [97]:
postal_codes_loc[postal_codes_loc['PostalCode'] == 'M5G']

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
