# Capstone Project Week 3 Part 1

For this project, we'll scrape a Wikipedia page containing a list of postal code in Canada starting with the letter M.

As a first step, we'll use BeautifulSoup to retrieve the related table.

In [6]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'html.parser')
table = soup.find('table', attrs={'class':'wikitable sortable'})

Then, we'll need to convert the table into a Pandas dataframe.

In [7]:
table_rows = table.find_all('tr')

emptyEntryStr = "Not assigned"

entries = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.strip() for tr in td if tr.text.strip()]
    if row:
        borough = row[1]
        if(borough != emptyEntryStr):
            entries.append(row) 
            neighborhood = row[2]
            if(neighborhood == emptyEntryStr):
                row[2] = borough

df = pd.DataFrame(entries, columns=["Postal Code", "Borough", "Neighborhood"])

Finally, we group Postal codes with the same Neighborhood, and print the first 5 rows of the dataframe

In [8]:
df = df.groupby(['Postal Code','Borough'])['Neighborhood'].apply(lambda x: ','.join(x.astype(str))).reset_index()
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


And its shape

In [9]:
df.shape

(103, 3)

# Capstone Project Week 3 Part 2

Geocoder doesn't seem to return any results, so I've used the csv instead.

In [10]:
spatial_df = pd.read_csv('https://cocl.us/Geospatial_data')

In [16]:
results_columns = ["Postal Code", "Borough", "Neighborhood", "Latitude", "Longitude"]
merged_df = df.merge(spatial_df, on='Postal Code', how='left')[results_columns]
merged_df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.264848


# Capstone Project Week 3 Part 3

First of all, let's show a map of Toronto with our dataframe superimposed. Let's import the required packages.

In [11]:
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



We then use geopy to get the coordinates of Toronto

In [14]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="toronto_search")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

And finally we generate a map using Folium.

In [20]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, borough, neighborhood in zip(merged_df['Latitude'], merged_df['Longitude'], merged_df['Borough'], merged_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.8,
        parse_html=False).add_to(map_toronto)  
    
map_toronto