# IBM Applied Data Science Capstone

## Segmenting & Clustering Toronto Neighborhoods: Assignment Part I

Build a dataframe of name and postal code of each Toronto neighborhood, along with the relative borough name.

### Install/import Python packages/libraries

In [1]:
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup

### Scrape the Wikipedia page and transform the data into a dataframe

In [2]:
wikidata=requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup=BeautifulSoup(wikidata,'html.parser')
postcode=[]; borough=[]; neighbor=[]
for row in soup.find('table').find_all('tr'):
    cells=row.find_all('td')
    if len(cells)>0:
        postcode.append(cells[0].text); borough.append(cells[1].text); neighbor.append(cells[2].text.rstrip('\n'))
dfW=pd.DataFrame({'PostalCode': postcode,'Borough': borough,'Neighborhood': neighbor})
dfW.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


### Ignore cells with "Not assigned" borough

In [3]:
dfT=dfW[dfW.Borough!='Not assigned'].reset_index(drop=True)
dfT.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


### Combine neighborhoods in the same borough

In [4]:
dfTG=dfT.groupby(['PostalCode','Borough'],as_index=False).agg(lambda x: ', '.join(x))
dfTG.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### Make "Not assigned" neighborhoods the same as their borough

In [5]:
dfTG['Neighborhood'].replace('Not assigned',dfTG['Borough'],inplace=True)
dfTG.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### Cleaned dataframe size

In [6]:
dfTG.to_csv('Toronto_dataframe.csv')
dfTG.shape

(103, 3)