<h1>Exploring the Neighborhoods in Toronto </h1>

<h3>Utilizing the Foursquare API and data analysis methods to explore and cluster the neighborhoods in North York, Toronto.</h3>

<h2>Part 1:</h2>

<h4>First, we import the necessary libraries:

In [45]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import csv

 <h4>Then we scrape the Wikipedia page for the required table data, and we save it in a .CSV file:<h4>

In [46]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
tree = BeautifulSoup(source, 'lxml')

csv_file = open("table_data.csv","w",newline='')
csv_writer = csv.writer(csv_file)

table_tag = tree.select("table")[0]
tab_data = [[item.text.split('\n')[0] for item in row_data.select("th,td")]
                for row_data in table_tag.select("tr")]

for data in tab_data:
    csv_writer.writerow(data)
    print(' '.join(data))

Postcode Borough Neighbourhood
M1A Not assigned Not assigned
M2A Not assigned Not assigned
M3A North York Parkwoods
M4A North York Victoria Village
M5A Downtown Toronto Harbourfront
M5A Downtown Toronto Regent Park
M6A North York Lawrence Heights
M6A North York Lawrence Manor
M7A Queen's Park Not assigned
M8A Not assigned Not assigned
M9A Etobicoke Islington Avenue
M1B Scarborough Rouge
M1B Scarborough Malvern
M2B Not assigned Not assigned
M3B North York Don Mills North
M4B East York Woodbine Gardens
M4B East York Parkview Hill
M5B Downtown Toronto Ryerson
M5B Downtown Toronto Garden District
M6B North York Glencairn
M7B Not assigned Not assigned
M8B Not assigned Not assigned
M9B Etobicoke Cloverdale
M9B Etobicoke Islington
M9B Etobicoke Martin Grove
M9B Etobicoke Princess Gardens
M9B Etobicoke West Deane Park
M1C Scarborough Highland Creek
M1C Scarborough Rouge Hill
M1C Scarborough Port Union
M2C Not assigned Not assigned
M3C North York Flemingdon Park
M3C North York Don Mills South
M

<h4>We read our .CSV file into a dataframe:</h4>

In [47]:
df = pd.read_csv('table_data.csv')
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


<h4>Remove all table rows without assigned boroughs:</h4>

In [48]:
df = df.drop(df.index[df.Borough == 'Not assigned'])
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


<h4>Replace 'Not assigned' in Neighbourhood column with corresponding value from Borough column:</h4>

In [49]:
df.loc[df.Neighbourhood == 'Not assigned', 'Neighbourhood'] = df['Borough']  

df

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Queen's Park
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


<h4>Group data by Postcode and join all neighbourhoods that exist in the same postal code area:</h4>

In [50]:
df = df.groupby('Postcode').agg(lambda x: ','.join(set(x))).reset_index()
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Rouge Hill,Port Union,Highland Creek"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


<h4>Finally, we print the final number of rows and columns:</h4>

In [51]:
df.shape

(103, 3)