# Segmenting and Clustering Neighborhoods in Toronto
Alex P. Blizzard
## Problem 2

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

1. Given that this package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

2. Use the Geocoder package or the csv file to create the following dataframe:

In [18]:
from bs4 import BeautifulSoup
import requests   # library to handle requests
import numpy as np
import pandas as pd

#Scrape website
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(source, 'html')
table = soup.find('table', class_='wikitable sortable')
#print(table.prettify())

#Scrape website into list
data = []
columns = []
table = soup.find(class_='wikitable')
for index, tr in enumerate(table.find_all('tr')):
    section = []
    for td in tr.find_all(['th','td']):
        section.append(td.text.rstrip())
    
    #First row of data is the header
    if (index == 0):
        columns = section
    else:
        data.append(section)

#convert list into Pandas DataFrame
canada_df = pd.DataFrame(data = data,columns = columns)

#Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned
canada_df = canada_df[canada_df['Borough'] != 'Not assigned']

#More than one neighborhood can exist in one postal code area
canada_df["Neighborhood"] = \
canada_df.groupby("Postal Code")["Neighborhood"].transform(lambda neigh: ', '.join(neigh))
canada_df = canada_df.drop_duplicates()

#If a cell has a borough but a Not assigned neighborhood, 
#then the neighborhood will be the same as the borough
canada_df['Neighborhood'].replace("Not assigned", canada_df["Borough"],inplace=True)
canada_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [27]:
#Get Cordinates into dataframe
def get_geocode(postal_code):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    return latitude,longitude
geo_df=pd.read_csv('http://cocl.us/Geospatial_data')
geo_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [30]:
canada_df.set_index('Postal Code')
geo_df.set_index('Postal Code')
canada_merge_df=pd.merge(canada_df, geo_df)