# Applied Data Science Capstone

## Week 3

### This notebook illustrates Segmenting and Clustering neighborhoods in Toronto, CA

####  Import Statements

In [12]:
%%capture
!pip install geocoder
!pip install folium

In [13]:
import pandas as pd
import geocoder
from geopy.geocoders import Nominatim
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
import folium 

#### The cell below loads the data from wikipedia using Beautifulsoup package to webscrape and store in a table. It converts the html to strings and stores into a dataframe

In [3]:
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))
df = pd.concat(df)
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Queen's Park,Not assigned
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Queen's Park


#### Here I am dropping Not Assigned Values

In [4]:
 
# Get names of indexes for which columns have Not assigned
newdf = df[df.Borough != 'Not assigned']
newdf.reset_index(drop=True, inplace=True)
newdf
    


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
5,M7A,Queen's Park,Not assigned
6,M9A,Queen's Park,Queen's Park
7,M1B,Scarborough,Rouge
8,M1B,Scarborough,Malvern
9,M3B,North York,Don Mills North


The Cell Below Groups Neighborhoods

#### Here I am grouping Neighborhoods by Borough

In [5]:

finaldf = newdf.groupby(['Postcode','Borough'])['Neighbourhood'].apply(','.join).reset_index()
finaldf


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


#### Here I am reading in postal codes , Latitude & Longitude coordinates from CSV into a dataframe and showing first 5 rows of data

In [6]:
path="http://cocl.us/Geospatial_data"
toronto=pd.read_csv(path)
toronto.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


#### Here I am renaming the column Postal Code to Postcode to align with dataframe I am merging and showing the first 5 rows of data

In [7]:
toronto.rename({'Postal Code':'Postcode'},axis=1,inplace=True)
toronto.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


#### Here I am merging the two dataframes I created. The initial one with Boroughs and Neighborhoods and the one where I added Latitude & Longitude

In [8]:
mergeddf=pd.merge(finaldf,toronto,how='left',on='Postcode')
mergeddf

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.264848


In [9]:
# address = <Borough>, Toronto, ON <PostalCode>

address = 'Toronto'

geolocator = Nominatim()

location = geolocator.geocode(address)

latitude = location.latitude

longitude = location.longitude

print('The geographical coordinates of Downtown Toronto are {}, {}.'.format(latitude, longitude))



The geographical coordinates of Downtown Toronto are 43.653963, -79.387207.


#### Map Creation That Shows Toronto with Clustering Neighborhoods on top 

In [16]:
tmap=folium.Map(location=[latitude,longitude],zoom_start=11)
for lat,lng,borough,neighborhood in zip(mergeddf['Latitude'],mergeddf['Longitude'],mergeddf['Borough'],mergeddf['Neighbourhood']):
    label='{}, {}'.format(neighborhood,borough)
    label=folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat,lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(tmap)
tmap

#### Observation: There are similaries in neighborhoods near Downtown Toronto