# Data Science Capstone Project - Week 3 Assignment

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download Dependencies</a>

2. <a href="#item2">Scrape and Parse Table</a>

3. <a href="#item3">Dataframe after Screenscrape (First Submission)</a>

4. <a href="#item4">Gather Latitude and Longitude (Second Submission)</a>

5. <a href="#item5">Explore Toronto Neighborhoods (Third Submission)</a>

</font>
</div>

<a id='item1'></a>

## Download Dependencies

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis


In [2]:
# Install Beautiful Soup
#from bs4 import BeautifulSoup

## Beautiful Soup not used - opted to use pandas read instead

In [3]:
## Beautiful Soup not used - opted to use pandas read instead
#Get Toronto Data: Postal Codes starting with 'M'
#import urllib2 #enable ability to read URLs
#from urllib.request import urlopen #enable ability to read URLs

#postal_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

#page = urlopen(postal_url).read()
#soup = BeautifulSoup(page, 'html.parser')

#postal_frame = []
#postal_table =  soup.find('table', {'class': 'wikitable sortable'})

#rows = postal_table.findAll('tr')
#for row in rows:
#    cols=row.findAll('td')
#    cols=[ele.text.strip() for ele in cols]
#    postal_frame.append([ele for ele in cols if ele])

##postal_frame.shape
##postal_table

<a id='item2'></a>

## Scrape and Parse Table using pandas

In [4]:
#Get Toronto Data: Postal Codes starting with 'M' using pandas.read
#import urllib2 #enable ability to read URLs

import requests
postal_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

page = requests.get(postal_url)

postal_frame = pd.read_html(page.content,attrs = {'class': 'wikitable sortable'})[0]


postal_frame.shape


(288, 3)

In [5]:
postal_frame.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [8]:
# Process dataframe
#Remove Borough 'Not assigned' rows
postal_frame_clean = postal_frame.drop(postal_frame[postal_frame.Borough == 'Not assigned'].index)

#Change Neighborhood to match Borough if Neighborhood = 'Not assigned'
postal_frame_clean['Neighbourhood'][postal_frame_clean['Neighbourhood'] == 'Not assigned'] = postal_frame_clean['Borough']

#Combine multiple neighborhoods from the same Burough into one row
postal_frame_clean = postal_frame_clean.groupby(['Postcode','Borough'])['Neighbourhood'].apply(','.join).reset_index()

#Rename Postcode Field to PostalCode
postal_frame_clean.rename(columns={'Postcode': 'PostalCode'}, inplace=True)

#postal_frame_clean.head()

<a id='item3'></a>

## Answer to Question 3 (First Submission)

In [9]:

postal_frame_clean.shape


(103, 3)

<a id='item4'></a>

## Gather Latitude and Longitude for each Postal code in our dataframe (Submission 2)

In [None]:
#Geocoder didn't work so using the file that was provided in class

#import geocoder # import geocoder
#postal_code = 'M5G'
## initialize your variable to None
#lat_lng_coords = None

## loop until you get the coordinates
#while(lat_lng_coords is None):
#  g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
#  lat_lng_coords = g.latlng

#latitude = lat_lng_coords[0]
#longitude = lat_lng_coords[1]

In [11]:
#Read File provided with Coordinates for each postal code

coords_file = r'Geospatial_Coordinates.csv'
coords = pd.read_csv(coords_file)

coords.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)

In [12]:
coords.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [13]:
coords.shape

(103, 3)

In [15]:
#add lat/longitude to the postal_frame_clean frame

toronto_df = pd.merge(postal_frame_clean, coords, on = 'PostalCode' )
toronto_df.head()


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [16]:
toronto_df

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie...",43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",43.739416,-79.588437


<a id='item5'></a>

# Analysis of Toronto Neighborhoods (Submission 3)

### Import modules needed for data exploration

In [21]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 

import folium # map rendering library



In [24]:
#Toronto Latitude/Longitude
latitude = 43.653908
longitude = -79.384293

# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [26]:
zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighbourhood'])

<zip at 0x1cc8ba69e88>

## Explore DataNeighbourhood

## Segment Data

## Cluster Data