<h1 align=center><font size = 5>Capstone Week 3 - Segmenting and Clustering Neighborhoods in Toronto</font></h1>

## Part 2 - Bring in the Location Data
explore and cluster the neighborhoods in Toronto.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Part 1 - Data Scraping of Neighborhood Names and Postal Codes</a>

2. <a href="#item2">Part 2 - Get Latitue and Longitude Coordinates of Neighborhoods</a>

</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [5]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# used for scraping
import requests
from bs4 import BeautifulSoup

!conda install -c conda-forge geocoder --yes # uncomment this line if you haven't completed the Foursquare API lab
import geocoder # import geocoder

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    geocoder-1.38.1            |             py_1          53 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          59 KB

The following NEW packages will be INSTALLED:

    geocoder: 1.38.1-py_1 conda-forge
    ratelim:  0.1.6-py_2  conda-forge


Downloading and Extracting Packages
ratelim-0.1.6        | 6 KB      | ##################################### | 100% 
geocoder-1.38.1      | 53 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Libraries imported.


<a id='item1'></a>

## 1. Download and Explore Dataset

In [82]:
# Get the json from wikipedia and read it with beautiful soup
res = requests.get(" https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))

# The wikipedia page is not really a table, it's more like a matrix.  Concatenate all the text into a single column
df=df[0]
frames = [df.iloc[:,0],df.iloc[:,1],df.iloc[:,2],df.iloc[:,3],df.iloc[:,4],df.iloc[:,5],df.iloc[:,6],df.iloc[:,7]]
df = pd.concat(frames)
df = df.to_frame()
df.reset_index(inplace = True, drop=True) # Fix the indices
df.columns = ['raw']

# get rid of rows for postal codes that aren't assigned
df = df[~df.raw.str.contains("Not assigned")]

# Ditch the "enclaves" too.  They're related to bulk mailing, and not interesting for the purposes of this analysis, which is centered on fun places to live.  
# Also they don't follow exactly the same format and mess up the cleaning process.
df = df[~df.raw.str.contains("Enclave")]
df.reset_index(inplace = True, drop=True)

# The dataframe has only a single column that has all the data kludged together.  Regex would probably be prettier, but let's pull the data out without it...
# Postal code is the first three letters
df['Postal Code'] = df['raw'].str[:3] 
df['raw'] = df['raw'].str[3:-1] # -1 removes the closing parentheses that we don't need

# now split up the neighborhood and boroughs
df[['Neighborhood','Borough']]=df.raw.str.split("(",n=1,expand=True,)
del df["raw"]

## 2. Get geolocation data

In [80]:
# Read the provided CSV, and convert to a dataframe
# I probably spent three hours trying to get the coordinates the hard way, and just came up empty.  Even the code provided by an instructor on the discussion boards didn't work for me. Sad.
!wget -O Geospatial_Coordiantes.csv http://cocl.us/Geospatial_data
locationDF = pd.read_csv("Geospatial_Coordiantes.csv")

--2020-03-22 02:07:12--  http://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 158.85.108.83, 158.85.108.86, 169.48.113.194
Connecting to cocl.us (cocl.us)|158.85.108.83|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cocl.us/Geospatial_data [following]
--2020-03-22 02:07:12--  https://cocl.us/Geospatial_data
Connecting to cocl.us (cocl.us)|158.85.108.83|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-03-22 02:07:14--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.27.197, 107.152.26.197
Connecting to ibm.box.com (ibm.box.com)|107.152.27.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-03-22 02:07:14--  https://ib

## Merge the location data with the postal code data from Part 1

In [85]:
pd.merge(df, locationDF, on='Postal Code')

Unnamed: 0,Postal Code,Neighborhood,Borough,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,Kennedy Park / Ionview / East Birchmount Park,43.727929,-79.262029
7,M1L,Scarborough,Golden Mile / Clairlea / Oakridge,43.711112,-79.284577
8,M1M,Scarborough,Cliffside / Cliffcrest / Scarborough Village West,43.716316,-79.239476
9,M1N,Scarborough,Birch Cliff / Cliffside West,43.692657,-79.264848
