<h3>Segmenting and Clustering Neighborhoods</h3>

Import packages that are needed to load wiki table

In [2]:
import pandas as pd
import numpy as np

In [3]:
raw_wiki = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")

In [11]:
wiki = pd.DataFrame(raw_wiki[0])

Remove any rows that there have no Borough assigned and rename the Postal Code column to "PostalCode"

In [59]:
wiki.drop(wiki[wiki['Borough'] == 'Not assigned'].index, inplace=True)
wiki.rename(columns = {'Postal Code': 'PostalCode'}, inplace=True)

In [60]:
wiki.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


To merge rows that have the same postal code and append the Neighborhood list, we will be using the groupby method. After grouping by postal codes, reset the index

In [63]:
df = wiki.groupby('PostalCode').agg({'Borough':'first','Neighborhood':''.join}).reset_index()
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


If there are any Neighborhood that do not have a name, we replace the value with the Borough name

In [64]:
mask = df['Neighborhood'] == 'Not assigned'
df.loc[mask, 'Neighborhood'] = df.loc[mask, 'Borough']

In [65]:
df.shape

(103, 3)

<h4>Use Geocoder to find location coordinates for each neighborhood</h4>

In [70]:

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0           conda-forge
    geopy:          

read location coordinate data from csv file and store in dataframe

In [84]:
location_df = pd.read_csv("http://cocl.us/Geospatial_data")
location_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


loop through each postal code and find the corresponding coordinates from the location dataframe, then add the longitude and latitude in the seperate columns to the original dataframe

In [102]:
Latitude = []
Longitude = []

for postal_code in df['PostalCode']:
    latitude = location_df.loc[location_df['Postal Code'] == postal_code, 'Latitude'].values[0]
    longitude = location_df.loc[location_df['Postal Code'] == postal_code, 'Longitude'].values[0]
    Latitude.append(latitude)
    Longitude.append(longitude)
    
df['Latitude'] = Latitude
df['Longitude'] = Longitude

In [103]:
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
