# Segmenting and Clustering Neighborhoods in Toronto


## Importing Numpy and Pandas, and Extracting the Data

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

**Step 1: Importing Requests and Beautiful Soup to extract data from Wikipage into a list of dataframes**

In [4]:


import requests
from bs4 import BeautifulSoup

res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))

toronto_area_codes = df[0]

**Step 2: See that the dataframe only has 3 columns - Postal Code, Borough and Neighborhood**

In [5]:
toronto_area_codes

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


**Step 3: Create a new dataframe that filters out codes that do not have an assigned Borough**

In [6]:
resultdf = toronto_area_codes[toronto_area_codes['Borough'] != 'Not assigned']

*Resetting the index after  filtering*

In [7]:
resultdf.reset_index()

Unnamed: 0,index,Postal Code,Borough,Neighborhood
0,2,M3A,North York,Parkwoods
1,3,M4A,North York,Victoria Village
2,4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,5,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,9,M1B,Scarborough,"Malvern, Rouge"
7,11,M3B,North York,Don Mills
8,12,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,13,M5B,Downtown Toronto,"Garden District, Ryerson"


*Rows no longer need to be combined via group by, the Wikipedia page already has the neighbourhoods grouped, separated by comma*

**Step 4: If neighbourhood is not assigned, replace it with the value of the Borough**

In [8]:
resultdf['Neighborhood'].replace('Not assigned', resultdf['Borough'])

2                                              Parkwoods
3                                       Victoria Village
4                              Regent Park, Harbourfront
5                       Lawrence Manor, Lawrence Heights
6            Queen's Park, Ontario Provincial Government
8                Islington Avenue, Humber Valley Village
9                                         Malvern, Rouge
11                                             Don Mills
12                       Parkview Hill, Woodbine Gardens
13                              Garden District, Ryerson
14                                             Glencairn
17     West Deane Park, Princess Gardens, Martin Grov...
18                Rouge Hill, Port Union, Highland Creek
20                                             Don Mills
21                                      Woodbine Heights
22                                        St. James Town
23                                    Humewood-Cedarvale
26     Eringate, Bloordale Gard

In [10]:
resultdf.reset_index()

Unnamed: 0,index,Postal Code,Borough,Neighborhood
0,2,M3A,North York,Parkwoods
1,3,M4A,North York,Victoria Village
2,4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,5,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,9,M1B,Scarborough,"Malvern, Rouge"
7,11,M3B,North York,Don Mills
8,12,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,13,M5B,Downtown Toronto,"Garden District, Ryerson"


**Step 5: Print the shape of the dataframe**

In [48]:
resultdf.shape

(103, 3)

**Step 6: Share on Github**

*Done*

## Adding Latitude and Longitude for the Neighborhoods

In [16]:
!pip install geocoder
import geocoder




In [35]:
postal_codes = resultdf['Postal Code'].values

In [36]:
API_KEY = 'd81586d5707f448e89f23450fe404003'

In [43]:
import json

latitudes = [] # Initializing the latitude array
longitudes = [] # Initializing the longitude array

for postal_code in postal_codes : 
    place_name = postal_code + " Toronto, Canada" # Formats the place name
    url = 'https://api.opencagedata.com/geocode/v1/json?q={}&key={}'.format(place_name, API_KEY) # Gets the proper url to make the API call
    obj = json.loads(requests.get(url).text) # Loads the JSON file in the form of a python dictionary
    
    results = obj['results'] # Extracts the results information out of the JSON file
    lat = results[0]['geometry']['lat'] # Extracts the latitude value
    lng = results[0]['geometry']['lng'] # Extracts the longitude value
    
    latitudes.append(lat) # Appending to the list of latitudes
    longitudes.append(lng) # Appending to the list of longitudes
    
resultdf['Latitude'] = latitudes
resultdf['Longitude'] = longitudes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [44]:
resultdf.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.653482,-79.383935
3,M4A,North York,Victoria Village,43.7276,-79.3148
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.653482,-79.383935
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.6662,-79.5282
9,M1B,Scarborough,"Malvern, Rouge",43.653482,-79.383935
11,M3B,North York,Don Mills,43.745,-79.359
12,M4B,East York,"Parkview Hill, Woodbine Gardens",43.7063,-79.3094
13,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783
