# Clustering Toronto Neighbourhoods

### A. Gutmanas
### Feb 2020



---
## Part A:
## Get, clean and load neighbourhood locations

_A simple google search and some critical review points to the City of Toronto website (https://www.toronto.ca) and their "Open Data" portal: https://open.toronto.ca.
The neighbourhood geodata is available from: https://open.toronto.ca/dataset/neighbourhoods, which allows download in CSV, GeoJSON and a few other formats. This is easier than scraping Wikipedia, which also contains a different list of neighbourhoods: some of the city of Toronto neighbourhoods are split, e.g. Runnymede-Bloor West Village, and some are combined, e.g., Agincourt North and Agincourt South._

_For this project, I will start by downloading the CSV from https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/a083c865-6d60-4d1d-b6c6-b0c8a85f9c15?format=csv&projection=4326, as it will be the easiest to load into a pandas dataframe. This dataset lacks "boroughs", which are the old municipalities before and the city of Toronto was amalgamated in 2001. This information could be useful at some point, and the geographic boundaries for these areas are available from https://open.toronto.ca/dataset/former-municipality-boundaries/. The GeoJSON file can be downloaded from https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/f82dbe76-928e-4cec-8147-a21882f575e2?format=geojson&projection=4326_


In [1]:
# import libraries
import json
from shapely.geometry import shape, Point
import pandas as pd
import requests
import folium
import beautifulsoup4 as bs4



In [2]:
# Download the CSV with Toronto neighbourhoods and load into a dataframe
toronto_raw = pd.read_csv("https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/a083c865-6d60-4d1d-b6c6-b0c8a85f9c15?format=csv&projection=4326")
toronto_raw.head()

Unnamed: 0,_id,AREA_ID,AREA_ATTR_ID,PARENT_AREA_ID,AREA_SHORT_CODE,AREA_LONG_CODE,AREA_NAME,AREA_DESC,X,Y,LONGITUDE,LATITUDE,OBJECTID,Shape__Area,Shape__Length,geometry
0,3221,25886861,25926662,49885,94,94,Wychwood (94),Wychwood (94),,,-79.425515,43.676919,16491505,3217960.0,7515.779658,"{u'type': u'Polygon', u'coordinates': (((-79.4..."
1,3222,25886820,25926663,49885,100,100,Yonge-Eglinton (100),Yonge-Eglinton (100),,,-79.40359,43.704689,16491521,3160334.0,7872.021074,"{u'type': u'Polygon', u'coordinates': (((-79.4..."
2,3223,25886834,25926664,49885,97,97,Yonge-St.Clair (97),Yonge-St.Clair (97),,,-79.397871,43.687859,16491537,2222464.0,8130.411276,"{u'type': u'Polygon', u'coordinates': (((-79.3..."
3,3224,25886593,25926665,49885,27,27,York University Heights (27),York University Heights (27),,,-79.488883,43.765736,16491553,25418210.0,25632.335242,"{u'type': u'Polygon', u'coordinates': (((-79.5..."
4,3225,25886688,25926666,49885,31,31,Yorkdale-Glen Park (31),Yorkdale-Glen Park (31),,,-79.457108,43.714672,16491569,11566690.0,13953.408098,"{u'type': u'Polygon', u'coordinates': (((-79.4..."


In [3]:
# create a new dataframe with relevant columns only 
toronto_base = toronto_raw[["AREA_SHORT_CODE", "LONGITUDE", "LATITUDE"]]
toronto_base.head()

Unnamed: 0,AREA_SHORT_CODE,LONGITUDE,LATITUDE
0,94,-79.425515,43.676919
1,100,-79.40359,43.704689
2,97,-79.397871,43.687859
3,27,-79.488883,43.765736
4,31,-79.457108,43.714672


In [4]:
# add cleaned up names of neighbourhoods
toronto_base.loc[:,"AREA_NAME"] = [x[:x.find('(')-1] for x in toronto_raw["AREA_NAME"]]
                               
toronto_base.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Unnamed: 0,AREA_SHORT_CODE,LONGITUDE,LATITUDE,AREA_NAME
0,94,-79.425515,43.676919,Wychwood
1,100,-79.40359,43.704689,Yonge-Eglinton
2,97,-79.397871,43.687859,Yonge-St.Clair
3,27,-79.488883,43.765736,York University Heights
4,31,-79.457108,43.714672,Yorkdale-Glen Park


In [5]:
# download GeoJSON with data for old municipalities (i.e., boroughs)
url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/f82dbe76-928e-4cec-8147-a21882f575e2?format=geojson&projection=4326"

boroughs_geoJSON = requests.get(url).json()


In [6]:
# boroughs_geoJSON


In [7]:
boroughs = []
for index, neighbourhood in toronto_base.iterrows():
    # print(neighbourhood["AREA_NAME"])
    point = Point(neighbourhood["LONGITUDE"], neighbourhood["LATITUDE"])
    for feature in boroughs_geoJSON['features']:
        polygon = shape(feature['geometry'])
        if polygon.contains(point):
            # print(neighbourhood['AREA_NAME']," is in ",feature['properties']['AREA_NAME'])
            boroughs.append(feature['properties']['AREA_NAME'])
            break
            
toronto_base.loc[:,"BOROUGH"] = boroughs

toronto_base.sort_values(by=["BOROUGH", "AREA_SHORT_CODE"], inplace=True)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  toronto_base.sort_values(by=["BOROUGH", "AREA_SHORT_CODE"], inplace=True)


In [8]:
toronto_base

Unnamed: 0,AREA_SHORT_CODE,LONGITUDE,LATITUDE,AREA_NAME,BOROUGH
29,54,-79.312228,43.706800,O'Connor-Parkview,EAST YORK
57,55,-79.349984,43.707749,Thorncliffe Park,EAST YORK
9,56,-79.366072,43.703797,Leaside-Bennington,EAST YORK
91,57,-79.355630,43.688825,Broadview North,EAST YORK
32,58,-79.335488,43.696781,Old East York,EAST YORK
...,...,...,...,...,...
41,111,-79.494420,43.674790,Rockcliffe-Smythe,YORK
84,112,-79.479473,43.693216,Beechborough-Greenbrook,YORK
65,113,-79.515723,43.702716,Weston,YORK
5,114,-79.496045,43.657420,Lambton Baby Point,YORK


In [9]:
!pip install folium

#import folium # map rendering library


Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/opt/local/bin/python -m pip install --upgrade pip' command.[0m


In [10]:
import folium

In [15]:
# Coordinates for Yonge and Eg (roughly central)
longitude = -79.403590 
latitude = 43.704689

In [20]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_base['LATITUDE'], toronto_base['LONGITUDE'], toronto_base['BOROUGH'], toronto_base['AREA_NAME']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
map_toronto