# Data for the Toronto neighourhoods assignment

This notebook contains the operations to obtain and manipulate geographical data for Toronto neighourhoods. It is the week three assignment in the Coursera Data Science Capstone project.

In [2]:
import pandas as pd
import numpy as np
import requests

from IPython.display import Image 
from IPython.core.display import HTML 
from pandas.io.json import json_normalize

import folium 

## Import '.csv'

Information on the Toronto neighourhoods can be obtained from [https://open.toronto.ca/dataset/neighbourhoods/](https://open.toronto.ca/dataset/neighbourhoods/), which is published by the Toronto city government. This site has been found by examining the [Wikipedia site](https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Toronto) and provides _open data_ that is more readily accessible than by scraping. Furthermore, the information on Wikipedia is ambiguous as to what counts as what should be counted as a neighbourhood as it also included informal designations.

The information from the Toronto city government can be downloaded as a '.csv' file and must first be stored locally before it can be read into a dataframe.

In [3]:
path = '~/Documents/Projects/Coursera-Capstone/geodata/Neighbourhoods.csv'

toronto_data = pd.read_csv(path)


## Clean data

Data can be cleaned by dropping unwanted columns. The data format of the neighourhood names is edited so as to be more compatible with external data sources.

In [4]:
# Select neighourhood names and locations and store in a smaller dataframe
nbhs = toronto_data[['AREA_NAME', 'AREA_LONG_CODE', 'LONGITUDE', 'LATITUDE']]

# Clean neighourhood names by removing the area codes. 
nbhs['AREA_NAME'] = toronto_data['AREA_NAME'].str.split('(').str[0]

# Rename columns
nbhs.columns = ['Neighbourhood', 'Area code', 'Longitude', 'Latitude']

nbhs.sort_values(by='Area code', inplace=True)

# Inspect first five rows of the dataframe
nbhs.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nbhs['AREA_NAME'] = toronto_data['AREA_NAME'].str.split('(').str[0]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nbhs.sort_values(by='Area code', inplace=True)


Unnamed: 0,Neighbourhood,Area code,Longitude,Latitude
24,West Humber-Clairville,1,,
34,Mount Olive-Silverstone-Jamestown,2,,
124,Thistletown-Beaumond Heights,3,,
122,Rexdale-Kipling,4,,
48,Elms-Old Rexdale,5,,


In [5]:
# Check that the dataframe contains all 140 neighbourhoods
nbhs.describe(include='all')

Unnamed: 0,Neighbourhood,Area code,Longitude,Latitude
count,140,140.0,0.0,0.0
unique,140,,,
top,Birchcliffe-Cliffside,,,
freq,1,,,
mean,,70.5,,
std,,40.5586,,
min,,1.0,,
25%,,35.75,,
50%,,70.5,,
75%,,105.25,,


## Add location data

Apparently, the Longitude and Latitude columns are empty. These need to be populated.

In [6]:
# module to convert an address into latitude and longitude values

from geopy.geocoders import Nominatim # Nominatim is the open acces geolocation service that geopy uses.

In [6]:
geolocator = Nominatim(user_agent='coursera_capstone')

i = 0
for neighbourhood in nbhs['Neighbourhood']:

    address = '{}, Toronto, Canada'.format(neighbourhood)
    location = geolocator.geocode(address)
    if location is None:
        i = i + 1
        print('{} could not be geocoded'.format(address))
    else:
        lat = location.latitude
        lon = location.longitude
        print(address, lat, lon)
        

print('{} instances could not be geocoded'.format(i))


West Humber-Clairville , Toronto, Canada 43.72337025 -79.59745741095173
Mount Olive-Silverstone-Jamestown , Toronto, Canada could not be geocoded
Thistletown-Beaumond Heights , Toronto, Canada could not be geocoded
Rexdale-Kipling , Toronto, Canada 43.722114149999996 -79.57229244708017
Elms-Old Rexdale , Toronto, Canada 43.72176985 -79.55217331972301
Kingsview Village-The Westway , Toronto, Canada could not be geocoded
Willowridge-Martingrove-Richview , Toronto, Canada could not be geocoded
Humber Heights-Westmount , Toronto, Canada 43.6977767 -79.5212217
Edenbridge-Humber Valley , Toronto, Canada 43.670672 -79.5188545
Princess-Rosethorn , Toronto, Canada could not be geocoded
Eringate-Centennial-West Deane , Toronto, Canada could not be geocoded
Markland Wood , Toronto, Canada 43.63123865 -79.58543401986114
Etobicoke West Mall , Toronto, Canada 43.620635 -79.560287
Islington-City Centre West , Toronto, Canada 43.645335 -79.5248163
Kingsway South , Toronto, Canada 43.6473811 -79.511332