## Get the Geo-Data of the Neighbourhoods <a name='neighbourhood_data'><a/>

### First Method 

This method acquires the neighbourhoods information of the City of Toronto from a wiki page. The neighbourhood information is mainly grouped by postcodes in this wiki page. Therefore, only 103 groups are there in the dataset. Then the geo information is added to the dataset, the geo information is acquaired from the IBM server.

#### 1 Load the table which contains the neighbourhoods information from the wiki page

In [1]:
# ignore SSL certificate errors
import ssl

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE


# load the data
import urllib.request
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
table = soup.find('table')

#### 2 Transform the JSON data into a Pandas Dataframe

In [2]:
import pandas as pd

fields = table.find_all('td') 
# postcode, borough and neighbourhood information are all in 'td' tags

postcode = []
borough = []
neighbourhood = []

for i in range(0, len(fields), 3) : # stride
    postcode.append(fields[i].text.strip())        # create postcode list
    borough.append(fields[i+1].text.strip())       # create borough list
    neighbourhood.append(fields[i+2].text.strip()) # create neighourhood list
    
toronto_geo = pd.DataFrame(data=[postcode, borough, neighbourhood]).transpose()
toronto_geo.columns = ['Postcode', 'Borough', 'Neighbourhood']
toronto_geo.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


#### 3 Remove rows with 'Not assigned' information

In [3]:
import numpy as np

toronto_geo['Borough'].replace('Not assigned', np.nan, inplace=True) 
# replace 'Not assigned' with NaN

toronto_geo.dropna(subset=['Borough'], inplace=True)   # drop NaN data
toronto_geo.reset_index(inplace=True, drop=True)       # reset index
toronto_geo.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


#### 4 Add geo information to the Dataframe

In [4]:
# ignore ssl certificate errors
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

# load geo information data from the server (provided by the capstone)
geo_data = pd.read_csv('http://cocl.us/Geospatial_data')
geo_data.columns = ['Postcode', 'Latitude', 'Longitude'] # name the columns' title

# merge dataframe of neighbourhood information and dataframe of geo indormation
toronto_geo = pd.merge(toronto_geo, geo_data, on=['Postcode'], how='inner')
toronto_geo.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


### Second Method

This method acquires both neighbourhood information and geo information from the geojson file of the City of Toronto. 140 neighbourhoods are reported by this file. The same file will be used for the following sections of this study. Therefore, it is consistent.

#### 1 Load the geojson file of the City of Toronto

In [5]:
import json

with open ('Neighbourhoods.geojson') as json_file :
    data = json.load(json_file)

# visualize geo information of one neighbourhood
data['features'][2] 

{'type': 'Feature',
 'properties': {'_id': 5323,
  'AREA_ID': 25886834,
  'AREA_ATTR_ID': 25926664,
  'PARENT_AREA_ID': 49885,
  'AREA_SHORT_CODE': 97,
  'AREA_LONG_CODE': 97,
  'AREA_NAME': 'Yonge-St.Clair (97)',
  'AREA_DESC': 'Yonge-St.Clair (97)',
  'X': None,
  'Y': None,
  'LONGITUDE': -79.3978707687,
  'LATITUDE': 43.687858872,
  'OBJECTID': 16491537,
  'Shape__Area': 2222464.265625,
  'Shape__Length': 8130.41127575658},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-79.391194825918, 43.681081122778],
    [-79.391405429411, 43.6809695537531],
    [-79.3932237727444, 43.6801656402164],
    [-79.3958088314622, 43.6789799421746],
    [-79.3973493866807, 43.6782748136808],
    [-79.3974560560976, 43.6782254111802],
    [-79.397563894062, 43.67816700831],
    [-79.397671319664, 43.6781175972839],
    [-79.3977795512042, 43.6780682048796],
    [-79.3978885343628, 43.6780142943828],
    [-79.3979313672526, 43.6779949618024],
    [-79.3979440512351, 43.6780262977103],
    [-79.39

#### 2 Convert the JSON file to a Pandas DataFrame

In [6]:
import pandas as pd

neighbourhood = []
neighbourhood_id = []
latitude = []
longitude = []

for item in data['features'] : # stride
    
    neighbourhood_name = ' '.join(item['properties']['AREA_NAME'].split(' ')[0: -1])
    neighbourhood.append(neighbourhood_name)
    neighbourhood_id.append(item['properties']['AREA_SHORT_CODE'])
    latitude.append(item['properties']['LATITUDE']) 
    longitude.append(item['properties']['LONGITUDE'])

toronto_geo = pd.DataFrame(data=[neighbourhood, neighbourhood_id, latitude, longitude]).transpose()
toronto_geo.columns = ['Neighbourhood', 'Neighbourhood ID', 'Latitude', 'Longitude']
toronto_geo.head()

Unnamed: 0,Neighbourhood,Neighbourhood ID,Latitude,Longitude
0,Wychwood,94,43.6769,-79.4255
1,Yonge-Eglinton,100,43.7047,-79.4036
2,Yonge-St.Clair,97,43.6879,-79.3979
3,York University Heights,27,43.7657,-79.4889
4,Yorkdale-Glen Park,31,43.7147,-79.4571


#### 3 Examine how many neighbourhoods are in the dataset

In [7]:
print('Neighbourhood Count:', len(toronto_geo['Neighbourhood'].value_counts()))
print('Neighbourhood ID Count:', len(toronto_geo['Neighbourhood ID'].value_counts()))

Neighbourhood Count: 140
Neighbourhood ID Count: 140


#### 4 Save the dataframe as 'toronto_geo' for later use

In [8]:
toronto_geo.to_csv('toronto_geo.csv', index=False)

## Visualize the Neighborhoods in Toronto <a name='neighbourhoods_visualization'></a>

#### 1 Get the geographical coordinate of the City of Toronto

In [9]:
# ignore ssl certificate errors
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

from geopy.geocoders import Nominatim

address = 'Lawrence Park, Toronto' # approximate spatial centre of Toronto

geolocator = Nominatim(user_agent='toronto_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of the City of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of the City of Toronto are 43.729199, -79.4032525.


#### 2 Create the map of Totonto using the latitude and longitude values

In [10]:
import folium

# create the map
map_neighbourhoods = folium.Map(location=[latitude, longitude],
                                zoom_start=11,
                                control_scale=True,
                                zoom_control=False, scrollWheelZoom=False, dragging=False)
                                # disable the zoom and scroll
# add markers to map
marker_count = 0

for lat, lng, neighbourhood in zip(toronto_geo['Latitude'],
                                   toronto_geo['Longitude'],
                                   toronto_geo['Neighbourhood']):
    
    label = '{}'.format(neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng],
                        radius=5,
                        popup=label,
                        color='yellow',
                        fill=True,
                        fill_color='blue',
                        fill_opacity=0.8,
                        parse_html=False).add_to(map_neighbourhoods)  

    marker_count = marker_count + 1 
    
print('{} markers have been added to the map.'.format(marker_count))
map_neighbourhoods

140 markers have been added to the map.
