# Toronto Neighbourhood clustering - Part 3 - Clustering

Firstly, the necessary packages are imported. `requests` is used to obtain the data, which is then processed into a dataframes using `pandas`. `Folium` is used to render maps. `Nominatim` is used to obtain coordinates data.

In [7]:
import requests
import pandas as pd
import geocoder
import folium
from geopy.geocoders import Nominatim

The data is fetched from Wikipedia and the `read_html` method is used to assign the tabular data to a dataframe. The index '0' ensures that only the necessary
data is retrieved. The last step filters the dataframe to remove entries with 'Not assigned' in the 'Borough' column

In [8]:
data  = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
df = pd.read_html(data)[0]
df = df[df.Borough != 'Not assigned']

A function called commajoin is defined. It takes in a list of strings and returns a comma-concatenated (with a whitespace) output. 
This function is defined so that it can be used as an input to the apply method, to combine necessary rows.
`groupby` is used first, to group entries in the dataframe. The output of all the aforementioned steps is written back to the same variable using the `to_frame` method.
To resolve the indices numbering, `reset_index` is applied.

In [9]:
def commajoin(list):
    return ', '.join(list)
df = df.groupby(['Postcode','Borough'])['Neighbourhood'].apply(list).apply(commajoin).to_frame().reset_index()

The next step involves assignment of respective Borough entries to the Neighbourhood column wherever 'Not assigned' is seen.

In [10]:
for i, j in df.iterrows():
    if j['Neighbourhood'] == 'Not assigned':
        j['Neighbourhood'] == j['Borough']

Next, a new dataframe is created from the csv file containing Toronto geospatial data. The header of the first column is changed from 'Postal Code' to 'Postcode' so that the `merge` method can be applied to the two dataframes. The columns are rearranged in order to match the required output.

In [11]:
df2 = pd.read_csv('http://cocl.us/Geospatial_data')
df2.columns = ['Postcode', 'Latitude', 'Longitude']
df = pd.merge(df2, df, on = 'Postcode')
df = df[['Postcode', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude']]

In the next step, the coordinates of Toronto are obtained using `Nominatim`

In [12]:
address = 'Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

(43.653963, -79.387207)


A map of Toronto is created using `Folium` with the coordinates data. Initial zoom level is set to 10.

In [13]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

Markers are added to the map using information from the dataframe. The map is printed thereafter.

In [16]:
for lat, lng, borough, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

In [17]:
map_toronto