# Segmenting and Clustering Neighborhoods in the city of Toronto

##  Part-1: Explore and Cluster the neighborhoods in Canada

In [1]:
import requests
import pandas as pd


Get the HTML of the Wiki page, convert into a table with help of read_html (read HTML tables into a list of DataFrame objects), remove cells with a borough that is Not assigned.

In [28]:
wiki = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
wikipedia_page = requests.get(wiki)


In [29]:
df_raw = pd.read_html(wikipedia_page.content, header=0)[0]
df_new = df_raw[df_raw.Borough != 'Not assigned']

df_new.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Find whether there is a "Not assigned" in Neighbourhood

In [27]:
df_new.loc[df_new.Neighbourhood == 'Not assigned']

Unnamed: 0,Postal Code,Borough,Neighbourhood


In [30]:
df_new.Neighbourhood.replace('Not assigned',df_new.Borough,inplace=True)
df_new.head(8)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._update_inplace(new_data)


Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills


Group Neighbourhoods with the same Postcode

In [32]:
df_new.shape

(103, 3)

## Part-2: Get the latitude and the longitude coordinates of each neighborhood

In [34]:
url = 'http://cocl.us/Geospatial_data'
df_geo=pd.read_csv(url)
df_geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [35]:
df_geo.shape

(103, 3)

Both Tables got same shape, now lets add new columns

In [42]:
df_new = df_new.join(df_geo.set_index('Postal Code'), on='Postal Code')
df_new.head

<bound method NDFrame.head of     Postal Code           Borough  \
2           M3A        North York   
3           M4A        North York   
4           M5A  Downtown Toronto   
5           M6A        North York   
6           M7A  Downtown Toronto   
..          ...               ...   
160         M8X         Etobicoke   
165         M4Y  Downtown Toronto   
168         M7Y      East Toronto   
169         M8Y         Etobicoke   
178         M8Z         Etobicoke   

                                         Neighbourhood   Latitude  Longitude  
2                                            Parkwoods  43.753259 -79.329656  
3                                     Victoria Village  43.725882 -79.315572  
4                            Regent Park, Harbourfront  43.654260 -79.360636  
5                     Lawrence Manor, Lawrence Heights  43.718518 -79.464763  
6          Queen's Park, Ontario Provincial Government  43.662301 -79.389494  
..                                                 

## Part-3: Use the foursquere API to segment and cluster the neighborhoods of Toronto

In [44]:
!conda install -c conda-forge geocoder --yes
import geocoder
from geopy.geocoders import Nominatim 

address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: - 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
                                                                                                            /failed

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - cffi -> python[version='2.7.*|3.5.*|3.6.*|3.6.12|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.9,<3.10.0a0|>=3.8,<3.9.0a0|3.6.9|3.6.9|3.6.9|>=2.7,<2.8.0a0|3.6.9|>=3.5,<3.6.0a0|3.4.*',build='1_73_pypy|4_73_pypy|3_73_pyp

kiwisolver -> libgcc-ng[version='>=7.2.0|>=7.3.0|>=7.5.0']
icu -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=7.2.0']
mkl_random -> libgcc-ng[version='>=7.2.0|>=7.3.0|>=7.5.0']
gmp -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=7.2.0']
libxml2 -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=7.2.0']
libprotobuf -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=9.3.0|>=7.2.0']
blosc -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=9.3.0|>=7.2.0']
brotlipy -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0']
wrapt -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=7.2.0']
pymssql -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0']
biopython -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=7.2.0']
libffi -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=7.2.0']
sip -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=7.2.0']
pcre -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.2.0']
lzo -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=7.2.0']
statsmodels -> libgcc-ng[version='>=4.9|>=7.3.0|>=7.5.0|>=7.2.0']
mkl_fft -> libgcc-n

ModuleNotFoundError: No module named 'geocoder'

In [45]:
import folium

# create map of Toronto using latitude and longitude values
map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(map_Toronto)  
    
map_Toronto

ModuleNotFoundError: No module named 'folium'

In [46]:
CLIENT_ID = 'NVATJDWMM2PEFKPGMOQQ3PEAS3J2EX3BWTLGMWLUK14KRW0W' # your Foursquare ID
CLIENT_SECRET = 'HXMBCJZUG1SVRLYNAPNTDB4L1GX2GIOIVZXHB3THU2MHLZ5N' # your Foursquare Secret
VERSION = '20180604' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: NVATJDWMM2PEFKPGMOQQ3PEAS3J2EX3BWTLGMWLUK14KRW0W
CLIENT_SECRET:HXMBCJZUG1SVRLYNAPNTDB4L1GX2GIOIVZXHB3THU2MHLZ5N


In [48]:
neighborhood_latitude = df_new.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_new.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_new.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

KeyError: 0