<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Tornoto City</font></h1>

## Introduction

In this lab, you will learn how to convert addresses into their equivalent latitude and longitude values. Also, you will use the Foursquare API to explore neighborhoods in Tornoto City. You will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. You will use the *k*-means clustering algorithm to complete this task. Finally, you will use the Folium library to visualize the neighborhoods in Tornoto City and their emerging clusters.

### Setup : Download all the dependencies

In [2]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
!conda install -c anaconda beautifulsoup4 --yes
!conda install -c anaconda html5lib --yes
!conda install -c anaconda lxml --yes
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab


import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import json # library to handle JSON files
import requests # library to handle requests
import matplotlib.cm as cm # Matplotlib and associated plotting modules
import matplotlib.colors as colors # Matplotlib and associated plotting modules
import folium # map rendering library
import requests # library to handle requests
import folium # map rendering library
import urllib.request

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from sklearn.cluster import KMeans # import k-means from clustering stage
from sklearn.cluster import KMeans # import k-means from clustering stage
from bs4 import BeautifulSoup

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ###############################

## 1. Download and Explore dataset

In [3]:
source = urllib.request.urlopen('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').read()
soup = BeautifulSoup(source,'lxml')

table = soup.find_all('table')
df = pd.read_html(str(table))[0]

df.head(5)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [4]:
df.shape

(180, 3)

###  1.1 Ignore cells with a borough that is Not assigned.

In [5]:
Toronto_df = df[df['Borough'] != 'Not assigned']
Toronto_df.shape

(103, 3)

### 1.2  Combined neighborhoods into one row separated with a comma group by postal code &  Borough

In [6]:
Toronto_DataSet = pd.pivot_table(Toronto_df, index=['Postal Code','Borough'], 
                        values=['Neighborhood'], 
                        aggfunc=lambda x: ', '.join(map(str, x)))

In [7]:
Toronto_DataSet.reset_index(inplace=True)
Toronto_DataSet.head(5)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### 1.3 Assign neighborhood = borough if a cell has a borough but a Not assigned neighborhood 

In [8]:
Filter = (Toronto_DataSet['Neighborhood'] == 'Not assigned') & (Toronto_DataSet['Borough'].notnull())
Toronto_DataSet.loc[Filter, 'Neighborhood'] = Toronto_DataSet.loc[Filter, 'Borough']

In [9]:
Toronto_DataSet.shape

(103, 3)

## 2. Load geographical coordinates of each postal code into a dataframe

In [10]:
import pandas as pd
Geospatial_Dataset = pd.read_csv('Geospatial_data.csv')
Geospatial_Dataset.head(5)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### 2.1  Combine neighborhood + geographical coordinates dataset

In [11]:
Toronto_With_Geospatial_Data =Toronto_DataSet.merge(Geospatial_Dataset, left_on='Postal Code', right_on='Postal Code')
Toronto_With_Geospatial_Data.head(5)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [12]:
Toronto_With_Geospatial_Data['Borough'].value_counts()

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
York                 5
East York            5
East Toronto         5
Mississauga          1
Name: Borough, dtype: int64

In [103]:
Toronto_With_Geospatial_Data.shape

(103, 5)

In [14]:
Toronto_With_Geospatial_Data = Toronto_With_Geospatial_Data[Toronto_With_Geospatial_Data['Borough'].str.contains('Toronto')]
Toronto_With_Geospatial_Data.shape

(39, 5)

### 2.2 Fetch longitude and latitude data based on address and create map of Toronto

In [15]:
from geopy.geocoders import Nominatim
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 43.6534817, -79.3839347.


In [16]:
# create map of Toronto using latitude and longitude values
map_Toronto = folium.Map(location=[latitude,longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(Toronto_With_Geospatial_Data['Latitude'], Toronto_With_Geospatial_Data['Longitude'], Toronto_With_Geospatial_Data['Borough'], Toronto_With_Geospatial_Data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto