### <h1>Exploring the Neighborhoods in Toronto (Skip to part 3)</h1>

<h3>Utilizing the Foursquare API and data analysis methods to explore and cluster the neighborhoods in North York, Toronto.</h3>

<h2>Part 1:</h2>

In [None]:
#First, we import the necessary libraries
import pandas as pd
from bs4 import BeautifulSoup
import requests
import csv

#Then we scrape the Wikipedia page for the required table data, and we save it in a .CSV file:
source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
tree = BeautifulSoup(source, 'lxml')

csv_file = open("table_data.csv","w",newline='')
csv_writer = csv.writer(csv_file)

table_tag = tree.select("table")[0]
tab_data = [[item.text.split('\n')[0] for item in row_data.select("th,td")]
                for row_data in table_tag.select("tr")]

for data in tab_data:
    csv_writer.writerow(data)

    
#We read our .CSV file into a dataframe:
df = pd.read_csv('table_data.csv')

#Remove all table rows without assigned boroughs:
df = df.drop(df.index[df.Borough == 'Not assigned'])

#Replace 'Not assigned' in Neighbourhood column with corresponding value from Borough column:
df.loc[df.Neighbourhood == 'Not assigned', 'Neighbourhood'] = df['Borough']

#Group data by Postcode and join all neighbourhoods that exist in the same postal code area:
df = df.groupby('Postcode').agg(lambda x: ','.join(set(x))).reset_index()

#Finally, we print the final number of rows and columns:
df.shape

In [None]:
df.head()

<h2>Part 2:</h2>

In [None]:
#First, we read the .CSV containg the longitude and latitude data into a new dataframe:
geodf = pd.read_csv('https://cocl.us/Geospatial_data')

#Then, we add the columns from the new geo-dataframe to our existing dataframe:
df = df.join(geodf)
df.head()

<h2> Part3:</h2>

<h3>Download all the dependencies that we will need:</h3>

In [None]:
import numpy as np # library to handle data in a vectorized manner

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

<h3>Use geopy library to get the latitude and longitude values of Toronto:</h3>

In [None]:
address = 'TORONTO'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

<h3>Create a map of Toronto with neighborhoods superimposed on top:</h3>

In [None]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

<h3>slice the original dataframe and create a new dataframe of the North York data:</h3>
<h4>(We will segment and cluster only the neighborhoods in North York)</h4>

In [None]:
northyork_data = df[df['Borough'] == 'North York'].reset_index(drop=True)
northyork_data.head()

<h3>Get the geographical coordinates of North York:</h3>

In [None]:
address = 'NORTH YORK, TORONTO'

geolocator = Nominatim(user_agent="nyork_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of North York are {}, {}.'.format(latitude, longitude))

<h3>Create a map of North York and the neighborhoods in it:</h3>

In [None]:
# create map of North York using latitude and longitude values
map_northyork = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(northyork_data['Latitude'], northyork_data['Longitude'], northyork_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_northyork)  
    
map_northyork

<h3>Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them:</h3>

In [None]:
#Define Foursquare Credentials and Version
CLIENT_ID = 'XONGSW5WHN5CIGFMF1UVIU34PIXE4ULNPLJQYF1QFWARXIAO' 
CLIENT_SECRET = 'JXMZRATAXWLJL5CBMXBXU5ESYNC3SHTBCZTFTNWEKWOUTDG0' 
VERSION = '20180605' 
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

<h3>Explore the first neighbourhood in our dataframe:</h3>

In [None]:
#Get the neighboUrhood's name
northyork_data.loc[0, 'Neighbourhood']

In [None]:
#Get the neighbourhood's latitude and longitude values
neighbourhood_latitude = northyork_data.loc[0, 'Latitude'] # neighborhood latitude value
neighbourhood_longitude = northyork_data.loc[0, 'Longitude'] # neighborhood longitude value

neighbourhood_name = northyork_data.loc[0, 'Neighbourhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))

<h3>Now, let's get the top 100 venues that are in Hillcrest Village within a radius of 500 meters:</h3>

In [None]:
#Create the GET request URL

LIMIT = 100
radius = 500 

#Create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighbourhood_latitude, 
    neighbourhood_longitude, 
    radius, 
    LIMIT)
url

In [None]:
#Send the GET request and examine the resutls
results = requests.get(url).json()
results

<h3>Return venues from Foursquare:</h3>

In [None]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [None]:
#Clean the json and structure it into a pandas dataframe
import json # library to handle JSON files

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

In [None]:
#Number of venues returned by foursqaure:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

<h3>Explore neighbourhoods in North York:</h3>

In [None]:
#create a function to repeat the same process to all the neighborhoods in North York
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
#run the above function on each neighborhood and create a new dataframe called northyork_venues
northyork_venues = getNearbyVenues(names=northyork_data['Neighbourhood'],
                                   latitudes=northyork_data['Latitude'],
                                   longitudes=northyork_data['Longitude']
                                  )

In [None]:
#check the size of the resulting dataframe
print(northyork_venues.shape)
northyork_venues.head()

In [None]:
#check how many venues were returned for each neighborhood
northyork_venues.groupby('Neighbourhood').count()

In [None]:
#find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(northyork_venues['Venue Category'].unique())))

<h3>Analyzing Each Neighbourhood:</h3>

In [None]:
# one hot encoding
northyork_onehot = pd.get_dummies(northyork_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighbourhood column back to dataframe
northyork_onehot['Neighbourhood'] = northyork_venues['Neighbourhood'] 

# move neighbourhood column to the first column
fixed_columns = [northyork_onehot.columns[-1]] + list(northyork_onehot.columns[:-1])
northyork_onehot = northyork_onehot[fixed_columns]

northyork_onehot.head()

In [None]:
#examine the new dataframe size
northyork_onehot.shape

In [None]:
#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
northyork_grouped = northyork_onehot.groupby('Neighbourhood').mean().reset_index()
northyork_grouped

In [None]:
#confirm the new size
northyork_grouped.shape

In [None]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in northyork_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = northyork_grouped[northyork_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

<h3>Put the new analyzed data into a pandas dataframe:</h3>

In [None]:
#write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
#create the new dataframe and display the top 10 venues for each neighbourhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = northyork_grouped['Neighbourhood']

for ind in np.arange(northyork_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(northyork_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

<h3>Clustering the neighbourhoods:</h3>

<h4>Run k-means to cluster the neighborhood into 5 clusters</h4>

In [None]:
# set number of clusters
kclusters = 5

northyork_grouped_clustering = northyork_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(northyork_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

<h4>create a new dataframe that includes the cluster as well as the top 10 venues for each neighbourhood</h4>

In [None]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

northyork_merged = northyork_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
northyork_merged = northyork_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

northyork_merged.head() # check the last columns!

<h4>Visualize the resulting clusters</h4>

In [187]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(northyork_merged['Latitude'], northyork_merged['Longitude'], northyork_merged['Neighbourhood'], northyork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

TypeError: list indices must be integers or slices, not float

<h3>Examining each cluster:</h3>

In [188]:
#Cluster 1
northyork_merged.loc[northyork_merged['Cluster Labels'] == 0, northyork_merged.columns[[1] + list(range(5, northyork_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,-79.363452,0.0,Golf Course,Dog Run,Pool,Athletics & Sports,Mediterranean Restaurant,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega
2,North York,-79.385975,0.0,Japanese Restaurant,Bank,Café,Chinese Restaurant,Frozen Yogurt Shop,Discount Store,Coffee Shop,Comfort Food Restaurant,General Entertainment,Construction & Landscaping
9,North York,-79.352188,0.0,Caribbean Restaurant,Gym / Fitness Center,Café,Basketball Court,Japanese Restaurant,Fried Chicken Joint,Discount Store,Comfort Food Restaurant,Furniture / Home Store,Construction & Landscaping
16,North York,-79.520999,0.0,Grocery Store,Gym / Fitness Center,Athletics & Sports,Discount Store,Liquor Store,Women's Store,Dog Run,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop
20,North York,-79.445073,0.0,Japanese Restaurant,Pizza Place,Metro Station,Pub,Park,Women's Store,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping
21,North York,-79.490074,0.0,Park,Construction & Landscaping,Bakery,Basketball Court,Electronics Store,Coffee Shop,Comfort Food Restaurant,Cosmetics Shop,Deli / Bodega,Department Store


In [189]:
#Cluster 2
northyork_merged.loc[northyork_merged['Cluster Labels'] == 1, northyork_merged.columns[[1] + list(range(5, northyork_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,North York,-79.400049,1.0,Park,Bank,Women's Store,Dog Run,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store
8,North York,-79.329656,1.0,Park,Food & Drink Shop,Fast Food Restaurant,Women's Store,Discount Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega
13,North York,-79.464763,1.0,Park,Airport,Bus Stop,Dog Run,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store


In [190]:
#Cluster 3
northyork_merged.loc[northyork_merged['Cluster Labels'] == 2, northyork_merged.columns[[1] + list(range(5, northyork_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,North York,-79.506944,2.0,Grocery Store,Bank,Women's Store,Electronics Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store


In [191]:
#Cluster 4
northyork_merged.loc[northyork_merged['Cluster Labels'] == 3, northyork_merged.columns[[1] + list(range(5, northyork_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,North York,-79.495697,3.0,Business Service,Food Truck,Baseball Field,Women's Store,Dog Run,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega
23,North York,-79.532242,3.0,Furniture / Home Store,Baseball Field,Women's Store,Dog Run,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store


In [192]:
#Cluster 5
northyork_merged.loc[northyork_merged['Cluster Labels'] == 4, northyork_merged.columns[[1] + list(range(5, northyork_merged.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,-79.346556,4.0,Clothing Store,Fast Food Restaurant,Coffee Shop,Restaurant,Asian Restaurant,Bakery,Women's Store,Food Court,Gift Shop,Cosmetics Shop
5,North York,-79.408493,4.0,Restaurant,Ramen Restaurant,Café,Pizza Place,Coffee Shop,Japanese Restaurant,Sandwich Place,Shopping Mall,Pharmacy,Plaza
7,North York,-79.442259,4.0,Pharmacy,Pizza Place,Grocery Store,Coffee Shop,Butcher,Discount Store,Clothing Store,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop
10,North York,-79.340923,4.0,Coffee Shop,Asian Restaurant,Gym,Beer Store,Bike Shop,Clothing Store,Chinese Restaurant,Dim Sum Restaurant,Bus Line,Japanese Restaurant
11,North York,-79.442259,4.0,Coffee Shop,Fried Chicken Joint,Shopping Mall,Frozen Yogurt Shop,Pharmacy,Pizza Place,Deli / Bodega,Bridal Shop,Restaurant,Sandwich Place
12,North York,-79.487262,4.0,Coffee Shop,Miscellaneous Shop,Massage Studio,Bar,Women's Store,Discount Store,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega
17,North York,-79.315572,4.0,Pizza Place,Coffee Shop,Portuguese Restaurant,Hockey Arena,Women's Store,Discount Store,Clothing Store,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop
18,North York,-79.41975,4.0,Coffee Shop,Fast Food Restaurant,Pizza Place,Italian Restaurant,Sandwich Place,Comfort Food Restaurant,Café,Butcher,Liquor Store,Juice Bar
19,North York,-79.464763,4.0,Accessories Store,Event Space,Gift Shop,Miscellaneous Shop,Coffee Shop,Shoe Store,Boutique,Clothing Store,Vietnamese Restaurant,Furniture / Home Store
22,North York,-79.565963,4.0,Pizza Place,Empanada Restaurant,Dog Run,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cosmetics Shop,Deli / Bodega,Department Store
