<a href="https://colab.research.google.com/github/Shakespeare1998/Coursera_Capstone/blob/master/Toronto_complete.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Scraping wiki page for Toronto neighbourhood info**

In [0]:
import urllib.request
from bs4 import BeautifulSoup


url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = urllib.request.urlopen(url)

In [0]:
soup = BeautifulSoup(page, 'lxml')
#print(soup.prettify())

In [3]:
table = soup.find_all('table', class_='wikitable sortable')
type(table[0])

bs4.element.Tag

In [0]:
postal_code = []
borough = []
neighbourhood = []


for row in table[0].find_all('tr'):
    cells=row.find_all('td')
    if len(cells)==3:
        postal_code.append((cells[0].find(text=True)).strip())
        borough.append((cells[1].find(text=True)).strip())
        neighbourhood.append((cells[2].find(text=True)).strip())

#postal_code


### **Converting scraped data into Dataframe**

In [5]:
import pandas as pd

df=pd.DataFrame(postal_code, columns=["Postal Code"])
df["Borough"] = borough
df["Neighbourhood"] = neighbourhood
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,
176,M6Z,Not assigned,
177,M7Z,Not assigned,
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


### Cleaning data


In [0]:
df = df[df["Borough"]!="Not assigned"]

In [0]:
df = df.reset_index()

In [0]:
del df["index"]

In [9]:
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business reply mail Processing Centre
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [10]:
df.shape

(103, 3)

### **Retrieving Location data (Longitudes and Latitudes)**

This the data from the csv file provided. It has been uploaded onto the github repository from which it is being retrieved.

In [11]:
!git clone https://github.com/Shakespeare1998/Coursera_Capstone

Cloning into 'Coursera_Capstone'...
remote: Enumerating objects: 27, done.[K
remote: Counting objects:   3% (1/27)[Kremote: Counting objects:   7% (2/27)[Kremote: Counting objects:  11% (3/27)[Kremote: Counting objects:  14% (4/27)[Kremote: Counting objects:  18% (5/27)[Kremote: Counting objects:  22% (6/27)[Kremote: Counting objects:  25% (7/27)[Kremote: Counting objects:  29% (8/27)[Kremote: Counting objects:  33% (9/27)[Kremote: Counting objects:  37% (10/27)[Kremote: Counting objects:  40% (11/27)[Kremote: Counting objects:  44% (12/27)[Kremote: Counting objects:  48% (13/27)[Kremote: Counting objects:  51% (14/27)[Kremote: Counting objects:  55% (15/27)[Kremote: Counting objects:  59% (16/27)[Kremote: Counting objects:  62% (17/27)[Kremote: Counting objects:  66% (18/27)[Kremote: Counting objects:  70% (19/27)[Kremote: Counting objects:  74% (20/27)[Kremote: Counting objects:  77% (21/27)[Kremote: Counting objects:  81% (22/27)[Kremote:

In [0]:
!cd Coursera_Capstone

In [0]:
lo = pd.read_csv("Coursera_Capstone/Geospatial_Coordinates.csv")

### Merging the two datasets

In [0]:
final_df = pd.merge(df, lo, how="inner", left_on="Postal Code", right_on="Postal Code")

In [15]:
final_df.head(2)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572


### **Creating a map of Toronto**

In [16]:
import folium
from geopy.geocoders import Nominatim

address = 'Toronto'

geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
#print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))
map_tor = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(final_df['Latitude'], final_df['Longitude'], final_df['Borough'], final_df['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tor)  
    
map_tor

### **Exploring and retrieving data from Foursquare**

---



In [17]:
import json
import requests
from pandas.io.json import json_normalize 

CLIENT_ID = 'T5BPTTCUI0BGDLUMJNP4WJFT4OFLIZKATZ0JOESYZIGXIGPX' # your Foursquare ID
CLIENT_SECRET = 'EGFOCBEUDL5RNKFRPLD1MW2VCS5310SAQO5JLRANKT5GRJ2K' # your Foursquare Secret
VERSION = '20180605'
venue_name = []
venue_categories = []
venue_location_lat = []
venue_location_lng = []
n = []

for i in final_df.index:
  lat = final_df.loc[i,"Latitude"]
  lng = final_df.loc[i,"Longitude"]
  url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, 750, 50)
  results = requests.get(url).json()
  #print(final_df.loc[i, "Neighbourhood"])
  try:
    data = results['response']['groups'][0]['items']
    for j in data:
      n.append(final_df.loc[i,"Neighbourhood"])
      venue_name.append(j['venue']['name'])
      venue_location_lat.append(j['venue']['location']['lat'])
      venue_location_lng.append(j['venue']['location']['lng'])
      venue_categories.append(j['venue']['categories'][0]['name'])

  except:
     print(final_df.loc[i, "Neighbourhood"] + "Unsuccessful")
print(len(n))
print(len(venue_name))
print(len(venue_location_lat))
print(len(venue_location_lng))
print(len(venue_categories))

2661
2661
2661
2661
2661


In [18]:
#lat = final_df[final_df['Neighbourhood']=="Steeles West, L'Amoreaux West"]['Latitude']
#lng = final_df[final_df['Neighbourhood']=="Steeles West, L'Amoreaux West"]['Longitude']
#url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, 750, 50)
#results = requests.get(url).json()
  #print(final_df.loc[i, "Neighbourhood"])
#try:
'''  data = results['response']
  print(data.keys())
  
  for j in data:
    n.append(final_df.loc[i,"Neighbourhood"])
    venue_name.append(j['venue']['name'])
    venue_location_lat.append(j['venue']['location']['lat'])
    venue_location_lng.append(j['venue']['location']['lng'])
    venue_categories.append(j['venue']['categories'][0]['name'])'''

except:
   print( " Unsuccessful")'''

dict_keys([])


In [0]:
data_dict = {'Neighbourhood' : n, 'Venue' : venue_name, 'Category' : venue_categories, 'Latitude' : venue_location_lat, 'Longitude' : venue_location_lng}

### Creating dataframe with data from Foursquare

In [20]:
import pandas as pd
nearby = pd.DataFrame.from_dict(data_dict)
nearby.head()

Unnamed: 0,Neighbourhood,Venue,Category,Latitude,Longitude
0,Parkwoods,Brookbanks Park,Park,43.751976,-79.33214
1,Parkwoods,Variety Store,Food & Drink Shop,43.751974,-79.333114
2,Parkwoods,DVP at York Mills,Road,43.758899,-79.334099
3,Parkwoods,TTC Stop #09083,Bus Stop,43.759655,-79.332223
4,Victoria Village,Victoria Village Arena,Hockey Arena,43.723481,-79.315635


In [21]:
pd.set_option('display.max_rows', None)
len(nearby['Category'].unique())

307

### Readying the dataframe for training using onehot encoding

In [22]:
nearby_onehot = pd.get_dummies(nearby[['Category']], prefix="", prefix_sep="")
nearby_onehot['Neighbourhood'] = nearby['Neighbourhood']
nearby_onehot.head()

Unnamed: 0,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,...,Sports Club,Stationery Store,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Syrian Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Tunnel,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Neighbourhood
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Parkwoods
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Victoria Village


In [23]:
nearby_onehot_grouped = nearby_onehot.groupby('Neighbourhood').mean().reset_index()
#len(nearby['Neighbourhood'].unique())
nearby_onehot_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,...,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Syrian Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Tunnel,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.076923,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.028571,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.028571,0.0,0.0
5,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.06,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Business reply mail Processing Centre,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.04,0.04,0.04,0.04,0.12,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [0]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    try:
      return row_categories_sorted.index.values[0:num_top_venues]
    except:
      return row_categories_sorted.index.values[0:]

### Finding the top venues for each neighbourhood

In [0]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = nearby_onehot_grouped['Neighbourhood']

for ind in np.arange(nearby_onehot_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(nearby_onehot_grouped.iloc[ind, :], num_top_venues)

In [26]:
neighborhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Fabric Shop,Lounge,Discount Store,Badminton Court,Breakfast Spot,Skating Rink,Latin American Restaurant,Mediterranean Restaurant,Pool Hall,Supermarket
1,"Alderwood, Long Branch",Pizza Place,Convenience Store,Sandwich Place,Pharmacy,Gym,Gas Station,Coffee Shop,Pool,Donut Shop,Pub
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Pizza Place,Park,Sandwich Place,Fried Chicken Joint,Chinese Restaurant,Supermarket,Middle Eastern Restaurant,Sushi Restaurant
3,Bayview Village,Japanese Restaurant,Bank,Chinese Restaurant,Grocery Store,Skating Rink,Café,Discount Store,Distribution Center,Dive Bar,Dog Run
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Sandwich Place,Sports Club,Bagel Shop,Bakery,Bank,Sushi Restaurant,Indian Restaurant,Intersection
5,Berczy Park,Coffee Shop,Beer Bar,Cocktail Bar,Japanese Restaurant,Seafood Restaurant,Restaurant,Cheese Shop,Café,Creperie,Greek Restaurant
6,"Birch Cliff, Cliffside West",Park,College Stadium,Skating Rink,Diner,Thai Restaurant,Farm,General Entertainment,Café,Cosmetics Shop,Coworking Space
7,"Brockton, Parkdale Village, Exhibition Place",Café,Coffee Shop,Restaurant,Bakery,Gift Shop,Supermarket,Furniture / Home Store,Arts & Crafts Store,Thrift / Vintage Store,Food
8,Business reply mail Processing Centre,Fast Food Restaurant,Bakery,Light Rail Station,Burrito Place,Brewery,Restaurant,Bar,Harbor / Marina,Coffee Shop,Italian Restaurant
9,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Terminal,Harbor / Marina,Rental Car Location,Sculpture Garden,Boat or Ferry,Airport Gate,Tunnel,Music Venue,Park


### ***Clustering the data using KMeans***

In [32]:
from sklearn.cluster import KMeans

neighborhood_cluster = nearby_onehot_grouped.drop('Neighbourhood', axis = 1)

k_cluster = KMeans(n_clusters = 5, random_state=0)
k_cluster.fit(neighborhood_cluster)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=0, tol=0.0001, verbose=0)

In [0]:
neighborhoods_venues_sorted["Cluster"] = k_cluster.labels_


In [0]:
#final_df = final_df.drop('Postal Code', axis=1)
end_data = pd.merge(final_df,neighborhoods_venues_sorted, on = 'Neighbourhood')

In [35]:
end_data['Cluster'].unique()

array([1, 0, 3, 2, 4], dtype=int32)

### **Creating map of Toronto with clusters**

In [36]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

import matplotlib.colors as colors
import matplotlib.cm as cm
# set color scheme for the clusters
x = np.arange(5)
ys = [i + x + (i*x)**2 for i in range(5)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(end_data['Latitude'], end_data['Longitude'], end_data['Neighbourhood'], end_data['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Analyzing the clusters

In [47]:
end_data[end_data['Cluster']==0].iloc[:,5:-1]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Playground,Portuguese Restaurant,Pizza Place,Financial or Legal Service,Park,Hockey Arena,Sporting Goods Shop,Coffee Shop,Concert Hall,Eastern European Restaurant
2,Coffee Shop,Café,Theater,Pub,Park,Bakery,Breakfast Spot,Restaurant,Italian Restaurant,Dessert Shop
3,Clothing Store,Miscellaneous Shop,Coffee Shop,Vietnamese Restaurant,Furniture / Home Store,Dessert Shop,Restaurant,Fast Food Restaurant,Event Space,Sushi Restaurant
4,Coffee Shop,Sushi Restaurant,Italian Restaurant,Park,Gastropub,Japanese Restaurant,Café,Yoga Studio,College Theater,Fried Chicken Joint
5,Pharmacy,Playground,Park,Bank,Shopping Mall,Skating Rink,Grocery Store,Café,Elementary School,Electronics Store
6,Fast Food Restaurant,Coffee Shop,Martial Arts Dojo,African Restaurant,Spa,Bus Station,Hobby Shop,Paper / Office Supplies Store,Trail,Dumpling Restaurant
7,Japanese Restaurant,Gym,Coffee Shop,Beer Store,Café,Asian Restaurant,Restaurant,Italian Restaurant,Athletics & Sports,Supermarket
8,Japanese Restaurant,Gym,Coffee Shop,Beer Store,Café,Asian Restaurant,Restaurant,Italian Restaurant,Athletics & Sports,Supermarket
9,Fast Food Restaurant,Pizza Place,Pharmacy,Gastropub,Brewery,Café,Bank,Rock Climbing Spot,Intersection,Restaurant
10,Coffee Shop,Japanese Restaurant,Restaurant,Bubble Tea Shop,Park,Gastropub,Theater,Plaza,Shopping Mall,Sandwich Place


We can clearly see the common venues in the above cluster

# This also shows the general necesseties of the public in general

In [48]:
end_data[end_data['Cluster']==1].iloc[:,5:-1]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Food & Drink Shop,Bus Stop,Road,Park,Yoga Studio,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop
59,Park,College Stadium,Skating Rink,Diner,Thai Restaurant,Farm,General Entertainment,Café,Cosmetics Shop,Coworking Space
67,Park,Tennis Court,Pet Store,Convenience Store,Bank,Eastern European Restaurant,Dive Bar,Dog Run,Doner Restaurant,Donut Shop
69,Gym / Fitness Center,Jewelry Store,Park,Trail,Sushi Restaurant,Yoga Studio,Drugstore,Distribution Center,Dive Bar,Dog Run
83,Park,Grocery Store,Café,Japanese Restaurant,Gym / Fitness Center,Gym,Thai Restaurant,Sandwich Place,Candy Store,Dog Run
91,Park,Trail,Playground,Candy Store,Drugstore,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant
100,Baseball Field,Construction & Landscaping,Gym / Fitness Center,Park,Yoga Studio,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Drugstore


In [49]:
end_data[end_data['Cluster']==2].iloc[:,5:-1]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
48,Pool,Yoga Studio,Eastern European Restaurant,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant


In [50]:
end_data[end_data['Cluster']==3].iloc[:,5:-1]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Coffee Shop,Park,Business Service,Eastern European Restaurant,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Drugstore
62,Bus Line,Coffee Shop,Park,Swim School,Yoga Studio,Dumpling Restaurant,Dive Bar,Dog Run,Doner Restaurant,Donut Shop


In [51]:
end_data[end_data['Cluster']==4].iloc[:,5:-1]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,Playground,Home Service,Garden,Drugstore,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant


We can see why this one neighbourhood is in a different cluster from the very different top venues