# Capstone Project

## Introduction<a name="introduction"></a>

### Canada is a nation that has such high respect for people and quality of living. People across the world love migrating to Canada before any other nation because of the country's quality of life. People love food, and there can never be sufficient options when in any city. Toronto is the center that has so much potential for business people. Toronto is arguably one of the most exciting places in the world. Toronto has to 8,100 restaurants and pubs, representing 6.5% of all stores in the city.  More people will move to the town, and this means that the market and the need for more establishments, restaurants, and pubs will grow. We want to discover, by doing some investigation, which section in Toronto would be the best to set up a new establishment.

#### Retailing experts have concluded that we must pay particular consideration to recreation services because they are deemed attractive to possible consumers. Commonly, people go to the restaurant after the show or different cultural activities.
#### Also, additional food services present that could be viewed in some instances, competition, or, in other cases, complementary.

## Data<a name="data"></a>

#### To reach the above goals, different data sources are needed. We need to authenticate which area in the city has the most significant chance for a new restaurant to be thriving.

### Canada Data
##### Web scraping of Canada data on Wikipedia for a list of postal codes and areas.

### Geolocation
##### For geolocation of the Toronto neighborhoods, Python geocoder will be used.

### Foursquare location data
##### Foursquare provides venue data for Toronto establishments of interest



### Import Libraries needed

In [1]:
#import all libraries

from bs4 import BeautifulSoup
import urllib3.request
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
import folium
import os
import requests
import json
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors

#### Web Scrapping

In [2]:
#web scrapping
web_data = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [3]:
#parse web_data from html
soup = BeautifulSoup(web_data, 'html.parser')

#### Create a list with specified values

In [4]:
# create lists for columns specified
postalCodeList = []
boroughList = []
neighborhoodList = []

#### Utilize Beautifulsoap

In [5]:
#utilize Beautiful Soap
# locate the table
soup.find('table').find_all('tr')

#locate all the rows of the table
soup.find('table').find_all('tr')

#locate all the table data per row
for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')

#### Import data into list

In [6]:
#import data into the different lists
for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if(len(cells) > 0):
        postalCodeList.append(cells[0].text)
        boroughList.append(cells[1].text)
        neighborhoodList.append(cells[2].text.rstrip('\n')) 

#### Creating a Dataframe

In [7]:
# create a DataFrame from the specified lists
tor_df = pd.DataFrame({"PostalCode": postalCodeList,
                           "Borough": boroughList,
                           "Neighborhood": neighborhoodList})

In [8]:
tor_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


#### Removing 'Not assigned' cells

In [9]:
# drop cells with a borough - Not assigned
tor_df = tor_df[tor_df.Borough != "Not assigned"].reset_index(drop=True)
tor_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


In [10]:
# drop cells with a borough - Not assigned
tor_df = tor_df[tor_df.Neighborhood != "Not assigned"].reset_index(drop=True)
tor_df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
5,M7A,Downtown Toronto,Queen's Park
6,M1B,Scarborough,Rouge
7,M1B,Scarborough,Malvern
8,M3B,North York,Don Mills North
9,M4B,East York,Woodbine Gardens


#### Load Coordinates

In [11]:
# load the coordinates from the csv file
coordinates = pd.read_csv("https://cocl.us/Geospatial_data")
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [12]:
# rename column "PostalCode"
coordinates.rename(columns={"Postal Code": "PostalCode"}, inplace=True)
coordinates.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


#### Merge two tables together

In [13]:
# merge two table using column "PostalCode"
tor_df = tor_df.merge(coordinates, on="PostalCode", how="left")
tor_df.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
3,M6A,North York,Lawrence Heights,43.718518,-79.464763
4,M6A,North York,Lawrence Manor,43.718518,-79.464763
5,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
6,M1B,Scarborough,Rouge,43.806686,-79.194353
7,M1B,Scarborough,Malvern,43.806686,-79.194353
8,M3B,North York,Don Mills North,43.745906,-79.352188
9,M4B,East York,Woodbine Gardens,43.706397,-79.309937


#### Create Dataframe with only Toronto data

In [14]:
#use Toronto boroughs by specify word contains Toronto
tor_df = tor_df[tor_df['Borough'].str.contains('Toronto')].reset_index(drop=True)
print(tor_df.shape)
tor_df.head()

(74, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494
2,M5B,Downtown Toronto,Ryerson,43.657162,-79.378937
3,M5B,Downtown Toronto,Garden District,43.657162,-79.378937
4,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418


In [15]:
tor_df.dtypes

PostalCode       object
Borough          object
Neighborhood     object
Latitude        float64
Longitude       float64
dtype: object

In [16]:
export_csv = tor_df.to_csv ('cleaned_dataframe.csv', index = None, header=True)

#### Retrieve Long and Lat for Toronto using Geopy Library

In [17]:
# Use geopy library to get the latitude and longitude values of Toronto.
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

address = 'Toronto'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {} is {}, {}.'.format(address, latitude, longitude))



The geograpical coordinate of Toronto is 43.653963, -79.387207.


#### Create map of Toronto using latitude and longitude

In [18]:
# create map of Toronto using latitude and longitude values and mark the neighborhoods
map_tor = folium.Map(location=[latitude, longitude], zoom_start=13)

folium.Marker([latitude, longitude], popup='Toronto').add_to(map_tor)
#folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_tor) 
folium.Circle([latitude, longitude], radius=1700, color='blue', fill=False).add_to(map_tor)
#folium.Marker([lat, lon]).add_to(map_tor)
map_tor

## Methodology <a name="methodology"></a>

In this project, we will collect on checking areas of Toronto that have great establishment density, especially those that are rated highly by the people. We also want to see which type of food is favored in parts of Toronto. Recognizing that opening up an establishment, you need to know the demand for the area as well as how well your company will do in an area where the demand is great.

In the first step, we have managed the needed data: location and type of every restaurant in Toronto. 
The next step in our analysis will be calculation and exploration of 'establishment density' across various areas of Toronto - upon the report, we want to see what kind of people are in the diverse cities in Toronto and direct our attention to these areas.

Final step, we will concentrate on the most encouraging areas and within these create groups of locations that satisfy some essential requirements established in consultation with stakeholders: We want to see the area that has the most penetration of establishments. As well as the ethnicity of the various people there. We will show a map of all such places. 

#### Define Foursquare Credentials and Version

In [19]:
CLIENT_ID = 'YEHSJ2APYDNXQSTWLTM0ERU4PJD2N4BE00QFJQ1HW1ZHZA1U' # your Foursquare ID
CLIENT_SECRET = 'QV50BF0XHFJIZEP4HZMBW0PYXOBXCARJGQZDB4F31PULZTEB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YEHSJ2APYDNXQSTWLTM0ERU4PJD2N4BE00QFJQ1HW1ZHZA1U
CLIENT_SECRET:QV50BF0XHFJIZEP4HZMBW0PYXOBXCARJGQZDB4F31PULZTEB


#### Let's explore the first neighborhood in our dataframe.

In [20]:
tor_df.loc[0, 'Neighborhood']

'Harbourfront'

Get the neighborhood's latitude and longitude values.

In [21]:
neighborhood_latitude = tor_df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = tor_df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = tor_df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Harbourfront are 43.6542599, -79.3606359.


#### Now, let's get the top 100 venues that are in Harbourfront within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [22]:

LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius



# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL 

'https://api.foursquare.com/v2/venues/explore?&client_id=YEHSJ2APYDNXQSTWLTM0ERU4PJD2N4BE00QFJQ1HW1ZHZA1U&client_secret=QV50BF0XHFJIZEP4HZMBW0PYXOBXCARJGQZDB4F31PULZTEB&v=20180605&ll=43.6542599,-79.3606359&radius=500&limit=100'

Send the GET request and examine the resutls

In [23]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e345d07df2774001b0de422'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Corktown',
  'headerFullLocation': 'Corktown, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 48,
  'suggestedBounds': {'ne': {'lat': 43.6587599045, 'lng': -79.3544279001486},
   'sw': {'lat': 43.6497598955, 'lng': -79.36684389985142}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '54ea41ad498e9a11e9e13308',
       'name': 'Roselle Desserts',
       'location': {'address': '362 King St E',
        'crossStreet': 'Trinity St',
        'lat': 43.653446723052674,
        'lng': -79.3620167174383,
        'labeledLatLngs': [{'label': 'display',
 

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [24]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [25]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Roselle Desserts,Bakery,43.653447,-79.362017
1,Tandem Coffee,Coffee Shop,43.653559,-79.361809
2,Cooper Koo Family YMCA,Gym / Fitness Center,43.653191,-79.357947
3,Body Blitz Spa East,Spa,43.654735,-79.359874
4,Impact Kitchen,Restaurant,43.656369,-79.35698


And how many venues were returned by Foursquare?

In [26]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

48 venues were returned by Foursquare.


#### Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *toronto_venues*.

In [28]:
toronto_venues = getNearbyVenues(names=tor_df['Neighborhood'],
                                   latitudes=tor_df['Latitude'],
                                   longitudes=tor_df['Longitude']
                                  )

Harbourfront
Queen's Park
Ryerson
Garden District
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Adelaide
King
Richmond
Dovercourt Village
Dufferin
Harbourfront East
Toronto Islands
Union Station
Little Portugal
Trinity
The Danforth West
Riverdale
Design Exchange
Toronto Dominion Centre
Brockton
Exhibition Place
Parkdale Village
The Beaches West
India Bazaar
Commerce Court
Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North
Forest Hill West
High Park
The Junction South
North Toronto West
The Annex
North Midtown
Yorkville
Parkdale
Roncesvalles
Davisville
Harbord
University of Toronto
Runnymede
Swansea
Moore Park
Summerhill East
Chinatown
Grange Park
Kensington Market
Deer Park
Forest Hill SE
Rathnelly
South Hill
Summerhill West
CN Tower
Bathurst Quay
Island airport
Harbourfront West
King and Spadina
Railway Lands
South Niagara
Rosedale
Stn A PO Boxes 25 The Esplanade
Cabbagetown
St. James Town
First Canadian Place
Underground city

#### Let's check the size of the resulting dataframe

In [29]:
print(toronto_venues.shape)
toronto_venues.head()

(3226, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Harbourfront,43.65426,-79.360636,Cooper Koo Family YMCA,43.653191,-79.357947,Gym / Fitness Center
3,Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,Harbourfront,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


Let's check how many venues were returned for each neighborhood

In [30]:
toronto_venues.groupby('Neighborhood').count()


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelaide,100,100,100,100,100,100
Bathurst Quay,18,18,18,18,18,18
Berczy Park,55,55,55,55,55,55
Brockton,22,22,22,22,22,22
Business Reply Mail Processing Centre 969 Eastern,17,17,17,17,17,17
CN Tower,18,18,18,18,18,18
Cabbagetown,44,44,44,44,44,44
Central Bay Street,79,79,79,79,79,79
Chinatown,84,84,84,84,84,84
Christie,18,18,18,18,18,18


#### Let's find out how many unique categories can be curated from all the returned venues

In [31]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 233 uniques categories.


#### We only need the restuarant data

In [32]:
 toronto_venues = toronto_venues[toronto_venues['Venue Category'].str.contains('Restaurant')].reset_index(drop=True)

In [33]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Harbourfront,43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant
1,Harbourfront,43.65426,-79.360636,Cluny Bistro & Boulangerie,43.650565,-79.357843,French Restaurant
2,Harbourfront,43.65426,-79.360636,El Catrin,43.650601,-79.35892,Mexican Restaurant
3,Harbourfront,43.65426,-79.360636,Cocina Economica,43.654959,-79.365657,Mexican Restaurant
4,Harbourfront,43.65426,-79.360636,Flame Shack,43.656844,-79.358917,Restaurant


##  Analyze Each Neighborhood in Toronto <a name="analysis"></a>

In [34]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,...,Ramen Restaurant,Restaurant,Seafood Restaurant,Southern / Soul Food Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Harbourfront,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [35]:
toronto_onehot.shape

(745, 46)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [36]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,...,Ramen Restaurant,Restaurant,Seafood Restaurant,Southern / Soul Food Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Adelaide,0.0,0.071429,0.107143,0.0,0.035714,0.0,0.0,0.0,0.035714,...,0.035714,0.107143,0.071429,0.0,0.071429,0.0,0.107143,0.0,0.071429,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.090909,0.181818,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0
2,Brockton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Cabbagetown,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,...,0.0,0.2,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0
5,Central Bay Street,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.1,0.0,...,0.05,0.0,0.05,0.0,0.05,0.0,0.05,0.0,0.05,0.0
6,Chinatown,0.0,0.0,0.0,0.034483,0.0,0.0,0.034483,0.137931,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.137931,0.206897
7,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Church and Wellesley,0.037037,0.037037,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,...,0.037037,0.111111,0.037037,0.0,0.148148,0.0,0.037037,0.037037,0.0,0.037037
9,Commerce Court,0.0,0.08,0.08,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.2,0.12,0.0,0.0,0.0,0.08,0.0,0.08,0.0


#### Let's confirm the new size

In [37]:
toronto_grouped.shape

(59, 46)

#### Let's print each neighborhood along with the top 10 most common venues

In [38]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide----
                           venue  freq
0               Asian Restaurant  0.11
1                Thai Restaurant  0.11
2                     Restaurant  0.11
3  Vegetarian / Vegan Restaurant  0.07
4            American Restaurant  0.07


----Berczy Park----
                         venue  freq
0           Seafood Restaurant  0.18
1             Greek Restaurant  0.09
2            French Restaurant  0.09
3  Eastern European Restaurant  0.09
4           Italian Restaurant  0.09


----Brockton----
                     venue  freq
0       Italian Restaurant   0.5
1               Restaurant   0.5
2  New American Restaurant   0.0
3      Japanese Restaurant   0.0
4        Korean Restaurant   0.0


----Business Reply Mail Processing Centre 969 Eastern----
                     venue  freq
0     Fast Food Restaurant   0.5
1               Restaurant   0.5
2        Afghan Restaurant   0.0
3  New American Restaurant   0.0
4      Japanese Restaurant   0.0


----Cabbagetown----
        

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [39]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [40]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Thai Restaurant,Asian Restaurant,Restaurant,American Restaurant,Sushi Restaurant,Seafood Restaurant,Vegetarian / Vegan Restaurant,Greek Restaurant,Colombian Restaurant,Italian Restaurant
1,Berczy Park,Seafood Restaurant,Greek Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant,Japanese Restaurant,Eastern European Restaurant,Restaurant,Comfort Food Restaurant,French Restaurant,Thai Restaurant
2,Brockton,Italian Restaurant,Restaurant,Vietnamese Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant
3,Business Reply Mail Processing Centre 969 Eastern,Fast Food Restaurant,Restaurant,Vietnamese Restaurant,Cuban Restaurant,French Restaurant,Filipino Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant
4,Cabbagetown,Italian Restaurant,Restaurant,Thai Restaurant,Taiwanese Restaurant,Indian Restaurant,Japanese Restaurant,Caribbean Restaurant,Chinese Restaurant,Vietnamese Restaurant,Dim Sum Restaurant


## Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [41]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 1, 0, 0, 0, 0, 1, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 15 venues for each neighborhood.

In [42]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = tor_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636,0.0,Restaurant,Mexican Restaurant,French Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant
1,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,0.0,Mexican Restaurant,Fast Food Restaurant,Sushi Restaurant,Italian Restaurant,Seafood Restaurant,Portuguese Restaurant,Chinese Restaurant,Vietnamese Restaurant,Falafel Restaurant,Ethiopian Restaurant
2,M5B,Downtown Toronto,Ryerson,43.657162,-79.378937,0.0,Japanese Restaurant,Middle Eastern Restaurant,Restaurant,Italian Restaurant,Ramen Restaurant,Fast Food Restaurant,Vietnamese Restaurant,American Restaurant,Sushi Restaurant,Ethiopian Restaurant
3,M5B,Downtown Toronto,Garden District,43.657162,-79.378937,0.0,Japanese Restaurant,Middle Eastern Restaurant,Restaurant,Italian Restaurant,Ramen Restaurant,Fast Food Restaurant,Vietnamese Restaurant,American Restaurant,Sushi Restaurant,Ethiopian Restaurant
4,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0.0,Restaurant,Italian Restaurant,American Restaurant,Thai Restaurant,Seafood Restaurant,Indian Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Comfort Food Restaurant,French Restaurant


In [43]:
toronto_merged.dtypes

PostalCode                 object
Borough                    object
Neighborhood               object
Latitude                  float64
Longitude                 float64
Cluster Labels            float64
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object

Finally, let's visualize the resulting clusters

In [44]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow,
        fill=True,
        fill_color=rainbow,
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Cluster 1

In [45]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,0.0,Restaurant,Mexican Restaurant,French Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant
1,Downtown Toronto,0.0,Mexican Restaurant,Fast Food Restaurant,Sushi Restaurant,Italian Restaurant,Seafood Restaurant,Portuguese Restaurant,Chinese Restaurant,Vietnamese Restaurant,Falafel Restaurant,Ethiopian Restaurant
2,Downtown Toronto,0.0,Japanese Restaurant,Middle Eastern Restaurant,Restaurant,Italian Restaurant,Ramen Restaurant,Fast Food Restaurant,Vietnamese Restaurant,American Restaurant,Sushi Restaurant,Ethiopian Restaurant
3,Downtown Toronto,0.0,Japanese Restaurant,Middle Eastern Restaurant,Restaurant,Italian Restaurant,Ramen Restaurant,Fast Food Restaurant,Vietnamese Restaurant,American Restaurant,Sushi Restaurant,Ethiopian Restaurant
4,Downtown Toronto,0.0,Restaurant,Italian Restaurant,American Restaurant,Thai Restaurant,Seafood Restaurant,Indian Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Comfort Food Restaurant,French Restaurant
6,Downtown Toronto,0.0,Seafood Restaurant,Greek Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant,Japanese Restaurant,Eastern European Restaurant,Restaurant,Comfort Food Restaurant,French Restaurant,Thai Restaurant
7,Downtown Toronto,0.0,Italian Restaurant,Japanese Restaurant,Chinese Restaurant,Thai Restaurant,Sushi Restaurant,French Restaurant,Seafood Restaurant,Korean Restaurant,Portuguese Restaurant,Vegetarian / Vegan Restaurant
9,Downtown Toronto,0.0,Thai Restaurant,Asian Restaurant,Restaurant,American Restaurant,Sushi Restaurant,Seafood Restaurant,Vegetarian / Vegan Restaurant,Greek Restaurant,Colombian Restaurant,Italian Restaurant
10,Downtown Toronto,0.0,Thai Restaurant,Asian Restaurant,Restaurant,American Restaurant,Sushi Restaurant,Seafood Restaurant,Vegetarian / Vegan Restaurant,Greek Restaurant,Colombian Restaurant,Italian Restaurant
11,Downtown Toronto,0.0,Thai Restaurant,Asian Restaurant,Restaurant,American Restaurant,Sushi Restaurant,Seafood Restaurant,Vegetarian / Vegan Restaurant,Greek Restaurant,Colombian Restaurant,Italian Restaurant


In group 1 we notice that the type of food marketed here is Italian Establishments

#### Cluster 2

In [46]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Downtown Toronto,1.0,Italian Restaurant,Restaurant,Vietnamese Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant
23,West Toronto,1.0,Italian Restaurant,Restaurant,Vietnamese Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant
24,West Toronto,1.0,Italian Restaurant,Restaurant,Vietnamese Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant
25,West Toronto,1.0,Italian Restaurant,Restaurant,Vietnamese Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant
42,West Toronto,1.0,Cuban Restaurant,Italian Restaurant,Eastern European Restaurant,Restaurant,Vietnamese Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Dumpling Restaurant
43,West Toronto,1.0,Cuban Restaurant,Italian Restaurant,Eastern European Restaurant,Restaurant,Vietnamese Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Dumpling Restaurant


In group 2 it's more of Asian food that is liked

#### Cluster 3

In [47]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,West Toronto,2.0,Fast Food Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,French Restaurant,Filipino Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant
13,West Toronto,2.0,Fast Food Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,French Restaurant,Filipino Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant
39,Central Toronto,2.0,American Restaurant,Vegetarian / Vegan Restaurant,Indian Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Doner Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant
40,Central Toronto,2.0,American Restaurant,Vegetarian / Vegan Restaurant,Indian Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Doner Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant
41,Central Toronto,2.0,American Restaurant,Vegetarian / Vegan Restaurant,Indian Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Doner Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant


In West Toronto, they like Middle Eastern Eateries

#### Cluster 4

In [48]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
34,Central Toronto,3.0,Sushi Restaurant,Mexican Restaurant,Vietnamese Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant
35,Central Toronto,3.0,Sushi Restaurant,Mexican Restaurant,Vietnamese Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant


The preponderance of the area here in Midtown Toronto and people have a different way of consuming.

#### Cluster 5

In [49]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
54,Central Toronto,4.0,Vietnamese Restaurant,American Restaurant,Sushi Restaurant,Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant
55,Central Toronto,4.0,Vietnamese Restaurant,American Restaurant,Sushi Restaurant,Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant
56,Central Toronto,4.0,Vietnamese Restaurant,American Restaurant,Sushi Restaurant,Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant
57,Central Toronto,4.0,Vietnamese Restaurant,American Restaurant,Sushi Restaurant,Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant
58,Central Toronto,4.0,Vietnamese Restaurant,American Restaurant,Sushi Restaurant,Restaurant,Cuban Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant


Central Toronto seems to be a vegan and vegetarian section. As the preponderance of people in the central area loves vegan and vegetarian establishments

## Results and Discussion <a name="results"></a>

Our analysis shows that there is a significant number of restaurants in Toronto and that a lot of people are very active. By active, I mean that they eat a lot and have commonplace visits. The area covered was mainly central Toronto but mostly all significant cities in Toronto.

We also established some great insights from the dataset. Every area in Toronto has different types of preferences. Its almost as if every region has its professional cook as we never find the same kind of food for 2 or 3 different cities. We also now know that the demand for food and restaurants is very high, as people who live in Toronto have a big appetite according to the data.

As an entrepreneur or a business owner, we can establish that there is an opportunity to start a restaurant business. Knowing the demand, you have to know what type of food people like in a specific area. The different sector has different requirements. When clustering the data with k means, that is how I established that.


## Conclusion <a name="conclusion"></a>

The gathered data from trusted sources and a known and robust methodology has been applied for processing

A group of five neighborhood has been selected from more than one hundred that Toronto has.

In such neighborhoods, there are English Restaurants, Fast Foods, and Pizza. Also, the different cuisines in the separate area were a great discovery.

We consider that one of them will be able to start a profitable and fruitful business endeavor.
