### Introduction 

A recent university graduate has decided to move from Miami, Florida to Toronto, Canada and open a Cuban restaurant.  In Miami, 54% of the city's population is Cuban, which helps explain the popularity of the many Cuban restaurants in the area.  Unfortunately for a new business owner, opening a restaurant where the menu items are already popular may have its challenges and could cause the business to be unsuccessful very early on.  The graduate decides to take a leap of faith and move to Canada in hope of success, where the third largest Cuban immigrant community resides.  The following factors will have to be evaluated in order to do well: location and demographic of location, price of menu items, and surrounding competition.  

### Business Problem

In which neighborhood of Toronto should the graduate open their Cuban restaurant?

### Data

In order to find a solution to this business problem I will be using Foursquare API to determine the following: which neighborhoods have a high percentage of Cubans residing, where the most popular Cuban restaurants are located, and to evaluate the price of popular menu items at those restaurants. With this data, I will then use K-Means Clustering to find the best neighborhood for the restaurant.

In [5]:
# Import Libraries
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

In [6]:
# Get Canada data
url = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
from bs4 import BeautifulSoup
soup = BeautifulSoup(url, 'lxml')
# Extract the table 
table = soup.find('table', {'class':'wikitable sortable'})
table_1 = table.find_all('tr')
data = []
for row in table_1:
    td=[]
    for t in row.find_all('td'):
        td.append(t.text.strip())
    data.append(td)
# Create DataFrame
df = pd.DataFrame(data, columns = ['PostalCode', 'Borough', 'Neighborhood'])
df = df[~df['Borough'].isnull()]
df.drop(df[df.Borough == 'Not assigned'].index, inplace=True)
df.reset_index(drop=True, inplace=True)
df = df.groupby(['PostalCode','Borough'])['Neighborhood'].apply(lambda x: ','.join(x)).reset_index()
df['Neighborhood'].replace('Not assigned',df['Borough'],inplace=True)
# Read geographical csv
geo_df = pd.read_csv("https://cocl.us/Geospatial_data")
geo_df.rename(columns = {'Postal Code':'PostalCode'}, inplace = True)
# Merge df and geo_df
merge = pd.merge(df, geo_df, on = 'PostalCode')
# Get Toronto data
toronto = merge[merge['Borough'].str.contains('Toronto',regex=False)]
toronto

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.676357,-79.293031
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
42,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
43,M4M,East Toronto,Studio District,43.659526,-79.340923
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
45,M4P,Central Toronto,Davisville North,43.712751,-79.390197
46,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678
47,M4S,Central Toronto,Davisville,43.704324,-79.38879
48,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
49,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",43.686412,-79.400049


Now that we have our data on the neighborhoods in Toronto, we need to figure out what the demographic is for each.  

In [7]:
# Create DataFrame for Deomgraphics data
import pandas as pd
info = [('M4E','#5',215),('M4K','#6',125),('M4L','#5',140),('M4M','#6',305),
        ('M4P','#6',70),('M4R','#6',115),('M4S','#5',285),('M4T','#5',270),
        ('M4V','#4',220),('M4W','#5',270),('M4X','#5',260),('M4Y','#5',700),
        ('M5A','#5',1405),('M5B','#5',700),('M5C','#5',700),('M5E','#5',700),
        ('M5G','#7',455),('M5H','#7',455),('M5J','#5',1405),('M5K','#5',455),
        ('M5L','#5',455),('M5N','#7',115),('M5P','#7',215),('M5R','#5',410),
        ('M5S','#6',125),('M5T','#7',325),('M5V','#4',1005),('M5W','#7',115),
        ('M5X','#5',455),('M6G','#4',235),('M6H','#4',1515),('M6J','#4',185),
        ('M6K','#6',510),('M6P','#5',490),('M6R','#6',285),('M6S','#4',475),
        ('M7A','#5',400),('M7Y','#7',115)]
demo = pd.DataFrame(info, columns = ["PostalCode", "Latin American Minority Ranking", "Number of Latin Americans (2016)"])
demo

Unnamed: 0,PostalCode,Latin American Minority Ranking,Number of Latin Americans (2016)
0,M4E,#5,215
1,M4K,#6,125
2,M4L,#5,140
3,M4M,#6,305
4,M4P,#6,70
5,M4R,#6,115
6,M4S,#5,285
7,M4T,#5,270
8,M4V,#4,220
9,M4W,#5,270


In [8]:
# Merge Toronto data and Demographics data
combo = pd.merge(toronto, demo, on = 'PostalCode')
combo

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Latin American Minority Ranking,Number of Latin Americans (2016)
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,#5,215
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,#6,125
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,#5,140
3,M4M,East Toronto,Studio District,43.659526,-79.340923,#6,305
4,M4P,Central Toronto,Davisville North,43.712751,-79.390197,#6,70
5,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678,#6,115
6,M4S,Central Toronto,Davisville,43.704324,-79.38879,#5,285
7,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,#5,270
8,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",43.686412,-79.400049,#4,220
9,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,#5,270


Now that we have the combined dataframe, we can observe the presence of Latin Americans in each neighborhood of Toronto, Canada.  Based on this data, specifically the rankings and the number of Latin Americans, we will use Foursquare API to explore the following neighborhoods further: Summerhill West, Regent Park, Harbourfront East, CN Tower, Christie, Dufferin, Little Portugal, and Runnymede.

### Using Foursquare API to find the top 25 Cuban restaurants in Toronto

In [9]:
#Import Libraries
from pandas.io.json import json_normalize
import folium
from geopy.geocoders import Nominatim
import requests

In [10]:
# Foursquare Credentials
CLIENT_ID = "NNZ3OPEXWX5TTA0BHMRUDHIJ1PNX33FNFDRISIXG1L0U4LDK"
CLIENT_SECRET = "ZOCUQVSR4CRYINTSOFE1THKJG0LMRJN2CTP2DLD0R53HM2OZ"
VERSION = "20180604"
LIMIT = 50

In [11]:
# Find Latitude and Longitude
address = 'Toronto, Canada'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

43.6534817 -79.3839347


In [12]:
search_query = 'Carribean Restaurant'
radius = 750
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, 
                                                                            CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ecff7fd0be7b4001b5399a5'},
 'response': {'venues': [{'id': '5414a874498ef49ca335992d',
    'name': 'Carribean Roti',
    'location': {'lat': 43.658216187158324,
     'lng': -79.38226093721481,
     'labeledLatLngs': [{'label': 'display',
       'lat': 43.658216187158324,
       'lng': -79.38226093721481}],
     'distance': 544,
     'cc': 'CA',
     'country': 'Canada',
     'formattedAddress': ['Canada']},
    'categories': [{'id': '4bf58dd8d48988d144941735',
      'name': 'Caribbean Restaurant',
      'pluralName': 'Caribbean Restaurants',
      'shortName': 'Caribbean',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/caribbean_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1590687750',
    'hasPerk': False},
   {'id': '4ad4c05ff964a52048f720e3',
    'name': 'Hemispheres Restaurant & Bistro',
    'location': {'address': '110 Chestnut Street',
     'lat': 43.65488413420439,
     'lng': -79.38593077

In [13]:
venues = results['response']['venues']
dataframe = json_normalize(venues)
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Carribean Roti,Caribbean Restaurant,,CA,,Canada,,544,[Canada],"[{'label': 'display', 'lat': 43.65821618715832...",43.658216,-79.382261,,,,5414a874498ef49ca335992d
1,Hemispheres Restaurant & Bistro,American Restaurant,110 Chestnut Street,CA,Toronto,Canada,,224,"[110 Chestnut Street, Toronto ON M5G 1R3, Canada]","[{'label': 'display', 'lat': 43.65488413420439...",43.654884,-79.385931,,M5G 1R3,ON,4ad4c05ff964a52048f720e3
2,Osgoode Hall Restaurant,New American Restaurant,130 Queen St W,CA,,Canada,University Ave,189,"[130 Queen St W (University Ave), M5H 2N6, Can...","[{'label': 'display', 'lat': 43.65197895903515...",43.651979,-79.385049,,M5H 2N6,,4cffc78a75d3236a3b10e7f7
3,Caribbean Queen,Caribbean Restaurant,10 Dundas St E,CA,Toronto,Canada,at Yonge St,396,"[10 Dundas St E (at Yonge St), Toronto ON, Can...","[{'label': 'display', 'lat': 43.6563137870131,...",43.656314,-79.380959,,,ON,4ad896f0f964a520981221e3
4,Cali Restaurant,Vietnamese Restaurant,179 Dundas St. W.,CA,Toronto,Canada,at Chestnut,264,"[179 Dundas St. W. (at Chestnut), Toronto ON M...","[{'label': 'display', 'lat': 43.65506808, 'lng...",43.655068,-79.386375,,M5G,ON,4c476d6719fde21e32410876
5,Wah Too Seafood Restaurant,Chinese Restaurant,56 Centre Ave.,CA,Toronto,Canada,,303,"[56 Centre Ave., Toronto ON M5G 1R5, Canada]","[{'label': 'display', 'lat': 43.65483285234745...",43.654833,-79.387206,,M5G 1R5,ON,4c69740b8d22c9284d42b745
6,Yueh Tung Chinese Restaurant,Chinese Restaurant,126 Elizabeth St.,CA,Toronto,Canada,Dundas St.,229,"[126 Elizabeth St. (Dundas St.), Toronto ON, C...","[{'label': 'display', 'lat': 43.65528126342919...",43.655281,-79.385337,,,ON,52a7ae41498eed3af4d0a3fa
7,Richtree Natural Market Restaurants,Restaurant,14 Queen St W,CA,Toronto,Canada,,313,"[14 Queen St W, Toronto ON M5H 3X4, Canada]","[{'label': 'display', 'lat': 43.65261436174172...",43.652614,-79.380231,,M5H 3X4,ON,4b295e10f964a520ba9d24e3
8,Some Time BBQ Grill Restaurant 碳烤屋,Szechuan Restaurant,988 Baldwin Street,CA,Toronto,Canada,,839,"[988 Baldwin Street, Toronto ON, Canada]","[{'label': 'display', 'lat': 43.655874, 'lng':...",43.655874,-79.393826,,,ON,5750b013498e755287c6de97
9,New Treasure Restaurant,Dim Sum Restaurant,150 Dundas St W,CA,Toronto,Canada,at Elizabeth,240,"[150 Dundas St W (at Elizabeth), Toronto ON, C...","[{'label': 'display', 'lat': 43.65538444237565...",43.655384,-79.385362,,,ON,4b2a674ef964a52074a824e3


From this dataframe, we can see that the two Cuban/Caribbean restaurants are located in postalcode M6H and M8V. For postalcode M6H, this means that the restaurant is located near Dufferin, therefore we can remove it from our list of neighborhood options.

### Using K-Means Clustering and a Folium Map

In [23]:
# Create dataset with 7 remaining neighborhoods
import pandas as pd
neigh = [('M4V', 'Summerhill West', '43.686412','-79.400049'), ('M5A','Regent Park','43.654260','-79.360636'),('M5J','Harbourfront East','43.640816','-79.381752'),
         ('M5V','CN Tower','43.628947','-79.394420'),('M6G','Christie','43.669542','-79.422564'),('M6J','Little Portugal','43.647927','-79.419750'),
         ('M6S','Runnymede','43.651571','-79.484450')]
df = pd.DataFrame(neigh, columns = ['PostalCode', 'Neighborhood', 'Latitude', 'Longitude'])
df

Unnamed: 0,PostalCode,Neighborhood,Latitude,Longitude
0,M4V,Summerhill West,43.686412,-79.400049
1,M5A,Regent Park,43.65426,-79.360636
2,M5J,Harbourfront East,43.640816,-79.381752
3,M5V,CN Tower,43.628947,-79.39442
4,M6G,Christie,43.669542,-79.422564
5,M6J,Little Portugal,43.647927,-79.41975
6,M6S,Runnymede,43.651571,-79.48445


In [25]:
map_toronto = folium.Map(location=[latitude, longitude],zoom_start=15)
for lat,lng,postal,neighborhood in zip(df['Latitude'],df['Longitude'],df['PostalCode'],df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, postal)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
map_toronto

From this map, we can see where the selected neighborhoods are in Toronto.  Notice that there are four neighborhoods that are very close to one another.  If we narrow our choices to these four and select the one with highest number of Latin Americans present, we will select CN Tower, postalcode M5V, as our area for where the Cuban restaurant should be opened.