# Define a Problem

#### The problem

A business selling kitchen ingredients to food places in New York is thinking of expanding its operations to another city. It wants to open its operations in Toronto but is unsure in which neighborhood it should house its office which also doubles as a warehouse. The **ideal location** would be in a **neighborhood surrounded by the most food places**, that way it can maximise visibility among its potential clients as well as reduce transportation distance between the food places and the business' office/warehouse, which will ultimately reduce delivery cost. 

#### The Task

To help determine the ideal location for its new office/warehouse, I have been tasked to utilise foursquare API to determine the most suitable neighborhood for the business to open in.

#### Data needed

The data required to undertake this analysis would include:
1. the longitude and latitude of all the neighborhoods in Toronto
2. the longitude and latitude of all venues in Toronto
3. the category of all venues in Toronto 


# The Plan

I will use Foursquare API to get every single venue within a 2-kilometer radius for all neighborhoods in Toronto. I will then filter the venues for restaurants, and rank the neighborhoods based on how many restaurants are within 2 kilometers of its radius. The no 1 ranking neighborhood will be recommended to the business to open its new operations there. 

In [1]:
# Importing all the libraries 

import pandas as pd
import numpy as np

# in case if not installed
!pip install lxml
!pip3 install lxml

import lxml

import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
import requests
import json
from pandas.io.json import json_normalize

# in case if not installed, used for webscrapping
! pip install beautifulsoup4
! pip3 install beautifulsoup4

from bs4 import BeautifulSoup



In [2]:
# Getting a list of postal codes in Toronto from a wikipedia page and reading it into a dataframe

df_postalcodes = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]

df_postalcodes

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


In [3]:
# Dropping all rows with an unassiged Borough

df_postalcodes = df_postalcodes[df_postalcodes['Borough'] != 'Not assigned']

# Resetting index
df_postalcodes.reset_index(inplace = True)

# Changing column name "Post Code" to "Postal Code"
df_postalcodes = df_postalcodes.rename(columns={'Post Code': 'Postal Code'})

# Dropping "index" column
df_postalcodes.drop(['index'], axis = 1, inplace = True)

In [4]:
# Cleaning potential "Not assigned" neighborhood by replacing "Not assigned" to its borough

for rows in df_postalcodes.index:
    if df_postalcodes['Neighborhood'][rows] == 'Not assigned':
        df_postalcodes.at[rows,'Neighborhood'] = df_postalcodes['Borough'][rows]    

In [5]:
# Downloading a CSV file of long lat data for all neighborhood from website and chucking it into a pandas dataframe 

url = "http://cocl.us/Geospatial_data"
df_longlat = pd.read_csv(url)

In [6]:
# merging the 2 dataframes into df_postalcodes using a common column value, keeping all rows 
# from the dataframe "df_postalcodes" using a left join

df_postalcodes = pd.merge(df_postalcodes, df_longlat, on="Postal Code", how="left")

df_postalcodes

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


In [7]:
# Defining foursquare credentials 

CLIENT_ID = 'P3YC5DTMGZ0KUPU5ZMW3DGFFINJBHLHA4NORJQC0GVUBAFP0' # use your Foursquare ID
CLIENT_SECRET = 'M1DZ4NAOQLBLXJHPL1HFLAFCWRVTAULDHZESE04R05UOJRUG' # use your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [8]:
# defining a function that extracts all venues within a 2-kilometer radius of a neighborhood

radius = 2000

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
# Create a df of all venues within 2-kilometers of all neighborhood in Toronto using the defined function

df_venues = getNearbyVenues(names = df_postalcodes['Neighborhood'],
                                   latitudes = df_postalcodes['Latitude'],
                                   longitudes = df_postalcodes['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [10]:
df_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
...,...,...,...,...,...,...,...
1310,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,RONA,43.629393,-79.518320,Hardware Store
1311,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Royal Canadian Legion #210,43.628855,-79.518903,Social Club
1312,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Koala Tan Tanning Salon & Sunless Spa,43.631370,-79.519006,Tanning Salon
1313,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Kingsway Boxing Club,43.627254,-79.526684,Gym


In [11]:

# Use this to filter df_venues

In [12]:
# Web Scrapping to get a list of venues category under the "food" umbrella

# Getting a html output
URL = 'https://developer.foursquare.com/docs/build-with-foursquare/categories/'
html_page = requests.get(URL)

In [13]:
# Creating a beautiful soup object from the html output
soup = BeautifulSoup(html_page.content, 'html.parser')

In [14]:
# Creating a list of all venues category
result = []

for li_tag in soup.find_all('ul', {'class':'VenueCategories__Wrapper-sc-1ysxg0y-0 bYmzDC'}):
    for span_tag in li_tag.find_all('li'):
        for div_tag in span_tag.find_all('div'):
            all_cat = div_tag.find('h3').text
            result.append(all_cat)

In [15]:
# Slicing the list to only include venues categories under the food umbrella

food_cat = result[result.index('Food')+1:result.index('Nightlife Spot')]

food_cat

['Afghan Restaurant',
 'African Restaurant',
 'Ethiopian Restaurant',
 'American Restaurant',
 'New American Restaurant',
 'Asian Restaurant',
 'Burmese Restaurant',
 'Cambodian Restaurant',
 'Chinese Restaurant',
 'Anhui Restaurant',
 'Beijing Restaurant',
 'Cantonese Restaurant',
 'Cha Chaan Teng',
 'Chinese Aristocrat Restaurant',
 'Chinese Breakfast Place',
 'Dim Sum Restaurant',
 'Dongbei Restaurant',
 'Fujian Restaurant',
 'Guizhou Restaurant',
 'Hainan Restaurant',
 'Hakka Restaurant',
 'Henan Restaurant',
 'Hong Kong Restaurant',
 'Huaiyang Restaurant',
 'Hubei Restaurant',
 'Hunan Restaurant',
 'Imperial Restaurant',
 'Jiangsu Restaurant',
 'Jiangxi Restaurant',
 'Macanese Restaurant',
 'Manchu Restaurant',
 'Peking Duck Restaurant',
 'Shaanxi Restaurant',
 'Shandong Restaurant',
 'Shanghai Restaurant',
 'Shanxi Restaurant',
 'Szechuan Restaurant',
 'Taiwanese Restaurant',
 'Tianjin Restaurant',
 'Xinjiang Restaurant',
 'Yunnan Restaurant',
 'Zhejiang Restaurant',
 'Filipino R

In [16]:
# Filtering df_venues by the list of venue category under the food umbrella into a new df "df_food"
df_food = df_venues[df_venues['Venue Category'].isin(food_cat)]

# Resetting index
df_food.reset_index(inplace = True)

# Dropping "index" column
df_food.drop(['index'], axis = 1, inplace = True)

df_food

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
1,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
2,Victoria Village,43.725882,-79.315572,The Frig,43.727051,-79.317418,French Restaurant
3,Victoria Village,43.725882,-79.315572,Pizza Nova,43.725824,-79.312860,Pizza Place
4,"Regent Park, Harbourfront",43.654260,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
...,...,...,...,...,...,...,...
686,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,South St. Burger,43.631314,-79.518408,Burger Joint
687,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Artisano Bakery Café,43.631006,-79.518172,Bakery
688,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,McDonald's,43.630002,-79.518198,Fast Food Restaurant
689,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Subway,43.631659,-79.519001,Sandwich Place


In [17]:
# Creating a ranked pandas series of Neigborhood based on the no of restaurants
rank = df_food.groupby(['Neighborhood']).count()['Venue Category'].sort_values(ascending = False)

In [18]:
print("The neighborhood with the most restaurants in Toronto is", rank.index[0])

The neighborhood with the most restaurants in Toronto is Central Bay Street


# Solution

According to the analysis undertaken, the business should **expand its operations** in Toronto **in the neighborhood "Central Bay Street"**