# Capstone Project: The Battles of Neighborhood (Week 1)

## 1. Introduction/Business Problem

In this project, we will give ideal neighborhood locations to open a Brazilian steakhouse restaurant. This report will be used by stakeholders interested in opening an upscale, all-you-can-eat Brazilian steakhouse restaurant in Toronto, Canada. We will look for the location of the existing upscale restaurants in the vicinity of Toronto in each neighborhood. We will focus our data science analysis to give the best possible neighborhood options for stakeholders in order to open an upscale Brazilian steakhouse restaurant in Toronto.

## 2. Data

1. Use neighborhood data of Toronto City from the Wikipedia page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. From this data, we will extract 'Postal code', 'Borough' and 'Neighborhood' for Toronto.

2. Download location data using the link http://cocl.us/Geospatial_data and extract the ‘Latitude’ and ‘Longitude’ for Toronto neighborhoods.

3. Use Foursquare API to retrieve geo-location information for existing upscale restaurants in  each neighborhood in Toronto.

4. Use this analysis to identify ideal neighborhood locations in Toronto to open a Brazilian steakhouse restaurant.

### 2.1 Import Libraries

In [1]:
import pandas as pd
import numpy as np
import json, requests
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
#Use geopy library to get the latitude and longitude values
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
!pip install beautifulsoup4
from bs4 import BeautifulSoup
!pip install lxml 
import lxml

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.



In [2]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### 2.2 Transform neighborhood data of Toronto City from the Wikipedia page using the BeautifulSoup package into a pandas dataframe.

In [3]:
import requests
CA_website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [4]:
soup = BeautifulSoup(CA_website_url,'lxml')
htmltable = soup.find('table', { 'class' : 'wikitable sortable' })

In [5]:
def tableDataText(table):       
    rows = []
    trs = table.find_all('tr')
    headerow = [td.get_text(strip=True) for td in trs[0].find_all('th')] # header row
    if headerow: # if there is a header row include first
        rows.append(headerow)
        trs = trs[1:]
    for tr in trs: # for every table row
        rows.append([td.get_text(strip=True) for td in tr.find_all('td')]) # data row
    return rows

In [6]:
list_table = tableDataText(htmltable)

In [7]:
import pandas as pd
dftable = pd.DataFrame(list_table[1:], columns=list_table[0])
dftable.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
9,M8A,Not assigned,Not assigned


In [8]:
indexNames = dftable[ dftable['Borough'] =='Not assigned'].index

dftable.drop(indexNames , inplace=True)

In [9]:
dftable.loc[dftable['Neighbourhood'] =='Not assigned' , 'Neighbourhood'] = dftable['Borough']

In [10]:
result = dftable.groupby(['Postcode','Borough'], sort=False).agg( ', '.join)
df_new=result.reset_index()
df_new.head(15)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


In [11]:
df_new.shape

(103, 3)

### 2.3 Use the location data csv file to create a dataframe with latitude and longitude values

In [12]:
!wget -q -O 'Toronto_long_lat_data.csv'  http://cocl.us/Geospatial_data
df_long_lat = pd.read_csv('Toronto_long_lat_data.csv')
df_long_lat.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [13]:
df_long_lat.columns=['Postalcode','Latitude','Longitude']
df_long_lat.head()

Unnamed: 0,Postalcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [14]:
df_pc_long_lat = df_long_lat.rename(columns={'Postalcode':'Postcode'})
df_pc_long_lat.set_index("Postcode")
df_new.set_index("Postcode")
toronto_data=pd.merge(df_new, df_pc_long_lat)
toronto_data

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


### 2.4 Get geographical coordinates of Toronto

In [26]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude_toronto = location.latitude
longitude_toronto = location.longitude
print('The geographical coordinates of Toronto are {}, {}.'.format(latitude_toronto, longitude_toronto))

The geographical coordinates of Toronto are 43.653963, -79.387207.


### 2.5 Use Foursquare API to retrieve geo-location information from the restaurants for each neighborhood in Toronto that are upscale ordered by top rated restaurants

In [16]:
CLIENT_ID = 'MHSM3SCFN51MFFPY0IWYRVORS0WWLMONRIEGDU2YYCRDGDX3' # your Foursquare ID
CLIENT_SECRET = 'DJA15GQSOHK3TRZZT0XP1X4U5FEI5BJZRLH2HQWNZNRAE1Z1' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: MHSM3SCFN51MFFPY0IWYRVORS0WWLMONRIEGDU2YYCRDGDX3
CLIENT_SECRET:DJA15GQSOHK3TRZZT0XP1X4U5FEI5BJZRLH2HQWNZNRAE1Z1


In [17]:
# defining radius and limit of venues to get
radius=500
LIMIT=100

In [22]:
import json # library to handle JSON files
def getUpscaleRestaurantVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&section=food&price=4'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_food_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_food_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_food_venues)

In [23]:
toronto_venues = getUpscaleRestaurantVenues(names=toronto_data['Neighbourhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

In [24]:
toronto_venues

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,"Harbourfront, Regent Park",43.65426,-79.360636,Cluny Bistro & Boulangerie,French Restaurant
1,"Harbourfront, Regent Park",43.65426,-79.360636,Pure Spirits Oyster House & Grill,Seafood Restaurant
2,Don Mills North,43.745906,-79.352188,Gonoe Sushi,Japanese Restaurant
3,"Ryerson, Garden District",43.657162,-79.378937,Barberian's Steak House,Steakhouse
4,St. James Town,43.651494,-79.375418,GEORGE Restaurant,Restaurant
5,St. James Town,43.651494,-79.375418,Carisma,Italian Restaurant
6,St. James Town,43.651494,-79.375418,Wildfire Steakhouse Cosmopolitan,Steakhouse
7,Berczy Park,43.644771,-79.373306,Harbour 60 Toronto,Steakhouse
8,Leaside,43.70906,-79.363452,GRILLTIME,Steakhouse
9,Central Bay Street,43.657952,-79.387383,Barberian's Steak House,Steakhouse


In [25]:
map_toronto = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=10)

# add neighborhood markers to map
for lat, lng, borough, Neighbourhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighbourhood']):
    label = '{}, {}'.format(Neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
# add upscale restaurant markers to map
for lat, lng, Neighbourhood in zip(toronto_venues['Neighbourhood Latitude'], toronto_venues['Neighbourhood Longitude'], toronto_venues['Neighbourhood']):
    label = '{}'.format(Neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        popup=label,
        color='red',
        fill=False,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Methodology, Results, Discussion and Conclusion sections will be included in Week 2.