# Battle of the Neighborhoods

## Introduction

Toronto is a diverse metropolis that is visited by thousands of tourists each year. 
While discovering the city most tourists are happy to get to know the best restaurants in town.

In this project the best-rated restaurants of the category "Italian Restaurant" are projected on a map so that tourists can see 
in which neighborhoods the best-rated Italian restaurants are located and if any of the best-rated Italian restaurants is closeby. 

## Data

For this project the following data is used:
   
   1) Toronto data that contains Boroughs & Neighborhoods along with their latitude and longitude
   
   2) FourSquare data that contains venues close to a specific location
   
   3) FourSquare data that contains information (e.g. rating) of a specific venue

## Methodology

Import libraries used in the project

In [429]:
import random # library for random number generation
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes

import matplotlib.pyplot as plt # plotting library

import urllib.request

from bs4 import BeautifulSoup

In [430]:
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

! pip install folium==0.5.0
import folium # map rendering library

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values



## Get location data of Toronto's neighbourhoods

In the following steps we get a dataframe with all neighborhoods of Toronto and their respective locations

In [431]:
url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = urllib.request.urlopen(url)

In [432]:
soup = BeautifulSoup(page, "lxml")

In [433]:
all_tables=soup.find_all("table")
my_table=all_tables[0]

In [434]:
table_contents=[]

for row in my_table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

In [435]:
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

Add longitude & latitude to the dataframe

In [436]:
coordinates=pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv")

In [437]:
df["Latitude"]=coordinates["Latitude"]
df["Longitude"]=coordinates["Longitude"]

In [438]:
df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.806686,-79.194353
1,M4A,North York,Victoria Village,43.784535,-79.160497
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.763573,-79.188711
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.770992,-79.216917
4,M7A,Queen's Park,Ontario Provincial Government,43.773136,-79.239476
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.706876,-79.518188
99,M4Y,Downtown Toronto,Church and Wellesley,43.696319,-79.532242
100,M7Y,East Toronto Business,Enclave of M4L,43.688905,-79.554724
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.739416,-79.588437


To keep the scope of the project reasonable, we will only have a look at neighborhoods that are part of the Borough "Downtown Toronto"

In [439]:
toronto_data = df[df['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.763573,-79.188711
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.692657,-79.264848
2,M5C,Downtown Toronto,St. James Town,43.799525,-79.318389
3,M5E,Downtown Toronto,Berczy Park,43.75749,-79.374714
4,M5G,Downtown Toronto,Central Bay Street,43.782736,-79.442259


In [440]:
toronto_data.shape

(17, 5)

### Get data abour locations that are within a specific radius of the neighborhoods

FourSquare credentials to get access to the FourSquare-API

For data-privacy I deleted my credentials before I uploaded the Jupyter-Notebook

In [441]:
CLIENT_ID = 'WZS531VYLY11XGK2RBKIKCVHKLSGKVRJIAY1SEGNGBFNTSML' # your Foursquare ID
CLIENT_SECRET = '14N01KRGOOUJXI5D0DAGG0S4Q0OUPSX00S5OTKUIDZ1WH2ZO' # your Foursquare Secret
ACCESS_TOKEN = 'B3YOYJUBMJ22D2XKZVS4NSV0SXACPKIY2XARHGBTOC1PHQ5U' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 5

Create a dataframe that includes the top 5 venues of the category "Italian Restaurant" for each neighborhood:

In [442]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000, categoryid='4bf58dd8d48988d110941735'):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, categoryid, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['venues']
       
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['name'], 
            v['id'],
            v['location']['lat'], 
            v['location']['lng'],  
            v['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue ID',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [443]:
toronto_venues = getNearbyVenues(names=toronto_data['Neighborhood'],
                                   latitudes=toronto_data['Latitude'],
                                   longitudes=toronto_data['Longitude']
                                  )

In [444]:
#toronto_venues.head()

In [445]:
italian_toronto_venues=toronto_venues.loc[toronto_venues['Venue Category'] == 'Italian Restaurant']
italian_toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
1,"Regent Park, Harbourfront",43.763573,-79.188711,F'Amelia,4e4e7aa06365e1419d021044,43.667536,-79.368613,Italian Restaurant
2,"Regent Park, Harbourfront",43.763573,-79.188711,Gusto 101,4ee8f32602d5895bd7dce1b1,43.644988,-79.40027,Italian Restaurant
3,"Regent Park, Harbourfront",43.763573,-79.188711,Pizzeria Libretto,56549cc9498eb2df2e1c437c,43.644449,-79.398759,Italian Restaurant
4,"Regent Park, Harbourfront",43.763573,-79.188711,Bar Buca,52e88e92498e30016f1b3cf0,43.643918,-79.399742,Italian Restaurant
6,"Garden District, Ryerson",43.692657,-79.264848,F'Amelia,4e4e7aa06365e1419d021044,43.667536,-79.368613,Italian Restaurant


In [446]:
italian_toronto_venues.shape

(68, 8)

## Get the rating of each of the venues

In [447]:
def get_venue_details(neighborhood, neighborhood_lat, neighborhood_long, name, venue_id, venue_lat, venue_long, venue_cat):

    venue_details=[]
    for neighborhood, neighborhood_lat, neighborhood_long, name, venue_id, venue_lat, venue_long, venue_cat in zip(neighborhood, neighborhood_lat, neighborhood_long, name, venue_id, venue_lat, venue_long, venue_cat):
    
    #url to fetch data from foursquare api
        url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
            venue_id,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION)
        
        
    
    # get all the data
        results = requests.get(url).json()['response']['venue']
        
        # return only relevant information
        venue_details.append([(
            neighborhood,
            neighborhood_lat,
            neighborhood_long,
            name,
            venue_id, 
            results['rating'],
            venue_lat,
            venue_long,
            venue_cat)])
        
        
    df = pd.DataFrame([item for venue_details in venue_details for item in venue_details])
    df.columns = ['Neighborhood',
                  'Neighborhood Latitude',
                  'Neighborhood Longitude',
                  'Venue',
                  'Venue ID', 
                  'Rating',
                  'Venue Latitude',
                  'Venue Longitude',
                  'Venue Category']    
        
    return df

In [448]:
toronto_details = get_venue_details(neighborhood=italian_toronto_venues['Neighborhood'],
                                    neighborhood_lat=italian_toronto_venues['Neighborhood Latitude'],
                                    neighborhood_long=italian_toronto_venues['Neighborhood Longitude'],
                                    name=italian_toronto_venues['Venue'],
                                    venue_id=italian_toronto_venues['Venue ID'],
                                    venue_lat=italian_toronto_venues['Venue Latitude'],
                                    venue_long=italian_toronto_venues['Venue Longitude'],
                                    venue_cat=italian_toronto_venues['Venue Category']
                                  )

KeyError: 'venue'

"Because of the limitations of the free FourSquare-API" it is only possible to get 10 results of premium-data per day. As venue-ratings are considered as such, the code gives out an error-message after 10 ratings.

Because of that it is not possible to get the ratings of all venues.

## Results

In [453]:
toronto_details

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue ID,Rating,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.763573,-79.188711,F'Amelia,4e4e7aa06365e1419d021044,8.6,43.667536,-79.368613,Italian Restaurant
1,"Regent Park, Harbourfront",43.763573,-79.188711,Gusto 101,4ee8f32602d5895bd7dce1b1,8.7,43.644988,-79.40027,Italian Restaurant


In [454]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, name, neighborhood, rating in zip(toronto_details['Venue Latitude'], toronto_details['Venue Longitude'], toronto_details['Venue'],toronto_details['Neighborhood'], toronto_details['Rating']):
    label = '{}, {}, {}'.format(neighborhood, name, rating)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In the above-shown map you can see the location of the top-rated Italian restaurant in each neighborhood

Moreover you can also view the top restaurants of a specific neighborhood. E.g. if we concentrate on the neighborhood "Regent Park, Harbourfront" we see that there are the following top-rated restaurants:

In [455]:
regent_park=toronto_details.loc[toronto_details['Neighborhood']=='Regent Park, Harbourfront']
regent_park

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue ID,Rating,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.763573,-79.188711,F'Amelia,4e4e7aa06365e1419d021044,8.6,43.667536,-79.368613,Italian Restaurant
1,"Regent Park, Harbourfront",43.763573,-79.188711,Gusto 101,4ee8f32602d5895bd7dce1b1,8.7,43.644988,-79.40027,Italian Restaurant


## Discussion

The results show that the top-rated Italian restaurants of the neighborhood Regent Park, Harbourfront are F'Amelia rated with 8.6 & Gusto 101 rated with 8.7

## Conclusion

The purpose of this project was to give an overview about the best-rated Italian restaurants in town and to show which neighborhoods have a variety of good-rated ones and which neighborhoods are lacking them.

All in all it was possible to get a good overview but there have been a few limitations as well. Because of the limited function of the free FourSquare-API it is only possible to get a limited amount of ratings of the different restaurants. By that it is not possible to give a generalistic overview about the situation but it is only possible to do show the methodology how to do it and execute the code only for an excerpt. 

