# Introduction
Many people are lured to the life in a big city. The promises of a city that never sleeps, with more nationalities than you can count, and the possibities to eat food from every region in the world is a massive attraction. But.. What is the difference really, between a city like Amsterdam and one like New York? Is there really such a huge difference ? For this exercise we'll explore the culinary options in both cities to figure out if making the jump overseas will make a foodies heart jump with joy. We will dive into the neighbourhoods to figure out how many different kitchens we can find and how many of them there are.



# Data
We will be using the Foursquare API to get information about the venues. Using the Places API and the "explore" call we can get a list of venues around a specific location (the center of a neighbourhood in this case). For New York we will rely on the "newyork_data.json" data file which contains the boroughs, the neighbourhoods and their location. For Amsterdam, we will use the data from https://maps.amsterdam.nl/open_geodata/ to find the names and postal codes of all the neighbourhoods. We'll grab the location data from the Geocoder Python package

In [93]:

import types
import pandas as pd
from botocore.client import Config
import ibm_boto3
import io
import numpy as np

#!pip install folium
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
print('Libraries imported')

Libraries imported


In [None]:
# cell below is hidden to protect credentials. It reads an Excel file containing the Amsterdam boroughs with their location contfrom the IBM storage

In [49]:
# The code was removed by Watson Studio for sharing.

In [50]:
#parse the Excel data
amsterdam_data = pd.read_excel(io.BytesIO(body.read()))

# remove first header row
amsterdam_data.columns = amsterdam_data.iloc[0]
amsterdam_data = amsterdam_data.reindex(amsterdam_data.index.drop(0)).reset_index(drop=True)
amsterdam_data.columns.name = None

# drop obsolete columns
amsterdam_data.drop(columns=['OBJECTNUMMER','Stadsdeel_code','Opp_m2','WKT_LNG_LAT', 'WKT_LAT_LNG'],inplace=True)

# rename from Dutch to international terms
amsterdam_data.rename(columns={"Stadsdeel": "Borough"},inplace=True)

# rename for better readability
amsterdam_data.rename(columns={"LAT": "Latitude"},inplace=True)
amsterdam_data.rename(columns={"LNG": "Longitude"},inplace=True)

amsterdam_data.head()

Unnamed: 0,Borough,Longitude,Latitude,NaN
0,Centrum,4.90371155,52.37329735,
1,Westpoort,4.8073194,52.41146535,
2,West,4.865216,52.37787885,
3,Nieuw-West,4.8026762,52.3635909,
4,Zuid,4.8660631,52.3417212,


In [113]:

body = client_3a0ef4c344124746a61e93ed7750c729.get_object(Bucket='courseradatasciencecapstoneprojec-donotdelete-pr-vavfhljeds8ejj',Key='GEBIED_BUURTEN.xlsx')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )


#parse the Excel data
amsterdam_data = pd.read_excel(io.BytesIO(body.read()))


# remove first header row
amsterdam_data.columns = amsterdam_data.iloc[0]
amsterdam_data = amsterdam_data.reindex(amsterdam_data.index.drop(0)).reset_index(drop=True)
amsterdam_data.columns.name = None

# drop obsolete columns
amsterdam_data.drop(columns=['OBJECTNUMMER','Buurt_code','Buurtcombinatie_code','Opp_m2','WKT_LNG_LAT', 'WKT_LAT_LNG'],inplace=True)

# rename from Dutch to international terms
amsterdam_data.rename(columns={"Buurt": "Neighbourhood"},inplace=True)
amsterdam_data.rename(columns={"Stadsdeel_code": "Borough"},inplace=True)


# rename for better readability
amsterdam_data.rename(columns={"LAT": "Latitude"},inplace=True)
amsterdam_data.rename(columns={"LNG": "Longitude"},inplace=True)

# focus on center of Amsterdam
amsterdam_data = amsterdam_data[np.isin(amsterdam_data['Borough'], ['A'])]

amsterdam_data.head()
#amsterdam_data.shape


Unnamed: 0,Neighbourhood,Borough,Longitude,Latitude,NaN
36,Kop Zeedijk,A,4.9001715,52.3757235,
37,BG-terrein e.o.,A,4.89557815,52.369559,
38,Stationsplein e.o.,A,4.9009435,52.3797652,
39,Hemelrijk,A,4.8949027,52.37821835,
40,Spuistraat Noord,A,4.8915324,52.37508835,


In [114]:
address = 'Amsterdam, Netherlands'

geolocator = Nominatim(user_agent="ams_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Amsterdam are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Amsterdam are 52.3727598, 4.8936041.


In [48]:
# create map of New York using latitude and longitude values
map_amsterdam = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(amsterdam_data['Latitude'], amsterdam_data['Longitude'], amsterdam_data['Borough'], amsterdam_data['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_amsterdam

In [67]:
# The code was removed by Watson Studio for sharing.

In [68]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [73]:
# helper function to parse the results from an exploration of a list of locations
def getNearbyVenues(names, latitudes, longitudes, radius=radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [88]:
amsterdam_venues = getNearbyVenues(names=amsterdam_data['Borough'],
                                   latitudes=amsterdam_data['Latitude'],
                                   longitudes=amsterdam_data['Longitude']
                                  )
amsterdam_venues.head()

Centrum
Westpoort
West
Nieuw-West
Zuid
Oost
Noord
Zuidoost


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centrum,52.37329735,4.90371155,The Hendrick's Hotel,52.373597,4.906002,Hotel
1,Centrum,52.37329735,4.90371155,OCHA,52.374024,4.901683,Thai Restaurant
2,Centrum,52.37329735,4.90371155,HPS,52.371683,4.907673,Cocktail Bar
3,Centrum,52.37329735,4.90371155,Rosalia's Menagerie,52.371678,4.899174,Cocktail Bar
4,Centrum,52.37329735,4.90371155,De Koffieschenkerij,52.374043,4.898427,Coffee Shop


In [89]:
# check the list of venues per neighbourhood
amsterdam_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Centrum,100,100,100,100,100,100
Nieuw-West,95,95,95,95,95,95
Noord,27,27,27,27,27,27
Oost,92,92,92,92,92,92
West,100,100,100,100,100,100
Westpoort,18,18,18,18,18,18
Zuid,100,100,100,100,100,100
Zuidoost,63,63,63,63,63,63


In [90]:
# find out how many unique categories we have
print('There are {} uniques categories.'.format(len(amsterdam_venues['Venue Category'].unique())))

There are 171 uniques categories.


In [96]:
#amsterdam_venues = amsterdam_venues.drop(amsterdam_venues[amsterdam_venues['Venue Category'].isin(['Hotel','Train Station'])].index)


amsterdam_venues = amsterdam_venues[np.isin(amsterdam_venues['Venue Category'], ['Hotel','Train Station'], invert=True)]

In [95]:
# find out how many unique categories we have
print('There are {} uniques categories.'.format(len(amsterdam_venues['Venue Category'].unique())))

There are 169 uniques categories.


In [99]:
# one hot encoding to do analysis on categorical variables
amsterdam_onehot = pd.get_dummies(amsterdam_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
amsterdam_onehot['Neighbourhood'] = amsterdam_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [amsterdam_onehot.columns[-1]] + list(amsterdam_onehot.columns[:-1])
amsterdam_onehot = amsterdam_onehot[fixed_columns]

amsterdam_onehot.head()

Unnamed: 0,Neighbourhood,Arcade,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,...,Trail,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
1,Centrum,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Centrum,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Centrum,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Centrum,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Centrum,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [100]:
# group by neighbourhood
amsterdam_grouped = amsterdam_onehot.groupby('Neighbourhood').mean().reset_index()
amsterdam_grouped

Unnamed: 0,Neighbourhood,Arcade,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,...,Trail,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,Centrum,0.0,0.011494,0.011494,0.0,0.011494,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.011494
1,Nieuw-West,0.0,0.0,0.0,0.010753,0.021505,0.010753,0.0,0.0,0.0,...,0.010753,0.053763,0.075269,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Noord,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Oost,0.0,0.0,0.0,0.0,0.011111,0.011111,0.0,0.0,0.011111,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,West,0.010417,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,...,0.010417,0.0,0.010417,0.0,0.0,0.010417,0.0,0.0,0.020833,0.0
5,Westpoort,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Zuid,0.0,0.0,0.0,0.0,0.0,0.010753,0.010753,0.010753,0.010753,...,0.0,0.0,0.0,0.010753,0.0,0.0,0.010753,0.0,0.0,0.0
7,Zuidoost,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [101]:
# print top 5 list of venues for every neighbourhood
num_top_venues = 5

for hood in amsterdam_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = amsterdam_grouped[amsterdam_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Centrum----
            venue  freq
0      Restaurant  0.06
1  Breakfast Spot  0.06
2     Coffee Shop  0.05
3           Plaza  0.05
4       Bookstore  0.03


----Nieuw-West----
                venue  freq
0         Supermarket  0.11
1  Turkish Restaurant  0.08
2                Park  0.05
3        Tram Station  0.05
4           Drugstore  0.04


----Noord----
             venue  freq
0  Harbor / Marina  0.07
1         Bus Stop  0.07
2     Soccer Field  0.07
3            Plaza  0.07
4      Supermarket  0.07


----Oost----
             venue  freq
0      Supermarket  0.06
1         Bus Stop  0.06
2     Soccer Field  0.04
3    Shopping Mall  0.04
4  Harbor / Marina  0.03


----West----
                venue  freq
0         Coffee Shop  0.09
1                Café  0.05
2                 Bar  0.05
3  Italian Restaurant  0.05
4       Deli / Bodega  0.04


----Westpoort----
                  venue  freq
0       Harbor / Marina  0.17
1         Boat or Ferry  0.17
2  Fast Food Restaurant  0.