# **Exploring the food options in New York and Amsterdam**
# **Dennis van Nooij**


# Introduction
Many people are lured to the life in a big city. The promises of a city that never sleeps, with more nationalities than you can count, and the possibities to eat food from every region in the world is a massive attraction. But.. What is the difference really, between a city like Amsterdam and one like New York? Is there really such a huge difference ? For this exercise we'll explore the culinary options in both cities to figure out if making the jump overseas will make a foodies heart jump with joy. 

## Problem
To find out more about this questions we are going to do a data analysis on many different restaurants there are in each city, and how often these categories appear

# Data

## Data Sources
We will be using the Foursquare API to get information about the venues. Using the Places API and the "explore" call we can get a list of venues around a specific location (the center of a neighbourhood in this case). For New York we will rely on the "newyork_data.json" data file which contains the boroughs, the neighbourhoods and their location. For Amsterdam, we will use the data from https://maps.amsterdam.nl/open_geodata/ to find the names and postal codes of all the neighbourhoods. We'll grab the location data from the Geocoder Python package



## Data cleaning
The Amsterdam data had to be cleaned up and translated. It contained the polygons describing the shape which is not used, so the column was dropped. I decided to focus only on the center of Amsterdam, hence the filter on borough code "A".

As we only want to see venues that focus on food we applied an extra filter called "section" on the Foursquare "explore" call, with value "food". This leaves out parks, train stations etc.

To see the number of categories and their weight visualized I decided to use a wordcloud. To prevent generic terms dominating the cloud I used stopwords 'Park','Fitness','Art Museum','Gym','Fitness Center'

As the Foursquare calls use a radius around the center of a neighbourhood and some neighbourhoods are pretty small in Amsterdam it was very likely that also the results from other neighbourhoods would be included. To prevent this the list of venues was filtered on duplicates

To be able to compare the list of venues the venue list of New York and Amsterdam was combined. As the New York dataset had more results, and we are interested in the relative values, the values are turned into percentages.

For an overview per region a lookup table is created to link categories with their regions. See the source code for details

![image](regions.png)


# Methodology

## Amsterdam

### Neighbourhoods
For Amsterdam a total of 70 neighbourhoods was retrieved aftering filtering on borough 'A' (which represents the center of Amsterdam)

![Image](amsterdam_neighbourhoods.png)

### Venues
For all these neighbourhoods we did a Foursquare call which retrieved 100 results per neigbourhood resulting in a total of 7000 venues. After filtering out the duplicates 440 venues were left.

![Image](amsterdam_venues.png)

Summarizing the venues into their categories gives this overview, where a total of 73 categories are shown

![Image](amsterdam_categories.png)

The same table as above, visualized as a wordcloud gives this result. Note that also here generic terms like "Restaurant" have been filtered out as that would dominate the result too much and as such decrease the readability.

![Image](ams_wordcloud.png)


## New York

### Neighbourhoods

For New York 306 neighbourhoods were retrieved, divided over 5 boroughs. For this exercise we only use the 39 neighbourhoods in Manhattan

![Image](newyork_neighbourhoods.png)

For every neighbourhood we retrieved 100 results. Also here the was some overlap in the neighbourhoods apparently, after filtering the duplicates we are left with 1297 unique venues

![Image](newyork_venues.png)

### Venues
To get an understanding of the total number of venues in a category we have summarized them in the following table. Manhattan has 106 unique categories of venues in the food category.

![Image](manhattan_categories.png)


If we plot these categories and their count in a wordcloud we get this picture. Note that also here generic terms like "Restaurant" have been filtered out
![Image](new_york_wordcloud.png)

# Results

### Comparing Amsterdam and New York

To compare both cities the two category tables were combined, after which a count was done on categories that were unique to a city. In total we now have 117 categories

![Image](combined_categories.png)

There are 44 categories unique to New York. A few of them are American-style kitchens like American and New American, but also more specialized international restaurant like "Arepa restaurant" and "Churrascaria" are only present in New York. Indonesian restaurants and doner restaurants are only present in Amsterdam, but also some specials like "Afghan restaurant" and 'Tibetan' restaurants. No surprise, another is the "Dutch kitchen".
As we wanted compare the number of unique categories and the number of restaurants within a category the results have been normalized and turned into percentages. See the following list for all the categories and percentages of venues that fall in those categories for both cities




In [2]:
import pandas as pd
combined_categories = pd.read_csv('newyork_amsterdam_categories.csv')
pd.options.display.max_rows = 120
combined_categories
 

Unnamed: 0,Category,Amsterdam,New York,ams_only,ny_only
0,Restaurant,11.363636,1.54202,0,0
1,Café,9.090909,6.013878,0,0
2,Italian Restaurant,7.272727,8.404009,0,0
3,French Restaurant,6.136364,4.009252,0,0
4,Sandwich Place,4.318182,2.621434,0,0
5,Bakery,4.090909,3.777949,0,0
6,Pizza Place,3.863636,6.861989,0,0
7,Deli / Bodega,3.636364,1.927525,0,0
8,Breakfast Spot,3.181818,0.462606,0,0
9,Steakhouse,2.727273,1.850424,0,0


Here's the same table visualized in a (large) bar chart. As the range of kitchens is important I decided to show all categories even if only 1 restaurant was present.

![Image](nyc_ams_bar.png)

To get a better overview the categories have been summarized into different regions. For this a practical approach was choosen to combine categories were possible and keep them separate if enough entries were present. This resulted in the following chart

![Image](nyc_ams_regions_bar.png)


# Discussion
The number of restaurants was retrieved using the location of the center of each borough with a fixed radius around it. This works best if a borough is round with exactly that radius. Obviously boroughs have different shapes and sizes so it is likely that venues are retrieved that are outside the neighbourhood. Also, the number of results was limited to 100. For neighbourhoods with a lot of venues that could mean that restaurants above that cut over not taken in account. As we want to focus on the most popular restaurants, the last observation is less of an issue. As the combinations of requests per neighbourhood resulted in a lot of duplications, our assumptions are probably correct here.

As a next step it would be interesting to combine the census data to figure out the number of nationalities present in both cities and correlate that with the number of categories found here



# Conclusion
Although it is difficult to use a Word cloud for scientific conclusions, it shows some interesting trends. New York shows some local trends like Mexican, American, Southern Soul and New American. Amsterdam shows French restaurants, Sandwiches and typical Dutch snack places

The majority of venues in Amsterdam are reported as just "Restaurants" without specifying which kitchen it represents. That happens often in the Netherlands (kitchen would typically have a French influence). The same goes for Cafe, Sandwich place, Deli / Bodega and Breakfast spot which are popular in both cities

French restaurants are popular in both cities, just as Italian restaurants are (it's number 1 category in New York). Both kitchens manage to secure a place in between the "generic" places mentioned before.

In New York there seems to be more variation, with a total of 107 unique categories, versus 72 in Amsterdam. There are 46 categories present in New York, that are not there in Amsterdam. Examples of those are Venezualan, Arepa and Filipino. 11 categories are only present in Amsterdam, "Dutch kitchen" being one of them

The regional chart shows similar trends. There are more restaurants in the general category and all regions have less presence in Amsterdam than in New York. The region South East Asia has a similar presence in both cities with a percentage of around 8. American and South American are non present in Amsterdam. African kitchens have very little presence in both cities

Based on these numbers and the discussion above it is safe to conclude that New York has a richer offering of restaurants. Amsterdam has less options and also has a higher percentage of generic restaurants