



# The Impact of Multiculturalism on Restaurant Cuisine



William Hanan

December 15 2020


## 1. Introduction 

In our modern globalised world, we have witnessed steadily increasing levels of migration [1]. This has led to many cities and countries becoming increasingly multicultural in nature over time. 

The data provided by Foursquare contains particularly rich information on the variety of cusines available in a specific location [2]. That is, it provides the cuisine type served in a restaurant e.g. Mexican, Sushi, French, Italian, Cajun, etc. There are 230+ such restaurant categories [3].

In this project we investigate the following hypothesis:

> The level of multiculturalism in a country is reflected in the diversity of cuisines in its major cities.

That is, the more multicultural a city, the wider the range and number of restaurant types one would expect to find in the city.

In order to calculate the level of multiculturalism in a country, we shall need to obtain national immigration statistics. That is, we will require a breakdown of immigrant numbers by nationality. As a measure of diversity in the population (i.e. Multiculturalism), we then compute an entropy value from these statitics [4]. 

Entropy is a concept originally defined in physics where it was used as a measure of disorder in a system. It can however also be used to quantify diversity [5]. So for example, if we have a population with very few immigrants, the entropy value will be much lower than for a country with a large number and wide range of nationalities. Going forward, we shall refer to this entropy value as the *diversity index*. Precise details on how the calculation of the diversity index is performed will be provided in the methodology section.

Quantifying the diversity of cuisines in a city is done in a similar fashion. That is, using the Foursquare API, we obtain the number of restaurants for each particular cuisine type in a city e.g. Number of Chinese restaurants, number of Italian restaurants, etc. We can then compute the entropy from these statistics to quantify the diversity in cuisines.

For each city, we therefore calculate a diversity index value for the population, and a diversity index value for the restaurant cuisine types. If the hypothesis holds true, one would expect that there should be a positive correlation between the population diversity index and the cuisine diversity index. We thus perform regression analysis to investigate this possibility.

The proposed hypothesis is primarily of academic interest. However, if the hypothesis were discovered to hold true, the diversity of cuisines in a city could then be used as a rough measure of the variety of inward migration into a country. Any such insight could be particularly useful to demographers in countries lacking accurate records and/or having poor census data, and could be used to inform national government policy. 

In addition to performing regression analysis, we also compare cities by clustering them on their available cuisine types. This analysis provides further insight into the primary factors that determine the makeup of restaurants in major European cities.



## 2. Data Sets

Our primary datasource will be the location data obtained from the Foursquare API [6]. One can input the latitude and longitude of a city location into this API and perform a search for venues serving food within a 500 metre radius. The returned search results include the latitude and longitude coordinates of the eatery and its category e.g. Italian Restaurant, Chinese Restaurant, Hotpot Restaurant, Japanese Curry Restaurant, etc. 

To obtain the full list of restaurants in a city, I first created a grid of equally spaced location points (each separated by a distance of 1km) within the city boundaries and then performed a restaurant venue search query at each such point. I then combined the results of the multiple search queries together and removed any duplicate entries. As an example of the raw data we end up with, here are the results for three restaurants in Dublin City:

| Venue Name        | Latitude           | Longitude  | Category |
| ------------- |:-------------:|:-----:|------:|
| Hard Rock Cafe Dublin     | 53.345878 | -6.260866 | American Restaurant |
| Musashi	     | 53.351138 | -6.264117 | Sushi Restaurant |
| Nando's     | 53.348829 | -6.267252 | Portugese Restaurant |

From this raw data, we subsequently count the number of restaurants in each cuisine category i.e. The number of American restaurants, the number of Sushi restaurants, etc. After processing all the data for Dublin , we thus end up with a dataframe that looks as follows:

|    &nbsp;     | abruzzo | afghan  | african | american | asian | australian | austrian |bangladeshi | bavarian | ... | vietnamese | xinjiang |
| ------------- |:-------------:|:-----:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|------:|
| Dublin | 0| 1| 5| 11| 72 | 4 | 0 | 0 | 0 |...| 2 | 0|

We can produce counts in a similar fashion for any city so long as we have the raw data collected from Foursquare. Note that there is a total of 137 different cuisine types/categories in our dataset. 

While this provides us with information on the available cuisine types in a city, we also need information on the level of muticulturism in the city. To measure this accurately, we  require detailed statistics on the breakdown of immigrant numbers within a country. That is, we require the breakdown of nationalities that reside in a country. Such information can be obtained from the Eurostat website [7]. For example, using their data extraction tool, the following data on the top 7 nationalities living in Austria in 2018 was downloaded as a csv file:

| Citizenship        | Number           | 
| ------------- |:-------------:|
| Austria | 7,426,387 | 
| Germany	     | 186,841 |
| Serbia     | 120,174 | 
| Turkey	     | 117,297 |
| Romania     | 102,270|
| Bosnia	     | 95,189 |
| Hungary     | 77,113|

Note that though only 7 nationalities are shown above, we in fact have data on 230 nationalities in total. One can subsequently use these retrieved statistics to construct a measure for the level of multiculturalism in Austria. As stated in the introduction, we do this by calculating the (entropy) diversity index. 

Demographic data on 22 European countries was collected from the Eurostat website. Then using the Foursquare API, restaurant information in one major city in each of these countries was collected and processed. The list of countries and cities is shown below.

| Country        | City           | Restaurant Count  |
| ------------- |:-------------:| -----:|
| United Kindom | London   |    6588 |
| Spain| Barcelona  |   3236 |
| Poland | Warsaw   |     1758 |
| Italy |Milan     |    1598 |
| Austria | Vienna   |     1475 |
| Germany | Munich   |     1468 |
| Czechia | Prague    |    1331 |
| Greece | Athens   |     1265 |
| Netherlands |Amsterdam  |   1062 |
| Switzerland | Zurich    |    1007 |
| Romania | Bucharest   |  1001 |
| Denmark |Copenhagen   |  961 |
| Portugal | Porto    |      894 |
| Bulgaria | Sofia    |      868 |
| France     |  Lyon  |    737  |
| Latvia | Riga     |      680 |
| Ireland | Dublin     |    628 |
| Norway |Oslo     |      432 |
| Luxembourg | Luxembourg City |   216 |
| Iceland |Reykjavik   |    90 |
| Montenegro | Podgorica   |    79 |
| Liechtenstein | Vaduz      |     48 |

The above table essentially summarises the dataset used in this project.  The data set would have been much larger, but the Foursquare API limit of 950 requests per day put a cap on the amount of data that could be collected in a short timeframe.


## 3. Methodology

### 3.1 Diversity Measures

Diversity is a concept that is central to this project. So how is one to quantify diversity? If we were interested in, for example, animal species that exist on two islands, which island would be considered more diverse? One could simply count the number of different species on each island and declare the island with the greater number, the more diverse. This measure of diversity, the category count, is referred to as *richness*. 

However, though it is intuitive, it is also a crude measure of diversity as it does not take into account the number of animals in each species. For example, the island with the greater species count might have a very large number of squirrels and very small numbers in the other species. Whereas the other island, though having less species, may have large numbers across all its species types. So which island is truly more diverse?  

An alternative and more sophisticated diversity measure that is often used is the Shannon entropy defined as:

$$ E = - \sum_{i=1}^{R} p_i \ln p_i $$

where $R$ is the total number of species and $p_i$ is the proportion of the total population that is of species $i$. Note that if the count/distribution in each species is equal so that the population is diverse, then the entropy has a maximum value of $ln R$. If however, one species were to dominate in number over all the others, then the entropy approaches a value of zero. Therefore, the larger the entropy value, the more diverse the population.

Note that this concept of entropy can be applied to any distribution. So rather than applying it to counts of different species on an island, we might also apply it to counts of different restaurant cuisines in a city. This is precisely what we shall be doing in this project. That is, we shall be using the entropy value as a measure of the diversity of available cuisines in a city. We shall also use it to measure the diversity of nationalities present in any given country.

Going forward I shall refer to the entropy as the entropy diversity index, or simply *diversity index*.


### 3.2 Regression Analysis

In this project we wish to investigate the following idea:

> The level of multiculturalism in a country is reflected in the diversity of cuisines in its major cities.

Is this true? How are we to prove or disprove the above statement in a scientific fashion? We shall perform a hypothesis test. Let us begin by stating the null hypothesis as follows:

> The diversity of cuisines available in a city is **unrelated** to the level of multiculturalism in the country.

Now for each of the 22 cities in our dataset, we shall calculate a diversity index value $E_c$ from the distribution of restaurant cuisines. Likewise we shall calculate a diversity index $E_n$ from the distribution of nationalities in each country. 

We shall then plot $E_c$ against $E_n$ to obtain a scatter plot with 22 points. If the two diversity indices are related, then we should see a clear correlation, linear or otherwise, in the plot. More specifically we shall perform linear regression on this plot to obtain the best fit line and a p-value for the non-constant coefficient. If the p-value < 0.05, then the data suggests that we have enough evidence in our data sample to reject our null hypothesis. If not, then we must conclude that our null hypothesis is our default position. In the next section, we will present our data and the results of this regression.


### 3.3 Clustering

In addition to the above regression analysis, we shall also examine in more detail the cuisines available in our 22 cities. We will do this by performing clustering on these cities so as to investigate which cities share similar cuisines. Having grouped cities into clusters, we shall then examine the demographics of the countries in each cluster. If the demographics were to differ significantly between clusters, then this would be evidence that there is some correlation between the diversity in restaurant cuisines and the makeup of the population.

This analysis will be more qualitative in nature (though one could perform ANOVA hypothesis testing here if one were so inclined). 

After some experimentation, hierarchical clustering was chosen as the method used for clustering. Recall that in the Data section above, we described how we processed the raw Foursquare data to calculate the counts of each cuisine type in a city. For example, here is a section of the collated data for three cities:

|    &nbsp;     | abruzzo | afghan  | african | american | asian | australian | austrian |bangladeshi | bavarian | ... | vietnamese | xinjiang |
| ------------- |:-------------:|:-----:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|------:|
| Dublin | 0| 1| 5| 11| 72 | 4 | 0 | 0 | 0 |...| 2 | 0|
| Milan | 2| 0| 13| 10| 64 | 0 | 0 | 0 | 0 |...| 3 | 0|
| Warsaw | 0| 17| 16| 44| 203 | 2 | 1 | 0 | 0 |...| 76 | 0|

Each row above constitutes a feature vector that is essentially a signature of the restaurant types that exist in the city. It is these feature vectors that we cluster on. More details on the clustering will be provided in the results section.

## 4. Results

### 4.1 Regression Results

Richness, the number of distinct non-zero categories, is a simple and intuitive measure of diversity. We therefore begin by calculating the number of distinct cuisine types $R_c$ that can be found within each of our 22 cities. Likewise we use the national demographic data of its country to estimate the number of distinct nationalities $R_n$ living in the city. A plot of $R_c$ versus $R_n$ is shown below. Each point represents a city.

<img src="RichnessRegressionPlot.png">

Clearly we can see that there is no obvious relationship between these two quantities. That is, there is no visual evidence that a more multicultural society leads to a greater diversity in the cuisines on offer in the city's restaurants. This conclusion is confirmed quantitatively by the regression results obtained by fitting the data to the line $y=b_0 + b_1*x$. The fitted line is shown above and the fitted coefficents with their t-statistics shown below:

|    &nbsp;     | coeff | Std Error  | t |  P > \|t\|  |
| ------------- |:-------------:|:-----:|:------:|------:|
|$b_0$| 61.8518 |    10.304  |    6.003  |    0.000 |
| $b_1$ | -0.0497 |    0.068 |   -0.729  |   0.474  |

We can see that the p-value of the t-test for the non-constant coefficient $b_1$ is 0.474. Since this value is greater than the critical value of 0.05, this implies that the coefficient $b_1$ is not statistically different from zero. 

As previously mentioned, the Shannon entropy (aka Entropy Diversity Index) is a more sophisticated measure of diversity. For each city, I thus calculated the diversity index value $E_c$ of the available cuisines. Likewise, a diversity index value $E_n$ for the population was estimated from the national demographic data. 

<img src="DiversityRegressionPlot.png">

The plot shown above and the accompanying regression statistics again confirm our above conclusions. The p-value of 0.773 for $b_1$ again implies diversity in cuisines and population are unrelated.

|    &nbsp;     | coeff | Std Error  | t |  P > \|t\|  |
| ------------- |:-------------:|:-----:|:------:|------:|
|$b_0$| 2.7523   |   0.144  |   19.124   |   0.000  |
| $b_1$ | -0.0538  |    0.184   |  -0.292    |  0.773 |


### 4.2 Cluster Analysis

To gain some intuitive insight into the range of cuisines available in European cities, I have performed some clustering on the cuisine feature vectors of the 22 cities (See the data and methodology sections for details on these feature vectors). 

The cuisine feature vectors were first normalized (so that they each had unit length) and hierarchical clustering was executed. After some experimentation, the *cosine similarity* was chosen as the distance metric and *complete linkage* used to calculate the distance between clusters. The results are summarised in the dendogram below.

<img src="Dendogram.png">

Let us begin analysing the above results by first observing which cities are very similar to each other from the perspective of the cuisine choices they have to offer. We can see clearly that Reykjavik, Oslo and Copenhagen form a very strong cluster. So too with Zurich and Vaduz. And Dublin and London. In the map below we show these and other strong localised clusters by segmenting the 22 cities into 10 clusters (Note that I have excluded Iceland from the map for clarity).


<img src="Cluster1.png" width="550">

We can already clearly see that the cluster points tend to be close geographically to each other. That is, the geographic location of a city is strongly correlated to the restaurant cuisine choices that are available.

For a more coarsegrained view of European cuisine, I aggregated the cities into 5 clusters to obtain the following map.

<img src="Cluster2.png" width="550">

We can now see a north/south divide. That is, the cuisines available in cities in northern and central Europe tend to be similar, though oddly Zurich and Vaduz are exceptions. In southern Europe we see the cluster of Sofia, Bucharest and Podgorica in the East. Curiously, Barcelona is also grouped with these, though I do not believe this link to be genuine (During my clustering work, I found that Barcelona was quite unstable in that it could easily be grouped with other cities depending on the clustering parameters chosen). 

It is interesting that the restaurant cuisines available in Porto and Athens are outliers. That is, they are quite distinct from each other and all the other cities in Europe.

To conclude this results section, let us return to our previous aggregation of 10 clusters. In the boxplot below, I show the spread in the population diversity indices $E_n$ of the countries in each of our 10 clusters.

<img src="BoxPlot.png">

The cities contained in each of the above clusters is explicitly stated in the table below for clarity. So why show this boxplot? Well, the primary goal of this project is to test if the available cuisine types in a city is somehow influenced by the demographic makeup of the country. We have clustered cities together into similar cuisine types. If the diversity in a population has an influence on the cuisines available, one might then expect to see some significant differences in population diversity between the cuisine clusters.

Though cluster 1 and 10 seem to have sigificantly low and high diversity indices respectively, the diversity indices in each cluster is effectively random. That is, again we see little evidence that demographic diversity is connected to the available cuisines. 


| Cluster ID        | Cities           | 
| ------------- |:-------------:|
| 1 | Bucharest, Podgorica, Sofia | 
| 2	| Munich, Prague, Vienna, Warsaw |
| 3 | Copenhagen, Oslo, Reykjavik |
| 4 | Amsterdam, Dublin, London, Milan |
| 5 | Luxembourg, Lyon |
| 6 | Porto |
| 7 | Barecelona |
| 8 | Athens|
| 9 | Riga |
| 10 | Vaduz, Zurich|


## 5. Discussion

Over the last few decades we have seen increased levels of migration across the world due primarily to ease of travel and access to information. As a result, many of the worlds major cities have become increasingly multicultural in their populations. 

The primary goal in this project was to investigate if the increased diversity in the population results in a similar increase in the diversity of the cuisines available in a city or country. The logic here is that as a particular migrant population grows over time, this leads to demand for the cuisine of their country of origin so that one might expect restaurants offering that cuisine to appear. 

To address this question, detailed restaurant data was obtained using the Foursquare API [6]. Demographic information on each country was also obtained from the Eurostat website [7]. It should be noted that for a specific country, this demographic information consisted of the population count broken down by **citizenship**. This is not ideal for two reasons:

1. An immigrant can become a citizen of the host nation. As such the counts for non-nationals are likely underestimated. 

2. It is well known that immigrants tend to aggregate around the major cities where economic activity is high. As such, the national population counts again likely underestimate the actual proportion of different nationalities living in the cities.

Though the demographic statistics gathered suffer from the above biases, it is unlikely to have affected the results as we are more interested in relative than absolute effects. That is, our analysis involves comparing different cities/countries relative to one another. And since arguably the population statistics from all countries suffer from the same biases, the analysis is unaffected. However it is still wise to be aware of the shortcomings of these statistics.

The regression analysis of section 4.1 suggests that there is no evidence of any correlation between diversity in a city population and the diversity of available cuisines. This thus implies that my logic regarding the growth of a particular immigrant population leading to the appearance of new types of restaurants is flawed. Perhaps the numbers of most immigrant populations are just too small for their demand to be significant?

The cluster analysis performed in section 4.2 suggests that the primary driver for the cuisines available in the city restaurants is cultural. That is, we discovered that cities offering similar cuisine choices tend to be geographically located close to each other. For example, we have Dublin and London, Zurich and Vaduz, Bucharest and Sofia, Vienna and Munich. And it is perhaps best exemplified by the cluster of Copenhagen, Oslo and Reykjavik which are clearly linked by their scandinavian culture. 



## 6. Conclusion

Is the culture of a city reflected in its restaurants? Or what about the more specific question: "Is the culture of the immigrant population in a city reflected in its restaurants?". 

On the very high level addressed here in this project, the answer is negative. The results presented here instead suggest that it is the demand from the native indigenous population that drive the types of restaurant cuisines available in a city.

However, this study is admittedly not detailed enough to be definitive. It is likely for example that many immigrants are simply not numerous enough to create the demand for new restaurants catering to their particular tastes. It might therefore be more informative to study specific immigrant groups that are relatively large in number and analyse their impact on the restaurant scene. This might be a more fruitful and insightful avenue for future work.

Finally, there is no reason to restrict oneself to only European cities. That we did so in this study was purely due to time constraints. The research could therefore be easily expanded to take in other geographic regions.



[1] https://www.un.org/en/development/desa/population/migration/data/estimates2/estimatesgraphs.asp?0g0

[2] https://foursquare.com/


[3] https://developer.foursquare.com/docs/build-with-foursquare/categories/

[4] https://en.wikipedia.org/wiki/Entropy

[5] https://en.wikipedia.org/wiki/Diversity_index

[6] https://developer.foursquare.com/developer/

[7] https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=migr_pop1ctz&lang=en