



# Multiculturalism and Diversity in Restaurant Cuisine



William Hanan

December 15 2020


## Introduction 

In our modern globalised world, we have witnessed steadily increasing levels of migration [1]. This has led to many cities and countries becoming increasingly multicultural in nature over time. 

The data provided by Foursquare contains particularly rich information on the variety of cusines available in a specific location [2]. That is, it provides the cuisine type served in a restaurant e.g. Mexican, Sushi, French, Italian, Cajun, etc. There are 230+ such restaurant categories [3].

In this project we investigate the following hypothesis:

> The level of multiculturalism in a country is reflected by the diversity of cuisines in its major cities.

That is, the more multicultural a city, the wider the range and number of restaurant types one would expect to find in the city.

In order to calculate the level of multiculturalism in a country, we shall need to obtain national immigration statistics. That is, we will require a breakdown of immigrant numbers by nationality. We then compute an entropy value from these statitics [4]. 

Entropy is a concept originally defined in physics where it was used as a measure of disorder in a system. It can however also be used to quantify diversity [5]. So for example, if we have a population with very few immigrants, the entropy value will be much lower than for a country with a large number and wide range of nationalities. Going forward, we shall refer to this entropy value as the *diversity index*. Precise details on how the calculation of the diversity index is performed will be provided in the methodology section at a later date.

Quantifying the diversity of cuisines in a city is done in a similar fashion. That is, using the Foursquare API, we obtain the number of restaurants for each particular cuisine type in a city e.g. Number of Chinese restaurants, number of Italian restaurants, etc. We can then compute the entropy from these statistics to quantify the diversity in cuisines.

For each city, we therefore calculate a diversity index value for the population, and a diversity index value for the restaurant cuisine types. If the hypothesis holds true, one would expect that there should be a positive correlation between the population diversity index and the cuisine diversity index. That is, we can perform some form of regression analysis.

The proposed hypothesis is primarily of academic interest. However, if the hypothesis were discovered to hold true, the diversity of cuisines in a city could then be used as a rough measure of the variety of inward migration into a country. Any such insight could be particularly useful to demographers in countries lacking accurate records and/or having poor census data, and could be used to inform national government policy. 

In addition to investigating our primary hypothesis, we also compare cities by clustering them on their available cuisine types. An analysis of the resultant clusters provide further insights.



## Data Sets

Our primary datasource will be the location data obtained from the Foursquare API [6]. One can input the latitude and longitude of a city location into this API and perform a search for venues serving food within a 500 metre radius. The returned search results include the latitude and longitude coordinates of the eatery and its category e.g. Italian Restaurant, Chinese Restaurant, Hotpot Restaurant, Japanese Curry Restaurant, etc. 

To obtain the full list of restaurants in a city, we first create a grid of equally spaced location points (each separated by a distance of 1km) within the city boundaries and then perform our restaurant search query at each such point. We then combine the results of our multiple search queries together and remove any duplicate entries. As an example, we end up with results that look as follows:

| Venue Name        | Latitude           | Longitude  | Category |
| ------------- |:-------------:|:-----:|------:|
| Hard Rock Cafe Dublin     | 53.345878 | -6.260866 | American Restaurant |
| Musashi	     | 53.351138 | -6.264117 | Sushi Restaurant |
| Nando's     | 53.348829 | -6.267252 | Portugese Restaurant |

Using the returned restaurant categories for a city, we can count the number of restaurants in each cuisine type. 

While this provides us with information on the available cuisine types in a city, we also need information on the level of muticulturism in the city. To measure this accurately, we  require detailed statistics on the breakdown of immigrant numbers within a country. That is, we require the breakdown of nationalities that reside in a country. Such information can be obtained from the Eurostat website [7]. For example, using their data extraction tool, the following data on the top 7 nationalities living in Austria in 2018 was downloaded as a csv file:

| Citizenship        | Number           | 
| ------------- |:-------------:|
| Austria | 7,426,387 | 
| Germany	     | 186,841 |
| Serbia     | 120,174 | 
| Turkey	     | 117,297 |
| Romania     | 102,270|
| Bosnia	     | 95,189 |
| Hungary     | 77,113|

One can subsequently use these retrieved statistics to construct a measure for the level of multiculturalism in Austria. As stated in the introduction, we do this by calculating the (entropy) diversity index. 

Demographic data on 19 European countries was collected from the Eurostat website. Then using the Foursquare API, restaurant information in one major city in each of these countries was collected and processed. The list of countries and cities is shown below.

| Country        | City           | Restaurant Count  |
| ------------- |:-------------:| -----:|
| United Kindom | London   |    6588 |
| Spain| Barcelona  |   3236 |
| Poland | Warsaw   |     1758 |
| Italy |Milan     |    1598 |
| Austria | Vienna   |     1475 |
| Germany | Munich   |     1468 |
| Czechia | Prague    |    1331 |
| Greece | Athens   |     1265 |
| Netherlands |Amsterdam  |   1062 |
| Switzerland | Zurich    |    1007 |
| Romania | Bucharest   |  1001 |
| Denmark |Copenhagen   |  961 |
| Portugal | Porto    |      894 |
| Bulgaria | Sofia    |      868 |
| France     |  Lyon  |    737  |
| Latvia | Riga     |      680 |
| Ireland | Dublin     |    628 |
| Norway |Oslo     |      432 |
| Luxembourg | Luxembourg City |   216 |
| Iceland |Reykjavik   |    90 |
| Montenegro | Podgorica   |    79 |
| Liechtenstein | Vaduz      |     48 |

The above table essentially summarises the dataset used in this project.  The data set would have been much larger, but the Foursquare API limit of 950 requests per day put a cap on the amount of data that could be collected in a short timeframe.


## Methodology

### Diversity Measures

Diversity is a concept that is central to this project. So how is one to quantify diversity? If we were interested in, for example, animal species that exist on two islands, which island would be considered more diverse? One could simply count the number of different species on each island and declare the island with the greater number, the more diverse. This measure of diversity, the category count, is referred to as *richness*. 

However, though it is intuitive, it is also a crude measure of diversity as it does not take into account the number of animals in each species. For example, the island with the greater species count might have a very large number of tigers and very small numbers in the other species. Whereas the other island, though having less species, may have large numbers across all its species types. So which island is truly more diverse?  

An alternative diversity measure that is often used is the Shannon entropy defined as:

$$ E = - \sum_{i=1}^{R} p_i \ln p_i $$

where R is the total number of species and p_i is the proportion of the total population that is of species $i$. Note that if the count/distribution in each species is equal so that the population is diverse, then the entropy has a maximum value of $ln R$. If however, one species were to dominate in number over all the others, then the entropy approaches a value of zero. Therefore, the larger the entropy value, the more diverse the population.

Note that this concept of entropy can be applied to any distribution. So rather than applying it to counts of different species on an island, we might also apply it to counts of different restaurant cuisines in a city. This is precisely what we shall be doing in this project. That is, we shall be using the entropy value as a measure of the diversity of available cuisines in a city. We shall also use it to measure the diversity of nationalities present in any given country.

Going forward I shall refer to the entropy as the entropy diversity index, or simply *diversity index*.

### Regression Analysis

In this project we wish to investigate the following idea:

> The level of multiculturalism in a country is reflected in the diversity of cuisines in its major cities.

Is this true? How are we to prove or disprove the above statement in a scientific fashion? Let us begin by stating the following null hypothesis:

> The level of multiculturalism in a country is **unrelated** to the diversity of cuisines in its major cities.

Now for each of the 19 countries in our dataset, we shall calculate a diversity index value $E_n$ from the distribution of nationalities in the country. Likewise we shall calculate a diversity index $E_c$ from the distribution of restaurant cuisines in its major city. 

We shall then plot $E_c$ against $E_n$. If the two diversity indices are related, then we should see a clear correlation, linear or otherwise, in the plot. More specifically we shall perform linear regression on this plot to obtain the best fit line and a p-value for the non-constant coefficient. If the p-value < 0.05, then the data suggests that we have enough evidence in our data sample to reject our null hypothesis. In the next section, we will present our data and the results of this regression.


### Clustering

In addition to the above regression analysis, we shall also examine in more detail the cuisines available in our 19 cities. We will do this by performing clustering on these cities to investigate which cities share similar cuisines. Having grouped cities into clusters, we shall then examine the demographics in the countries in each cluster. If the demographics were to differ significantly between clusters, then this would be evidence that there is some correlation between the available restaurant cuisines and the makeup of the population.

This analysis will be more qualitative in nature (though one could perform ANOVA hypothesis testing here if one were so inclined). 

Hierarchical clustering was ultimately the method used here for clustering. More details will be provided in the results section.


[1] https://www.un.org/en/development/desa/population/migration/data/estimates2/estimatesgraphs.asp?0g0

[2] https://foursquare.com/


[3] https://developer.foursquare.com/docs/build-with-foursquare/categories/

[4] https://en.wikipedia.org/wiki/Entropy

[5] https://en.wikipedia.org/wiki/Diversity_index

[6] https://developer.foursquare.com/developer/

[7] https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=migr_pop1ctz&lang=en