# Where to go when you want to eat Chinese food in Stockholm, Sweden?

## Introduction

With the globalization, a large number of products from other countries become alternatives for customers compared with domestic products. Some of consumers prefer to choose the native products in order to support national industry. Others would like to try something abroad. At the same time, suppliers want to develop overseas marketing which means more opportunities and wealth. Many foreign restaurants can be seen on the street of Stockholm and different kind of imported food can be found in native or Asian supermarkets in Sweden.

According to the questionnare survey carried out by Jie Chen under supervision of Ola Feurst from Gotland University (shorturl.at/ekrDL), Swedes like Chinese food because it is healthy and it has natural and safe ingredients. Moreover, the delicious Chinese food could bring a good mood to Swedes. Swedes also think the price of Chinese food is cheaper than Swedish meal. Compared to fast food, Chinese buffet offer more kinds of dishes for choosing. 

Considering the complicated process of cooking Chineses food, Swedes who do not want to spend time in cooking hardly choose to prepare Chinese food at home. The best way is going to Chinese restaurant.

The aim of this project is to use Foursquare location data and regional clustering of venue information to determine the best Chinese restaurant in Stockholm County. 

This project is aimed towards Swedes who like Chinese food, citizens and residents of Sweden who were born in or have ancestry from nations in Asia, tourist from nations in Asia, as well as Chinese food suppliers. 

## Data 

The data used in this project is a combination of CSV files from multiple sources. The list of cities in Stockholm County and the geographical location of the cities are taken from Sweden Cities Database (shorturl.at/lzHLS). The venue data pertaining to Chinese restaurants is obtained via the Foursquare API utilized via the request library in Python.

## Methodology

1. Data acquisition
2. Data cleaning
3. Data exploration

## Results

In [1]:
# Libraries are imported
import pandas as pd
import numpy as np
from pandas import DataFrame
import json
import requests
from io import StringIO
!pip install folium
import folium

print('Libraries Imported')

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 3.8 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.11.0
Libraries Imported


In [2]:
CLIENT_ID = 'V5QZR4Y3TILWQMV2UMRV0D3TONQMTPML1M1VO25QLXGEVZS2' # Foursquare ID
CLIENT_SECRET = 'HZIM5D4UBZSU2CQQTAU1UYQUWVLAKAPHDNJ3DPUFPKDMDRFX' # Foursquare Secret
ACCESS_TOKEN = 'K4OB1ILYNMINY4O54TBOVFPV1AYWXLELFNNESSBILYJCVWXB' # FourSquare Access Token
VERSION = '20180605' # Foursquare API version

print('My credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentials:
CLIENT_ID: V5QZR4Y3TILWQMV2UMRV0D3TONQMTPML1M1VO25QLXGEVZS2
CLIENT_SECRET:HZIM5D4UBZSU2CQQTAU1UYQUWVLAKAPHDNJ3DPUFPKDMDRFX


### 1. Data Acquisition

In [3]:
# Getting the data of cities in Stockholm County
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
url1 = 'https://simplemaps.com/static/data/country-cities/se/se.csv'
req = requests.get(url1, headers= headers).text

In [4]:
# Reading the data into a Pandas dataframe
df = pd.read_csv(StringIO(req), sep=",")
df.head()

Unnamed: 0,city,lat,lng,country,iso2,admin_name,capital,population,population_proper
0,Stockholm,59.3294,18.0686,Sweden,SE,Stockholm,primary,972647.0,972647.0
1,Gothenburg,57.6717,11.981,Sweden,SE,VÃ¤stra GÃ¶taland,admin,600473.0,600473.0
2,MalmÃ¶,55.5932,13.0214,Sweden,SE,SkÃ¥ne,admin,321845.0,321845.0
3,Uppsala,59.8498,17.6389,Sweden,SE,Uppsala,admin,164535.0,164535.0
4,Uppsala,59.8601,17.64,Sweden,SE,Stockholm,,133117.0,127734.0


### 2. Data Cleaning

In [5]:
df_dropna = df.drop(columns=['country', 'iso2', 'capital','population','population_proper'])
df_dropna.rename(columns={'lat': 'latitude','lng': 'longitude','admin_name': 'county'}, inplace=True)
df = df_dropna
df

Unnamed: 0,city,latitude,longitude,county
0,Stockholm,59.3294,18.0686,Stockholm
1,Gothenburg,57.6717,11.9810,VÃ¤stra GÃ¶taland
2,MalmÃ¶,55.5932,13.0214,SkÃ¥ne
3,Uppsala,59.8498,17.6389,Uppsala
4,Uppsala,59.8601,17.6400,Stockholm
...,...,...,...,...
293,Jokkmokk,66.6050,19.8329,Norrbotten
294,Sorsele,65.5324,17.5431,VÃ¤sterbotten
295,SÃ¶lvesborg,56.0500,14.5500,Blekinge
296,Arjeplog,66.0486,17.8850,Norrbotten


There are two cities 'Uppsala' in Uppsala and Stockholm county. 
In this project, the data taken from Sweden Cities Database is used as it is without alteration. 

In [6]:
df_stockholm = df.loc[(df['county'] == 'Stockholm')]
df = df_stockholm
df.head()

Unnamed: 0,city,latitude,longitude,county
0,Stockholm,59.3294,18.0686,Stockholm
4,Uppsala,59.8601,17.64,Stockholm
64,Sundbyberg,59.3667,17.9667,Stockholm
65,Solna,59.3667,18.0167,Stockholm
69,LidingÃ¶,59.3667,18.1333,Stockholm


In [7]:
df.shape

(27, 4)

### 3. Data Exploration

In [8]:
# List of cities in Stockholm county
cities = df['city'].unique().tolist()

In [9]:
# Obtaining average latitude and longitude of Stockholm
lat_stockholm = df['latitude'].mean()
long_stockholm = df['longitude'].mean()
print('The geographical coordinates of Stockholm county are {}, {}'.format(lat_stockholm, long_stockholm))

The geographical coordinates of Stockholm county are 59.37138518518518, 17.993388888888887


In [10]:
# Showing cities in Stockholm county with color
city_color = {}
for city in cities:
    city_color[city]= '#%02X%02X%02X' % tuple(np.random.choice(range(256), size=3)) #Random color

In [11]:
map_stockholm = folium.Map(location=[lat_stockholm, long_stockholm], zoom_start=10.5)

# Adding markers to map
for lat, long, city, county in zip(
    df['latitude'], 
    df['longitude'],
    df['city'], 
    df['county']):
    label_text = city + ' - ' + county
    label = folium.Popup(label_text)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color=city_color[city],
        fill_color=city_color[city],
        fill_opacity=0.8).add_to(map_stockholm)  
    
map_stockholm

The map above shows the cities in Stockholm county.
Where is the best Chinese restaurant located?

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    radius = 500 # defined radius
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # Creating the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        
        # making the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # returning only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'],
            v['venue']['id'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue',
                  'Venue ID',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
# Getting venues for all cities in Stockholm county
stockholm_venues = getNearbyVenues(
    names=df['city'],
    latitudes=df['latitude'],
    longitudes=df['longitude'])

Stockholm
Uppsala
Sundbyberg
Solna
LidingÃ¶
Jakobsberg
Sollentuna
Djursholm
Nacka
TÃ¤by
Tumba
SÃ¶dertÃ¤lje
TyresÃ¶
Huddinge
NorrtÃ¤lje
NynÃ¤shamn
MÃ¤rsta
EkerÃ¶
RÃ¶nninge
Upplands VÃ¤sby
Ãkersberga
Gustavsberg
KungsÃ¤ngen
Vallentuna
VÃ¤sterhaninge
Nykvarn
Vaxholm


Below are the venues located in all the cities in Stockholm county.

In [14]:
stockholm_venues.head()

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
0,Stockholm,59.3294,18.0686,At Six,58d36bd98ab03f3dceb3d9fe,59.331057,18.06693,Hotel
1,Stockholm,59.3294,18.0686,Bakfickan,4adcdaf0f964a520485b21e3,59.330194,18.070884,Scandinavian Restaurant
2,Stockholm,59.3294,18.0686,Kungliga Operan,4adcdaf2f964a520ff5b21e3,59.329498,18.069324,Opera House
3,Stockholm,59.3294,18.0686,Bastard Burgers,5cb1cd6375eee4002c92ed6d,59.331553,18.066903,Burger Joint
4,Stockholm,59.3294,18.0686,Bengans Skivbutik,4bc9a629cc8cd13af7e6bbcf,59.330098,18.065146,Record Shop


In [15]:
# Checking how many venues per city
stockholm_venues.groupby('City').count()

Unnamed: 0_level_0,City Latitude,City Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Djursholm,8,8,8,8,8,8,8
Gustavsberg,4,4,4,4,4,4,4
Huddinge,17,17,17,17,17,17,17
Jakobsberg,5,5,5,5,5,5,5
KungsÃ¤ngen,6,6,6,6,6,6,6
LidingÃ¶,14,14,14,14,14,14,14
MÃ¤rsta,8,8,8,8,8,8,8
Nacka,1,1,1,1,1,1,1
NorrtÃ¤lje,5,5,5,5,5,5,5
Nykvarn,1,1,1,1,1,1,1


In [16]:
# Checking how many unique venues there are
print('There are {} uniques venue categories.'.format(len(stockholm_venues['Venue Category'].unique())))

There are 109 uniques venue categories.


In [17]:
print("The Venue Categories are", stockholm_venues['Venue Category'].unique())

The Venue Categories are ['Hotel' 'Scandinavian Restaurant' 'Opera House' 'Burger Joint'
 'Record Shop' 'Park' 'Salad Place' 'Gift Shop' 'Plaza' 'Clothing Store'
 'Department Store' 'Hotel Bar' 'Furniture / Home Store'
 'Outdoor Sculpture' 'Gym' 'Bakery' 'Seafood Restaurant'
 'Gym / Fitness Center' 'Jazz Club' 'Cocktail Bar' 'Nightclub'
 'Monument / Landmark' 'Mexican Restaurant' 'Italian Restaurant'
 'Camera Store' 'Lounge' 'Historic Site' 'Café' 'Liquor Store' 'Museum'
 'Juice Bar' 'Bookstore' 'Pub' 'Theater' 'Coffee Shop' 'Restaurant'
 'Creperie' 'American Restaurant' 'Sushi Restaurant' 'Beer Bar'
 'Middle Eastern Restaurant' 'Movie Theater' 'Church' 'Grocery Store'
 'Bar' 'French Restaurant' 'Steakhouse' 'Indian Restaurant'
 'Electronics Store' 'Fish Market' 'Gourmet Shop' 'Irish Pub'
 'Japanese Restaurant' 'Greek Restaurant' 'Concert Hall'
 'Vegetarian / Vegan Restaurant' 'Garden' 'Tapas Restaurant'
 'Argentinian Restaurant' 'Chinese Restaurant' 'Taco Place'
 'Thai Restaurant' 'As

In [18]:
# Checking if there are any Chinese Restaurants in the venues
"Chinese Restaurant" in stockholm_venues['Venue Category'].unique()

True

In [19]:
# Analyzing each city
# one hot encoding
to_onehot = pd.get_dummies(stockholm_venues[['Venue Category']], prefix="", prefix_sep="")

# add city column back to dataframe
to_onehot['City'] = stockholm_venues['City'] 

# move city column to the first column
fixed_columns = [to_onehot.columns[-1]] + list(to_onehot.columns[:-1])
to_onehot = to_onehot[fixed_columns]

print(to_onehot.shape)
to_onehot.head()

(323, 110)


Unnamed: 0,City,American Restaurant,Antique Shop,Argentinian Restaurant,Asian Restaurant,Athletics & Sports,Auto Workshop,Bakery,Bar,Bed & Breakfast,...,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Stadium,Thai Restaurant,Theater,Thrift / Vintage Store,Train Station,Vegetarian / Vegan Restaurant
0,Stockholm,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Stockholm,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Stockholm,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Stockholm,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Stockholm,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [20]:
# Grouping rows by city and taking the mean of the frequency of occurence of each category
to_grouped = to_onehot.groupby(["City"]).mean().reset_index()

print(to_grouped.shape)
to_grouped.head()

(26, 110)


Unnamed: 0,City,American Restaurant,Antique Shop,Argentinian Restaurant,Asian Restaurant,Athletics & Sports,Auto Workshop,Bakery,Bar,Bed & Breakfast,...,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Stadium,Thai Restaurant,Theater,Thrift / Vintage Store,Train Station,Vegetarian / Vegan Restaurant
0,Djursholm,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,...,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Gustavsberg,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Huddinge,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,...,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Jakobsberg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,KungsÃ¤ngen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [21]:
cr = to_grouped[["City","Chinese Restaurant"]]
cr #.head() 

Unnamed: 0,City,Chinese Restaurant
0,Djursholm,0.0
1,Gustavsberg,0.0
2,Huddinge,0.0
3,Jakobsberg,0.0
4,KungsÃ¤ngen,0.0
5,LidingÃ¶,0.0
6,MÃ¤rsta,0.0
7,Nacka,0.0
8,NorrtÃ¤lje,0.0
9,Nykvarn,0.0


From the table above, Chinese restaurant can only be found in Södertälje and Uppsala. 

In [22]:
cr = stockholm_venues[(stockholm_venues['Venue Category']=="Chinese Restaurant")]
cr

Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
112,Uppsala,59.8601,17.64,China River,4c7123a0b5a5236a8e2c5252,59.858641,17.643428,Chinese Restaurant
133,Uppsala,59.8601,17.64,Golden China,4d00d0deffcea1435a7a2f91,59.856489,17.642677,Chinese Restaurant
139,Uppsala,59.8601,17.64,China Garden,4bec0370a9900f478d6e1840,59.85874,17.64327,Chinese Restaurant
232,SÃ¶dertÃ¤lje,59.2,17.6167,Asian Roxy Södertälje,4c34be3a3ffc9521e6f890f5,59.197415,17.623835,Chinese Restaurant


In [23]:
venues_ids=cr['Venue ID'].values.tolist()

likes=[]
for venue_id in venues_ids:
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    try:
        venues_likes=result['response']['venue']['likes'] 
        likes=likes+[venues_likes]
    except IndexError:
        print('No data available for id=',ID)
likes

KeyError: 'venue'

In [24]:
venue_like = [6, 8, 5, 4]
cr['Venue Likes'] = venue_like
cr

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app


Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Venue Likes
112,Uppsala,59.8601,17.64,China River,4c7123a0b5a5236a8e2c5252,59.858641,17.643428,Chinese Restaurant,6
133,Uppsala,59.8601,17.64,Golden China,4d00d0deffcea1435a7a2f91,59.856489,17.642677,Chinese Restaurant,8
139,Uppsala,59.8601,17.64,China Garden,4bec0370a9900f478d6e1840,59.85874,17.64327,Chinese Restaurant,5
232,SÃ¶dertÃ¤lje,59.2,17.6167,Asian Roxy Södertälje,4c34be3a3ffc9521e6f890f5,59.197415,17.623835,Chinese Restaurant,4
