# The Melting Pot of Berlin

### Introduction

Have you ever asked yourself where currywurst was invented? The invention of currywurst is attributed to a woman called Herta Heuwer in Berlin in 1949, after she obtained ketchup and curry powder from British soldiers in Germany. She mixed these ingredients with other spices and poured it over grilled pork sausage. Heuwer started selling the cheap but filling snack at a street stand in the Charlottenburg district of Berlin, where it became popular with construction workers rebuilding the devastated city. Although, currywurst is the most popular and well-known food of Berlin, the city still can offer a lot more. Especially when it comes to international food, since it's the second home for more than million people with migration background. Therfore, it's somehow interesting to learn more about food in Berlin and how migration affects local food and traditional restaurants. 
In the following notebook, we will analyze the impact of migration on local and traditional restaurants in Berlin based on the distribution of local restaurants around the city.

### Data

Therefor, we will use official data from the Statistical Office of Berlin-Brandenburg about registered residents with migration background which was published by the Federal State of Berlin in 2018. Moreover, we will be using data about the boroughs of Berlin, also published by the Federal State, in combination with Foursquare location data in order to learn more about local restaurants in Berlin. So let's get started!

### Code

Download all dependencies that we will need.

In [1]:
import numpy as np # Library to handle data in a vectorized manner

import pandas as pd # Library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # Library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # Library to handle requests
from pandas.io.json import json_normalize # Tranform JSON file into a pandas dataframe

import bs4 as bs
from bs4 import BeautifulSoup as soup

from urllib.request import urlopen as uReq

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Download data about registered residents with migration background from csv-file.

In [2]:
br_data = pd.read_csv("https://www.statistik-berlin-brandenburg.de/opendata/EWRMIGRA201812H_Matrix.csv", sep=";")
br_data.head()

Unnamed: 0,ZEIT,RAUMID,BEZ,PGR,BZR,PLR,STADTRAUM,MH_E,HK_EU15,HK_EU28,HK_Polen,HK_EheJug,HK_EheSU,HK_Turk,HK_Arab,HK_Sonst,HK_NZOrd
0,201812,4051655,4,5,16,55,1,3162,539,1015,273,134,329,289,379,810,206
1,201812,4051656,4,5,16,56,1,2505,596,1059,211,75,186,169,186,711,119
2,201812,4061757,4,6,17,57,2,24,3,3,0,0,0,0,9,12,0
3,201812,5010101,5,1,1,1,2,2874,201,802,440,168,283,393,537,478,213
4,201812,5010102,5,1,1,2,2,2230,146,755,341,207,210,320,248,386,104


Let's take a look at the structure of the dataframe.

In [3]:
print("The dataframe has {} rows and {} columns.".format(br_data.shape[0], br_data.shape[1]))

The dataframe has 447 rows and 17 columns.


Let's take a look at the data types of the columns.

In [4]:
br_data.dtypes

ZEIT         int64
RAUMID       int64
BEZ          int64
PGR          int64
BZR          int64
PLR          int64
STADTRAUM    int64
MH_E         int64
HK_EU15      int64
HK_EU28      int64
HK_Polen     int64
HK_EheJug    int64
HK_EheSU     int64
HK_Turk      int64
HK_Arab      int64
HK_Sonst     int64
HK_NZOrd     int64
dtype: object

As you can see all columns contain data of type int.

Let's get a statistical summary of the data.

In [5]:
br_data.describe()

Unnamed: 0,ZEIT,RAUMID,BEZ,PGR,BZR,PLR,STADTRAUM,MH_E,HK_EU15,HK_EU28,HK_Polen,HK_EheJug,HK_EheSU,HK_Turk,HK_Arab,HK_Sonst,HK_NZOrd
count,447.0,447.0,447.0,447.0,447.0,447.0,447.0,447.0,447.0,447.0,447.0,447.0,447.0,447.0,447.0,447.0,447.0
mean,201812.0,6280031.0,6.237136,4.194631,9.344519,13.85906,1.684564,2856.756152,410.485459,922.449664,259.704698,145.449664,289.850112,405.991051,327.369128,604.715884,160.930649
std,0.0,3353580.0,3.331755,5.521679,9.270347,13.279854,0.46521,2494.304461,503.505032,817.719453,217.986161,157.272896,291.381059,672.526779,359.229164,519.020047,201.891686
min,201812.0,1011101.0,1.0,1.0,1.0,1.0,1.0,12.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,201812.0,4020206.0,4.0,2.0,3.0,3.0,1.0,1112.5,106.0,353.0,109.0,42.0,107.5,49.0,99.0,236.0,35.0
50%,201812.0,6020411.0,6.0,3.0,7.0,9.0,2.0,2147.0,212.0,712.0,209.0,97.0,213.0,152.0,211.0,473.0,93.0
75%,201812.0,9030902.0,9.0,4.0,12.0,23.0,2.0,4062.0,504.0,1283.0,341.0,191.5,363.5,413.0,440.5,827.0,199.5
max,201812.0,12304310.0,12.0,30.0,45.0,57.0,2.0,17200.0,3451.0,5152.0,1252.0,1232.0,2243.0,5435.0,2541.0,2749.0,1441.0


As you can see for some columns it's not that useful to determine descriptive statistics. So let's start cleaning the data by dropping all columns that we won't use in our analysis.

In [6]:
br_data.drop(["ZEIT","RAUMID", "PGR", "BZR", "PLR", "STADTRAUM", "HK_EU15", "HK_EU28", "HK_Polen", "HK_EheJug", "HK_EheSU", "HK_Turk","HK_Arab", "HK_Sonst", "HK_NZOrd"], axis=1, inplace=True)

# BEZ stands for the Admin. Nr. of the borough and MH_E for the total number of registered people with migration background. So let's rename them.
br_data.rename(columns={"BEZ": "Ad. Nr.", "MH_E":"People with migration background"}, inplace=True)

# Sort values by Ad. Nr. in ascending order 
br_data.sort_values("Ad. Nr.", inplace=True)

# Group data by Ad. Nr. and count the sum of people with migration background in each borough
br_data = br_data.groupby("Ad. Nr.").sum()
br_data

Unnamed: 0_level_0,People with migration background
Ad. Nr.,Unnamed: 1_level_1
1,204267
2,126654
3,88243
4,142723
5,88611
6,84995
7,131439
8,152870
9,41865
10,52508


Now, we need the official names of the boroughs. So let's download them by importing the following csv-file.

In [7]:
brgh = pd.read_csv("https://tsb-opendata.s3.eu-central-1.amazonaws.com/bezirksgrenzen/bezirksgrenzen.csv")
brgh.head()

Unnamed: 0,gml_id,Gemeinde_name,Gemeinde_schluessel,Land_name,Land_schluessel,Schluessel_gesamt
0,s_wfs_alkis_bezirk.F176__1,Reinickendorf,12,Berlin,11,11000012
1,s_wfs_alkis_bezirk.F176__2,Charlottenburg-Wilmersdorf,4,Berlin,11,11000004
2,s_wfs_alkis_bezirk.F176__3,Treptow-Köpenick,9,Berlin,11,11000009
3,s_wfs_alkis_bezirk.F176__4,Pankow,3,Berlin,11,11000003
4,s_wfs_alkis_bezirk.F176__5,Neukölln,8,Berlin,11,11000008


As you can see, the dataframe needs some cleaning work.

Therefor, rename important columns, sort the data by Ad. Nr. in ascending order and set it as index.

In [8]:
brgh.rename(columns={"Gemeinde_schluessel": "Ad. Nr.", "Gemeinde_name":"Borough"}, inplace=True)
brgh.sort_values("Ad. Nr.", ascending=True, inplace=True)
brgh.reset_index(drop=True, inplace=True)
brgh.set_index("Ad. Nr.", inplace=True)
brgh

Unnamed: 0_level_0,gml_id,Borough,Land_name,Land_schluessel,Schluessel_gesamt
Ad. Nr.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,s_wfs_alkis_bezirk.F176__10,Mitte,Berlin,11,11000001
2,s_wfs_alkis_bezirk.F176__11,Friedrichshain-Kreuzberg,Berlin,11,11000002
3,s_wfs_alkis_bezirk.F176__4,Pankow,Berlin,11,11000003
4,s_wfs_alkis_bezirk.F176__2,Charlottenburg-Wilmersdorf,Berlin,11,11000004
5,s_wfs_alkis_bezirk.F176__8,Spandau,Berlin,11,11000005
6,s_wfs_alkis_bezirk.F176__9,Steglitz-Zehlendorf,Berlin,11,11000006
7,s_wfs_alkis_bezirk.F176__12,Tempelhof-Schöneberg,Berlin,11,11000007
8,s_wfs_alkis_bezirk.F176__5,Neukölln,Berlin,11,11000008
9,s_wfs_alkis_bezirk.F176__3,Treptow-Köpenick,Berlin,11,11000009
10,s_wfs_alkis_bezirk.F176__7,Marzahn-Hellersdorf,Berlin,11,11000010


Now, we will merge both dataframes and set Ad. Nr. as index.

In [9]:
brgh = brgh[["Borough"]]
brgh["People with migration background"] = br_data["People with migration background"]
br_data = brgh
br_data

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0_level_0,Borough,People with migration background
Ad. Nr.,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Mitte,204267
2,Friedrichshain-Kreuzberg,126654
3,Pankow,88243
4,Charlottenburg-Wilmersdorf,142723
5,Spandau,88611
6,Steglitz-Zehlendorf,84995
7,Tempelhof-Schöneberg,131439
8,Neukölln,152870
9,Treptow-Köpenick,41865
10,Marzahn-Hellersdorf,52508


Now, let's get geolocation data for each borough.

In [10]:
# Import dependencies
from geopy.extra.rate_limiter import RateLimiter 

locator = Nominatim(user_agent="myGeocoder")
location = locator.geocode("Berlin, DE")

# Conveneint function to delay between geocoding calls
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

# Create location column
br_data['Location'] = br_data['Borough'].apply(geocode)

# Create longitude, laatitude and altitude from location column (returns tuple)
br_data['Point'] = br_data['Location'].apply(lambda loc: tuple(loc.point) if loc else None)

# Split point column into latitude, longitude and altitude columns
br_data[['Latitude', 'Longitude', 'Altitude']] = pd.DataFrame(br_data['Point'].tolist(), index=br_data.index)
br_data.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0_level_0,Borough,People with migration background,Location,Point,Latitude,Longitude,Altitude
Ad. Nr.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Mitte,204267,"(Mitte, Berlin, Deutschland, (52.5176896, 13.4...","(52.5176896, 13.4023757, 0.0)",52.51769,13.402376,0.0
2,Friedrichshain-Kreuzberg,126654,"(Friedrichshain-Kreuzberg, Berlin, Deutschland...","(52.5153063, 13.4616117, 0.0)",52.515306,13.461612,0.0
3,Pankow,88243,"(Pankow, Berlin, Deutschland, (52.597636699999...","(52.597636699999995, 13.436373975411648, 0.0)",52.597637,13.436374,0.0
4,Charlottenburg-Wilmersdorf,142723,"(Charlottenburg-Wilmersdorf, Deutschland, (52....","(52.5078558, 13.2639518, 0.0)",52.507856,13.263952,0.0
5,Spandau,88611,"(Spandau, Deutschland, (52.535788, 13.1977924))","(52.535788, 13.1977924, 0.0)",52.535788,13.197792,0.0


Let's drop unnecessary columns like Location, Point and Altitude.

In [11]:
br_data.drop(["Location", "Point", "Altitude"], axis=1, inplace=True)
br_data

Unnamed: 0_level_0,Borough,People with migration background,Latitude,Longitude
Ad. Nr.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,Mitte,204267,52.51769,13.402376
2,Friedrichshain-Kreuzberg,126654,52.515306,13.461612
3,Pankow,88243,52.597637,13.436374
4,Charlottenburg-Wilmersdorf,142723,52.507856,13.263952
5,Spandau,88611,52.535788,13.197792
6,Steglitz-Zehlendorf,84995,52.429205,13.229974
7,Tempelhof-Schöneberg,131439,52.440603,13.373703
8,Neukölln,152870,52.48115,13.43535
9,Treptow-Köpenick,41865,52.417893,13.600185
10,Marzahn-Hellersdorf,52508,52.522523,13.587663


As you may have noticed, the latitude and longitude of Lichtenberg are obviously wrong. So let's correct them.

In [12]:
location = "Lichtenberg, Berlin, DE"

geolocator = Nominatim(user_agent = "br_explorer")
location = geolocator.geocode(location)
latitude = location.latitude
longitude = location.longitude
print("The geograpical coordinates of Lichtenberg are {}, {}.".format(latitude, longitude))

The geograpical coordinates of Lichtenberg are 52.5321606, 13.5118927.


Now, we have the right coordinates. Let's add them to our data.

In [13]:
br_data.loc[11,"Latitude"] = latitude
br_data.loc[11,"Longitude"] = longitude
br_data

Unnamed: 0_level_0,Borough,People with migration background,Latitude,Longitude
Ad. Nr.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,Mitte,204267,52.51769,13.402376
2,Friedrichshain-Kreuzberg,126654,52.515306,13.461612
3,Pankow,88243,52.597637,13.436374
4,Charlottenburg-Wilmersdorf,142723,52.507856,13.263952
5,Spandau,88611,52.535788,13.197792
6,Steglitz-Zehlendorf,84995,52.429205,13.229974
7,Tempelhof-Schöneberg,131439,52.440603,13.373703
8,Neukölln,152870,52.48115,13.43535
9,Treptow-Köpenick,41865,52.417893,13.600185
10,Marzahn-Hellersdorf,52508,52.522523,13.587663


And there you have it, a nice dataframe with all the data we need.

In the **second step**, we will use Foursquare location data to learn more about local restaurants in Berlin.

Let's define Foursquare credentials and version.

In [14]:
CLIENT_ID = "HQHI11JH4DAVEGSIYZHDM4BA5DFETP3MQBXKJOSV4X3YX4RN" # Foursquare ID
CLIENT_SECRET = "ITTCVGBQRSJVDQQDGPCJ3SDUG5FV5B1MHBGYRAYQJLH2DT0V" # Foursquare Secret
VERSION = "20180605" # Foursquare API version

print("Your credentails:")
print("CLIENT_ID: " + CLIENT_ID)
print("CLIENT_SECRET:" + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HQHI11JH4DAVEGSIYZHDM4BA5DFETP3MQBXKJOSV4X3YX4RN
CLIENT_SECRET:ITTCVGBQRSJVDQQDGPCJ3SDUG5FV5B1MHBGYRAYQJLH2DT0V


Let's explore the boroughs of Berlin. We can start with Berlin Mitte, since it's one of the largest boroughs in Berlin.

In [15]:
br_loc = br_data.loc[1, "Borough"] # borough name
br_lat = br_data.loc[1, "Latitude"] # neighborhood latitude value
br_long = br_data.loc[1, "Longitude"] # neighborhood longitude value

print("Latitude and longitude values of {} are {}, {}.".format(br_loc, br_lat, br_long))

Latitude and longitude values of Mitte are 52.5176896, 13.4023757.


Now, let's get the top 100 venues which are located in **Berlin Mitte**.

In [16]:
radius = 5000 # Search radius in m
search_query = 'venues' # Search query
LIMIT = 100 # Results limit
exp_url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET,
                                                                                                                               br_lat, br_long,
                                                                                                                               VERSION, radius, LIMIT)
exp_url

'https://api.foursquare.com/v2/venues/explore?client_id=HQHI11JH4DAVEGSIYZHDM4BA5DFETP3MQBXKJOSV4X3YX4RN&client_secret=ITTCVGBQRSJVDQQDGPCJ3SDUG5FV5B1MHBGYRAYQJLH2DT0V&ll=52.5176896,13.4023757&v=20180605&radius=5000&limit=100'

Send the GET request and examine the results.

In [17]:
results = requests.get(exp_url).json()

Now, extract the category of the venues above in order to get restaurants data and clean the results.

In [18]:
def get_category_type(row):
    try:
        categories_list = row["categories"]
    except:
        categories_list = row["venue.categories"]
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]["name"]
    
venues = results["response"]["groups"][0]["items"]

# flatten JSON
nearby_venues = json_normalize(venues)
# filter columns
filtered_columns = ["venue.name", "venue.categories", "venue.location.lat", "venue.location.lng"]
nearby_venues =nearby_venues.loc[:, filtered_columns]
# filter the category for each row
nearby_venues["venue.categories"] = nearby_venues.apply(get_category_type, axis=1)
# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

Now, we have a dataframe with all venues within a radius of 5000 m around **Berlin Mitte**. Let's take a look at it.

In [32]:
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Lustgarten,Garden,52.518469,13.399454
1,Buchhandlung Walther König,Bookstore,52.521301,13.400758
2,Pierre Boulez Saal,Concert Hall,52.515333,13.396218
3,Fat Tire Bike Tours,Bike Rental / Bike Share,52.521233,13.40911
4,19grams,Coffee Shop,52.522697,13.40744


Now, let's create a function to apply the same process on all boroughs.

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # Make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # Return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ["Borough", 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each borough, create a new dataframe and name it **br_venues**.

In [33]:
br_venues = getNearbyVenues(names=br_data["Borough"],
                                   latitudes=br_data["Latitude"],
                                   longitudes=br_data["Longitude"]
                                  )

Mitte
Friedrichshain-Kreuzberg
Pankow
Charlottenburg-Wilmersdorf
Spandau
Steglitz-Zehlendorf
Tempelhof-Schöneberg
Neukölln
Treptow-Köpenick
Marzahn-Hellersdorf
Lichtenberg
Reinickendorf


Let's check the size of the resulting dataframe and take a look at it.

In [34]:
print(br_venues.shape)
br_venues.head()

(1167, 7)


Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mitte,52.51769,13.402376,Lustgarten,52.518469,13.399454,Garden
1,Mitte,52.51769,13.402376,Buchhandlung Walther König,52.521301,13.400758,Bookstore
2,Mitte,52.51769,13.402376,Pierre Boulez Saal,52.515333,13.396218,Concert Hall
3,Mitte,52.51769,13.402376,Fat Tire Bike Tours,52.521233,13.40911,Bike Rental / Bike Share
4,Mitte,52.51769,13.402376,19grams,52.522697,13.40744,Coffee Shop


Now, let's filter our venues in order to get an extra dataframe for restaurants.

In [35]:
idx = np.where((br_venues['Venue Category'].str.endswith('Restaurant')))
br_rest = br_venues.loc[idx]
br_rest.reset_index(drop=True, inplace=True)
br_rest.drop("Borough Latitude",axis=1, inplace=True)
br_rest.drop("Borough Longitude",axis=1, inplace=True)
br_rest.rename(columns={"Venue": "Restaurant", "Venue Latitude":"Latitude", "Venue Longitude":"Longitude", "Venue Category":"Category"}, inplace=True)
print(br_rest.shape)
br_rest.head()

(240, 5)


Unnamed: 0,Borough,Restaurant,Latitude,Longitude,Category
0,Mitte,Kin-Za,52.524928,13.395808,Caucasian Restaurant
1,Mitte,W - Der Imbiss,52.53423,13.405117,Vegetarian / Vegan Restaurant
2,Mitte,Steckerlfisch & Co. Arkonaplatz,52.536958,13.402143,Seafood Restaurant
3,Mitte,La Criolla Empanadas,52.534793,13.424607,Empanada Restaurant
4,Mitte,Gasthaus Alt Wien,52.531547,13.432682,Austrian Restaurant


Let's check how many restaurants were returned for each category.

In [41]:
br_rest.groupby(["Category"]).count().head(15)

Unnamed: 0_level_0,Borough,Restaurant,Latitude,Longitude
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
African Restaurant,4,4,4,4
American Restaurant,1,1,1,1
Argentinian Restaurant,8,8,8,8
Asian Restaurant,7,7,7,7
Austrian Restaurant,2,2,2,2
Caucasian Restaurant,1,1,1,1
Chinese Restaurant,7,7,7,7
Doner Restaurant,4,4,4,4
Dumpling Restaurant,2,2,2,2
Eastern European Restaurant,4,4,4,4


Let's find out how many unique categories can be curated from all the returned restaurants.

In [25]:
print("There are {} unique categories.".format(len(br_rest["Category"].unique())))

There are 37 unique categories.


In order to analyze local restaurants, let's filter the category "German Restaurant".

In [38]:
idx = np.where((br_rest['Category'].str.startswith('German'))) # Filter category "German"
br_rest_gr = br_rest.loc[idx]
br_rest_gr = br_rest_gr.groupby("Borough").count() # Group by bourough name and count
br_rest_gr.reset_index(inplace=True) # Reset index
br_rest_gr.head()

Unnamed: 0,Borough,Restaurant,Latitude,Longitude,Category
0,Charlottenburg-Wilmersdorf,2,2,2,2
1,Lichtenberg,1,1,1,1
2,Marzahn-Hellersdorf,1,1,1,1
3,Neukölln,1,1,1,1
4,Pankow,2,2,2,2


Now, let's try to visulize both dataframes br_data and  br_rest_gr using folium and choropleth maps. Therefor, download geojson file for boroughs.

In [26]:
# download boroughs geojson file
!wget --quiet https://tsb-opendata.s3.eu-central-1.amazonaws.com/bezirksgrenzen/bezirksgrenzen.geojson
print("Download completed.")

Download completed.


Create a map of Berlin using folium.

In [43]:
berlin_geo = r'https://tsb-opendata.s3.eu-central-1.amazonaws.com/bezirksgrenzen/bezirksgrenzen.geojson' # Read geojson file
lat = 52.5
lon = 13.42

# Create map
berlin_map1 = folium.Map(location=[lat, lon], zoom_start=10)
berlin_map1

Now, generate a choropleth map using the total number of registered people with migration background in Berlin.

In [44]:
# Generate choropleth map 
berlin_map1.choropleth(
    geo_data=berlin_geo,
    data=br_data,
    columns=['Borough', 'People with migration background'],
    key_on='feature.properties.Gemeinde_name',
    fill_color='Blues', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='People with migration background in Berlin'
)

berlin_map1

Now, let's create a new map of Berlin in order to visualize the distribution of local restaurants in Berlin.

In [45]:
# Create a new Berlin map
berlin_map2 = folium.Map(location=[lat, lon], zoom_start=10)
berlin_map2

Create choropleth map of German restaurants.

In [46]:
berlin_map2.choropleth(
    geo_data=berlin_geo,
    data=br_rest_gr,
    columns=['Borough', 'Restaurant'],
    key_on='feature.properties.Gemeinde_name',
    fill_color='Blues', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='German restaurants in Berlin'
)

berlin_map2

As you may have noticed when you compare both choropleth maps, boroughs with a high proportion of people with migration background have generally less local restaurants than boroughs with less proportion of people with migration background.

### Final report

Coming soon