# Applied Data Science Capstone (IBM) - Week 4

#### Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods (Week 1) - Part 2 (Data)

<div style="text-align: right"> by Tim Kießling </div> 

For this project, two data sources are used. The first data source[<sup>1</sup>](#fn1) is provided by the city Munich and provides data about the citizens age of each of Munich districts. The names of Munichs districts will be taken from said dataset and used with the geocoder library to get their respective geo coordinates. Another dataset from Airbnb [<sup>2</sup>](#fn2) is used for drawing the borders of each district on a map. Afterwards, the geo coordinates will be used with the Foursquare API in order to get the closest 100 venues in a radius of 1500 meters of each district. 

### Table of Contents

* [Dataset - Age Quotient](#chapter1)
    * [Dataset Description](#section_1_1)
    * [Data Display](#section_1_2)
    * [Basic Data Cleanup](#section_1_3)
    * [Clean Dataset](#section_1_4)

* [Dataset - Geospatial Coordinates](#chapter2)
    * [Get Address and Coordinates](#section_2_1)
    * [Get District Borders](#section_2_2)
    
* [Dataset - Venues](#chapter3)
    * [Create and Send GET Request](#section_3_1)
    * [Extract Venues](#section_3_2)
    * [Explore all Districts](#section_3_3)
    
* [Sources](#chapter4)

### Dataset - Age Quotient <a class="anchor" id="chapter1"></a>

#### Data Description <a class="anchor" id="section_1_1"></a>

This dataset[<sup>1</sup>](#fn1) includes the following data about Munich from the year 2000 to 2017: 
- all 25 disctrict names 
- "aging quotient" in percent for each district

The "aging quotient" (aq) is calculated by the number of citizins older than 65 divided by the number of citizins between the age of 0 to 15 multiplied by 100.

$aq = \frac{citizens \, (age\, \ge \, 65)}{citizens \, (age \, is \, [0, 15])} \times 100$

Interpretation of the aq: 
- aq > 100: There are less citizens of age $[0, 15]$ than there are citizens of age $\ge 65$. 
- aq = 100: The number of citizins of age $[0, 15]$ and of age $\ge 65$ is equal.
- aq < 100: There are more citizens of age $[0, 15]$ than there are citizens of age $\ge 65$.

#### Data Display <a class="anchor" id="section_1_2"></a>

Load data and display first five rows

In [124]:
import pandas as pd

path = "data/"
fn = "indikatorenatlas1812bevoelkerungueberalterungsquotient.csv"

df = pd.read_csv(path + fn)
df.head(5)

Unnamed: 0,Indikator,Ausprägung,Jahr,Räumliche Gliederung,Indikatorwert,Basiswert 1,Basiswert 2,Basiswert 3,Basiswert 4,Basiswert 5,Name Basiswert 1,Name Basiswert 2,Name Basiswert 3,Name Basiswert 4,Name Basiswert 5
0,Überalterungsquotient,Ausländer_innen,2017,Stadt München,130.7,44352,33935,,,,Anzahl Einwohner ab 65 (Ausländer),Anzahl Einwohner jünger 15 (Ausländer),,,
1,Überalterungsquotient,Ausländer_innen,2017,01 Altstadt - Lehel,187.2,646,345,,,,Anzahl Einwohner ab 65 (Ausländer),Anzahl Einwohner jünger 15 (Ausländer),,,
2,Überalterungsquotient,Ausländer_innen,2017,02 Ludwigsvorstadt - Isarvorstadt,183.3,1593,869,,,,Anzahl Einwohner ab 65 (Ausländer),Anzahl Einwohner jünger 15 (Ausländer),,,
3,Überalterungsquotient,Ausländer_innen,2017,03 Maxvorstadt,202.4,1178,582,,,,Anzahl Einwohner ab 65 (Ausländer),Anzahl Einwohner jünger 15 (Ausländer),,,
4,Überalterungsquotient,Ausländer_innen,2017,04 Schwabing - West,175.2,1689,964,,,,Anzahl Einwohner ab 65 (Ausländer),Anzahl Einwohner jünger 15 (Ausländer),,,


#### Basic Data Cleanup <a class="anchor" id="section_1_3"></a>

Select only the relevant columns

In [125]:
# the age quotient is listed divided into three categories:
# 'immigrants', 'germans' and 'total'
# select only the columns wiht 'total'
df_clean_ger = df.loc[df['Ausprägung'] == 'gesamt', ["Jahr", "Räumliche Gliederung", "Indikatorwert"]]
df_clean_ger.reset_index(drop=True, inplace=True) # reset index
df_clean_ger.head(5)

Unnamed: 0,Jahr,Räumliche Gliederung,Indikatorwert
0,2017,Stadt München,134.5
1,2017,01 Altstadt - Lehel,161.3
2,2017,02 Ludwigsvorstadt - Isarvorstadt,107.2
3,2017,03 Maxvorstadt,138.7
4,2017,04 Schwabing - West,148.6


Rename the columns to English

In [126]:
df_clean_eng = df_clean_ger.rename(columns={"Jahr": "year", 
                                            "Räumliche Gliederung": "district",
                                            "Indikatorwert": "aq"})

df_clean_eng.head(5)

Unnamed: 0,year,district,aq
0,2017,Stadt München,134.5
1,2017,01 Altstadt - Lehel,161.3
2,2017,02 Ludwigsvorstadt - Isarvorstadt,107.2
3,2017,03 Maxvorstadt,138.7
4,2017,04 Schwabing - West,148.6


Some stats of the dataframe. There should be 468 rows since the dataset includes 18 years (2000-2017) and for every year there are 26 rows (25 districts and one row for the average)

In [127]:
import pprint

pprint.pp(df_clean_eng.shape)
pprint.pp(df_clean_eng.dtypes)

(468, 3)
year          int64
district     object
aq          float64
dtype: object


Convert columns "disctrict" to string

In [128]:
df_clean = df_clean_eng.astype({"district": "string", "year": "int16"})
pprint.pp(df_clean.dtypes)

year          int16
district     string
aq          float64
dtype: object


Remove enumeration of districts

In [129]:
df_clean.district = df_clean.district.str.replace('\d+', '') #remove enumeration of districts
df_clean.district = df_clean.district.str.lstrip() #strip left whitespaces
df_clean.head()

Unnamed: 0,year,district,aq
0,2017,Stadt München,134.5
1,2017,Altstadt - Lehel,161.3
2,2017,Ludwigsvorstadt - Isarvorstadt,107.2
3,2017,Maxvorstadt,138.7
4,2017,Schwabing - West,148.6


Save dataframe to new file for next assignment

In [130]:
df_clean.to_csv(path + "aq_munich.csv")
print("file saved")

file saved


### Clean Dataset <a class="anchor" id="section_1_4"></a>

Display first 10 rows. The first row of each year is contains the average aq of all the 25 districts

In [131]:
df_clean.head(10)

Unnamed: 0,year,district,aq
0,2017,Stadt München,134.5
1,2017,Altstadt - Lehel,161.3
2,2017,Ludwigsvorstadt - Isarvorstadt,107.2
3,2017,Maxvorstadt,138.7
4,2017,Schwabing - West,148.6
5,2017,Au - Haidhausen,112.1
6,2017,Sendling,127.8
7,2017,Sendling - Westpark,141.6
8,2017,Schwanthalerhöhe,101.5
9,2017,Neuhausen - Nymphenburg,134.9


Display last 10 rows. The last ten rows show the aq of districts from 2000 since the dataset contains all the aq's of all districts from 2000-2017

In [132]:
df_clean.tail(10)

Unnamed: 0,year,district,aq
458,2000,Ramersdorf - Perlach,97.7
459,2000,Obergiesing - Fasangarten,142.9
460,2000,Untergiesing - Harlaching,183.2
461,2000,Thalkirchen - Obersendling - Forstenried - Für...,168.5
462,2000,Hadern,133.5
463,2000,Pasing - Obermenzing,123.7
464,2000,Aubing - Lochhausen - Langwied,107.7
465,2000,Allach - Untermenzing,120.5
466,2000,Feldmoching - Hasenbergl,110.3
467,2000,Laim,187.3


### Dataset - Geospatial Coordinates <a class="anchor" id="chapter1"></a>

Geopy is a Python library which returns the geo coordinates, i.e. longitute and latitude, of an address, i.e. a city, district, neighborhood etc. The geo coordinates are needed for the Foursquare API to get a list of the venues of each district.

In [133]:
!pip install -U geopy #install geopy if not present
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import pandas as pd

Defaulting to user installation because normal site-packages is not writeable
Requirement already up-to-date: geopy in /home/tbfk/.local/lib/python3.8/site-packages (2.0.0)


In [134]:
# load dataset
path = 'data/'
fn = 'aq_munich.csv'
df_clean = pd.read_csv(path + fn)
df_clean = df_clean.drop(df_clean.columns[0], axis=1) #drop first unnamed column
df_clean.head(5)

Unnamed: 0,year,district,aq
0,2017,Stadt München,134.5
1,2017,Altstadt - Lehel,161.3
2,2017,Ludwigsvorstadt - Isarvorstadt,107.2
3,2017,Maxvorstadt,138.7
4,2017,Schwabing - West,148.6


#### Get Address and Coordinates <a class="anchor" id="section_2_1"></a>

Get geo coordinates of the first district 'Altstadt - Lehel'

In [135]:
# select name of first district in the dataset and 
# add 'Munich, Germany' to it
district = df_clean['district'].iloc[1]
print(district)
address = district + ', Munich, Germany'
print(address)

Altstadt - Lehel
Altstadt - Lehel, Munich, Germany


In [136]:
# get coordinates of address
geolocator = Nominatim(user_agent="munich_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinates of Altstadt - Lehel, Munich, Germany are 48.1378285, 11.5745823.


Get coordinates of all the 25 disctricts of Munich and create a new dataset

In [137]:
# select all data of year 2017
df_recent = df_clean.loc[df_clean['year'] == 2017]

df_recent

Unnamed: 0,year,district,aq
0,2017,Stadt München,134.5
1,2017,Altstadt - Lehel,161.3
2,2017,Ludwigsvorstadt - Isarvorstadt,107.2
3,2017,Maxvorstadt,138.7
4,2017,Schwabing - West,148.6
5,2017,Au - Haidhausen,112.1
6,2017,Sendling,127.8
7,2017,Sendling - Westpark,141.6
8,2017,Schwanthalerhöhe,101.5
9,2017,Neuhausen - Nymphenburg,134.9


Loop through all districts, get coordinates and store them in lists

In [138]:
lat_list = list()
long_list = list()

geolocator = Nominatim(user_agent="munich_explorer")

for district in df_recent['district']:
    latitude = None
    # loop until you get the coordinates
    while(latitude is None):
        if district == "Stadt München":
            address = 'Munich, Germany'
        else:
            address = district + ', Munich, Germany'
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        
    print('The geograpical coordinates of {} are {}, {}.'.format(address, latitude, longitude))
    lat_list.append(latitude)
    long_list.append(longitude)

print('Done')

The geograpical coordinates of Munich, Germany are 48.1371079, 11.5753822.
The geograpical coordinates of Altstadt - Lehel, Munich, Germany are 48.1378285, 11.5745823.
The geograpical coordinates of Ludwigsvorstadt - Isarvorstadt, Munich, Germany are 48.1303398, 11.5733658.
The geograpical coordinates of Maxvorstadt, Munich, Germany are 48.1510916, 11.5624179.
The geograpical coordinates of Schwabing - West, Munich, Germany are 48.1682709, 11.5698727.
The geograpical coordinates of Au - Haidhausen, Munich, Germany are 48.1287531, 11.5905362.
The geograpical coordinates of Sendling, Munich, Germany are 48.1180125, 11.5390832.
The geograpical coordinates of Sendling - Westpark, Munich, Germany are 48.11803085, 11.519332770284128.
The geograpical coordinates of Schwanthalerhöhe, Munich, Germany are 48.1337822, 11.5410566.
The geograpical coordinates of Neuhausen - Nymphenburg, Munich, Germany are 48.1542217, 11.5315172.
The geograpical coordinates of Moosach, Munich, Germany are 48.179894

In [139]:
# assign columns lat and long to the dataframe
df_recent['lat'] = lat_list
df_recent['long'] = long_list

df_recent.to_csv('data/munich_coord.csv')

df_recent

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_recent['lat'] = lat_list
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_recent['long'] = long_list


Unnamed: 0,year,district,aq,lat,long
0,2017,Stadt München,134.5,48.137108,11.575382
1,2017,Altstadt - Lehel,161.3,48.137828,11.574582
2,2017,Ludwigsvorstadt - Isarvorstadt,107.2,48.13034,11.573366
3,2017,Maxvorstadt,138.7,48.151092,11.562418
4,2017,Schwabing - West,148.6,48.168271,11.569873
5,2017,Au - Haidhausen,112.1,48.128753,11.590536
6,2017,Sendling,127.8,48.118012,11.539083
7,2017,Sendling - Westpark,141.6,48.118031,11.519333
8,2017,Schwanthalerhöhe,101.5,48.133782,11.541057
9,2017,Neuhausen - Nymphenburg,134.9,48.154222,11.531517


#### Get District Borders <a class="anchor" id="section_2_2"></a>

The dataset for the district borders was downloaded from Airbnb [<sup>2</sup>](#fn2)

In [36]:
!pip install -U geopandas
import geopandas as gpd

Defaulting to user installation because normal site-packages is not writeable
Requirement already up-to-date: geopandas in /home/tbfk/.local/lib/python3.8/site-packages (0.8.1)


In [56]:
# load and display data
df_recent = pd.read_csv('data/munich_coord.csv')
border_df = gpd.read_file('data/munich_neighbourhoods.geojson')
border_df.head(5)

Unnamed: 0,neighbourhood,neighbourhood_group,geometry
0,Altstadt-Lehel,,"MULTIPOLYGON (((11.59520 48.14170, 11.59500 48..."
1,Ludwigsvorstadt-Isarvorstadt,,"MULTIPOLYGON (((11.55600 48.14080, 11.55930 48..."
2,Maxvorstadt,,"MULTIPOLYGON (((11.58430 48.14420, 11.58310 48..."
3,Schwabing-West,,"MULTIPOLYGON (((11.58170 48.17630, 11.58320 48..."
4,Au-Haidhausen,,"MULTIPOLYGON (((11.59560 48.14050, 11.59590 48..."


In [58]:
# merge border coordinates with aq dataset
df_districts = df_recent.drop(df_recent.index[0]) # drop first row
df_districts = df_districts.drop(df_districts.columns[0], axis=1) #drop first unnamed column
df_districts = df_districts.reset_index(drop=True)
df_districts["border"] = border_df["geometry"]

df_districts.head(5)

Unnamed: 0,year,district,aq,lat,long,border
0,2017,Altstadt - Lehel,161.3,48.137828,11.574582,"MULTIPOLYGON (((11.59520 48.14170, 11.59500 48..."
1,2017,Ludwigsvorstadt - Isarvorstadt,107.2,48.13034,11.573366,"MULTIPOLYGON (((11.55600 48.14080, 11.55930 48..."
2,2017,Maxvorstadt,138.7,48.151092,11.562418,"MULTIPOLYGON (((11.58430 48.14420, 11.58310 48..."
3,2017,Schwabing - West,148.6,48.168271,11.569873,"MULTIPOLYGON (((11.58170 48.17630, 11.58320 48..."
4,2017,Au - Haidhausen,112.1,48.128753,11.590536,"MULTIPOLYGON (((11.59560 48.14050, 11.59590 48..."


In [61]:
# save dataset
df_districts.to_csv('data/munich_districts.csv')

### Dataset - Venues <a class="anchor" id="chapter3"></a>

To get the venues of all districts, the coordinates from the previous chapters will be used in conjunction with the Foursquare API

In [84]:
import json # library to handle JSON files
import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe

In [64]:
# define Foursquare credentials and version
CLIENT_ID = '01OVT5KIQSCY5QZDIOSYKWG4CS5TKJMOLVXFZVLJIHEXYI44' # your Foursquare ID
CLIENT_SECRET = '0QBZNZLQDCPTDNKSIM051QCREPEBH5ONLINXV15NC3F10F2I' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 01OVT5KIQSCY5QZDIOSYKWG4CS5TKJMOLVXFZVLJIHEXYI44
CLIENT_SECRET:0QBZNZLQDCPTDNKSIM051QCREPEBH5ONLINXV15NC3F10F2I


#### Create and Send GET Request <a class="anchor" id="section_3_1"></a>

In [103]:
# select first district and its coordinates
df_districts = pd.read_csv('data/munich_districts.csv')

district_lat = df_districts.loc[0, 'lat']
district_long = df_districts.loc[0, 'long']
district_name = df_districts.loc[0, 'district']

print('Latitude and longitude values of {} are {}, {}.'.format(district_name, 
                                                               district_lat, 
                                                               district_long))

Latitude and longitude values of Altstadt - Lehel are 48.1378285, 11.5745823.


In [113]:
# get top 100 venues of first district
# create search call 
search_query = district_name
radius = 1500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    district_lat, 
    district_long, 
    VERSION, 
    search_query, 
    radius, 
    LIMIT)
url


'https://api.foursquare.com/v2/venues/explore?client_id=01OVT5KIQSCY5QZDIOSYKWG4CS5TKJMOLVXFZVLJIHEXYI44&client_secret=0QBZNZLQDCPTDNKSIM051QCREPEBH5ONLINXV15NC3F10F2I&ll=48.1378285,11.5745823&v=20180605&query=Altstadt - Lehel&radius=1500&limit=100'

In [114]:
# send the GET request
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fd666c878c4fb260af1260b'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Kreuzviertel',
  'headerFullLocation': 'Kreuzviertel, Munich',
  'headerLocationGranularity': 'neighborhood',
  'query': 'altstadt lehel',
  'totalResults': 12,
  'suggestedBounds': {'ne': {'lat': 48.15132851350001,
    'lng': 11.594774087402236},
   'sw': {'lat': 48.124328486499984, 'lng': 11.554390512597763}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b27f300f964a520c68c24e3',
       'name': 'Liebighof im Lehel',
       'location': {'address': 'Liebigstr. 5',
        'crossStreet': 'Tattenbachstr.',
        'lat': 48.14164,
        'lng': 11.5864696,
        'la

#### Extract Venues <a class="anchor" id="section_3_2"></a>

In [115]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [116]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Liebighof im Lehel,German Restaurant,48.14164,11.58647
1,Mercure Hotel München Altstadt,Hotel,48.137061,11.57091
2,LEHEL Bar*Food*Club,Restaurant,48.139243,11.584722
3,H Lehel,Tram Station,48.139544,11.588764
4,ZebraMobil Südliches Lehel,Rental Car Location,48.13512,11.586585


In [117]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

12 venues were returned by Foursquare.


#### Explore all Districts <a class="anchor" id="section_3_3"></a>

In [118]:
# Function to get all the Venues from all districts in Munich 
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['district', 
                  'district_lat', 
                  'district_long', 
                  'venue', 
                  'venue_lat', 
                  'venue_long', 
                  'venue_category']
    
    return(nearby_venues)

In [119]:
munich_venues = getNearbyVenues(names=df_districts['district'],
                                   latitudes=df_districts['lat'],
                                   longitudes=df_districts['long']
                                  )

Altstadt - Lehel
Ludwigsvorstadt - Isarvorstadt
Maxvorstadt
Schwabing - West
Au - Haidhausen
Sendling
Sendling - Westpark
Schwanthalerhöhe
Neuhausen - Nymphenburg
Moosach
Milbertshofen - Am Hart
Schwabing - Freimann
Bogenhausen
Berg am Laim
Trudering - Riem
Ramersdorf - Perlach
Obergiesing - Fasangarten
Untergiesing - Harlaching
Thalkirchen - Obersendling - Forstenried - Fürstenried - Solln
Hadern
Pasing - Obermenzing
Aubing - Lochhausen - Langwied
Allach - Untermenzing
Feldmoching - Hasenbergl
Laim


In [120]:
print(munich_venues.shape)
munich_venues.head()

(1834, 7)


Unnamed: 0,district,district_lat,district_long,venue,venue_lat,venue_long,venue_category
0,Altstadt - Lehel,48.137828,11.574582,Marienplatz,48.137125,11.575483,Plaza
1,Altstadt - Lehel,48.137828,11.574582,Fischbrunnen,48.137211,11.576047,Fountain
2,Altstadt - Lehel,48.137828,11.574582,Alois Dallmayr,48.138469,11.577372,Gourmet Shop
3,Altstadt - Lehel,48.137828,11.574582,Kustermann,48.136242,11.574897,Department Store
4,Altstadt - Lehel,48.137828,11.574582,St. Peter,48.13653,11.575615,Church


In [121]:
# Count the number of venues for each neighbourhood
munich_venues.groupby('district').count()

Unnamed: 0_level_0,district_lat,district_long,venue,venue_lat,venue_long,venue_category
district,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allach - Untermenzing,16,16,16,16,16,16
Altstadt - Lehel,100,100,100,100,100,100
Au - Haidhausen,100,100,100,100,100,100
Aubing - Lochhausen - Langwied,11,11,11,11,11,11
Berg am Laim,56,56,56,56,56,56
Bogenhausen,97,97,97,97,97,97
Feldmoching - Hasenbergl,8,8,8,8,8,8
Hadern,46,46,46,46,46,46
Laim,76,76,76,76,76,76
Ludwigsvorstadt - Isarvorstadt,100,100,100,100,100,100


In [122]:
print('There are {} uniques categories.'.format(len(munich_venues['venue_category'].unique())))

There are 227 uniques categories.


In [123]:
# save venues to .csv file
munich_venues.to_csv('data/munich_venues.csv')
print('file saved')

file saved


### Sources <a class="anchor" id="chapter4"></a>

[<sup>1</sup>] Age Quotient Dataset <span id="fn1"> https://www.opengov-muenchen.de/ar/dataset/indikatorenatlas-bevoelkerung-ueberalterungsquotient-83r65mct (last visited 2020.12.12) </span>

[<sup>2</sup>] Munich District Borders <span id="fn2"> http://insideairbnb.com/get-the-data.html (last visited 2020.12.13) </span>