# Applied Data Science Capstone (IBM) - Week4

#### Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods (Week 1) - Part 2 (Data)

<div style="text-align: right"> by Tim Kießling </div> 

For this project, two data sources are used. The first data source[<sup>1</sup>](#fn1) is provided by the city Munich and provides data about the citizens age of each of Munich districts. The names of Munichs districts will be taken from said dataset and used with the geocoder library to get their respective geo coordinates. Afterwards, the geo coordinates will be used with the Foursquare API in order to get the closest 100 venues in a radius of 750m of each district.

The dataset provided by the city Munich will be discussed in more detail in the following.

## Data Description

This dataset[<sup>1</sup>](#fn1) includes the following data about Munich from the year 2000 to 2017: 
- all 25 disctrict names 
- "aging quotient" in percent for each district

The "aging quotient" (aq) is calculated by the number of citizins older than 65 divided by the number of citizins between the age of 0 to 15 multiplied by 100.

$aq = \frac{citizens \, (age\, \ge \, 65)}{citizens \, (age \, is \, [0, 15])} \times 100$

Interpretation of the aq: 
- aq > 100: There are less citizens of age $[0, 15]$ than there are citizens of age $\ge 65$. 
- aq = 100: The number of citizins of age $[0, 15]$ and of age $\ge 65$ is equal.
- aq < 100: There are more citizens of age $[0, 15]$ than there are citizens of age $\ge 65$.

[<sup>1</sup>] <span id="fn1"> https://www.opengov-muenchen.de/ar/dataset/indikatorenatlas-bevoelkerung-ueberalterungsquotient-83r65mct (last visited 2020.12.12) </span>

## Data Display

Load data and display first five rows

In [3]:
import pandas as pd

path = "data/"
fn = "indikatorenatlas1812bevoelkerungueberalterungsquotient.csv"

df = pd.read_csv(path + fn)
df.head(5)

Unnamed: 0,Indikator,Ausprägung,Jahr,Räumliche Gliederung,Indikatorwert,Basiswert 1,Basiswert 2,Basiswert 3,Basiswert 4,Basiswert 5,Name Basiswert 1,Name Basiswert 2,Name Basiswert 3,Name Basiswert 4,Name Basiswert 5
0,Überalterungsquotient,Ausländer_innen,2017,Stadt München,130.7,44352,33935,,,,Anzahl Einwohner ab 65 (Ausländer),Anzahl Einwohner jünger 15 (Ausländer),,,
1,Überalterungsquotient,Ausländer_innen,2017,01 Altstadt - Lehel,187.2,646,345,,,,Anzahl Einwohner ab 65 (Ausländer),Anzahl Einwohner jünger 15 (Ausländer),,,
2,Überalterungsquotient,Ausländer_innen,2017,02 Ludwigsvorstadt - Isarvorstadt,183.3,1593,869,,,,Anzahl Einwohner ab 65 (Ausländer),Anzahl Einwohner jünger 15 (Ausländer),,,
3,Überalterungsquotient,Ausländer_innen,2017,03 Maxvorstadt,202.4,1178,582,,,,Anzahl Einwohner ab 65 (Ausländer),Anzahl Einwohner jünger 15 (Ausländer),,,
4,Überalterungsquotient,Ausländer_innen,2017,04 Schwabing - West,175.2,1689,964,,,,Anzahl Einwohner ab 65 (Ausländer),Anzahl Einwohner jünger 15 (Ausländer),,,


## Basic Data Cleanup

Select only the import columns

In [11]:
df_clean_ger = df[["Jahr", "Räumliche Gliederung", "Indikatorwert"]]
df_clean_ger.head(5)

Unnamed: 0,Jahr,Räumliche Gliederung,Indikatorwert
0,2017,Stadt München,130.7
1,2017,01 Altstadt - Lehel,187.2
2,2017,02 Ludwigsvorstadt - Isarvorstadt,183.3
3,2017,03 Maxvorstadt,202.4
4,2017,04 Schwabing - West,175.2


Rename the columns to English

In [24]:
df_clean_eng = df_clean_ger.rename(columns={"Jahr": "year", 
                                            "Räumliche Gliederung": "district",
                                            "Indikatorwert": "aq"})
df_clean_eng.head(5)

Unnamed: 0,year,district,aq
0,2017,Stadt München,130.7
1,2017,01 Altstadt - Lehel,187.2
2,2017,02 Ludwigsvorstadt - Isarvorstadt,183.3
3,2017,03 Maxvorstadt,202.4
4,2017,04 Schwabing - West,175.2


Some stats of the dataframe

In [44]:
import pprint

pprint.pp(df_clean_eng.shape)
pprint.pp(df_clean_eng.dtypes)

(1404, 3)
year          int64
district     object
aq          float64
dtype: object


Convert columns "disctrict" to string

In [59]:
df_clean = df_clean_eng.astype({"district": "string"})
df_clean = df_clean_eng.astype({"year": "int16"})
pprint.pp(df_clean.dtypes)

year          int16
district     object
aq          float64
dtype: object


Remove enumeration of districts

In [62]:
df_clean.district = df_clean.district.str.replace('\d+', '') #remove enumeration of districts
df_clean.district = df_clean.district.str.lstrip() #strip left whitespaces
df_clean.head()

Unnamed: 0,year,district,aq
0,2017,Stadt München,130.7
1,2017,Altstadt - Lehel,187.2
2,2017,Ludwigsvorstadt - Isarvorstadt,183.3
3,2017,Maxvorstadt,202.4
4,2017,Schwabing - West,175.2


Save dataframe to new file for next assignment

In [66]:
df_clean.to_csv(path + "aq_munich.csv")
print("file saved")

file saved


<div style="text-align: right"> last edited on 2020.12.12 </div> 