# Week 5 Assignment

## Prelude

In this assignment, we use a dataset on labour force participation in Germany. The data
has multiple columns and comes as a DataFrame, so we will work with that: slicing and
manipulating it, adding columns, etc.

We use these data to plot line and bar graphs and calculate statistics familiar from
previous weeks. 

In [73]:
%pip install "pandas" "plotly" "nbformat>=4.2.0"

Note: you may need to restart the kernel to use updated packages.


In [74]:
import pandas as pd

pd.options.plotting.backend = "plotly"

This cell is just to get the data from the web to the correct location.

You must execute it, but please do not try to understand what it is doing!

In [75]:
try:
    # necessary within jupyteach
    import pyodide_http

    pyodide_http.patch_all()
except:
    pass

base_url = (
    "https://raw.githubusercontent.com/OpenSourceEconomics/ada_course_materials/main/"
)


def get_employment_data():
    url = base_url + "estat_lfsi_emp_a_cleaned.csv"
    return pd.read_csv(url, header=0, index_col=0)


def get_empshare_data():
    url = base_url + "empshare_0919.csv"
    return pd.read_csv(url, header=0, index_col=0)


def get_active_data():
    url = base_url + "estat_lfsi_act_a_cleaned.csv"
    return pd.read_csv(url, header=0, index_col=0)

## Exercise 1

First, we need to load the data.

In [76]:
employment_df = get_employment_data()

**1.1** Employment Dataset

`employment_df` is a dataframe that contains information on working-age (15-64) people
in Germany by gender between 2009--2023. All values are in thousands. Here is the
variable description:

- `F Total`: total number of working-age women
- `F Emp`: number of employed working-age women,
- `M Total`: total number of working-age men,
- `M Emp`: number of employed working-age men


First, look at the data and define a variable `f_emp_2021` that states how many women
were working in 2021. 

:::{note}
Report this number as an integer with units being people, not thousands of
people.
:::




In [77]:
employment_df["F Emp"][2021]
f_emp_2021 = employment_df["F Emp"][2021] * 10000

f_emp_2021

print(f"Anzahl der Frauen die im Jahr 2021 gearbeitet haben: {f_emp_2021:,.0f}")


Anzahl der Frauen die im Jahr 2021 gearbeitet haben: 187,970,000


**1.2** Are employment and population stocks or flows? Why? 

If they are stocks, what would be the corresponding flows? If they are flows, what would
be the corresponding stocks? 

Note we are looking for the mechanisms/separate names here it they exist, not simply the
definitions. E.g., if we were asking about prices, saying "the price level is a stock
variable and its change between two points in time is a flow variable" would not give
the full amount of points. We would be looking for the term "inflation."


`employment`und `population`sind **Stocks**, da sie die Anzahl der Personen zu einem **bestimmten Zeitpunkt** darstellen (wie oben im Beispiel aus einem _bestimmten_ Jahr _2021_). Die Werte messen keine Veränderung, sondern den aktuellen Zustand. 

Ein **Flow** wäre zum Beispiel **Beschäftigungswachstum** (`job_growth`) oder **Zuwachs der Bevölkerung** (`population_growth`) über einen bestimmten Zeitraum. Diese **Flows** repräsentieren die Veränderungen der Bestandsgrößen im Laufe der Zeit. 

Ähnlich wie bei Preisniveau und Inflationsrate beschreibt das Preisniveau (Bestand) den Stand der Preise zu einem bestimmten Zeitpunkt, während die Inflationsrate (Fluss) die prozentuale Veränderung des Preisniveaus über die Zeit darstellt. Hier ist die **Inflationsrate** der spezifische Mechanismus, der die Änderung im Preisniveau angibt. 

Auf ähnliche Weise ist die **Beschäftigungswachstumsrate** der Mechanismus für die Veränderung der Bestandsgröße Beschäftigung, und die **Bevölkerungswachstumsrate** beschreibt die Veränderung der Bestandsgröße Bevölkerung.





**1.3** What to plot?

Would you want to plot the entire `employment_df`, i.e., call
`employment_df.plot.xxx()`? Why or why not?

Es wäre am besten **nicht** das gesamte `employment_df` zu plotten, da der Datensatz auch die Jahre in der COVID-19 Zeit enthält, man könnte die Daten erst auf die Jahre 2009 bis 2019 beschränken um keine Verzerrungen der Daten zu verursachen, so könnte man dann einen stabilen Zeitraum erhalten. 

**1.4** Slicing a DataFrame

Next, we want to slice our dataframe, since we want to exclude the post-Covid years from our analysis on employment. Create a DataFrame `emp_df_2009_2019` that contains only years 2009 to 2019.

:::{note}
We have not covered slicing of indices in the videos yet. However, it is a
straightforward extension of what we have seen: With `.loc[start:end]` you will get all
labels from `start` to `end` inclusive.
:::




In [78]:
emp_df_2009_2019 = employment_df.loc[2009:2019]
print(emp_df_2009_2019)

      F Total    F Emp  M Total    M Emp
2009  26715.4  17178.0  27059.1  20132.0
2010  26131.5  17090.0  26235.1  19755.0
2011  26119.6  17474.0  26199.7  20069.0
2012  26150.3  17573.0  26321.2  20241.0
2013  26163.0  17817.0  26413.5  20312.0
2014  26193.9  17969.0  26524.7  20424.0
2015  26268.1  18125.0  26700.9  20533.0
2016  26525.0  18541.0  27250.6  21092.0
2017  26456.1  18678.0  27335.5  21267.0
2018  26406.5  18775.0  27126.0  21321.0
2019  26466.6  19003.0  27099.5  21517.0


**1.5** Descriptive Statistics from DataFrame

Create variables `f_emp_mean` and `m_emp_mean` that tells the mean of females and males employed between 2009 and 2019 in thousands.




In [79]:
f_emp_mean = emp_df_2009_2019["F Emp"].mean()
m_emp_mean = emp_df_2009_2019["M Emp"].mean()

print(f"Mittelwert der beschäftigen Frauen (2009 bis 2019): {f_emp_mean:,.0f} Tausend")
print(f"Mittelwert der beschäftigten Männer (2009 bis 2019): {m_emp_mean:,.0f} Tausend")


Mittelwert der beschäftigen Frauen (2009 bis 2019): 18,020 Tausend
Mittelwert der beschäftigten Männer (2009 bis 2019): 20,606 Tausend


**1.6** Creating Series from DataFrames

To compare the labour force participation of men and women, we want to know the share of
working-age women and men who are employed. To do this, create two pandas Series
`f_empshare_2009_2019` and `m_empshare_2009_2019` that have the corresponding shares to from the
`emp_df_2009_2019` for each year.




In [80]:
f_empshare_2009_2019 = emp_df_2009_2019["F Emp"] / emp_df_2009_2019["F Total"] * 100 
m_empshare_2009_2019 = emp_df_2009_2019["M Emp"] / emp_df_2009_2019["M Total"] * 100

print(f"Anteil der beschäftigten Frauen 2009 bis 2019 {f_empshare_2009_2019} %")
print(f"Anteil der beschäftigten Männer 2009 bis 2019 {m_empshare_2009_2019} %")

Anteil der beschäftigten Frauen 2009 bis 2019 2009    64.299992
2010    65.399996
2011    66.899953
2012    67.199994
2013    68.099989
2014    68.599941
2015    69.000042
2016    69.900094
2017    70.599975
2018    71.099919
2019    71.799929
dtype: float64 %
Anteil der beschäftigten Männer 2009 bis 2019 2009    74.400109
2010    75.299885
2011    76.600114
2012    76.899989
2013    76.900070
2014    76.999928
2015    76.900030
2016    77.400131
2017    77.799930
2018    78.599867
2019    79.399989
dtype: float64 %


**1.7** Creating a new DataFrame

Finally, create a new DataFrame `empshare_2009_2019` that has both series from 1.6 as
columns of data. Give the columns names "F Emp Share" and "M Emp Share".




In [92]:
empshare_2009_2019 = pd.DataFrame({

    "F Emp Share": f_empshare_2009_2019,
    "M Emp Share": m_empshare_2009_2019

})

**1.8** Overall Change between 2009 and 2019.

Based on your new dataset `empshare_2009_2019`, calculate the overall change in the
employment rate for males between 2009 and 2019. Assign this overall change to variable
`m_change_2009_2019`. Alternatively, you can also use the series `m_empshare_2009_2019`
from 1.6.




In [94]:
m_change_2009_2019 = ((m_empshare_2009_2019.loc[2019] - m_empshare_2009_2019.loc[2009]) / m_empshare_2009_2019.loc[2009]) * 100
print(f"Prozentuale Veränderung der beschäftigten Männer (2009-2019): {m_change_2009_2019:,.5f} %")

Prozentuale Veränderung der beschäftigten Männer (2009-2019): 6.72026 %


**1.9** Yearly Change

Finally, calculate the average yearly change of the employment rate based on
`m_change_2009_2019`. Store this value to variable `m_change_2009_2019_yrly`.

:::{note}
It is fine to use the arithmetic mean, we are just looking for a rough indication and
compounding is not extremely important here. If you want to be correct and fancy, feel
free to use the geometric mean!
:::






In [93]:
m_change_2009_2019_yrly = m_change_2009_2019 / (2019 - 2009)
print(f"Durchschnittliche prozentuale Veränderung der beschäftigten Männer (2009-2019): {m_change_2009_2019_yrly:,.5f} % pro Jahr")


Durchschnittliche prozentuale Veränderung der beschäftigten Männer (2009-2019): 0.67203 % pro Jahr


## Exercise 2

**2.1** Which types of graphs?

Now, we are ready to plot employment rates by gender. Which types of graphs will you use
to depict 
1. the mean employment rates of men and women over the period 2009-2019?
2. the changes in the two employment rates over this time period?

Explain your choices.

1. Hier einigt sich am Besten ein **Balken- oder Säulendiagramm** `histogram`, Die durcschnittlichen Werte für Männer und Frauen können dann in zwei Balken dargestellt werden um alles auf einen Blick zu sehen. 
2. Hier einigt sich am Besten ein **Liniendiagramm** `linecharts`, so sieht man die Schwankungen für die Jahre und man kann viel einfacher die Entwicklung zwischen den beiden Geschlechtern interpretieren.  

We will work with a downloaded version of the employment shares so that your answer here
does not depend on whether you solved the previous exercise.

In [86]:
empshare_2009_2019_download = get_empshare_data()

**2.2** Plotting, Part 1

Based on your answer to 2.1, plot employment rates by gender, averaged over the period
2009-2019
- There should be one graph only.
- Give the graph a fitting title.
- Give labels to both axes.
- You do not need to worry about the ticks on either axis.
- Remove the legend if it does not contain any useful information. 





In [87]:

mean_female_emp = empshare_2009_2019_download["F Emp Share"].mean()
mean_male_emp = empshare_2009_2019_download["M Emp Share"].mean()

mean_emp_df = pd.DataFrame({
    "Geschlecht": ["Frauen", "Männer"],
    "Durchschnittliche Beschäftigungsrate": [mean_female_emp, mean_male_emp]
})

fig = mean_emp_df.plot.bar(
    x="Geschlecht",
    y="Durchschnittliche Beschäftigungsrate",
    title="Durchschnittliche Beschäftigungsrate von Frauen und Männern (2009 bis 2019)"
)

fig.update_layout(
    xaxis_title="Geschlecht",
    yaxis_title="Durchschnittliche Beschäftigungsrate (%)",
    showlegend=False
)

fig.show()


**2.3** Plotting, Part 2

Based on your answer to 2.1, plot changes in employment rates by gender over the period
2009-2019.
- There should be one graph only.
- Give the graph a fitting title.
- Give labels to both axes.
- You do not need to worry about the ticks on either axis.
- Remove the legend if it does not contain any useful information. 

Again, use `empshare_2009_2019_download` as the data source.




**2.4** Employment share trends

Based on the graph you created in 2.3, what are the overall trends of employment rate in
Germany between 2009-2019 for males and females? That is, has one been increasing and
the other decreasing, does one have a U-shaped or inverted U-shaped trend, etc.?

**2.5** Comparing changes

Based on the graph, which gender saw larger changes (in percentage points) in employment rate between 2010 and 2016. How much did the employment rate (roughly) change for that gender?

Female labour force participation

Finally, we have data on the share of women who participate in the labour force.
Participating or being "active" in the labour force means that an individual is either
employed or actively seeking work. Typically, the status of "actively seeking work" is
measured by being registered as unemployed. Conversely, students, stay-at-home parents,
early retirees, people disabled for work, etc., are considered to be outside of the
labour force, or inactive. 


The `df_active` data contains both employment rate and labour force participation rate
(active) for females between 2009 and 2023. 

In [None]:
df_active = get_active_data()

**2.6** Plot the contents of `df_active`.

Do so in a way that is suitable way for comparing changes over time.
- There should be one graph only.
- Give the graph a fitting title.
- Give labels to both axes.
- You do not need to worry about the ticks on either axis.
- Remove the legend if it does not contain any useful information. 




**2.7** Reading changes off the graph

Based on the figure, how much has the employment rate increased between 2009 and 2023
(you can hover over the graph to see exact values)? What about labour force participation
("active")?

**2.8** Decomposing changes

Based on the graph and your calculations of previous exercise, how much has the
unemployment rate changed for women? Do changes in labour force participation or in
unemployment rate explain more of the employment rate change (in terms of changes in
percentage points)?

## Exercise 3

In this exercise, we want to calculate the aggregate employment share and plot it along
with the employment shares by gender.

Again, we first download the employment shares by gender and assign it to a new
DataFrame. This is simply so we do not get into conflict with previous exercises.

In [88]:
empshare_with_total = get_empshare_data()

**3.1** Calculating the aggregate employment share

Create a series `total_empshare` containing the share of employed over the period
2009-2019, regardless of gender.




In [89]:
total_emp_share = empshare_with_total["F Emp Share"] + empshare_with_total["M Emp Share"]

**3.2** Including total employment in the DataFrame

Now assign the series `total_empshare` to a column in the DataFrame
`empshare_with_total` with the name `Total Emp Share`.

:::{note}
In order to do so, just use standard column indexing on the left-hand side of the
assignment operator, i.e., `df["new_column"] = some_series`.
:::




In [90]:
total_empshare = empshare_with_total["F Emp Share"] + empshare_with_total["M Emp Share"]
empshare_with_total["Total Emp Share"] = total_empshare


**3.3** Plot all employment shares

Plot the evolution of employment rates by gender and the total employment rate for the period 2009-2019
- There should be one graph only.
- Give the graph a fitting title.
- Give labels to both axes.
- You do not need to worry about the ticks on either axis.
- Remove the legend if it does not contain any useful information. 




In [91]:
fig = empshare_with_total.plot.line(
    title="Entwicklung der Beschäftigungsanteile nach Geschlecht und Gesamt (2009-2019)"
)

fig.update_layout(
    xaxis_title="Jahr",
    yaxis_title="Beschäftigungsanteil (%)",
    showlegend=True
)



fig.show()