# Table of Contents

# Introduction to Model/Approach

## Goal of this tutorial

The goal of this tutorial is to present the features of the European labor market in Python consistently and with commentary. From a programming side, the main objective was working with data. From an economic side, the main objective was implementing an economic model and using programming to demonstrate features otherwise more difficult to demonstrate.

This tutorial will focus mainly on the learnings from a programming perspective. However, this introduction will set up the background from an economic side to orientate the reader. There will be additional economic commentary throughout the sections.

## A simple model of the labor market

A simple model of the labor market that is used in this tutorial catagorizes three states a working-age person can be in:
1. Employed E
2. Unemployed U
3. Inactive or out of the labor force N

Using three states you can analysze labor market indicators:

* __Participation Rate__:  $$ \frac{E + U}{E + U + N}\label{eq1}\tag{1} $$

* **Employment Rate**: $$ \frac{E}{E + U + N}\label{eq2}\tag{2} $$

* **Unemployment Rate**: $$ \frac{U}{E + U}\label{eq3}\tag{3} $$

The flows of people from one state to another are called **transitions**. Transition rates are calculated from using data that shows the beginning and end value of people in states, and then is calculated from the change.

For example, UE transition rates represent the percentage of people unemployed at the beginning of the period that transition to being employed.

It can be used to approximate how likely it is for an unemployed person to find a job (Job finding probablity).

This will be elaborated upon further in the tutorial as it specific to certain sections.

# Data

## Eurostat Data

In order to compare the different dynamics of countries related to transition rates and unemployment rates, data needs to be collected systematically and in a reliable manner. All data used in this tutorial is extracted from Eurostat which is an adminitrative branch of the European Commission located in Luxembourg. Its responsability is to provide statistical information to the institutions of the European Union and to encourage the harmonisation of statistical methods in order to ease comparison between data. In this section, we will discuss how Eurostat gather data and the degree of relability of its operations. Eurostat publishes its statistical database online for free on its [website](https://ec.europa.eu/eurostat).

The data that will interest us in this tutorial are the one related to the **European Labor market**. The European Labor Force Survey is a survey conducted by Eurostat in order to find those data. The latter are obtained by interviewing a large sample of individuals directly. This data collection takes place over on a monthly, quarterly and annually basis. The European Labor Force Survey collects data by 4 manners:

- Personal visits
- Telephone interviews
- Web interviews
- Self-administered questionnaires (questionnaire that has been designed specifically to be completed by a respondent without intervention of the researchers)

The overall accuracy of these methods have been proved to be high. Retrospectively, the results have been found to lay in a 95% confidence interval. For more information related to how Eurostat collects its data, you can consult this [page](https://ec.europa.eu/eurostat/cros/content/data-collection_en).


## Data for this tutorial

In Eurostat, you are able to download a .tsv file. Tsv files are similar to csv but use tabs to separate data instead of commas like in csv files. Or, you can use the Pyrostat API for python. However, the documentation at the moment of creating this tutorial is not clear enough to present in a clear way. It is based upon a json and unicode REST api from the eurostat page, but is beyond the scope of this tutorial. For more information see [eurostat web services](https://ec.europa.eu/eurostat/web/json-and-unicode-web-services) and [pyrostat](https://github.com/eurostat/pyrostat).

The data used for this tutorial comes from the Eurostat website. In particular, the data set (sdg_08_30) with information [here](https://ec.europa.eu/eurostat/cache/metadata/en/sdg_08_30_esmsip2.htm).

Therefore, the rest of the tutorial will focus on data import, cleaning and analysis starting with the provided Excel files from this class.

## Data in Python: Pandas and Data Formats

This section will give a basic introduction into how data works in python and programming. Data in general can be stored in several different formats that organize it. The basic formats have been introduced in past tutorials (lists, dictionaries, tuples). More advanced ones are:
1. Unstructured data files of .txt.
2. Or structured data files:
    * CSV files such as excel that separate data cells with a comma and creates tables.
    * TSV files that separate data cells with tabs.

They can be imported into python and stored in a data frame using the python extension pandas. The advantage of data frames is the wide variety of operations you can perform on them in python since it is a python object ([source](https://towardsdatascience.com/a-quick-introduction-to-the-pandas-python-library-f1b678f34673)).

As a pandas dataframe is structed by rows and column, it is easier to select data compared to a list or a dictionary. You can also easily filter by column or row in order to derive conclusions or structure the data set for analysis. You can also join different pandas datasets as well as clean data easier than if you were working with another data type. These examples are elaborated upon below.

### Setting up your pandas enviorment

It is often easy to import pandas as pd such so it easier to call it later on.

`import pandas as pd`

Additonally, it is common to also import it with the package *numpy*

`import numpy as np`

Numpy is used for analysis and computing in python.

### Reading data into a dataframe

In order to convert structured files into a pandas data frame there are a variety of options that you can find here: [List of pandas functionalities](https://pandas.pydata.org/pandas-docs/stable/reference/index.html). Some examples are the ability to convert excel, json, html, csv, pickle, and sql. These are generally formatted as:

`read_excel
read_pickle
read_json
read_html
etc.
`



### Creating a dataframe manually

There is also the option with pandas to create a dataframe either manually or by using existing lists or dictionaries.

For example, dataframes can be created from dictionaries.

In [6]:
import pandas as pd
#here, creating a food dictionary setting categories
food_categories={"apples":"fruit",
                  "oranges":"fruit",
                  "cucumber":"vegetable",
                  "spinach":"vegetable",
                  "beef":"meat",
                  "pepper":"vegetable",
                  "banana":"fruit"
                    }
#created dataframe setting category as the index
food_cat_df=pd.DataFrame(food_categories, index=["category"])  
food_cat_df


Unnamed: 0,apples,oranges,cucumber,spinach,beef,pepper,banana
category,fruit,fruit,vegetable,vegetable,meat,vegetable,fruit


Some other useful functions for dataframe in pandas will be used below, but are:

* Transposing, or transforming columns into rows: df.T
* Sorting by axis or values: df.sort_index(axis=1) or df.sort_values(by='category')
* Selecting a single column: df['Fruit']
* Statistic operations such as df.mean()
* Merging and appending: pd.merge(), df.append(, pd.concat()

The pandas documentation provides a very good description of what you can do with dataframes and if there is something that interests you beyond the application of this tutorial, it may very likely be found at [Pandas Documentation](https://pandas.pydata.org/pandas-docs/stable/getting_started/overview.html).

# Set-Up

## Import all packages

Here, we will import all the packages that are used in the program. As packages have not been introduced in past tutorials, we will explain them in Python briefly.

First, if running a Python program on your local hardrive for example in spydir, it will be necessary to install a package on your computer or a virtual enviorment (venv or virtualenv). Virtual enviorments allow you to separate packages and versions for individual projects. For more information see [section for creating virtual enviorments](https://packaging.python.org/tutorials/installing-packages/). Otherwise, Python will not be able to understand to what you are refering as it will not be defined. It is suggested you use the functionality [pip](https://pip.pypa.io/en/stable/user_guide/) which allows you to install from the [python packaging index](https://packaging.python.org/tutorials/installing-packages/).

Once a packages has been installed, you may then import the function into a piece of code.

However, for the purpose of this tutorial, jupyter does not require you to pip install the following packages. For convience and clarity, they will be imported in this section, but we will reference the specific libraries as a note throughout the tutorial.

In [7]:
import pandas as pd
import numpy as np 
import matplotlib
import matplotlib.pyplot as plt 
from pandas import ExcelWriter
from pandas import ExcelFile
import math
import statsmodels as sm

## Define the Folder

Our data is in three different local folders:

* `Total/` for all the working-age population.
* `Female/` for only the working-age females.
- `Male/` for only the working-age males.

In this case it makes sense to specify which folder to use in advance. If we want to change from `Total/` to `Female/` later on, we can easily change the variable sex instead of having to search for the name of the folder in the whole code. The same holds true for the variable *myfolder*, if the location of the data changes later on, we can apply that change right here instead of searching for all the times we used that filepath.

In general, the easiest way to find locally saved files with python on a mac is to use a relative filepath. This means Python starts searching for the file from the location of the Python code we are currently working with. Here, the folder *EuropesLM_Data/* is located within the same folder as the notebook. From there, Python proceeds by entering the *Male/* folder and then accesses whatever we specify later on. To execute everything the way we do in this tutorial, you should place the jupyter notebook in the same folder as the file *country_codes.xlsx* and the folder *EuropesLM_Data*.

If on Windows, the easiest way is to use a full filepath, starting at the `C:/` directory and specifiing all steps to go from there.

In [8]:
sex = 'Male'
myfolder = "EuropesLM_Data/" + sex + "/"

# Unemployment Rates

Almost all countries in the world have developed a system and the infrastructure to record and approximate the unemployment rate of the labor force as accurately as possible. In order to understand its significance we shall agree on a clear definition in more detail than what was provided earlier. There are numerous definitions since the methodology for calculating the unemployment rate often varies among countries. Different definitions of employment and unemployment, as well as different data sources are used but the consensus is the following: unemployed people are those who are willing and available to work, and who have actively sought work within the past four weeks. Students, prisoners, or for example handicapped people do not match the definition and are not considered as unemployed but out of the labor force.

To calculate the unemployment rate, the number of unemployed people is divided by the number of people in the labor force, which consists of all employed and unemployed people.

***

$\mathbf{\text{Unemployment rate}}$<br>
***
$$\textrm{Unemployment rate} = \frac{\textrm{Number of unenployed people}}{\textrm{Total Labour Force}}$$

***

Before we plot the unemployment rate for different countries, we want to give a little introduction to plotting in general.

## Matplotlib

The most common library for plotting in python is [`Matplotlib`](https://matplotlib.org/stable/index.html). We are going to introduce its most important functionalities with some basic examples in the following section. In the following parts, we will further specify the functionalities when we use them.

```python
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import math
import pandas as pd

```

To show the basic functionality of Matplotlib, here we create a sample plot

* The List items in the argument of `plt.plot` contain the x and y coordinates, to refer to it later, we can add the argument label
* If we want to see the plot in the Jupyter Notebook, we have to call the `plt.show` function

```python
plt.plot([1, 5, 3, 4, 7, 9],[1, 3, 3, 5, 7, 9], label = "Random Graph")
plt.show()
```
