<a href="https://www.kaggle.com/code/stefanraychev/analysis-of-lifelong-learning?scriptVersionId=98365831" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

Lifelong learning is an important element of the country's economic and social system. On the one hand, it increases the quality of the workforce by maintaining up-to-date knowledge and skills. On the other hand, knowledge is the main tool against the defects of the democratic and market system - corruption, populism, labor and market exploitation, market and non-market imperfections, etc. For this reason, the analysis and promotion of lifelong learning is of vital importance for any government policy.

In this analysis, we take a brief look at the process of lifelong learning in Europe. The analysis uses Eurostat's database and Eurostat's definition of lifelong learning: "Lifelong learning covers all lifelong learning activities aimed at improving knowledge, skills and competences in personal, civic, social or employment-related perspectives. The intention or goal of learning is the critical point that distinguishes these activities from non-learning activities, such as cultural or sports activities.

Participation in education and training is a measure of lifelong learning. The participation rate in education and training covers participation in formal and non-formal education and training. In this section the reference period for the participation in education and training is the four weeks prior to the interview. Participation rates in education and training for various age groups and by different breakdowns are presented.

The data shown are calculated as annual averages of quarterly EU Labour Force Survey data (EU-LFS)."

For more information about the data and methodology: https://ec.europa.eu/eurostat/cache/metadata/en/trng_lfs_4w0_esms.htm

In this first stape we make a short glance using the exploratory data analysis on the Participation in education and training (covers participation in formal and non-formal education and training.) over a 18 year period from 2004 to 2021 in sex, education level and age. The data includes EU member states and other Europe countries. 

Lets first install Eurostat Python Package

In [None]:
!pip install 'eurostat'



The next step is the import of relevant libraries and tools to be used for analysis


In [None]:
import eurostat
import warnings
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import pytz
import seaborn as sns
import scipy.stats as stats
import statsmodels as sm
import plotly.express as px
from pandas import DataFrame
import plotly.express as px
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
warnings.filterwarnings('ignore')

Lets import the data and take a look. 

In [None]:
df_lll = eurostat.get_data_df("trng_lfs_02")
df_lll.head(5)

Before next step, whenever we use imported data from Eurostat, it is good to rename the column "geo\time". This is necessary to avoid difficulty in the next use of the column names. I chooce to rename it with simple "geo".

In [None]:
df_lll.rename({"geo\\time":"geo"}, inplace=True, axis=1)
df_lll.head(5)

After importing the data, it is important to understand the parameters and dimensions of the data, including data size, columns, rows, etc.

In [None]:
df_lll.shape
df_lll.columns
df_lll.unit.unique()
df_lll.age.unique()
df_lll.sex.unique()
df_lll.isced11.unique()
df_lll.geo.unique()

If we take a quick glance at the imported dataset we see that the size is 6563 rows/ 23 columns.
* Тhe units of measure is only Percentages (PC). 
* The age groups are 10 - different full age gropus and one for young (18-24).
* The sex can be Total , Male and Female. 
* Education levels are 4: ED0-2(Pre-primary, primary and lower secondary education (levels 0-2)), ED3-8 (Upper secondary, post-secondary non-tertiary and tertiary education (levels 3-8)), ED3_4 (Upper secondary and post-secondary non-tertiary education (levels 3 and 4)), ED5-8 (Tertiary education (levels 5-8)). Alse have Total and NRP (No response)


Now, for ease, let's rename the country codes into names, the education column (isced11) and drop some of the columns

In [None]:
df_lll["country"] = df_lll["geo"].replace(pytz.country_names)
df_lll["education"] = df_lll["isced11"]
df_lll.drop(["geo", "unit" , "isced11"], axis=1, inplace=True)
df_lll.head()

Almost nice view , now lets rearenge the columns for more logic

In [None]:
df_lll = df_lll[list(df_lll.columns[:2]) + list(df_lll.columns[:1:-1])]
df_lll.head()

A similar method of data manipulation will be used in the next steps of graphical visualization. First, the unnecessary data will be removed or the necessary data will be selected, and second, a table will be rearranged for a type suitable for visualization.

Lifelong learning in EU28 by education and sex

remove ED3-8, NRP from education and select only 18-74 years in age

In [None]:
df_EU28=df_lll[(df_lll['age']=="Y18-74") & (df_lll['country']=="EU28") & (df_lll['education'] != "NRP") & (df_lll['education'] != "ED3-8")]
df_EU28g=df_EU28.melt(id_vars=["country", "age", "sex" ,"education"], 
        var_name="Date", 
        value_name="Value")
df_EU28g.head()

In [None]:
plt.figure(figsize=(10,4))
ax = sns.boxplot(x='education', y='Value', data=df_EU28g, hue="sex")
plt.title("EU28 - participation rate by sex and education level", loc="left")
plt.show()

From the EU28 graph above, we see the distribution of participation rate by gender and level of education. The highest level of participation is the level of education ED5-8 with average 20%, the level ED3-4 is with average 16% and the lowest level of education ED0-2 is with average 10% participation rate.

With the highest number of participation rate by sex are female for level of education ED5-8 and ED3-4,but for education level ED0-2 the male sex are with more  participation rate.

Looking the total education level we see that the average participation rate for EU 28 is around 14% and the female are with highest rate then male.

Now lets check the descriptive statistics to confirm the conclusions for EU28

First will rearrange the data by sex

In [None]:
df_EU28_sex = df_EU28.set_index(['sex', 'age', "education" , "country"]).rename_axis(['Year'], axis=1).stack().unstack("sex").reset_index()
df_EU28_sex.head()

In [None]:
df_EU28_sex[['F', 'M', 'T']].describe()

Second will rearrange the data by education level

In [None]:
df_EU28_edu = df_EU28.set_index(['sex', 'age', "education" , "country"]).rename_axis(['Year'], axis=1).stack().unstack("education").reset_index()
df_EU28_edu.head()

In [None]:
df_EU28_edu[["ED0-2","ED3_4","ED5-8", "TOTAL"]].describe()

To summarize, the mean of participation rate in EU28 by sex is: female 15.43%, male 13.60%, total 14.51%; by education level is: ED0-2 9.4%, ED3_4 15.2%, ED5-8 19.12%, TOTAL 14.34%

Now lets take a view on participation level for all countries by sex

In [None]:
df_all=df_lll[(df_lll['age']=="Y18-74") & (df_lll['education']=="TOTAL")]
df_allg=df_all.melt(id_vars=["country", "age", "sex" ,"education"], 
        var_name="Date", 
        value_name="Value")
df_allg.head()

In [None]:
g_lll_all = sns.catplot(x="sex", y="Value", col="country", col_wrap=8, data=df_allg, kind="box", height=2.4, aspect=0.9);

We see that with highest participation rate by countries are Denmark, Switz, Finland, Iceland, Sweden. With lowest rate are Bulgaria, Croatia, Montenegro, Romania, Serbia Slovakia, Turkey.