# Are youths disproportionately affected by the Covid-19 crisis in the labour market?

***Youth unemployment*** has been a pressing problem in the European Union and the Covid-19 pandemic only brought the issue to the fore. The International Labour Organization claims that the pandemic has a "devastating and disproportionate" impact on youth unemployment. 

Many worry about a "lockdown generation" or "lost generation", as the current crisis might blight their long-term employment and earnings prospect. The Centre for Economic Policy Research found that one month of unemployment at age 18-20 is associated with a lifetime income loss of 2% as unemployed youths miss out on gaining skills and experience necessary to climb the career ladder.

In this project, I ***investigate if youths are disproportionately affected by the Covid-19 pandemic in the job market.*** This project uses unemployment data by sex and age for the European Union (obtained from the EU Open Data Portal). I also used an ISO dataset to map the ISO country codes to the country names. I will compare the average youth unemployment rate before and after March 2020. I then use ***difference-in-differences*** to isolate the differential impact of the pandemic on youth unemployment in Europe. This approach is sound insofar as there are no other factors that might cause unemployment rate to change at differing rates for youths and adults.

If you find this notebook helpful, please give it an upvote! Thanks :)


In [None]:
# This R environment comes with many helpful analytics packages installed
# It is defined by the kaggle/rstats Docker image: https://github.com/kaggle/docker-rstats
# For example, here's a helpful package to load

library(tidyverse) # metapackage of all tidyverse packages
library(dplyr)
library(ggplot2)
library(ggrepel)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

list.files(path = "../input")

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# 1) Import data

In [None]:
df <- read.table(file='/kaggle/input/unemployment-in-european-union/une_rt_m.tsv', sep = '\t', header = TRUE)

In [None]:
head(df)
# The dataset needs to be cleaned. The column headers show monthly unemployment rate dating all the way back to 1983.
# Each row comprises information about age groups (total, <25, 25-74), data units (percentage of active population 
# and thousands unemployed) as well as the sex groups (male, female, both)

# 2) Data cleaning

In [None]:
# We therefore need to separate the first column into multiple columns
df_separated <- separate(df, s_adj.age.unit.sex.geo.time, c("seasonal_adjustment", "age_group", "data_unit", "sex","country"), 
                         sep = ",", remove=TRUE)
# Convert new variables into categoricals
df_separated <- mutate_at(df_separated, vars(age_group, data_unit, sex, country), as.factor)
head(df_separated)
# Note that the countries are referred to by their 2 letter ISO codes

In [None]:
# We import the ISO dataset that maps the codes to their country names
country_codes <- read.delim(file='/kaggle/input/iso-country-codes/iso_country_codes.csv',sep=',')
country_codes <- country_codes %>% rename(country=Code.Value)
head(country_codes)

In [None]:
df_country_corrected <- df_separated%>%left_join(country_codes)
df_country_corrected$country <- NULL
df_country_corrected <- df_country_corrected %>% rename(country=Definition)

In [None]:
# We are only interested in percentage unemployment, not absolute number of people unemployed
# We focus on unemployment numbers from the start of 2019
# We focus on seasonally-adjusted unemployment figures

df_filtered <- df_country_corrected %>% filter(data_unit=="PC_ACT", is.na(country)==FALSE, seasonal_adjustment=="SA") %>% select(age_group:X2019M01,country)

In [None]:
# Pivot the months columns into rows
df_pivoted <- df_filtered %>% pivot_longer(cols='X2020M07':'X2019M01', names_to='month', values_to = 'unemployment_rate')
df_pivoted$data_unit <- NULL

# Convert unemployment rate to a decimal
df_pivoted$unemployment_rate <- df_pivoted$unemployment_rate %>% trimws(which = c("right")) %>% as.numeric

# Tidy up the month string
df_pivoted$month <- gsub("X", "", df_pivoted$month)
df_pivoted$month <- gsub("M", "/", df_pivoted$month)
head(df_pivoted)

# 3) Exploratory data analysis

In [None]:
# We first compare how total unemployment and youth unemployment rate varies across countries
# For most countries, there is a sharp rise in unemployment in early 2020
# It seems that youth unemployment is already high before the pandemic, and increased substantially in early 2020
country_list <- c("Spain","Italy","Sweden","Portugal","Norway","Netherlands")
df_unemp <- df_pivoted %>% filter(sex=='T', country %in% country_list, !age_group=="Y25-74")

ggplot(df_unemp, aes(x=month, y=unemployment_rate, color=country))+ geom_line(aes(group=country)) + 
    theme(axis.text.x = element_text(angle = 90), panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(),axis.line = element_line(colour = "black")) +
    facet_wrap(~age_group, labeller=labeller(age_group = c("TOTAL"="Total Unemployment", "Y_LT25"="Youth Unemployment")))

In [None]:
# We now compare how average unemployment varies between youths and adults (aggregated across all countries)
# This is consistent with our earlier graph. Youth unemployment is higher relative to adult unemployment
# All unemployment rates increased in early 2020

df_aggregate = df_pivoted %>% filter(sex=="T") %>% group_by(age_group, month) %>% summarize(unemployment_rate=mean(unemployment_rate,na.rm=TRUE))

ggplot(df_aggregate, aes(x=month, y=unemployment_rate, color=age_group))+geom_line(aes(group=age_group))+
    ggtitle("Comparison of unemployment rates for individuals below and above 25 years old")+ 
    theme(axis.text.x = element_text(angle = 90), panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(),axis.line = element_line(colour = "black")) +
    scale_color_hue(labels = c("Total Unemployment", "<25 years old", ">=25 years old"))

In [None]:
# As an aside, we explore if there are any gender differences in unemployment rate
# There does not seem to be any visually apparent difference in total unemployment rate between males and females
# Male youths are slightly more likely to be unemployed vis-a-vis female youths but the gap seems to have 
# narrowed after the onset of the pandemic

df_gender = df_pivoted %>% filter(age_group!="Y25-74", sex!="T") %>% group_by(age_group, sex, month) %>% 
    summarize(unemployment_rate=mean(unemployment_rate,na.rm=TRUE))

ggplot(df_gender, aes(x=month, y=unemployment_rate, color=sex))+ geom_line(aes(group=sex))+
    theme(axis.text.x = element_text(angle = 90), panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
    panel.background = element_blank(),axis.line = element_line(colour = "black")) +
    facet_wrap(~age_group, labeller=labeller(age_group = c("TOTAL"="Total Unemployment", "Y_LT25"="Youth Unemployment")))

# 4) Hypothesis testing: Are youths affected differently by the coronavirus pandemic?

The dataset comprises 2 age groups:
* Y_LT25 for individuals below 25 years old (I will broadly refer to this group as 'youths')
* Y25-74 for individuals aged 25-74 (I will broadly refer to this group as 'adults')

To test if the pandemic has a differential impact on youths vis-a-vis adults, I will conduct a two-tailed hypothesis test. 

unemployment_rate = beta0 + beta1 post-mar-2020 + beta2 youths + beta3 post-mar-2020*youths


In [None]:
df_hypo_test <- df_pivoted %>% filter(age_group %in% c("Y_LT25","Y25-74"), sex=="T", month %in% c("2019/12","2020/01","2020/02","2020/04"
                                                                                                  ,"2020/05", "2020/06"))
df_hypo_test$month <- ifelse(df_hypo_test$month %in% c("2019/11","2019/12","2020/01"), "pre-mar-2020", "post-mar-2020")

df_hypo_test$sex = NULL
df_hypo_test <- df_hypo_test %>% group_by(country,age_group,month) %>% summarize(mean_unemp = mean(unemployment_rate, na.rm=TRUE))

# Note that the latest unemployment data is unavailable for Germany, Italy and Romania, hence we will exclude these countries
# Also note that the dataset includes a few non-European countries (Japan, Turkey, United States)
# We will filter out all these countries
df_hypo_test %>% filter(is.na(mean_unemp))

In [None]:
exclude_list = c("Japan","Turkey","United States","Germany","Italy","Romania")
df_hypo_test <- df_hypo_test %>% filter(!country %in% exclude_list)
head(df_hypo_test)

In [None]:
# Convert variables to binary (1,0) and then perform DID regression
df_hypo_test$youths <- ifelse(df_hypo_test$age_group=="Y_LT25",1,0)
df_hypo_test$post <- ifelse(df_hypo_test$month=="post-mar-2020",1,0)
df_hypo_test$difference_in_difference <- df_hypo_test$youths * df_hypo_test$post
didreg = lm(formula = mean_unemp ~ youths + post + difference_in_difference, data = df_hypo_test)
summary(didreg)

# 5) Discuss results

From the results of the above regression, we observe that the ***coefficient on *youths* is statistically significant and positive at 0.1% level of confidence***. This is congruent with our existing understanding that youths in the European Union face significantly higher unemployment rates. 

***While the DiD-coefficient is positive, it is not statistically significant***. This implies that ***there is no evidence to suggest that youths in Europe are disproportionately affected by the ongoing global pandemic***, at least based on data from the select group of countries and time-period. It might be useful to continue monitoring the numbers for the months to come to arrive at a more conclusive result.

Regardless, this project reinforces the severity of youth unemployment in Europe. The EU announced the Youth Employment Support package in July 2020, which will comprise of improved vocational education, training, apprenticeships and guidance. This package is a step in the right direction, and will hopefully mitigate pressures on youth unemployment.

# Acknowledgements
https://www.europarl.europa.eu/news/en/headlines/society/20200709STO83004/covid-19-how-the-eu-fights-youth-unemployment

https://www.reuters.com/article/us-health-coronavirus-unemployment-youth-idUSKBN24A0LN

