# **DATA PROJECT 2024: DO INTERENATIONAL WORKERS IN DENMARK REMEDY LABOR SHORTAGE?**

By Emma Knippel, Anna Abildsjov and Oscar Nyholm

# Table of contents
* [Setup](#toc0_)   

* [Read and clean data](#toc1_) 

* [Question 2: Market Clearing Error](#toc2_)    

* [Question 3: Market Clearing Price](#toc3_)    

* [Question 4: A as Price Setter](#toc4_)   

* [Question 5: A as Market Maker](#toc5_) 

* [Question 6: Utalitarian Social Planner](#toc6_) 

* [Question 7: Random Draw](#toc7_) 

* [Question 8: Market Equilibrium](#toc8_) 



## <a id='toc0_'></a>[Setup](#toc0_)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from matplotlib_venn import venn2
import json

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2


In [2]:
# installing API reader, that will allow to load data from DST.
%pip install git+https://github.com/alemartinello/dstapi
%pip install pandas-datareader

import pandas_datareader # install with `pip install pandas-datareader`
from dstapi import DstApi # install with `pip install git+https://github.com/alemartinello/dstapi`

Collecting git+https://github.com/alemartinello/dstapi
  Cloning https://github.com/alemartinello/dstapi to /private/var/folders/24/czmv85dj1_3dcc2tc4x8kd0r0000gn/T/pip-req-build-eme0qo8p
  Running command git clone --quiet https://github.com/alemartinello/dstapi /private/var/folders/24/czmv85dj1_3dcc2tc4x8kd0r0000gn/T/pip-req-build-eme0qo8p
  Resolved https://github.com/alemartinello/dstapi to commit d9eeb5a82cbc70b7d63b2ff44d92632fd77123a4
  Preparing metadata (setup.py) ... [?25ldone
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## <a id='toc1_'></a>[Read and clean data](#toc1_)

In [3]:
# importing the actual data from DST
employees = DstApi('LBESK03')
lb_short_service = DstApi('KBS2')
lb_short_manu = DstApi('BARO3')
lb_short_cons = DstApi('KBYG33')
with open('International Labor.json', 'r') as f:
    int_data = json.load(f)
int_lb = pd.DataFrame(int_data)

In [4]:
tabsum = employees.tablesummary(language = 'en')
display(tabsum)
for variable in tabsum['variable name']:
    print(variable+':')
    display(employees.variable_levels(variable, language='en'))

Table LBESK03: Employees by industry (DB07 10- and 19-grouping) and time
Last update: 2024-03-22T08:00:00


Unnamed: 0,variable name,# values,First value,First value label,Last value,Last value label,Time variable
0,BRANCHEDB071038,32,TOT,"TOT Industry, total",X,X Activity not stated,False
1,Tid,193,2008M01,2008M01,2024M01,2024M01,True


BRANCHEDB071038:


Unnamed: 0,id,text
0,TOT,"TOT Industry, total"
1,1,"1 Agriculture, forestry and fishing"
2,A,"A Agriculture, forestry and fishing"
3,2,"2 Manufacturing, mining and quarrying, and uti..."
4,B,B Mining and quarrying
5,C,C Manufacturing
6,D,"D Electricity, gas, steam and air conditioning..."
7,E,"E Water supply, sewerage and waste management"
8,3,3 Construction
9,F,F Construction


Tid:


Unnamed: 0,id,text
0,2008M01,2008M01
1,2008M02,2008M02
2,2008M03,2008M03
3,2008M04,2008M04
4,2008M05,2008M05
...,...,...
188,2023M09,2023M09
189,2023M10,2023M10
190,2023M11,2023M11
191,2023M12,2023M12


In [5]:
tabsum2 = lb_short_service.tablesummary(language = 'en')
display(tabsum2)
for variable in tabsum2['variable name']:
    print(variable+':')
    display(lb_short_service.variable_levels(variable, language='en'))

Table KBS2: Production limitations in Services by industry (DB07), type and time
Last update: 2024-03-21T08:00:00


Unnamed: 0,variable name,# values,First value,First value label,Last value,Last value label,Time variable
0,BRANCHE07,19,000,SERVICES TOTAL,090,Other service activities (94-95),False
1,TYPE,6,INGEN,No limitations,ANDÅS,Other factors,False
2,Tid,155,2011M05,2011M05,2024M03,2024M03,True


BRANCHE07:


Unnamed: 0,id,text
0,0,SERVICES TOTAL
1,5,TRANSPORT (49-53)
2,10,Land transport and others (49)
3,15,TOURISME (55-56; 79)
4,20,Hotels and similar accommodation (55)
5,25,Restaurants (56)
6,30,Travel agent activities (79)
7,35,COMMUNICATION AND INFORMATION (58; 61-63)
8,40,Information technology service activities (62)
9,45,"FINANCE, INSURANCE AND REAL ESTATE (64-65; 68)"


TYPE:


Unnamed: 0,id,text
0,INGEN,No limitations
1,MEFT,Insufficient demand
2,MAAK,Shortage of labour force
3,MALOK,Shortage of space and/or equipment
4,FINBR,Financial constraints
5,ANDÅS,Other factors


Tid:


Unnamed: 0,id,text
0,2011M05,2011M05
1,2011M06,2011M06
2,2011M07,2011M07
3,2011M08,2011M08
4,2011M09,2011M09
...,...,...
150,2023M11,2023M11
151,2023M12,2023M12
152,2024M01,2024M01
153,2024M02,2024M02


In [6]:
tabsum3 = lb_short_manu.tablesummary(language = 'en')
display(tabsum3)
for variable in tabsum3['variable name']:
    print(variable+':')
    display(lb_short_manu.variable_levels(variable, language='en'))

Table BARO3: Production limitations in industry by industry (DB07), type and time
Last update: 2024-01-24T08:00:00


Unnamed: 0,variable name,# values,First value,First value label,Last value,Last value label,Time variable
0,BRANCHE07,20,BC,BC Mining and quarrying and manufacturing,S4,Non-durable consumer goods (MIG),False
1,TYPE,6,INGEN,No limitations,ANDÅS,Other factors,False
2,Tid,77,2005K1,2005Q1,2024K1,2024Q1,True


BRANCHE07:


Unnamed: 0,id,text
0,BC,BC Mining and quarrying and manufacturing
1,B,B Mining and quarrying
2,C,C Manufacturing
3,CA,"CA Manufacture of food products, beverages and..."
4,CB,CB Textiles and leather products
5,CC,CC Wood and paper products and printing
6,CD,CD Oil refinery etc.
7,CE,CE Manufacture of chemicals
8,CF,CF Pharmaceuticals
9,CG,"CG Manufacture of plastic, glass and concrete"


TYPE:


Unnamed: 0,id,text
0,INGEN,No limitations
1,AMA,Shortage of labour force
2,UKA,Shortage of material and/or equipment
3,UEF,Insufficient demand
4,FINBE,Financial constraints
5,ANDÅS,Other factors


Tid:


Unnamed: 0,id,text
0,2005K1,2005Q1
1,2005K2,2005Q2
2,2005K3,2005Q3
3,2005K4,2005Q4
4,2006K1,2006Q1
...,...,...
72,2023K1,2023Q1
73,2023K2,2023Q2
74,2023K3,2023Q3
75,2023K4,2023Q4


In [7]:
tabsum4 = lb_short_cons.tablesummary(language = 'en')
display(tabsum4)
for variable in tabsum4['variable name']:
    print(variable+':')
    display(lb_short_cons.variable_levels(variable, language='en'))

Table KBYG33: Production limitations in Construction by industry (DB07), type and time
Last update: 2024-03-21T08:00:00


Unnamed: 0,variable name,# values,First value,First value label,Last value,Last value label,Time variable
0,BRANCHE07,9,F,F Construction,43003,43003 Other specialized construction activitie...,False
1,TYPE,7,INGEN,No limitations,ANDÅS,Other factors,False
2,Tid,231,2005M01,2005M01,2024M03,2024M03,True


BRANCHE07:


Unnamed: 0,id,text
0,F,F Construction
1,41000,41000 Construction of buildings
2,42000,42000 Civil engineering
3,43201,43201 Electrical installation etc.
4,432200,"432200 Plumbing, heat and air-conditioning ins..."
5,43301,43301 Joinery installation etc.
6,43302,43302 Painting and Glazing etc.
7,439910,439910 Bricklayers
8,43003,43003 Other specialized construction activitie...


TYPE:


Unnamed: 0,id,text
0,INGEN,No limitations
1,MEFT,Insufficient demand
2,DVEJR,Bad weather
3,MAT,Shortage of material and/or equipment
4,AMA,Shortage of labour force
5,FB,Financial contraints
6,ANDÅS,Other factors


Tid:


Unnamed: 0,id,text
0,2005M01,2005M01
1,2005M02,2005M02
2,2005M03,2005M03
3,2005M04,2005M04
4,2005M05,2005M05
...,...,...
226,2023M11,2023M11
227,2023M12,2023M12
228,2024M01,2024M01
229,2024M02,2024M02


In [8]:
params = employees._define_base_params(language='en')

params = {'table': 'LBESK03',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'BRANCHEDB071038', 'values': ['*']},
  {'code': 'Tid', 'values': ['>2013M12<=2024M01']}]}

empl = employees.get_data(params=params)
empl.drop(['BRANCHEDB071038'], axis=1, inplace=True)
empl.rename(columns = {'INDHOLD':'Employees', 'TID':'Time'}, inplace=True)
empl.head(5)

Unnamed: 0,Time,Employees
0,2018M07,42896
1,2018M07,4700
2,2018M07,304551
3,2018M07,10324
4,2018M07,11666


In [9]:
params2 = lb_short_service._define_base_params(language='en')

params2 = {'table': 'KBS2',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'BRANCHE07', 'values': ['*']},
  {'code': 'TYPE', 'values': ['MAAK']},
  {'code': 'Tid', 'values': ['>2013M12<=2024M01']}]}

lab_short_service = lb_short_service.get_data(params=params2)
lab_short_service.sort_values(by = ['Tid', 'BRANCHE07'], inplace=True)
lab_short_service.head(5)

lab_short_service.drop(['TYPE'], axis = 1, inplace = True)
lab_short_service.rename(columns = {'BRANCHE07':'industry', 'TID':'time', 'INDHOLD':'labor_shortage'}, inplace=True)
lab_short_service.head(5)

KeyError: 'Tid'

In [None]:
lab_short_service.pivot(index='industry', columns='time', values='labor_shortage')

In [None]:
lab_short_service.drop(['Arts, sports and recreation activities (90-93)'], inplace=True)

In [None]:
drop_industry = [
    'Arts, sports and recreation activities (90-93)'
    'Financial and insurance activities (64-65)',
    'Hotels and similar accommodation (55)',
    'Information technology service activities (62)',
    'Other service activities (94-95)',
    'Real estate activities (68)',
    'Rental and leasing activities (77)',
    'Restaurants (56)',
    'Services to buildings, cleaning and landscape activities (81)',
    'Travel agent activities (79)'
]


In [None]:
I = lab_short_service.Industry.str.contains('Arts, sports and recreation activities (90-93)')
drop2 = lab_short_service.loc[lab_short_service.Industry == 'Financial and insurance activities (64-65)']
drop3 = lab_short_service.loc[lab_short_service.Industry == 'Hotels and similar accommodation (55)']
drop4 = lab_short_service.loc[lab_short_service.Industry == 'Information technology service activities (62)']
drop5 = lab_short_service.loc[lab_short_service.Industry == 'Other service activities (94-95)']
drop6 = lab_short_service.loc[lab_short_service.Industry == 'Real estate activities (68)']
drop7 = lab_short_service.loc[lab_short_service.Industry == 'Rental and leasing activities (77)']
drop8 = lab_short_service.loc[lab_short_service.Industry == 'Restaurants (56)']
drop9 = lab_short_service.loc[lab_short_service.Industry == 'Services to buildings, cleaning and landscape activities (81)']
drop10 = lab_short_service.loc[lab_short_service.Industry == 'Travel agent activities (79)']

In [None]:
params3 = lb_short_manu._define_base_params(language='en')

params3 = {'table': 'BARO3',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'BRANCHE07', 'values': ['C']},
  {'code': 'TYPE', 'values': ['AMA']},
  {'code': 'Tid', 'values': ['>2013K4<=2024K1']}]}

lab_short_manu = lb_short_manu.get_data(params=params3)
lab_short_manu.sort_values(by = ['TID'], inplace=True)
lab_short_manu.head(5)

In [None]:
params4 = lb_short_cons._define_base_params(language='en')

params4 = {'table': 'KBYG33',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'BRANCHE07', 'values': ['F']},
  {'code': 'TYPE', 'values': ['AMA']},
  {'code': 'Tid', 'values': ['>2013M12<=2024M01']}]}

lab_short_cons = lb_short_cons.get_data(params=params4)
lab_short_cons.sort_values(by = ['TID'], inplace=True)
lab_short_cons.head(5)

**Cleaning the data on International workers sorted from JobIndsats**

In [None]:
# Now moving on to the data from Jobindsats.
print(f'Before cleaning, the JSON datafile from JobIndsats contains {int_lb.shape[0]} observations and {int_lb.shape[1]} variables')

# Copying the DataFrame, which we will clean, incase we need the original data.
int_lb_copy = int_lb.copy()

# As we've only extracted the data from 2014 and after, we do not need to drop any time-dependent variables.
# Firstly, we don't need the second and last column, so we drop these.
int_lb_copy.drop(1, axis=1, inplace=True)
int_lb_copy.drop(4, axis=1, inplace=True)

# As seen above, the columns are currently named 0,1,...,4. This doesn't say a lot, so we rename all columns:
int_lb_copy.rename(columns = {0:'time'}, inplace=True)
int_lb_copy.rename(columns= {2:'industry'}, inplace=True)
int_lb_copy.rename(columns={3:'int_empl'}, inplace=True)


display(int_lb_copy.head(5))
print(f'After cleaning, the dataeset contains {int_lb_copy.shape[0]} observations and {int_lb_copy.shape[1]} variables')

In [None]:
# We now sort through the data by, first by setting time as the sorting index
int_lb_copy.sort_values(by='time')
int_lb_copy.pivot(index='time', columns='industry', values='int_empl')

## Explore each data set

In order to be able to **explore the raw data**, you may provide **static** and **interactive plots** to show important developments 

Statisk plot af:
- mangel på medarbejdere i de 3 sektorer, januar 2014 til januar 2024

**Interactive plot** :

Her skal vi have udviklingen i antal internationale medarbejdere i hver af servicebrancherne med drop-down

In [None]:
def plot_func():
    # Function that operates on data set
    pass

widgets.interact(plot_func, 
    # Let the widget interact with data through plot_func()    
); 


Explain what you see when moving elements of the interactive plot around. 

# Analysis

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

MAKE FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.