# Dataproject - Fertility and Unemployment rate

#### In this data project we are examining whether there are any correlation between unemployment rates and fertility rates on a municipality level in Denmark in the years from 2007 until 2017. First we download the tables of interest from Denmark Statistics (DST) and then merge them into a combined dataset. One this combined dataset we do some graphical explorations of the evolution of the unemployment rate and the fertility rate and the correlation between the two.

In [1]:
# Importing crucial packages

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import pydst
dst = pydst.Dst(lang='en')

## 1 Downloading data from DST
We examine what datasets are available from DST

In [2]:
dst.get_subjects()

Unnamed: 0,active,desc,hasSubjects,id
0,True,Population and elections,True,2
1,True,Living conditions,True,5
2,True,Education and knowledge,True,3
3,True,Culture and National Church,True,18
4,True,"Labour, income and wealth",True,4
5,True,Prices and consumption,True,6
6,True,National accounts and government finances,True,14
7,True,Money and credit market,True,16
8,True,External economy,True,13
9,True,Business sector in general,True,7


We would like to look at "population and elections" because this is where we can find unformation about fertility rates

In [3]:
dst.get_tables(subjects=['02'])

Unnamed: 0,active,firstPeriod,id,latestPeriod,text,unit,updated,variables
0,True,2008Q1,FOLK1A,2019Q1,Population at the first day of the quarter,number,2019-02-11 08:00:00,"[region, sex, age, marital status, time]"
1,True,2008Q1,FOLK1B,2019Q1,Population at the first day of the quarter,number,2019-02-11 08:00:00,"[region, sex, age, citizenship, time]"
2,True,2008Q1,FOLK1C,2019Q1,Population at the first day of the quarter,number,2019-02-11 08:00:00,"[region, sex, age, ancestry, country of origin..."
3,True,2008Q1,FOLK1D,2019Q1,Population at the first day of the quarter,number,2019-02-11 08:00:00,"[region, sex, age, citizenship, time]"
4,True,2008Q1,FOLK1E,2019Q1,Population at the first day of the quarter,number,2019-02-11 08:00:00,"[region, sex, age, ancestry, time]"
5,True,1980,FOLK2,2019,Population 1. January,number,2019-02-11 08:00:00,"[age, sex, ancestry, citizenship, country of o..."
6,True,2008,FOLK3,2019,Population 1. January,number,2019-02-11 08:00:00,"[day of birth, birth month, year of birth, time]"
7,True,1769,FT,2019,Population figures from the censuses,number,2019-02-11 08:00:00,"[national part, time]"
8,True,2008,BEF5F,2019,People born in Faroe Islands and living in Den...,number,2019-02-11 08:00:00,"[sex, age, parents place of birth, time]"
9,True,2008,BEF5G,2019,People born in Greenland and living in Denmark...,number,2019-02-11 08:00:00,"[sex, age, parents place of birth, time]"


We use the table with id 'FOD407', that containts information about fertility rates in each municipality in specific years.

In [64]:
FOD407_vars = dst.get_variables(table_id='FOD407')
FOD407_vars['values'][1][:] ## age= TOT1 to include all ages

[{'id': 'TOT1', 'text': 'Total fertility rate'},
 {'id': '15-19', 'text': '15-19 years'},
 {'id': '20-24', 'text': '20-24 years'},
 {'id': '25-29', 'text': '25-29 years'},
 {'id': '30-34', 'text': '30-34 years'},
 {'id': '35-39', 'text': '35-39 years'},
 {'id': '40-44', 'text': '40-44 years'},
 {'id': '45-49', 'text': '45-49 years'}]

In [65]:
#We are only interested in the total fertility rate, hence we write 'ALDER:[TOT1]'

FOD407 = dst.get_data(table_id = 'FOD407', variables={'OMRÅDE':['*'], 'TID':['*'], 'ALDER':['TOT1'] })
FOD407.sort_values(by=['OMRÅDE', 'TID'], ascending=False)
FOD407.head()

Unnamed: 0,OMRÅDE,TID,ALDER,INDHOLD
0,All Denmark,2006,Total fertility rate,1847.6
1,Region Hovedstaden,2006,Total fertility rate,1706.2
2,Region Sjælland,2006,Total fertility rate,2061.2
3,Region Syddanmark,2006,Total fertility rate,1928.0
4,Region Midtjylland,2006,Total fertility rate,1919.1


We rename each variable

In [71]:
FOD407_en=FOD407.rename(columns={"OMRÅDE": "AREA", "TID": "YEAR", "INDHOLD": "BIRTH RATE", "ALDER": "AGE"})
FOD407_en.head(10)

Unnamed: 0,AREA,YEAR,AGE,BIRTH RATE
0,All Denmark,2006,Total fertility rate,1847.6
1,Region Hovedstaden,2006,Total fertility rate,1706.2
2,Region Sjælland,2006,Total fertility rate,2061.2
3,Region Syddanmark,2006,Total fertility rate,1928.0
4,Region Midtjylland,2006,Total fertility rate,1919.1
5,Region Nordjylland,2006,Total fertility rate,1902.8
6,Province Byen København,2006,Total fertility rate,1555.0
7,Province Københavns omegn,2006,Total fertility rate,1924.9
8,Province Nordsjælland,2006,Total fertility rate,2143.5
9,Province Bornholm,2006,Total fertility rate,1998.8


In [72]:
FOD407_en_group = FOD407_en.groupby(['AREA','YEAR'])['BIRTH RATE'].sum().reset_index()
FOD407_en_group.head()

Unnamed: 0,AREA,YEAR,BIRTH RATE
0,Aabenraa,2006,2146.1
1,Aabenraa,2007,2021.9
2,Aabenraa,2008,2075.0
3,Aabenraa,2009,2177.6
4,Aabenraa,2010,2037.3


We now look at the category "Labour, income and wealth" at Denmark Statistics

In [16]:
dst.get_tables(subjects=['04'])

Unnamed: 0,active,firstPeriod,id,latestPeriod,text,unit,updated,variables
0,True,2008Q1,FOLK1A,2019Q1,Population at the first day of the quarter,number,2019-02-11 08:00:00,"[region, sex, age, marital status, time]"
1,True,2008Q1,FOLK1B,2019Q1,Population at the first day of the quarter,number,2019-02-11 08:00:00,"[region, sex, age, citizenship, time]"
2,True,2008Q1,FOLK1C,2019Q1,Population at the first day of the quarter,number,2019-02-11 08:00:00,"[region, sex, age, ancestry, country of origin..."
3,True,2008Q1,FOLK1D,2019Q1,Population at the first day of the quarter,number,2019-02-11 08:00:00,"[region, sex, age, citizenship, time]"
4,True,2008Q1,FOLK1E,2019Q1,Population at the first day of the quarter,number,2019-02-11 08:00:00,"[region, sex, age, ancestry, time]"
5,True,1980,FOLK2,2019,Population 1. January,number,2019-02-11 08:00:00,"[age, sex, ancestry, citizenship, country of o..."
6,True,2008,FOLK3,2019,Population 1. January,number,2019-02-11 08:00:00,"[day of birth, birth month, year of birth, time]"
7,True,1769,FT,2019,Population figures from the censuses,number,2019-02-11 08:00:00,"[national part, time]"
8,True,2008,BEF5F,2019,People born in Faroe Islands and living in Den...,number,2019-02-11 08:00:00,"[sex, age, parents place of birth, time]"
9,True,2008,BEF5G,2019,People born in Greenland and living in Denmark...,number,2019-02-11 08:00:00,"[sex, age, parents place of birth, time]"


In [17]:
AULP01_vars = dst.get_variables(table_id='AULP01')
AULP01_vars

Unnamed: 0,elimination,id,map,text,time,values
0,True,OMRÅDE,denmark_municipality_07,region,False,"[{'id': '000', 'text': 'All Denmark'}, {'id': ..."
1,True,ALDER,,age,False,"[{'id': 'TOT', 'text': 'Age, total'}, {'id': '..."
2,True,KØN,,sex,False,"[{'id': 'TOT', 'text': 'Total'}, {'id': 'M', '..."
3,False,Tid,,time,True,"[{'id': '2007', 'text': '2007'}, {'id': '2008'..."


In [39]:
AULP01 = dst.get_data(table_id = 'AULP01', variables={'OMRÅDE':['*'], 'ALDER':['TOT'], 'KØN':['TOT'], 'TID':['*'] })
AULP01.head()

Unnamed: 0,OMRÅDE,ALDER,KØN,TID,INDHOLD
0,Svendborg,"Age, total",Total,2011,6.8
1,Nordfyns,"Age, total",Total,2011,7.4
2,Langeland,"Age, total",Total,2011,7.9
3,Ærø,"Age, total",Total,2011,4.5
4,Haderslev,"Age, total",Total,2011,6.0


In [41]:
AULP01_en = AULP01.rename(columns={"OMRÅDE": "AREA", "ALDER": "AGE", "KØN": "GENDER",\
                                   "TID": "YEAR", "INDHOLD": "UNEMPLOYMENT RATE"})
AULP01_en.head(6)

Unnamed: 0,AREA,AGE,GENDER,YEAR,UNEMPLOYMENT RATE
0,Svendborg,"Age, total",Total,2011,6.8
1,Nordfyns,"Age, total",Total,2011,7.4
2,Langeland,"Age, total",Total,2011,7.9
3,Ærø,"Age, total",Total,2011,4.5
4,Haderslev,"Age, total",Total,2011,6.0
5,Billund,"Age, total",Total,2011,3.7


In [46]:
AULP01_en_group = AULP01_en.groupby(['AREA','YEAR'])['UNEMPLOYMENT RATE'].sum().reset_index()
AULP01_en_group.head()

Unnamed: 0,AREA,YEAR,UNEMPLOYMENT RATE
0,Aabenraa,2007,3.8
1,Aabenraa,2008,2.8
2,Aabenraa,2009,5.2
3,Aabenraa,2010,6.8
4,Aabenraa,2011,6.4


We now merge the two different datasets.

In [49]:
merged_data = pd.merge(FOD407_en_group, AULP01_en_group, on=['YEAR', 'AREA'], how='left')
merged_data.head(100)

Unnamed: 0,AREA,YEAR,BIRTH RATE,UNEMPLOYMENT RATE
0,Aabenraa,2006,2146.1,
1,Aabenraa,2007,2021.9,3.8
2,Aabenraa,2008,2075.0,2.8
3,Aabenraa,2009,2177.6,5.2
4,Aabenraa,2010,2037.3,6.8
5,Aabenraa,2011,1962.2,6.4
6,Aabenraa,2012,2003.8,6.9
7,Aabenraa,2013,1884.5,6.4
8,Aabenraa,2014,1868.2,5.2
9,Aabenraa,2015,1757.9,4.7


We now drop the year 2006 and 2018 since, we don't have any information about the unemployment rate in these years. Also the fertility rates for the municipalities Læse, Samsø, Ærø, Fanø and Christiansø are not given because of to few observations, so these are dropped from the data set as well. 

In [60]:
merged_data = merged_data.dropna()
merged_data.head(11)

Unnamed: 0,AREA,YEAR,BIRTH RATE,UNEMPLOYMENT RATE
1,Aabenraa,2007,2021.9,3.8
2,Aabenraa,2008,2075.0,2.8
3,Aabenraa,2009,2177.6,5.2
4,Aabenraa,2010,2037.3,6.8
5,Aabenraa,2011,1962.2,6.4
6,Aabenraa,2012,2003.8,6.9
7,Aabenraa,2013,1884.5,6.4
8,Aabenraa,2014,1868.2,5.2
9,Aabenraa,2015,1757.9,4.7
10,Aabenraa,2016,2018.9,4.0
