# Statistics Denmark Data

Overview:
 - DST - RAS301 ( RAS: Registerbased Workforce Statistics) 
     - Employment by region of work (not by residence) 
 - API-scraping
 - Region-wide data - matches our primary data scraped.
 - May be used for: 
  - Sanity check
  - First difference analysis 

In addition to the data previously gained from scraping the archives of Jobindex, we have gathered supplementary data using the API offered by Statistics Denmark. The dataframe covering the data is presented as:  


|      | Region Nordjylland   | Region Midtjylland   | Region Syddanmark   | Region Hovedstaden   | Region Sjælland   |
|-----|---------------------|---------------------|--------------------|---------------------|------------------|
| 2016 | 274420               | 636253               | 568033              | 995027               | 325465            |
| 2015 | 270376               | 625794               | 560323              | 974969               | 321685            |
| 2014 | 268616               | 619296               | 556958              | 950965               | 319632            |
| 2013 | 267540               | 613280               | 551530              | 936275               | 318862            |
| 2012 | 269597               | 611487               | 551809              | 927242               | 318994            |
| 2011 | 271664               | 615713               | 557038              | 924653               | 323771            |
| 2010 | 273005               | 617706               | 562768              | 917723               | 328039            |
| 2009 | 273155               | 620708               | 570034              | 922408               | 332640            |
| 2008 | 288348               | 652242               | 600270              | 949840               | 347400            |
  




Somewhat contrary to the rather advanced procedure of scraping data by uncovering URL and HTML-markups as was previously done for the Jobindex-data, scraping from an API is rather straight forward. In the present case of DST, the API (Application Programming Interface) is a manual console offered by the developer through which all publicized data is available. As thus, we do not belive that any immediate ethical implications concerning the scraping or nature of the data are present.

The data covers employment by area of work across the timespan from 2008 - 2016 with both years included. Categorization of areas are distributed according to the Danish regions with five overall, thus matching the areas covering the data available from Jobindex. All data from DST hails from the RAS-survey (Registerbased Workforce Statistics), an individual based survey conducted ultimo November approximately every one and a half year. Using state wide labour market accounts rather than sampling the data yields high levels of precision concerning the labour market state of the Danish nation.  
 ** - Perhaps something concerning the "realness" of the DST data as contrasted with the "proxy"-data from Jobindex???**

By including such supplementary data we wish to perform procedures of comparison, sanity checking, and possibly cross validiation in our further analysis.

________

Literature:
_RAS-documentation:_
https://www.dst.dk/da/Statistik/dokumentation/statistikdokumentation/registerbaseret-arbejdsstyrkestatistik


_DST API-documentation:_
https://www.dst.dk/da/Statistik/statistikbanken/api


In [21]:
import requests
import numpy as np
import pandas as pd
from tabulate import tabulate

def get_data(table_id,variables):
    base = 'https://api.statbank.dk/v1/data/{id}/JSONSTAT?lang=en'.format(id = table_id)
    
    for var in variables:
        base += '&{v}'.format(v = var) 

    response=requests.get(base)
    data_json=response.json()
    return data_json


In [13]:
data5=get_data('RAS301',['Tid=*','OMRÅDE=085']) # Sjælland
data4=get_data('RAS301',['OMRÅDE=084','Tid=*']) # Hovedstaden
data3=get_data('RAS301',['Tid=*','OMRÅDE=083']) # Syddanmark
data2=get_data('RAS301',['Tid=*','OMRÅDE=082']) # Midtjylland
data1=get_data('RAS301',['Tid=*','OMRÅDE=081']) # Nordjylland

indexlabels=['2008','2009','2010','2011','2012','2013','2014','2015','2016']

df1=pd.DataFrame(data1['dataset']['value'],index=indexlabels,columns=['Region Nordjylland'])
df2=pd.DataFrame(data2['dataset']['value'],index=indexlabels,columns=['Region Midtjylland'])
df3=pd.DataFrame(data3['dataset']['value'],index=indexlabels,columns=['Region Syddanmark'])
df4=pd.DataFrame(data4['dataset']['value'],index=indexlabels,columns=['Region Hovedstaden'])
df5=pd.DataFrame(data5['dataset']['value'],index=indexlabels,columns=['Region Sjælland'])

dfs=[df1,df2,df3,df4,df5]
df=pd.concat(dfs,axis=1)
df=df.reindex(index=df.index[::-1]) # Reversing row index
df

Unnamed: 0,Region Nordjylland,Region Midtjylland,Region Syddanmark,Region Hovedstaden,Region Sjælland
2016,274420,636253,568033,995027,325465
2015,270376,625794,560323,974969,321685
2014,268616,619296,556958,950965,319632
2013,267540,613280,551530,936275,318862
2012,269597,611487,551809,927242,318994
2011,271664,615713,557038,924653,323771
2010,273005,617706,562768,917723,328039
2009,273155,620708,570034,922408,332640
2008,288348,652242,600270,949840,347400


In [23]:
def df_to_markdown(*dfs, sep_line='\n---\n', **kwargs):
    """Convert pandas dataframe to markdown table."""
    import tabulate

    disable_numparse = kwargs.pop('disable_numparse', True)
    tablefmt = kwargs.pop('tablefmt', 'pipe')
    headers = kwargs.pop('headers', 'keys')
    
    for df in dfs:
        print(tabulate.tabulate(df, tablefmt=tablefmt, headers=headers,
                                disable_numparse=disable_numparse, **kwargs))
        if sep_line is not None:
            print(sep_line)
            
            
#df_to_markdown(df)