# Statistics Denmark Data

Overview - lacks:
 - API-scraping
 - Usage: 
  - Sanity check
  - First difference analysis 

In addition to the data previously gained from scraping the archives of Jobindex, we have gathered supplementary data using the API offered by Statistics Denmark. The dataframes covering the data is presented as:  

BNP:

|                                        | 2007        | 2008        | 2009        | 2010        | 2011        | 2012        | 2013        | 2014        | 2015       | 2016        | 2017        |
|:---------------------------------------|:------------|:------------|:------------|:------------|:------------|:------------|:------------|:------------|:-----------|:------------|:------------|
| 2010-prices, chained values, (mia kr.) | 1879.008892 | 1869.388052 | 1777.665635 | 1810.925601 | 1835.133652 | 1839.290226 | 1856.457075 | 1886.520426 | 1916.82888 | 1954.476749 | 1998.975791 |

---


  

Highest level of education completed:


|      | H10 Primary education   | H20 Upper secondary education   | H30 Vocational Education and Training (VET)   | H40 Short cycle higher education   | H50 Vocational bachelors educations   | H60 Bachelors programmes   | H70 Masters programmes   | H80 PhD programmes   |
|:-----|:------------------------|:--------------------------------|:----------------------------------------------|:-----------------------------------|:--------------------------------------|:---------------------------|:-------------------------|:---------------------|
| 2017 | 1064772                 | 400171                          | 1220960                                       | 192492                             | 577111                                | 92867                      | 365171                   | 32252                |
| 2016 | 1085004                 | 390781                          | 1235197                                       | 187986                             | 565252                                | 92193                      | 342964                   | 30121                |
| 2015 | 1101935                 | 376739                          | 1248007                                       | 183151                             | 549623                                | 88097                      | 322129                   | 27424                |
| 2014 | 1125995                 | 364873                          | 1257089                                       | 179361                             | 538315                                | 83445                      | 305861                   | 24548                |
| 2013 | 1148172                 | 353567                          | 1265639                                       | 175299                             | 529218                                | 79491                      | 292015                   | 22274                |
| 2012 | 1169093                 | 341942                          | 1271608                                       | 171649                             | 522103                                | 76516                      | 279730                   | 20347                |
| 2011 | 1186300                 | 332671                          | 1276224                                       | 168142                             | 514202                                | 73344                      | 268105                   | 18571                |
| 2010 | 1197785                 | 327522                          | 1280268                                       | 164640                             | 505990                                | 69208                      | 256719                   | 16904                |
| 2009 | 1207060                 | 324166                          | 1281338                                       | 161422                             | 497912                                | 65821                      | 244863                   | 15477                |
| 2008 | 1204469                 | 322250                          | 1282490                                       | 157421                             | 490644                                | 62197                      | 234119                   | 14139                |
| 2007 | 1214970                 | 320412                          | 1284917                                       | 154031                             | 482492                                | 59779                      | 223024                   | 12987                |

____

The data as presented has been gathered using the API offered by DST. An API (Application Programming Interface) is a manual console offered by the developer through which all publicized data is available. The source is openly availabe and yields offiicial statistical data by which usage is permitted whilst DST as the source is properly credited.
 - ** Possibly some literature cocerning API-scraping**

The data that we chose to include covers 1) development of the Gross National Product (BNP) per citizen, and 2) development in highest level of education, both in the yearspan of 2007 - 2017 with both years included. Data hails respectively from the NAN1 (Yearly National Accounting) and the HFUDD (Highest Completed Education) statistics.  
  - ** Perhaps something concerning the "realness" of the DST data as contrasted with the "proxy"-data from Jobindex???**

The supplementary data is included on merit of performing procedures of comparison, sanity checking, and cross validiation in our further analysis.

________

Literature:
_HFUDD-documentation:_
https://www.dst.dk/da/Statistik/dokumentation/statistikdokumentation/hoejest-fuldfoert-uddannelse

_NAN1-documentation:_
https://www.dst.dk/da/Statistik/dokumentation/statistikdokumentation/aarligt-nationalregnskab--hele-oekonomien

_Terms of conditions when using DST_
https://www.dst.dk/da/OmDS/omweb#

_DST API-documentation:_
https://www.dst.dk/da/Statistik/statistikbanken/api


In [2]:
import requests
import numpy as np
import pandas as pd

def get_data(table_id,variables):
    base = 'https://api.statbank.dk/v1/data/{id}/JSONSTAT?lang=en'.format(id = table_id)
    
    for var in variables:
        base += '&{v}'.format(v = var) 

    response=requests.get(base)
    data_json=response.json()
    return data_json


In [63]:
data5=get_data('RAS301',['Tid=*','OMRÅDE=085']) # Sjælland
data4=get_data('RAS301',['OMRÅDE=084','Tid=*']) # Hovedstaden
data3=get_data('RAS301',['Tid=*','OMRÅDE=083']) # Syddanmark
data2=get_data('RAS301',['Tid=*','OMRÅDE=082']) # Midtjylland
data1=get_data('RAS301',['Tid=*','OMRÅDE=081']) # Nordjylland

indexlabels=['2008','2009','2010','2011','2012','2013','2014','2015','2016']

df1=pd.DataFrame(data1['dataset']['value'],index=indexlabels,columns=['Region Nordjylland'])
df2=pd.DataFrame(data2['dataset']['value'],index=indexlabels,columns=['Region Midtjylland'])
df3=pd.DataFrame(data3['dataset']['value'],index=indexlabels,columns=['Region Syddanmark'])
df4=pd.DataFrame(data4['dataset']['value'],index=indexlabels,columns=['Region Hovedstaden'])
df5=pd.DataFrame(data5['dataset']['value'],index=indexlabels,columns=['Region Sjælland'])

dfs=[df1,df2,df3,df4,df5]
df_Employment_in_Area=pd.concat(dfs,axis=1)
df_Employment_in_Area=df_Employment_in_Area.reindex(index=df_Employment_in_Area.index[::-1]) # Reversing row index
df_Employment_in_Area

Unnamed: 0,Region Nordjylland,Region Midtjylland,Region Syddanmark,Region Hovedstaden,Region Sjælland
2016,274420,636253,568033,995027,325465
2015,270376,625794,560323,974969,321685
2014,268616,619296,556958,950965,319632
2013,267540,613280,551530,936275,318862
2012,269597,611487,551809,927242,318994
2011,271664,615713,557038,924653,323771
2010,273005,617706,562768,917723,328039
2009,273155,620708,570034,922408,332640
2008,288348,652242,600270,949840,347400


In [173]:
data5=get_data('RAS201',['Tid=*','OMRÅDE=085','SOCIO=50']) # Sjælland
data4=get_data('RAS201',['OMRÅDE=084','Tid=*','SOCIO=50']) # Hovedstaden
data3=get_data('RAS201',['Tid=*','OMRÅDE=083','SOCIO=50']) # Syddanmark
data2=get_data('RAS201',['Tid=*','OMRÅDE=082','SOCIO=50']) # Midtjylland
data1=get_data('RAS201',['Tid=*', 'OMRÅDE=081','SOCIO=50']) # Nordjylland

indexlabels=['2008','2009','2010','2011','2012','2013','2014','2015','2016']

df1=pd.DataFrame(data1['dataset']['value'],index=indexlabels,columns=['Region Nordjylland'])
df2=pd.DataFrame(data2['dataset']['value'],index=indexlabels,columns=['Region Midtjylland'])
df3=pd.DataFrame(data3['dataset']['value'],index=indexlabels,columns=['Region Syddanmark'])
df4=pd.DataFrame(data4['dataset']['value'],index=indexlabels,columns=['Region Hovedstaden'])
df5=pd.DataFrame(data5['dataset']['value'],index=indexlabels,columns=['Region Sjælland'])

dfs=[df1,df2,df3,df4,df5]
df_Unemployment_by_residence=pd.concat(dfs,axis=1)
df_Unemployment_by_residence=df_Unemployment_by_residence.reindex(index=df_Unemployment_by_residence.index[::-1]) # Reversing row index
df_Unemployment_by_residence

Unnamed: 0,Region Nordjylland,Region Midtjylland,Region Syddanmark,Region Hovedstaden,Region Sjælland
2016,10707,19109,20452,34000,13775
2015,11411,19503,21172,34790,13963
2014,12084,20561,22021,39002,15062
2013,12940,24274,26377,44381,17445
2012,13693,26283,29481,46451,19290
2011,13420,24661,27462,42233,18023
2010,14348,26969,27850,44008,18883
2009,15246,28284,27938,40451,18139
2008,7556,11779,12445,22971,9458


In [5]:
indexlabels_bnp=['2007','2008','2009','2010','2011','2012','2013','2014','2015','2016','2017']
data_bnp=get_data('NAN1',['Tid=2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017','TRANSAKT=B1GQK','PRISENHED=LAN_M']) # Nordjylland
BNP=pd.DataFrame(data_bnp['dataset']['value'],index=indexlabels_bnp,columns=['2010-priser, kædede værdier, (mia kr.)'])
#BNP=BNP.reindex(index=BNP.index[::-1])
BNP=BNP.transpose()
BNP

Unnamed: 0,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017
"2010-priser, kædede værdier, (mia kr.)",1879.008892,1869.388052,1777.665635,1810.925601,1835.133652,1839.290226,1856.457075,1886.520426,1916.82888,1954.476749,1998.975791


In [126]:
#Gathering DST for highest education in DK
HFUDD=['H10','H20','H30','H40','H50','H60','H70','H80']
indexlabels_HFUDD=['2007','2008','2009','2010','2011','2012','2013','2014','2015','2016','2017']
HFUDD_col=[]
HFUDD_dfs=[]

for uddannelse in HFUDD:
    data=get_data('HFUDD10',['Tid=2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017',\
                'BOPOMR=000','HFUDD={u}'.format(u=uddannelse)])
    df=pd.DataFrame(data['dataset']['value'],index=indexlabels_HFUDD,columns=\
                   [data['dataset']['dimension']['HFUDD']['category']['label']\
                    ['{u}'.format(u=uddannelse)]])
    HFUDD_dfs.append(df)

DF_HFUDD=pd.concat(HFUDD_dfs,axis=1)
DF_HFUDD=DF_HFUDD.reindex(index=DF_HFUDD.index[::-1]) # Reversing row index
DF_HFUDD

Unnamed: 0,H10 Primary education,H20 Upper secondary education,H30 Vocational Education and Training (VET),H40 Short cycle higher education,H50 Vocational bachelors educations,H60 Bachelors programmes,H70 Masters programmes,H80 PhD programmes
2017,1064772,400171,1220960,192492,577111,92867,365171,32252
2016,1085004,390781,1235197,187986,565252,92193,342964,30121
2015,1101935,376739,1248007,183151,549623,88097,322129,27424
2014,1125995,364873,1257089,179361,538315,83445,305861,24548
2013,1148172,353567,1265639,175299,529218,79491,292015,22274
2012,1169093,341942,1271608,171649,522103,76516,279730,20347
2011,1186300,332671,1276224,168142,514202,73344,268105,18571
2010,1197785,327522,1280268,164640,505990,69208,256719,16904
2009,1207060,324166,1281338,161422,497912,65821,244863,15477
2008,1204469,322250,1282490,157421,490644,62197,234119,14139


In [None]:
area_list=['081','082','083','084','085']
year_list=['2008K1','2008K4','2009K4',\
           '2010K4','2011K4','2012K4','2013K4','2014K4',\
           '2015K4','2016K4','2017K4']
age_list=['20','21','22','23','24','25','26','27','28','29','30','31'\
         ,'32','33','34','35','36','37','38','39','40','41','42','43','44'\
         ,'45','46','47','48','49','50','51','52','53','54','55','56','57','58',\
         '59','60','61','62','63','64']

all_df=[]

for area in area_list[:1]:
    print(area)
   
    for year in year_list:
        print(year)
        age_sum=[]
        for age in age_list:
            data=get_data('FOLK1A',['Tid={t}'.format(t=year),\
                  'OMRÅDE={o}'.format(o=area),'ALDER={a}'.format(a=age)])
            age_value=data['dataset']['value']
            print(age_value)
            age_sum+age_value
            
        age_to_df=sum(age_sum)    
        df=pd.DataFrame(age_to_df,index=['{a}'.format(a=age)]\
                    ,columns=['{t}'.format(t=year)])
        
        all_df.append(df)
            
            

    
#all_df=pd.concat(all_df,axis=1)
#all_df=all_df.reindex(index=all_df.index[::-1]) # Reversing row index
#aff_df

data=get_data('FOLK1A',['Tid=2008K1,2008K4,2009K4,2010K4,2011K4,2012K4,2013K4,2014K4,2015K4,2016K4,2017K4'\
                        ,'OMRÅDE=081,082,083,084,085','ALDER=20'])

In [6]:
def df_to_markdown(*dfs, sep_line='\n---\n', **kwargs):
    """Convert pandas dataframe to markdown table."""
    import tabulate

    disable_numparse = kwargs.pop('disable_numparse', True)
    tablefmt = kwargs.pop('tablefmt', 'pipe')
    headers = kwargs.pop('headers', 'keys')
    
    for df in dfs:
        print(tabulate.tabulate(df, tablefmt=tablefmt, headers=headers,
                                disable_numparse=disable_numparse, **kwargs))
        if sep_line is not None:
            print(sep_line)
            
            
df_to_markdown(BNP)

|                                        | 2007        | 2008        | 2009        | 2010        | 2011        | 2012        | 2013        | 2014        | 2015       | 2016        | 2017        |
|:---------------------------------------|:------------|:------------|:------------|:------------|:------------|:------------|:------------|:------------|:-----------|:------------|:------------|
| 2010-priser, kædede værdier, (mia kr.) | 1879.008892 | 1869.388052 | 1777.665635 | 1810.925601 | 1835.133652 | 1839.290226 | 1856.457075 | 1886.520426 | 1916.82888 | 1954.476749 | 1998.975791 |

---

