# Analysis of Gross national income (GNI)

Imports and set magics:

In [170]:
# First, we import an API from DST, specifying the language
import pydst                          
Dst = pydst.Dst(lang='en')            
# Next, the packages for data analysis are imported
import pandas as pd                   
import numpy as np                    
import matplotlib.pyplot as plt       
import ipywidgets as widgets          

# Autoreloading modules when code is run
%load_ext autoreload
%autoreload 2

# Local modules
import dataproject


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Read and clean data

**Read the  data** from DSTI. We want to analyze the table named NKN2 including the Danish gross national income  by transaction and price unit:

In [171]:
#In order to retrieve the dataset 'NKN2' from DST, we run the following code
Dst.get_data(table_id = 'NKN2')      


Unnamed: 0,TRANSAKT,PRISENHED,SÆSON,TID,INDHOLD
0,B.1*g Gross domestic product,Current prices,Non-seasonally adjusted,1990Q1,210169


**Next, we want to get an overview** of the variables in the dataset:

In [172]:
#To retrieve the different varibles of the dataset, we run the following code
Dst.get_variables(table_id = 'NKN2') 

Unnamed: 0,id,text,elimination,time,values
0,TRANSAKT,transaction,False,False,"[{'id': 'B1GQD', 'text': 'B.1*g Gross domestic..."
1,PRISENHED,price unit,False,False,"[{'id': 'V', 'text': 'Current prices'}, {'id':..."
2,SÆSON,seasonal adjustment,False,False,"[{'id': 'N', 'text': 'Non-seasonally adjusted'..."
3,Tid,time,False,True,"[{'id': '1990K1', 'text': '1990Q1'}, {'id': '1..."


**Now, we want to extract the seasonally adjusted GNI for the entire period**:

In [173]:
#The following code extracts the desided variables and shows the first and last three rows of the dataset

df = Dst.get_data(table_id = 'NKN2',
     variables = {'TRANSAKT':['B1GQD'],
     'PRISENHED':['V'],
     'SÆSON':['Y'],'TID':['*']})        
pd.concat([df.head(4), df.tail(4)])     

Unnamed: 0,TRANSAKT,PRISENHED,SÆSON,TID,INDHOLD
0,B.1*g Gross domestic product,Current prices,Seasonally adjusted,1990Q1,210498
1,B.1*g Gross domestic product,Current prices,Seasonally adjusted,1990Q2,215917
2,B.1*g Gross domestic product,Current prices,Seasonally adjusted,1990Q3,214405
3,B.1*g Gross domestic product,Current prices,Seasonally adjusted,1990Q4,214737
116,B.1*g Gross domestic product,Current prices,Seasonally adjusted,2019Q1,571565
117,B.1*g Gross domestic product,Current prices,Seasonally adjusted,2019Q2,577887
118,B.1*g Gross domestic product,Current prices,Seasonally adjusted,2019Q3,584047
119,B.1*g Gross domestic product,Current prices,Seasonally adjusted,2019Q4,587989


In [174]:
#We now remove redundant variables, so only time and GNI remain.

df1 = df.drop('TRANSAKT',1).drop('PRISENHED',1).drop('SÆSON',1) 
pd.concat([df1.head(5), df1.tail(5)])                           

Unnamed: 0,TID,INDHOLD
0,1990Q1,210498
1,1990Q2,215917
2,1990Q3,214405
3,1990Q4,214737
4,1991Q1,222620
115,2018Q4,571330
116,2019Q1,571565
117,2019Q2,577887
118,2019Q3,584047
119,2019Q4,587989


Time and GNI is extracted as arrays. Empty cells are removed, converting the variable to a float type:

In [175]:
#We then extract time and GDP as arrays. We also remove empty cells in GDP and convert this variable to the type of float.

Time = df1['TID']                       
GNI  = df1['INDHOLD']                   

GNI.replace('..',np.nan, inplace=True)
GNI.dropna(inplace=True)                
GNI = GNI.iloc[:].astype(float)         

GNI = GNI.reset_index(drop=True)       

#Showing the first- and last rows of the dataset
pd.concat([GNI.head(5), GNI.tail(5)])   

0      210498.0
1      215917.0
2      214405.0
3      214737.0
4      222620.0
115    571330.0
116    571565.0
117    577887.0
118    584047.0
119    587989.0
Name: INDHOLD, dtype: float64

Now, we wish to remove some time units in order to create a perfectly balanced time series data set, resetting the index and plotting the first and last observations of the dataset as follows:

In [176]:
t = time[4:]                        
t = t.reset_index(drop=True)        
pd.concat([t.head(5), t.tail(5)])   

0      1991Q1
1      1991Q2
2      1991Q3
3      1991Q4
4      1992Q1
111    2018Q4
112    2019Q1
113    2019Q2
114    2019Q3
115    2019Q4
Name: TID, dtype: object

The dataset now consists of 115+1 observations for each variable 

## Growth rate and descriptive stats

**Summarizing the data**:

In [177]:
pd.DataFrame(GNI.describe())

# a. load
#inc = pd.read_excel('INDKP101.xlsx', skiprows=2)

# b. drop and rename columns
#inc.drop([f'Unnamed: {i}' for i in range(3)], axis=1, inplace=True)
#inc.rename(columns = {'Unnamed: 3':'municipality'}, inplace=True)

# c. drop rows with missing
#inc.dropna(inplace=True)

# d. remove non-municipalities
#inc = dataproject.only_keep_municipalities(inc)

# e. convert to long
#inc.rename(columns = {str(i):f'income{i}' for i in range(1986,2018)}, inplace=True)
#inc_long = pd.wide_to_long(inc, stubnames='income', i='municipality', j='year').reset_index()

# f. show
#inc_long.head(5)

Unnamed: 0,INDHOLD
count,120.0
mean,385179.391667
std,110077.898437
min,210498.0
25%,287099.5
50%,385882.5
75%,474208.5
max,587989.0


> **Note:** The function ``dataproject.only_keep_municipalities()`` is used on both the employment and the income datasets.

## Explore data set

In order to be able to **explore the raw data**, we here provide an **interactive plot** to show, respectively, the employment and income level in each municipality

The **static plot** is:

In [178]:
#def plot_empl_inc(empl,inc,dataset,municipality): 
    
 #   if dataset == 'Employment':
  #      df = empl
   #     y = 'employment'
    #else:
     #   df = inc
      #  y = 'income'
    
   # I = df['municipality'] == municipality
   # ax = df.loc[I,:].plot(x='year', y=y, style='-o')

The **interactive plot** is:

In [179]:
#widgets.interact(plot_empl_inc, 
    
 #   empl = widgets.fixed(empl_long),
  #  inc = widgets.fixed(inc_long),
   # dataset = widgets.Dropdown(description='Dataset', 
    #                           options=['Employment','Income']),
    #municipality = widgets.Dropdown(description='Municipality', 
     #                               options=empl_long.municipality.unique())
                 
#); 

ADD SOMETHING HERE IF THE READER SHOULD KNOW THAT E.G. SOME MUNICIPALITY IS SPECIAL.

# Merge data sets

We now create a data set with **municpalities which are in both of our data sets**. We can illustrate this **merge** as:

In [180]:
#plt.figure(figsize=(15,7))
#v = venn2(subsets = (4, 4, 10), set_labels = ('inc', 'empl'))
#v.get_label_by_id('100').set_text('dropped')
#v.get_label_by_id('010').set_text('dropped' )
#v.get_label_by_id('110').set_text('included')
#plt.show()

In [181]:
#merged = pd.merge(empl_long, inc_long, how='inner',on=['municipality','year'])

#print(f'Number of municipalities = {len(merged.municipality.unique())}')
#print(f'Number of years          = {len(merged.year.unique())}')

# Analysis

To get a quick overview of the data, we show some **summary statistics by year**:

In [182]:
#merged.groupby('year').agg(['mean','std']).round(2)

ADD FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.