# Analysis of Gross national income (GNI)

Imports and set magics:

In [297]:
# First, we import an API from DST, specifying the language
import pydst                          
Dst = pydst.Dst(lang='en')            
# Next, the packages for data analysis are imported
import pandas as pd                   
import numpy as np                    
import matplotlib.pyplot as plt       
import ipywidgets as widgets          

# Autoreloading modules when code is run
%load_ext autoreload
%autoreload 2

# Local modules
import dataproject


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Read and clean data

**Read the  data** from DSTI. We want to analyze the table named NKN2 including the Danish gross national income  by transaction and price unit:

In [298]:
#In order to retrieve the dataset 'NKN2' from DST, we run the following code
Dst.get_data(table_id = 'NKN2')      


Unnamed: 0,TRANSAKT,PRISENHED,SÆSON,TID,INDHOLD
0,B.1*g Gross domestic product,Current prices,Non-seasonally adjusted,1990Q1,210169


**Next, we want to get an overview** of the variables in the dataset:

In [299]:
#To retrieve the different varibles of the dataset, we run the following code
Dst.get_variables(table_id = 'NKN2') 

Unnamed: 0,id,text,elimination,time,values
0,TRANSAKT,transaction,False,False,"[{'id': 'B1GQD', 'text': 'B.1*g Gross domestic..."
1,PRISENHED,price unit,False,False,"[{'id': 'V', 'text': 'Current prices'}, {'id':..."
2,SÆSON,seasonal adjustment,False,False,"[{'id': 'N', 'text': 'Non-seasonally adjusted'..."
3,Tid,time,False,True,"[{'id': '1990K1', 'text': '1990Q1'}, {'id': '1..."


**Now, we want to extract the seasonally adjusted GNI for the entire period**:

In [300]:
#The following code extracts the desided variables and shows the first and last three rows of the dataset

df = Dst.get_data(table_id = 'NKN2',
     variables = {'TRANSAKT':['B1GQD'],
     'PRISENHED':['V'],
     'SÆSON':['Y'],'TID':['*']})        
pd.concat([df.head(4), df.tail(4)])     

Unnamed: 0,TRANSAKT,PRISENHED,SÆSON,TID,INDHOLD
0,B.1*g Gross domestic product,Current prices,Seasonally adjusted,1990Q1,210498
1,B.1*g Gross domestic product,Current prices,Seasonally adjusted,1990Q2,215917
2,B.1*g Gross domestic product,Current prices,Seasonally adjusted,1990Q3,214405
3,B.1*g Gross domestic product,Current prices,Seasonally adjusted,1990Q4,214737
116,B.1*g Gross domestic product,Current prices,Seasonally adjusted,2019Q1,571565
117,B.1*g Gross domestic product,Current prices,Seasonally adjusted,2019Q2,577887
118,B.1*g Gross domestic product,Current prices,Seasonally adjusted,2019Q3,584047
119,B.1*g Gross domestic product,Current prices,Seasonally adjusted,2019Q4,587989


In [301]:
#We now remove redundant variables, so only time and GNI remain.

df1 = df.drop('TRANSAKT',1).drop('PRISENHED',1).drop('SÆSON',1) 
pd.concat([df1.head(5), df1.tail(5)])                           

Unnamed: 0,TID,INDHOLD
0,1990Q1,210498
1,1990Q2,215917
2,1990Q3,214405
3,1990Q4,214737
4,1991Q1,222620
115,2018Q4,571330
116,2019Q1,571565
117,2019Q2,577887
118,2019Q3,584047
119,2019Q4,587989


Time and GNI is extracted as arrays. Empty cells are removed, converting the variable to a float type:

In [302]:
#We then extract time and GDP as arrays. We also remove empty cells in GDP and convert this variable to the type of float.

Time = df1['TID']                       
GNI  = df1['INDHOLD']                   

GNI.replace('..',np.nan, inplace=True)
GNI.dropna(inplace=True)                
GNI = GNI.iloc[:].astype(float)         

GNI = GNI.reset_index(drop=True)       

#Showing the first- and last rows of the dataset
pd.concat([GNI.head(5), GNI.tail(5)])   

0      210498.0
1      215917.0
2      214405.0
3      214737.0
4      222620.0
115    571330.0
116    571565.0
117    577887.0
118    584047.0
119    587989.0
Name: INDHOLD, dtype: float64

Now, we wish to remove some time units in order to create a perfectly balanced time series data set, resetting the index and plotting the first and last observations of the dataset as follows:

In [303]:
t = Time[4:]                        
t = t.reset_index(drop=True)        
pd.concat([t.head(5), t.tail(5)])   

0      1991Q1
1      1991Q2
2      1991Q3
3      1991Q4
4      1992Q1
111    2018Q4
112    2019Q1
113    2019Q2
114    2019Q3
115    2019Q4
Name: TID, dtype: object

The dataset now consists of 115+1 observations for each variable 

## Analyzing the dataset

**Now, we are interested at looking at the growth rate and descriptive statistics of the dataset.** Summarizing the data, we find:

In [304]:
pd.DataFrame(GNI.describe())


Unnamed: 0,INDHOLD
count,120.0
mean,385179.391667
std,110077.898437
min,210498.0
25%,287099.5
50%,385882.5
75%,474208.5
max,587989.0


**Note:** COMMENT ON THESE FINDINGS

In order to be able to **explore the raw data**, we now plot the growth rate of GNI

In [309]:
#Calculating the growth rate
Growth=(np.exp(np.diff(np.log(GNI)))-1)*100          

#Plotting the figure
plt.figure(figsize=(14,7))
plt.xlabel('Quarter', fontdict=None, labelpad=None, size=12)
plt.ylabel('Growth Rate', size=12)
plt.bar(Growth, t[1:], width=0.8, bottom=None, align='center', data=None)
plt.title('Quarterly growth rate of GNI', fontdict=None, loc='center',pad=None, size=25)
plt.xticks(t[4:], labels=None, rotation='vertical')
plt.locator_params(axis='x', nbins=len(t[0:])/2)


ValueError: shape mismatch: objects cannot be broadcast to a single shape

**Note:** COMMENT ON THESE FINDINGS

# Calculating moving averages

We now want to calculate the moving averages, adding the latest four quarters of GNI and dividing thse by 4. This is done to smoothe out the quarterly variations to give a proper indication of the overall trend:

In [306]:
#plt.figure(figsize=(15,7))
#v = venn2(subsets = (4, 4, 10), set_labels = ('inc', 'empl'))
#v.get_label_by_id('100').set_text('dropped')
#v.get_label_by_id('010').set_text('dropped' )
#v.get_label_by_id('110').set_text('included')
#plt.show()

ADD FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.