# Data project

**Our project is titled Data Project and is about whether different economic terms are procyclical or countercyclical compared to GDP. Procyclical is a term for when a variable grows with GDP where countercyclical means that the variable has the opposite growth of GDP. The economic terms consists of the components from the national account (private consumption, public consumption, investment and net exports) and the employment level. The data is from Denmark in the period 2000-2023. The data analysis is performed by a visual comparison with GDP and the economic terms.**

# Imports

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as widgets
import bqplot as bq
from IPython.display import display
import plotly.graph_objects as go

%load_ext autoreload
%autoreload 2

# Import data sets

We get data from statistikbanken.dk/NKN1 even though the two data sets are from the same source we could not get an output in one file and therefore we have to do it manually in Python. We import two files with danish data from 2000 to 2023. The first file contains the components of the national accounts, and the other has the level of employment. Since our files use danish notation we make sure python knows what the symbols for seperator and decimal are. Lastly we combine the two files. As both files are from the same source we can run the same code on both files to clean the data

In [2]:
# a. We import the datasets
NKN1a = pd.read_csv('NKN1a.csv', sep=";", decimal=",", skiprows=2 )
NKN1a = NKN1a.iloc[:-2]
drop = ['Unnamed: ' + str(num) for num in range(2)]
NKN1a.drop(drop, axis=1, inplace=True)
NKN1a_transposed = NKN1a.T
NKN1a_transposed = NKN1a_transposed.iloc[1:]
NKN1a_transposed = NKN1a_transposed.reset_index()
NKN1a_transposed.rename(columns = {'index':'Date', 0:'Y', 1:'M', 2:'X', 3:'C', 4:'G', 5:'I'}, inplace=True)

NKN1b = pd.read_csv('NKN1b.csv', sep=";", decimal=",", skiprows=2 )
NKN1b = NKN1b.iloc[:-2]
drop = ['Unnamed: ' + str(num) for num in range(2)]
NKN1b.drop(drop, axis=1, inplace=True)
NKN1b_transposed = NKN1b.T
NKN1b_transposed = NKN1b_transposed.iloc[1:]
NKN1b_transposed = NKN1b_transposed.reset_index()
NKN1b_transposed.rename(columns = {'index':'Date', 0:'N'}, inplace=True)

# b. We select the desired columns from the both datasets
NKN1a_selected = NKN1a_transposed[['Date','Y', 'M', 'X', 'C', 'G', 'I']]
NKN1b_selected = NKN1b_transposed[['N']]


# Data exploration

We concatenate the two datasets. We change the format of the 'Date' column such that it is written only in numbers.We construct a new variable NX from the two variables X and M. We make an interactive figure to illustrate the different components of the data set. 

In [3]:
# a. We combine the selected columns into a new dataset
df = pd.concat([NKN1a_selected, NKN1b_selected], axis=1)

# b. We replace 'K' with 'Q' in 'Date' column
df['Date'] = df['Date'].str.replace('K', 'Q')

# c. We convert 'Date' column to PeriodIndex
df['Date'] = pd.PeriodIndex(df['Date'], freq='Q')

# d. We convert 'Date' column to year with fractional part representing the quarter
df['Date'] = df['Date'].map(lambda x: x.year + (x.quarter - 1) / 4+ 0.25)

# e. We create a new column by subtracting 'M' from 'X' which gives net export 'NX'
df['NX'] = df['X'] - df['M']

# f. To see if the data looks correct
df.head(97)

Unnamed: 0,Date,Y,M,X,C,G,I,N,NX
0,2000.25,406.0,130.4,158.6,184.7,101.5,87.5,2703.7,28.2
1,2000.50,418.8,134.5,167.3,187.5,101.8,92.1,2775.9,32.8
2,2000.75,413.2,137.2,180.4,184.7,103.1,78.9,2771.0,43.2
3,2001.00,439.3,146.7,190.7,195.7,105.8,89.8,2770.0,44.0
4,2001.25,410.9,139.2,179.0,185.5,102.4,79.9,2728.3,39.8
...,...,...,...,...,...,...,...,...,...
92,2023.25,565.1,309.6,369.7,250.8,126.5,122.5,3176.5,60.1
93,2023.50,575.2,327.5,380.2,256.8,133.7,127.4,3222.7,52.7
94,2023.75,564.2,355.6,412.6,246.3,127.6,125.6,3230.7,57.0
95,2024.00,610.8,380.0,450.7,272.4,135.6,122.7,3227.2,70.7


We have now created the wanted dataset and to see how it looks we create an interactive figure that shows the development of all the variables. We note that the values are stored as objects and we have to change the type to "floats" create the figure as Python else will read it as a string.

In [4]:


# We create a dropdown menu with the variable names
dropdown = widgets.Dropdown(options=['Y', 'C', 'G', 'I', 'NX', 'N'])

# We create a dictionary that maps the original variable names to the new names
name_dict = {'Y': 'GDP', 'C': 'Consumption', 'G': 'Public consumption', 'I': 'Investment', 'NX': 'Net exports', 'N': 'Employment level'}

# We create a dictionary for the y-axis labels
ylabel_dict = {'Y': 'Billions in DKK', 'C': 'Billions in DKK', 'G': 'Billions in DKK', 'I': 'Billions in DKK', 'NX': 'Billions in DKK', 'N': 'Thousands of people'}

# We set up the figure
x_sc = bq.LinearScale(min=2000, max=2024)
y_sc = bq.LinearScale()

line = bq.Lines(x=df['Date'], y=[], scales={'x': x_sc, 'y': y_sc})

x_ax = bq.Axis(scale=x_sc)
y_ax = bq.Axis(scale=y_sc, label='', orientation='vertical')

fig = bq.Figure(marks=[line], axes=[x_ax, y_ax], title='', layout=widgets.Layout(width='600px', height='500px'))

# We create a function to update the figure
def update_figure(change):
    df[dropdown.value] = pd.to_numeric(df[dropdown.value], errors='coerce')
    line.y = df[dropdown.value]
    fig.layout.title = f'{name_dict[dropdown.value]} development from 2000 to 2023'
    y_ax.label = ylabel_dict[dropdown.value]

# This line updates the figure as the dropdown value changes
dropdown.observe(update_figure, 'value')

# We display and initialize the figure
display(dropdown)
display(fig)

update_figure(None)


Dropdown(options=('Y', 'C', 'G', 'I', 'NX', 'N'), value='Y')

Figure(axes=[Axis(scale=LinearScale(max=2024.0, min=2000.0)), Axis(orientation='vertical', scale=LinearScale()…

It is clear that there is a lot of seasonality in our data. Therefore we decide to take the rolling mean of each variable. This means that each value will be the average of itself and the two values on either side.

In [5]:
# We define the number of periods for the rolling mean
window_size = 5

# We create a new DataFrame without the 'Date' column
df_without_date = df.drop('Date', axis=1)

# We calculate the centered rolling mean for each column
df_rolling = df_without_date.rolling(window_size, center=True).mean()

# We add the 'Date' column back to the DataFrame
df_rolling['Date'] = df['Date']

# We drop rows with NaN values
df_rolling = df_rolling.dropna()

# Analysis

We calculate the growth rates of all the components and assign these new values to a new data set. 

In [6]:
# We convert the columns to numeric, excluding 'Date'
for column in df_rolling.columns:
    if column != 'Date':
        df_rolling[column] = pd.to_numeric(df_rolling[column], errors='coerce')

# We create a new DataFrame for the growth rates
df_growth = pd.DataFrame()
df_growth['Date'] = df_rolling['Date']

# We calculate the growth rates and add them as new columns, excluding 'Date'
for column in df_rolling.columns:
    if column != 'Date':
        df_growth[column + '_growth'] = df_rolling[column].pct_change(fill_method=None) * 100  # Multiply by 100 to get percentage

# We drop rows with NaN values
df_growth = df_growth.dropna()

We make an interactive figure that shows the growth rates of all components against the growth rate of GDP. In order to be able to be able to read the figure, we decide to group the growth rates for each year by taking the average. 

In [7]:
# We convert 'Date' to integer to get the year
df_growth['Year'] = df_growth['Date'].astype(int)

# We group by 'Year' and calculate the mean
df_growth_yearly = df_growth.groupby('Year').mean().reset_index()

fig = go.Figure()

# We add a trace for 'Y_growth' that is always visible
fig.add_trace(
    go.Bar(name='Y_growth', x=df_growth_yearly['Year'], y=df_growth_yearly['Y_growth'], visible=True)
)

# We add traces for all other variables
variables = ['C_growth', 'G_growth', 'I_growth', 'NX_growth', 'N_growth']
for variable in variables:
    fig.add_trace(
        go.Bar(name=variable, x=df_growth_yearly['Year'], y=df_growth_yearly[variable], visible=(variable=='C_growth'))
    )

# We create a dropdown menu
buttons = []
for i, variable in enumerate(variables):
    visibility = [True] + [False]*i + [True] + [False]*(len(variables)-i-1)
    buttons.append(dict(label=variable, method='update', args=[{'visible': visibility}, {'title': variable  + " compared to Y_growth"}]))

fig.update_layout(
    updatemenus=[
        dict(
            active=0,
            buttons=buttons,
        )
    ],
)

fig.update_layout(title= variables[0] + " compared to Y_growth")

# We show the figure
fig.show()

From our analysis we are able to conclude that:
* Private consumption is very procyclical as the growth rates of C and Y are positive and negative in the same periods. 
* Public consumption is harder to determine than private consumption. Fiscal policy argues that G should be countercyclical to GDP growth. On the other hand the government can spend more during booms. 
* Investments are also procyclical however at low growth rates it is tougher to conclude. 
* From our plot it is impossible to draw any conclusions regarding the pro/counter-cyclical nature of net exports. A reason for this could be, that the net exports depends on the economic activity of the world. 
* The employment rate is very procyclical. 