# YOUR PROJECT TITLE

> **Note the following:** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code
> 1. The `dataproject.py` file includes a function which can be used multiple times in this notebook.

Imports and set magics:

In [1]:
!pip install matplotlib-venn



In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from matplotlib_venn import venn2

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# user written modules
import dataproject

# Read and clean data

We are importing our data using an API for Danmarks statistik.

In [3]:
# installing API reader, that will allow to load data from DST.
%pip install git+https://github.com/alemartinello/dstapi
%pip install pandas-datareader

import pandas_datareader # install with `pip install pandas-datareader`
from dstapi import DstApi # install with `pip install git+https://github.com/alemartinello/dstapi`

Note: you may need to restart the kernel to use updated packages.
Collecting git+https://github.com/alemartinello/dstapi
  Cloning https://github.com/alemartinello/dstapi to c:\users\jacob\appdata\local\temp\pip-req-build-7dk0h6wx
  Resolved https://github.com/alemartinello/dstapi to commit d9eeb5a82cbc70b7d63b2ff44d92632fd77123a4
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'


  Running command git clone --filter=blob:none --quiet https://github.com/alemartinello/dstapi 'C:\Users\jacob\AppData\Local\Temp\pip-req-build-7dk0h6wx'


Note: you may need to restart the kernel to use updated packages.


## Explore each data set

In order to be able to **explore the raw data**, you may provide **static** and **interactive plots** to show important developments 

**Interactive plot** :

Explain what you see when moving elements of the interactive plot around. 

In [4]:
forsyningsbalanen = DstApi('NAN1')
tabsum = forsyningsbalanen.tablesummary(language = 'en')
display(tabsum)
for variable in tabsum['variable name']:
    print(variable+':')
    display(forsyningsbalanen.variable_levels(variable, language='en'))

Table NAN1: Demand and supply by transaction, price unit and time
Last update: 2024-03-27T08:00:00


Unnamed: 0,variable name,# values,First value,First value label,Last value,Last value label,Time variable
0,TRANSAKT,31,B1GQK,B.1*g Gross domestic product,EMPM_DC,"Total employment (1,000 persons)",False
1,PRISENHED,6,V_M,"Current prices, (bill. DKK.)",LAN_C,"Pr. capita, 2010-prices, chained values, (1000...",False
2,Tid,58,1966,1966,2023,2023,True


TRANSAKT:


Unnamed: 0,id,text
0,B1GQK,B.1*g Gross domestic product
1,P7K,P.7 Imports of goods and services
2,P71K,P.71 Import of goods
3,P72K,P.72 Import of services
4,TFSPR,Supply
5,P6D,P.6 Exports of goods and services
6,P61D,P.61 Export of goods
7,P62D,P.62 Export of services
8,P31S1MD,P.31 Private consumption
9,P31S14D,P.31 Household consumption expenditure


PRISENHED:


Unnamed: 0,id,text
0,V_M,"Current prices, (bill. DKK.)"
1,LAN_M,"2010-prices, chained values, (bill. DKK.)"
2,L_V,Period-to-period real growth (per cent)
3,V_C,"Pr. capita. Current prices, (1000 DKK.)"
4,L_VB,"Contribution to GDP growth, (percentage point)"
5,LAN_C,"Pr. capita, 2010-prices, chained values, (1000..."


Tid:


Unnamed: 0,id,text
0,1966,1966
1,1967,1967
2,1968,1968
3,1969,1969
4,1970,1970
5,1971,1971
6,1972,1972
7,1973,1973
8,1974,1974
9,1975,1975


In [5]:
kvartal_bnp = DstApi('NKN1')
tabsum2 = kvartal_bnp.tablesummary(language = 'en')
display(tabsum2)
for variable in tabsum['variable name']:
    print(variable+':')
    display(kvartal_bnp.variable_levels(variable, language='en'))

Table NKN1: Demand and supply by transaction, price unit, seasonal adjustment and time
Last update: 2024-03-27T08:00:00


Unnamed: 0,variable name,# values,First value,First value label,Last value,Last value label,Time variable
0,TRANSAKT,31,B1GQK,B.1*g Gross domestic product,EMPM_DC,"Total employment (1,000 persons)",False
1,PRISENHED,6,V_M,"Current prices, (bill. DKK.)",LKV_C,"Pr. capita, 2010-prices, chained values, (1000...",False
2,SÆSON,2,N,Non-seasonally adjusted,Y,Seasonally adjusted,False
3,Tid,136,1990K1,1990Q1,2023K4,2023Q4,True


TRANSAKT:


Unnamed: 0,id,text
0,B1GQK,B.1*g Gross domestic product
1,P7K,P.7 Imports of goods and services
2,P71K,P.71 Import of goods
3,P72K,P.72 Import of services
4,TFSPR,Supply
5,P6D,P.6 Exports of goods and services
6,P61D,P.61 Export of goods
7,P62D,P.62 Export of services
8,P31S1MD,P.31 Private consumption
9,P31S14D,P.31 Household consumption expenditure


PRISENHED:


Unnamed: 0,id,text
0,V_M,"Current prices, (bill. DKK.)"
1,L_V,Period-to-period real growth (per cent)
2,V_C,"Pr. capita. Current prices, (1000 DKK.)"
3,LKV_M,"2010-prices, chained values, (bill. DKK.)"
4,L_VB,"Contribution to GDP growth, (percentage point)"
5,LKV_C,"Pr. capita, 2010-prices, chained values, (1000..."


Tid:


Unnamed: 0,id,text
0,1990K1,1990Q1
1,1990K2,1990Q2
2,1990K3,1990Q3
3,1990K4,1990Q4
4,1991K1,1991Q1
...,...,...
131,2022K4,2022Q4
132,2023K1,2023Q1
133,2023K2,2023Q2
134,2023K3,2023Q3


# Merge data sets

Now you create combinations of your loaded data sets. Remember the illustration of a (inner) **merge**:

In [6]:
#Supply balance BNP
print(dataproject.supply_balance_BNP())

        TID INDHOLD
8568   1990  1288.6
745    1991  1306.6
7080   1992  1332.2
5581   1993  1332.3
9487   1994  1403.3
6697   1995  1445.8
3532   1996  1487.8
8010   1997  1536.3
5767   1998  1570.3
2295   1999  1616.6
7824   2000  1677.2
6325   2001  1691.0
4279   2002  1698.9
6876   2003  1705.5
8196   2004  1751.0
373    2005  1792.0
9673   2006  1862.1
931    2007  1879.0
2416   2008  1869.4
8382   2009  1777.7
1489   2010  1810.9
1675   2011  1835.1
7452   2012  1839.3
5023   2013  1856.5
3346   2014  1886.5
2044   2015  1930.7
10603  2016  1993.4
2788   2017  2049.6
5953   2018  2090.4
1861   2019  2121.6
1117   2020  2070.2
5395   2021  2211.9
187    2022  2272.3
6894   2023  2315.2


In [7]:
#Supply balance import
print(dataproject.supply_balance_import())

        TID INDHOLD
4801   1966   112.2
8904   1967   120.0
10381  1968   127.7
3310   1969   144.0
4987   1970   157.2
10009  1971   159.4
10567  1972   161.3
6661   1973   190.0
9451   1974   186.1
4243   1975   176.9
9090   1976   206.6
2752   1977   208.4
5359   1978   210.4
151    1979   224.4
7416   1980   211.9
10195  1981   213.1
1449   1982   219.8
3124   1983   224.1
4615   1984   236.1
709    1985   259.6
4036   1986   281.6
6840   1987   278.3
6289   1988   290.0
7788   1989   305.7
8718   1990   312.9
895    1991   325.6
7230   1992   325.1
5731   1993   320.5
9637   1994   363.1
9265   1995   388.6
3682   1996   400.7
8160   1997   437.6
5917   1998   470.7
2259   1999   482.7
7974   2000   548.7
6475   2001   561.9
4429   2002   597.7
3850   2003   591.6
8346   2004   633.8
523    2005   705.3
9823   2006   803.9
1081   2007   850.8
2566   2008   891.4
8532   2009   784.9
1639   2010   789.1
1825   2011   847.9
7602   2012   870.9
5173   2013   883.7
3496   2014   918.1


In [8]:
#Supply balance export
print(dataproject.supply_balance_export())

        TID INDHOLD
4783   1966   123.6
8886   1967   128.2
10363  1968   140.7
3292   1969   149.2
4969   1970   154.9
9991   1971   164.7
10549  1972   173.4
6643   1973   187.8
9433   1974   194.2
4225   1975   192.8
9072   1976   199.5
2734   1977   206.6
5341   1978   209.5
133    1979   232.3
7398   1980   245.3
10177  1981   266.5
1431   1982   275.0
3106   1983   287.6
4597   1984   297.1
691    1985   315.0
4018   1986   319.2
6822   1987   334.7
6271   1988   365.3
7770   1989   382.3
8700   1990   407.3
877    1991   432.4
7212   1992   433.6
5713   1993   438.9
9619   1994   475.1
9247   1995   488.8
3664   1996   511.5
8142   1997   534.5
5899   1998   556.4
2241   1999   619.1
7956   2000   696.9
6457   2001   720.3
4411   2002   751.7
3832   2003   742.7
8328   2004   765.1
505    2005   824.2
9805   2006   909.4
1063   2007   942.6
2548   2008   979.1
8514   2009   888.8
1621   2010   914.9
1807   2011   980.8
7584   2012   992.2
5155   2013  1008.1
3478   2014  1039.7


In [9]:
#Supply balance privat
print(dataproject.supply_balance_privat())

        TID INDHOLD
4675   1966   372.8
8778   1967   397.9
10255  1968   412.2
3184   1969   435.8
4861   1970   446.6
9883   1971   450.3
10441  1972   450.6
6535   1973   484.7
9325   1974   473.7
4117   1975   488.5
8964   1976   525.8
2626   1977   536.2
5233   1978   545.5
25     1979   551.1
7290   1980   538.8
10069  1981   532.0
1323   1982   539.5
2998   1983   549.9
4489   1984   569.2
583    1985   594.7
3910   1986   637.1
6714   1987   625.0
6163   1988   613.6
7662   1989   615.5
8592   1990   620.4
769    1991   630.5
7104   1992   646.7
5605   1993   641.1
9511   1994   683.8
9139   1995   695.3
3556   1996   712.4
8034   1997   733.3
5791   1998   750.6
2319   1999   749.7
7848   2000   752.6
6349   2001   754.3
4303   2002   765.2
3724   2003   775.4
8220   2004   811.5
397    2005   841.5
9697   2006   866.3
955    2007   881.6
2440   2008   885.9
8406   2009   855.5
1513   2010   862.2
1699   2011   864.6
7476   2012   869.0
5047   2013   871.5
3370   2014   879.4


In [10]:
#Quarterly BNP, seasonally adjusted
print(dataproject.quarterly_BNP())

          TID INDHOLD
7299   1990Q1      ..
40777  1990Q2      ..
29914  1990Q3      ..
15804  1990Q4      ..
12552  1991Q1   326.6
...       ...     ...
8529   2022Q4   570.1
24692  2023Q1   577.5
11246  2023Q2   572.9
23305  2023Q3   574.9
46360  2023Q4   589.9

[136 rows x 2 columns]


# Analysis

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

In [11]:
# 3* flot graf

MAKE FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.

In [13]:
moving_avg_len = 4

Q_BNP_rollingavg = dataproject.quarterly_BNP_for_mov_avg()

# Convert 'INDHOLD' column to numeric
Q_BNP_rollingavg['INDHOLD'] = pd.to_numeric(Q_BNP_rollingavg['INDHOLD'], errors='coerce')

# Calculate the rolling mean
moving_avg = Q_BNP_rollingavg['INDHOLD'].rolling(window=moving_avg_len).mean()

print(moving_avg)

7299         NaN
40777        NaN
29914        NaN
15804        NaN
12552        NaN
          ...   
8529     568.075
24692    571.950
11246    572.975
23305    573.850
46360    578.800
Name: INDHOLD, Length: 136, dtype: float64
