# YOUR PROJECT TITLE

> **Note the following:** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code
> 1. The `dataproject.py` file includes a function which can be used multiple times in this notebook.

Imports and set magics:

In [63]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from dstapi import DstApi # install with `pip install git+https://github.com/alemartinello/dstapi`
pd.set_option('display.float_format', lambda x: '%.2f' % x) # formating
from matplotlib.dates import date2num
#from matplotlib_venn import venn2

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Read and clean data

In [64]:
GDP = DstApi('NABP10')
INVEST = DstApi('NAN1') 

# a. Tables of the variables
summary_GDP = GDP.tablesummary(language='en')
display(summary_GDP)
summary_INVEST = INVEST.tablesummary(language='en')
display(summary_INVEST)

Table NABP10: 1-2.1.1 Production
and  generation of income (10a3-grouping) by transaction, industry, price unit and time
Last update: 2023-03-31T08:00:00


Unnamed: 0,variable name,# values,First value,First value label,Last value,Last value label,Time variable
0,TRANSAKT,10,P1K,P.1 Output,B1N2D,B.1n Net value added,False
1,BRANCHE,15,V,Total,VR_S,"R_S Arts, entertainment and other services",False
2,PRISENHED,2,V,Current prices,LAN,"2010-prices, chained values",False
3,Tid,57,1966,1966,2022,2022,True


Table NAN1: Demand and supply by transaction, price unit and time
Last update: 2023-03-31T08:00:00


Unnamed: 0,variable name,# values,First value,First value label,Last value,Last value label,Time variable
0,TRANSAKT,31,B1GQK,B.1*g Gross domestic product,EMPM_DC,"Total employment (1,000 persons)",False
1,PRISENHED,6,V_M,"Current prices, (bill. DKK.)",LAN_C,"Pr. capita, 2010-prices, chained values, (1000...",False
2,Tid,57,1966,1966,2022,2022,True


Import your data, either through an API or manually, and load it. 

## Explore each data set

In order to be able to **explore the raw data**, you may provide **static** and **interactive plots** to show important developments 

**Interactive plot** :

In [65]:
# We show available values for each variable in rent dataset: 
for variable in summary_GDP['variable name']:
    print(variable+':')
    display(GDP.variable_levels(variable, language='en'))

TRANSAKT:


Unnamed: 0,id,text
0,P1K,P.1 Output
1,P2D,P.2 Intermediate consumption
2,B1GD,B.1g Gross value added
3,D29X39D,D.29-D.39 Other taxes less subsidies on produc...
4,B1GFD,B.1GF Gross domestic product at factor cost
5,D1D,D.1 Compensation of employees
6,B2A3GD,B.2g+B.3g Gross operating surplus and mixed in...
7,P51CD,P.51c Consumption of fixed capital
8,B2A3ND,B.2n+B.3n Net operating surplus and mixed income
9,B1N2D,B.1n Net value added


BRANCHE:


Unnamed: 0,id,text
0,V,Total
1,VMEMO,Of which: General government
2,VA,"A Agriculture, forestry and fishing"
3,VB,B Mining and quarrying
4,VC,C Manufacturing
5,VD_E,D_E Utility services
6,VF,F Construction
7,VG_I,G_I Trade and transport etc.
8,VJ,J Information and communication
9,VK,K Financial and insurance


PRISENHED:


Unnamed: 0,id,text
0,V,Current prices
1,LAN,"2010-prices, chained values"


Tid:


Unnamed: 0,id,text
0,1966,1966
1,1967,1967
2,1968,1968
3,1969,1969
4,1970,1970
5,1971,1971
6,1972,1972
7,1973,1973
8,1974,1974
9,1975,1975


In [66]:
# We show available values for each variable in investment dataset: 
for variable in summary_INVEST['variable name']:
    print(variable+':')
    display(INVEST.variable_levels(variable, language='en'))

TRANSAKT:


Unnamed: 0,id,text
0,B1GQK,B.1*g Gross domestic product
1,P7K,P.7 Imports of goods and services
2,P71K,P.71 Import of goods
3,P72K,P.72 Import of services
4,TFSPR,Supply
5,P6D,P.6 Exports of goods and services
6,P61D,P.61 Export of goods
7,P62D,P.62 Export of services
8,P31S1MD,P.31 Private consumption
9,P31S14D,P.31 Household consumption expenditure


PRISENHED:


Unnamed: 0,id,text
0,V_M,"Current prices, (bill. DKK.)"
1,LAN_M,"2010-prices, chained values, (bill. DKK.)"
2,L_V,Period-to-period real growth (per cent)
3,V_C,"Pr. capita. Current prices, (1000 DKK.)"
4,L_VB,"Contribution to GDP growth, (percentage point)"
5,LAN_C,"Pr. capita, 2010-prices, chained values, (1000..."


Tid:


Unnamed: 0,id,text
0,1966,1966
1,1967,1967
2,1968,1968
3,1969,1969
4,1970,1970
5,1971,1971
6,1972,1972
7,1973,1973
8,1974,1974
9,1975,1975


In [67]:
# Der er nogle problemer med at filtrere, da de begge hedder TRANSAKT.

# a. Here we define the parameters
pars_GDP = GDP._define_base_params(language='en')
pars_INVEST = INVEST._define_base_params(language='en')

# b. load api
GDP_api = GDP.get_data(params=pars_GDP)
INVEST_api = INVEST.get_data(params=pars_INVEST)

# c. left join data by TID
api = pd.merge(GDP_api, INVEST_api, on='TID', how='left')
api.rename(columns = {'INDHOLD_x':'GDP', 'INDHOLD_y':'INVEST'}, inplace=True)

# d. filter data
I = api.TRANSAKT.str.contains('P.1')
I &= api.PRISENHED.str.contains('Current prices')
I &= api.TRANSAKT.str.contains('P.5g')
I &= api.PRISENHED.str.contains('Period-to-period real growth')
api.loc[I, :]
api = api.loc[I == True]

# e. new indexing and atomic types
api.reset_index(inplace = True, drop = True)
api = api.apply(pd.to_numeric, errors='ignore')
api['TID'] = api['TID'].apply(pd.to_datetime, errors='ignore')

# f. show data
api.head(5)



AttributeError: 'DataFrame' object has no attribute 'TRANSAKT'

In [None]:
def plot_func():
    # Function that operates on data set
    pass

widgets.interact(plot_func, 
    # Let the widget interact with data through plot_func()    
); 



Explain what you see when moving elements of the interactive plot around. 

In [None]:
# Hent data for Produktion på tværs af brancher. Hold dette op mod investeringer. 
# Evt. kig på beskæftigelse og hold dette op mod investeringer

# Merge data sets

Now you create combinations of your loaded data sets. Remember the illustration of a (inner) **merge**:

In [None]:
plt.figure(figsize=(15,7))
v = venn2(subsets = (4, 4, 10), set_labels = ('Data X', 'Data Y'))
v.get_label_by_id('100').set_text('dropped')
v.get_label_by_id('010').set_text('dropped' )
v.get_label_by_id('110').set_text('included')
plt.show()

Here we are dropping elements from both data set X and data set Y. A left join would keep all observations in data X intact and subset only from Y. 

Make sure that your resulting data sets have the correct number of rows and columns. That is, be clear about which observations are thrown away. 

**Note:** Don't make Venn diagrams in your own data project. It is just for exposition. 

# Analysis

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

MAKE FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.