# YOUR PROJECT TITLE

> **Note the following:** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code
> 1. The `dataproject.py` file includes a function which can be used multiple times in this notebook.

Imports and set magics:

In [132]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
#from matplotlib_venn import venn2
plt.style.use('seaborn-whitegrid')
#import pydst
#dst = pydst.Dst(lang='en')

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# user written modules
import dataproject


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Read and clean data

In [133]:
#Loading the data for GDP and house prices in Denmark from 2012-2022 
df_gdp = pd.read_excel('GDP.xlsx', skiprows=2)
df_hp = pd.read_excel('Houseprices.xlsx', skiprows=2)

In [None]:
#Cleaning data for GDP


In [134]:
#Cleaning data for Houseprices 

#Giving a column title for the municipalities:
df_hp.rename(columns={'Unnamed: 2': 'municipality'}, inplace=True)

#Removing the columns that is not used
del df_hp['Unnamed: 0']
del df_hp['Unnamed: 1']

#Renaming time colums 1992K1 -> price1992 q1
timecolumn_dict = {} 
for y in range(2012,2022+1): 
    for k in range(1,4+1): 
        q_from = f'{y}K{k}'
        q_to   = f'price{y} q{k}'
        timecolumn_dict[q_from] = q_to
df_hp = df_hp.rename(columns = timecolumn_dict)

#Dropping missing values
df_hp = df_hp.dropna()


df_hp.head()

Unnamed: 0,municipality,price2012 q1,price2012 q2,price2012 q3,price2012 q4,price2013 q1,price2013 q2,price2013 q3,price2013 q4,price2014 q1,...,price2020 q3,price2020 q4,price2021 q1,price2021 q2,price2021 q3,price2021 q4,price2022 q1,price2022 q2,price2022 q3,price2022 q4
0,Hele landet,10972,11072,11084,10931,11041,11167,11128,11082,11100,...,14752,15076,15736,16377,16595,16528,16747,16898,16412,15491
1,København,22289,23782,23343,22932,24217,24611,24334,25883,25429,...,40458,42173,45636,46966,49455,47726,48374,50551,48954,42327
2,Frederiksberg,27052,27321,33137,34696,35653,34015,30697,33955,38057,...,54290,64237,63713,74204,73317,77872,80966,90034,76108,54166
3,Dragør,23083,19853,20733,21718,22948,23766,23803,23063,24022,...,33646,31565,35614,39863,37155,38977,39221,39417,41378,34750
4,Tårnby,17417,18042,19078,18386,18540,19386,21018,20756,20155,...,29363,29819,32239,36583,36839,34563,35810,35703,33769,29328


In [137]:
#Producing a summary statistics for house prices
df_hp.describe()

Unnamed: 0,price2013 q1,price2016 q2,price2016 q3,price2016 q4,price2017 q3,price2017 q4,price2018 q1,price2018 q3,price2019 q1,price2019 q2,price2019 q3,price2019 q4,price2020 q2,price2020 q4,price2021 q1,price2021 q2,price2021 q3,price2022 q1
count,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0,99.0
mean,11584.212121,13308.20202,13398.494949,13333.464646,14044.808081,14073.151515,14257.212121,14655.434343,14515.525253,14950.555556,15027.79798,15086.212121,15326.353535,16156.929293,16876.484848,17797.757576,18007.666667,18288.494949
std,6151.255638,8045.081508,8077.341822,8136.096769,8869.715439,8942.745513,8919.626617,8987.988865,9044.633961,9236.066443,9286.963506,9612.462419,9675.658839,10839.403469,11376.872476,12561.635234,12827.987292,13375.719587
min,4591.0,3753.0,3542.0,3341.0,4015.0,4041.0,4270.0,4885.0,3845.0,3577.0,3782.0,3300.0,3860.0,4088.0,4553.0,5134.0,4746.0,4872.0
25%,7051.0,7516.0,7317.0,7141.0,7499.5,7291.5,7472.5,7678.5,7515.5,7796.0,7892.5,7987.0,7955.5,8364.0,8348.0,8826.5,8800.5,8683.0
50%,9591.0,10465.0,10570.0,10560.0,10857.0,10651.0,11158.0,11458.0,11228.0,11722.0,11660.0,11738.0,12061.0,11972.0,12297.0,13131.0,12833.0,13136.0
75%,15836.5,19116.0,19488.0,19382.5,20122.5,19739.0,20351.5,20486.0,20970.0,21202.5,20890.5,21494.0,21605.0,23290.5,24441.5,24928.0,25350.0,25932.0
max,35653.0,44624.0,42008.0,42545.0,52597.0,53700.0,49203.0,49540.0,48487.0,50732.0,48625.0,53979.0,52770.0,64237.0,63713.0,74204.0,73317.0,80966.0


## Explore each data set

In order to be able to **explore the raw data**, you may provide **static** and **interactive plots** to show important developments 

**Interactive plot** :

In [135]:
def plot_func():
    # Function that operates on data set
    pass

widgets.interact(plot_func, 
    # Let the widget interact with data through plot_func()    
); 


interactive(children=(Output(),), _dom_classes=('widget-interact',))

Explain what you see when moving elements of the interactive plot around. 

# Merge data sets

Now you create combinations of your loaded data sets. Remember the illustration of a (inner) **merge**:

In [136]:
plt.figure(figsize=(15,7))
v = venn2(subsets = (4, 4, 10), set_labels = ('Data X', 'Data Y'))
v.get_label_by_id('100').set_text('dropped')
v.get_label_by_id('010').set_text('dropped' )
v.get_label_by_id('110').set_text('included')
plt.show()

NameError: name 'venn2' is not defined

<Figure size 1500x700 with 0 Axes>

Here we are dropping elements from both data set X and data set Y. A left join would keep all observations in data X intact and subset only from Y. 

Make sure that your resulting data sets have the correct number of rows and columns. That is, be clear about which observations are thrown away. 

**Note:** Don't make Venn diagrams in your own data project. It is just for exposition. 

# Analysis

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

MAKE FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION. tester 123