# YOUR PROJECT TITLE

> **Note the following:** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code
> 1. The `dataproject.py` file includes a function which can be used multiple times in this notebook.

Imports and set magics:

In [20]:
import pandas as pd
import numpy as np
import datetime

import matplotlib.pyplot as plt
plt.rcParams.update({"axes.grid":True,"grid.color":"black","grid.alpha":"0.25","grid.linestyle":"--"})
plt.rcParams.update({'font.size': 14})
import ipywidgets as widgets

from dstapi import DstApi # install with `pip install git+https://github.com/alemartinello/dstapi`

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# user written modules
import dataproject

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Read and clean data

In [49]:
data = DstApi('EJ55') 

Import your data, either through an API or manually, and load it. 

## Explore each data set

In [50]:
tabsum = data.tablesummary(language='en')
display(tabsum)

Table EJ55: Price index for sales of property by region, category of real property, unit and time
Last update: 2023-03-31T08:00:00


Unnamed: 0,variable name,# values,First value,First value label,Last value,Last value label,Time variable
0,OMRÅDE,17,000,All Denmark,11,Province Nordjylland,False
1,EJENDOMSKATE,3,0111,One-family houses,2103,"Owner-occupied flats, total",False
2,TAL,3,100,Index,310,Percentage change compared to same quarter the...,False
3,Tid,124,1992K1,1992Q1,2022K4,2022Q4,True


In [51]:
# The available values for a each variable: 
for variable in tabsum['variable name']:
    print(variable+':')
    display(data.variable_levels(variable, language='en'))

OMRÅDE:


Unnamed: 0,id,text
0,0,All Denmark
1,84,Region Hovedstaden
2,1,Province Byen København
3,2,Province Københavns omegn
4,3,Province Nordsjælland
5,4,Province Bornholm
6,85,Region Sjælland
7,5,Province Østsjælland
8,6,Province Vest- og Sydsjælland
9,83,Region Syddanmark


EJENDOMSKATE:


Unnamed: 0,id,text
0,111,One-family houses
1,801,Weekend cottages
2,2103,"Owner-occupied flats, total"


TAL:


Unnamed: 0,id,text
0,100,Index
1,210,Percentage change compared to previous quarter
2,310,Percentage change compared to same quarter the...


Tid:


Unnamed: 0,id,text
0,1992K1,1992Q1
1,1992K2,1992Q2
2,1992K3,1992Q3
3,1992K4,1992Q4
4,1993K1,1993Q1
...,...,...
119,2021K4,2021Q4
120,2022K1,2022Q1
121,2022K2,2022Q2
122,2022K3,2022Q3


**Skriv at vi kan se at datasættet indeholder x, y z som vi ikke er interesserede i bla bla bla**

In [52]:
params = data._define_base_params(language='en')
params

{'table': 'ej55',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'OMRÅDE', 'values': ['*']},
  {'code': 'EJENDOMSKATE', 'values': ['*']},
  {'code': 'TAL', 'values': ['*']},
  {'code': 'Tid', 'values': ['*']}]}

In [66]:
params = {'table': 'ej55',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'OMRÅDE', 'values': ['000']},
  {'code': 'EJENDOMSKATE', 'values': ['*']},
  {'code': 'TAL', 'values': ['100', '210']},
  {'code': 'Tid', 'values': ['*']}]}

In [67]:
sales_api = data.get_data(params=params)
sales_api.sort_values(by=['OMRÅDE', 'TID', 'EJENDOMSKATE'], inplace=True)
sales_api.head(10)

Unnamed: 0,OMRÅDE,EJENDOMSKATE,TAL,TID,INDHOLD
324,All Denmark,One-family houses,Index,1992Q1,31.5
325,All Denmark,One-family houses,Percentage change compared to previous quarter,1992Q1,..
328,All Denmark,"Owner-occupied flats, total",Index,1992Q1,23.7
329,All Denmark,"Owner-occupied flats, total",Percentage change compared to previous quarter,1992Q1,..
326,All Denmark,Weekend cottages,Index,1992Q1,29.5
327,All Denmark,Weekend cottages,Percentage change compared to previous quarter,1992Q1,..
636,All Denmark,One-family houses,Index,1992Q2,31.5
637,All Denmark,One-family houses,Percentage change compared to previous quarter,1992Q2,0.0
640,All Denmark,"Owner-occupied flats, total",Index,1992Q2,23.2
641,All Denmark,"Owner-occupied flats, total",Percentage change compared to previous quarter,1992Q2,-2.1


In [101]:
sales_api.reset_index(inplace = True, drop = True)

In [102]:
sales_api.rename(columns = {'OMRÅDE':'REGION', 'EJENDOMSKATE':'CATEGORY', 'TAL':'UNIT', 'TID':'TIME', 'INDHOLD':'VALUE'}, inplace=True)
sales_api.head(5)

Unnamed: 0,REGION,CATEGORY,UNIT,TIME,VALUE
0,All Denmark,One-family houses,Index,1992Q1,31.5
1,All Denmark,One-family houses,Percentage change compared to previous quarter,1992Q1,..
2,All Denmark,"Owner-occupied flats, total",Index,1992Q1,23.7
3,All Denmark,"Owner-occupied flats, total",Percentage change compared to previous quarter,1992Q1,..
4,All Denmark,Weekend cottages,Index,1992Q1,29.5


In [107]:
sales_api = sales_api.replace('..', np.nan)
sales_api.head(5)

Unnamed: 0,REGION,CATEGORY,UNIT,TIME,VALUE
0,All Denmark,One-family houses,Index,1992Q1,31.5
1,All Denmark,One-family houses,Percentage change compared to previous quarter,1992Q1,
2,All Denmark,"Owner-occupied flats, total",Index,1992Q1,23.7
3,All Denmark,"Owner-occupied flats, total",Percentage change compared to previous quarter,1992Q1,
4,All Denmark,Weekend cottages,Index,1992Q1,29.5


In [108]:
sales_api.describe()

Unnamed: 0,REGION,CATEGORY,UNIT,TIME,VALUE
count,744,744,744,744,741.0
unique,1,3,2,124,409.0
top,All Denmark,One-family houses,Index,1992Q1,0.0
freq,744,248,372,6,12.0


In [109]:
sales_api.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 744 entries, 0 to 743
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   REGION    744 non-null    object
 1   CATEGORY  744 non-null    object
 2   UNIT      744 non-null    object
 3   TIME      744 non-null    object
 4   VALUE     741 non-null    object
dtypes: object(5)
memory usage: 29.2+ KB


In [110]:
sales_api.VALUE = sales_api.VALUE.astype('float')
sales_api.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 744 entries, 0 to 743
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   REGION    744 non-null    object 
 1   CATEGORY  744 non-null    object 
 2   UNIT      744 non-null    object 
 3   TIME      744 non-null    object 
 4   VALUE     741 non-null    float64
dtypes: float64(1), object(4)
memory usage: 29.2+ KB


In [None]:
#lave 2019Q1 til index 100 
index_2019Q1 = sales_api.value(sales_api['value'])

Ovenfor vil vi gerne re-indeksere, så 2019Q1=100. Så kan vi sammenligne boligpriserne (nominelt) med perioden før Corona.

Vi skal (måske?) have lavet et loop, som kan dividere indeksværdien i et pågældende kvartal i den pågældende boligkategori over med indeksværdien i 2019Q1 i den pågældende kategori.

In order to be able to **explore the raw data**, you may provide **static** and **interactive plots** to show important developments 

**Interactive plot** :

In [141]:
def plot_value(df, category, unit): 
    I = (df['CATEGORY'] == category) & (df['UNIT'] == unit)
    ax=df.loc[I,:].plot(x='TIME', y='VALUE', style='-', legend=False)

widgets.interact(plot_value, 
    df = widgets.fixed(sales_api),
    category = widgets.Dropdown(description='Category', 
                                    options=sales_api.CATEGORY.unique(), 
                                    value='One-family houses'),
    unit = widgets.Dropdown(description='Unit', 
                                    options=sales_api.UNIT.unique(), 
                                    value='Index')
)


interactive(children=(Dropdown(description='Category', options=('One-family houses', 'Owner-occupied flats, to…

<function __main__.plot_value(df, category, unit)>

Explain what you see when moving elements of the interactive plot around. 

# Merge data sets

# Analysis

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

MAKE FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.