# YOUR PROJECT TITLE

> **Note the following:** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code
> 1. The `dataproject.py` file includes a function which can be used multiple times in this notebook.

Imports and set magics:

In [30]:
import pandas as pd
import numpy as np
import datetime

import matplotlib.pyplot as plt
plt.rcParams.update({"axes.grid":True,"grid.color":"black","grid.alpha":"0.25","grid.linestyle":"-"})
plt.rcParams.update({'font.size': 14})
import ipywidgets as widgets

from dstapi import DstApi 

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# user written modules
import dataproject

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Read and clean data

1. Data is imported using the API for Danmarks statistik

In [16]:
data = DstApi('EJ55') 

## Explore each data set

1. The availble values for each variable is plotted in order to select relevant variables. 

In [18]:
# The available values for a each variable: 
for variable in tabsum['variable name']:
    print(variable+':')
    display(data.variable_levels(variable, language='en'))

OMRÅDE:


Unnamed: 0,id,text
0,0,All Denmark
1,84,Region Hovedstaden
2,1,Province Byen København
3,2,Province Københavns omegn
4,3,Province Nordsjælland
5,4,Province Bornholm
6,85,Region Sjælland
7,5,Province Østsjælland
8,6,Province Vest- og Sydsjælland
9,83,Region Syddanmark


EJENDOMSKATE:


Unnamed: 0,id,text
0,111,One-family houses
1,801,Weekend cottages
2,2103,"Owner-occupied flats, total"


TAL:


Unnamed: 0,id,text
0,100,Index
1,210,Percentage change compared to previous quarter
2,310,Percentage change compared to same quarter the...


Tid:


Unnamed: 0,id,text
0,1992K1,1992Q1
1,1992K2,1992Q2
2,1992K3,1992Q3
3,1992K4,1992Q4
4,1993K1,1993Q1
...,...,...
119,2021K4,2021Q4
120,2022K1,2022Q1
121,2022K2,2022Q2
122,2022K3,2022Q3


We are only interested in some of 

1. A param dictionary is defined in order to detaile the data we want

In [19]:
params = data._define_base_params(language='en')
params

{'table': 'ej55',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'OMRÅDE', 'values': ['*']},
  {'code': 'EJENDOMSKATE', 'values': ['*']},
  {'code': 'TAL', 'values': ['*']},
  {'code': 'Tid', 'values': ['*']}]}

1. We select the data we want. We only want data for "All Denmark" and indexed values, and percentage change compared to previous quarter.

In [20]:
params = {'table': 'ej55',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'OMRÅDE', 'values': ['000']},
  {'code': 'EJENDOMSKATE', 'values': ['*']},
  {'code': 'TAL', 'values': ['100', '210']},
  {'code': 'Tid', 'values': ['*']}]}

1. Data is sorted and the index is reset. 
2. Coloumns are renamed.

In [33]:
sales_api = data.get_data(params=params)
sales_api.reset_index(inplace = True, drop = True)
sales_api.sort_values(by=['OMRÅDE', 'TID', 'EJENDOMSKATE'], inplace=True)
sales_api.rename(columns = {'OMRÅDE':'REGION', 'EJENDOMSKATE':'CATEGORY', 'TAL':'UNIT', 'TID':'TIME', 'INDHOLD':'VALUE'}, inplace=True)
sales_api.head(5)

Unnamed: 0,REGION,CATEGORY,UNIT,TIME,VALUE
306,All Denmark,One-family houses,Index,1992Q1,31.5
307,All Denmark,One-family houses,Percentage change compared to previous quarter,1992Q1,..
310,All Denmark,"Owner-occupied flats, total",Index,1992Q1,23.7
311,All Denmark,"Owner-occupied flats, total",Percentage change compared to previous quarter,1992Q1,..
308,All Denmark,Weekend cottages,Index,1992Q1,29.5


1. Some values are replaced with NaN.

In [24]:
sales_api = sales_api.replace('..', np.nan)

Unnamed: 0,REGION,CATEGORY,UNIT,TIME,VALUE
0,All Denmark,One-family houses,Index,1992Q1,31.5
1,All Denmark,One-family houses,Percentage change compared to previous quarter,1992Q1,
2,All Denmark,"Owner-occupied flats, total",Index,1992Q1,23.7
3,All Denmark,"Owner-occupied flats, total",Percentage change compared to previous quarter,1992Q1,
4,All Denmark,Weekend cottages,Index,1992Q1,29.5


1. Values types are replaced. 

In [26]:
sales_api.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 744 entries, 0 to 743
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   REGION    744 non-null    object
 1   CATEGORY  744 non-null    object
 2   UNIT      744 non-null    object
 3   TIME      744 non-null    object
 4   VALUE     741 non-null    object
dtypes: object(5)
memory usage: 29.2+ KB


1. The value variable is changed to er float type variable. 

In [27]:
sales_api.VALUE = sales_api.VALUE.astype('float')
sales_api.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 744 entries, 0 to 743
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   REGION    744 non-null    object 
 1   CATEGORY  744 non-null    object 
 2   UNIT      744 non-null    object 
 3   TIME      744 non-null    object 
 4   VALUE     741 non-null    float64
dtypes: float64(1), object(4)
memory usage: 29.2+ KB


In [46]:
I = sales_api['TIME'] == '2019Q1'
selected_row = sales_api[I]
print(selected_row)

#value_2019Q1 = selected_row.loc['VALUE', 0 ]
#sales_api['2019Q1'] = value_2019Q1


#lave 2019Q1 til index 100 
# = sales_api.value(sales_api['value'])

          REGION                     CATEGORY  \
102  All Denmark            One-family houses   
103  All Denmark            One-family houses   
106  All Denmark  Owner-occupied flats, total   
107  All Denmark  Owner-occupied flats, total   
104  All Denmark             Weekend cottages   
105  All Denmark             Weekend cottages   

                                               UNIT    TIME  VALUE  
102                                           Index  2019Q1  108.4  
103  Percentage change compared to previous quarter  2019Q1    1.2  
106                                           Index  2019Q1  123.2  
107  Percentage change compared to previous quarter  2019Q1    0.1  
104                                           Index  2019Q1   88.5  
105  Percentage change compared to previous quarter  2019Q1    0.7  


Ovenfor vil vi gerne re-indeksere, så 2019Q1=100. Så kan vi sammenligne boligpriserne (nominelt) med perioden før Corona.

Vi skal (måske?) have lavet et loop, som kan dividere indeksværdien i et pågældende kvartal i den pågældende boligkategori over med indeksværdien i 2019Q1 i den pågældende kategori.

In order to be able to **explore the raw data**, you may provide **static** and **interactive plots** to show important developments 

**Interactive plot** :

In [31]:
def plot_value(df, category, unit): 
    I = (df['CATEGORY'] == category) & (df['UNIT'] == unit)
    ax=df.loc[I,:].plot(x='TIME', y='VALUE', legend=False)

widgets.interact(plot_value, 
    df = widgets.fixed(sales_api),
    category = widgets.Dropdown(description='Category', 
                                    options=sales_api.CATEGORY.unique(), 
                                    value='One-family houses'),
    unit = widgets.Dropdown(description='Unit', 
                                    options=sales_api.UNIT.unique(), 
                                    value='Index')
)


interactive(children=(Dropdown(description='Category', options=('One-family houses', 'Owner-occupied flats, to…

<function __main__.plot_value(df, category, unit)>

Explain what you see when moving elements of the interactive plot around. 

# Analysis

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

MAKE FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.