# YOUR PROJECT TITLE

> **Note the following:** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code
> 1. The `dataproject.py` file includes a function which can be used multiple times in this notebook.

Imports and set magics:

In [156]:
import pandas as pd
import numpy as np
import datetime

import matplotlib.pyplot as plt
plt.rcParams.update({"axes.grid":True,"grid.color":"black","grid.alpha":"0.25","grid.linestyle":"-"})
plt.rcParams.update({'font.size': 10})
import ipywidgets as widgets

from dstapi import DstApi 

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# user written modules
import dataproject

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Read and clean data

1. Data is imported using the API for Danmarks statistik

In [157]:
data = DstApi('EJ55') 

## Explore each data set

1. The availble values for each variable is plotted in order to select relevant variables. 

In [158]:
#An overview over the availble data. 
tabsum = data.tablesummary(language='en')
display(tabsum)

# The available values for a each variable:
for variable in tabsum['variable name']:
    print(variable+':')
    display(data.variable_levels(variable, language='en'))

Table EJ55: Price index for sales of property by region, category of real property, unit and time
Last update: 2023-03-31T08:00:00


Unnamed: 0,variable name,# values,First value,First value label,Last value,Last value label,Time variable
0,OMRÅDE,17,000,All Denmark,11,Province Nordjylland,False
1,EJENDOMSKATE,3,0111,One-family houses,2103,"Owner-occupied flats, total",False
2,TAL,3,100,Index,310,Percentage change compared to same quarter the...,False
3,Tid,124,1992K1,1992Q1,2022K4,2022Q4,True


OMRÅDE:


Unnamed: 0,id,text
0,0,All Denmark
1,84,Region Hovedstaden
2,1,Province Byen København
3,2,Province Københavns omegn
4,3,Province Nordsjælland
5,4,Province Bornholm
6,85,Region Sjælland
7,5,Province Østsjælland
8,6,Province Vest- og Sydsjælland
9,83,Region Syddanmark


EJENDOMSKATE:


Unnamed: 0,id,text
0,111,One-family houses
1,801,Weekend cottages
2,2103,"Owner-occupied flats, total"


TAL:


Unnamed: 0,id,text
0,100,Index
1,210,Percentage change compared to previous quarter
2,310,Percentage change compared to same quarter the...


Tid:


Unnamed: 0,id,text
0,1992K1,1992Q1
1,1992K2,1992Q2
2,1992K3,1992Q3
3,1992K4,1992Q4
4,1993K1,1993Q1
...,...,...
119,2021K4,2021Q4
120,2022K1,2022Q1
121,2022K2,2022Q2
122,2022K3,2022Q3


We are only interested in some of 

1. A param dictionary is defined in order to detaile the data we want

In [159]:
params = data._define_base_params(language='en')
params

{'table': 'ej55',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'OMRÅDE', 'values': ['*']},
  {'code': 'EJENDOMSKATE', 'values': ['*']},
  {'code': 'TAL', 'values': ['*']},
  {'code': 'Tid', 'values': ['*']}]}

1. We select the data we want. We only want data for "All Denmark" and indexed values, and percentage change compared to previous quarter.

In [160]:
params = {'table': 'ej55',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'OMRÅDE', 'values': ['000']},
  {'code': 'EJENDOMSKATE', 'values': ['*']},
  {'code': 'TAL', 'values': ['100', '210']},
  {'code': 'Tid', 'values': ['*']}]}

1. Data is sorted and the index is reset. 
2. Coloumns are renamed.

In [161]:
sales_api = data.get_data(params=params)
sales_api.reset_index(inplace = True, drop = True)
sales_api.sort_values(by=['OMRÅDE', 'TID', 'EJENDOMSKATE'], inplace=True)
sales_api.rename(columns = {'OMRÅDE':'REGION', 'EJENDOMSKATE':'CATEGORY', 'TAL':'UNIT', 'TID':'TIME', 'INDHOLD':'VALUE'}, inplace=True)
sales_api.head(5)

Unnamed: 0,REGION,CATEGORY,UNIT,TIME,VALUE
324,All Denmark,One-family houses,Index,1992Q1,31.5
325,All Denmark,One-family houses,Percentage change compared to previous quarter,1992Q1,..
328,All Denmark,"Owner-occupied flats, total",Index,1992Q1,23.7
329,All Denmark,"Owner-occupied flats, total",Percentage change compared to previous quarter,1992Q1,..
326,All Denmark,Weekend cottages,Index,1992Q1,29.5


1. Some values are replaced with NaN.

In [162]:
sales_api = sales_api.replace('..', np.nan)

1. Values types are replaced. 

In [163]:
sales_api.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 744 entries, 324 to 225
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   REGION    744 non-null    object
 1   CATEGORY  744 non-null    object
 2   UNIT      744 non-null    object
 3   TIME      744 non-null    object
 4   VALUE     741 non-null    object
dtypes: object(5)
memory usage: 34.9+ KB


1. The value variable is changed to er float type variable. 

In [164]:
sales_api.VALUE = sales_api.VALUE.astype('float')
sales_api.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 744 entries, 324 to 225
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   REGION    744 non-null    object 
 1   CATEGORY  744 non-null    object 
 2   UNIT      744 non-null    object 
 3   TIME      744 non-null    object 
 4   VALUE     741 non-null    float64
dtypes: float64(1), object(4)
memory usage: 34.9+ KB


In [165]:
#det kan nok laves med loop men jer er loop noob
#Addings a coloumn with only the index value for OFH 2019Q1
I = (sales_api['TIME'] == '2019Q1') & (sales_api['CATEGORY'] == 'One-family houses') & (sales_api['UNIT'] == 'Index')
selected_row_OFH = sales_api[I]
value_OHF = selected_row_OFH.loc[396, 'VALUE'] 
sales_api['OFH_2019Q1'] = value_OHF

#Addings a coloumn with only the index value for OOH 2019Q1
I = (sales_api['TIME'] == '2019Q1') & (sales_api['CATEGORY'] == 'Owner-occupied flats, total') & (sales_api['UNIT'] == 'Index')
selected_row_OOF = sales_api[I]
value_OOF = selected_row_OOF.loc[400, 'VALUE'] 
sales_api['OOF_2019Q1'] = value_OOF

#Addings a coloumn with only the index value for WC 2019Q1
I = (sales_api['TIME'] == '2019Q1') & (sales_api['CATEGORY'] == 'Weekend cottages') & (sales_api['UNIT'] == 'Index')
selected_row_WC = sales_api[I]
value_WC = selected_row_WC.loc[398, 'VALUE'] 
sales_api['WC_2019Q1'] = value_WC

In [166]:
#Indeks-2019Q1
sales_api['index_2019Q1'] = np.nan

one_family_houses = (sales_api['CATEGORY'] == 'One-family houses') & (sales_api['UNIT'] == 'Index')
owner_occupied_flats = (sales_api['CATEGORY'] == 'Owner-occupied flats, total') & (sales_api['UNIT'] == 'Index')
weekend_cottages = (sales_api['CATEGORY'] == 'Weekend cottages') & (sales_api['UNIT'] == 'Index')

sales_api.loc[one_family_houses, 'index_2019Q1'] = (sales_api.loc[one_family_houses, 'VALUE'] / sales_api.loc[one_family_houses, 'OFH_2019Q1']) * 100
sales_api.loc[owner_occupied_flats, 'index_2019Q1'] = (sales_api.loc[owner_occupied_flats, 'VALUE'] / sales_api.loc[owner_occupied_flats, 'OOF_2019Q1']) * 100
sales_api.loc[weekend_cottages, 'index_2019Q1'] = (sales_api.loc[weekend_cottages, 'VALUE'] / sales_api.loc[weekend_cottages, 'WC_2019Q1']) * 100

sales_api.sample(10)

Unnamed: 0,REGION,CATEGORY,UNIT,TIME,VALUE,OFH_2019Q1,OOF_2019Q1,WC_2019Q1,index_2019Q1
502,All Denmark,"Owner-occupied flats, total",Index,2004Q3,66.8,108.4,123.2,88.5,54.220779
646,All Denmark,"Owner-occupied flats, total",Index,1992Q4,21.7,108.4,123.2,88.5,17.613636
296,All Denmark,Weekend cottages,Index,2011Q1,86.3,108.4,123.2,88.5,97.514124
656,All Denmark,Weekend cottages,Index,1993Q4,29.8,108.4,123.2,88.5,33.672316
320,All Denmark,Weekend cottages,Index,2010Q2,89.6,108.4,123.2,88.5,101.242938
116,All Denmark,Weekend cottages,Index,2021Q1,108.0,108.4,123.2,88.5,122.033898
468,All Denmark,One-family houses,Index,1995Q4,38.7,108.4,123.2,88.5,35.701107
694,All Denmark,"Owner-occupied flats, total",Index,2003Q1,60.0,108.4,123.2,88.5,48.701299
680,All Denmark,Weekend cottages,Index,1999Q1,39.9,108.4,123.2,88.5,45.084746
36,All Denmark,One-family houses,Index,1998Q1,47.9,108.4,123.2,88.5,44.188192


In order to be able to **explore the raw data**, you may provide **static** and **interactive plots** to show important developments 

**Interactive plot** :

1. We make an interactive plot

In [167]:
def plot_value(df, category, unit): 
    I = (df['CATEGORY'] == category) & (df['UNIT'] == unit)
    ax=df.loc[I,:].plot(x='TIME', y='VALUE', legend=False)

widgets.interact(plot_value, 
    df = widgets.fixed(sales_api),
    category = widgets.Dropdown(description='Category', 
                                    options=sales_api.CATEGORY.unique(), 
                                    value='One-family houses'),
    unit = widgets.Dropdown(description='Unit', 
                                    options=sales_api.UNIT.unique(), 
                                    value='Index')
)


interactive(children=(Dropdown(description='Category', options=('One-family houses', 'Owner-occupied flats, to…

<function __main__.plot_value(df, category, unit)>

Explain what you see when moving elements of the interactive plot around. 

In [168]:
def plot_value_2(df, category, unit): 
    I = (df['CATEGORY'] == category) & (df['UNIT'] == unit)
    ax=df.loc[I,:].plot(x='TIME', y='index_2019Q1', legend=False)

widgets.interact(plot_value_2, 
    df = widgets.fixed(sales_api),
    category = widgets.Dropdown(description='Category', 
                                    options=sales_api.CATEGORY.unique(), 
                                    value='One-family houses'),
    unit = widgets.fixed('Index')
)

interactive(children=(Dropdown(description='Category', options=('One-family houses', 'Owner-occupied flats, to…

<function __main__.plot_value_2(df, category, unit)>

# Analysis

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

MAKE FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.