# Introduction

This notebook serves to research the metadata of the datasets we are interested in using. The metadata will serve as a basis to do the data query and will support the data analysis.

# Involved datasets

We intend to analyse some facts about energy in the EU. For that we intend to use:


|Agency|Dataset_id|Name  |Url |
|--|--|--|-- |
|eurostat|demo_pjan| Population on 1 January by age and sex  |https://ec.europa.eu/eurostat/web/products-datasets/-/DEMO_PJAN|
|eurostat|nrg_bal_s| Simplified Energy Balances | https://ec.europa.eu/eurostat/databrowser/view/nrg_bal_s|
|eurostat|ilc_lvph01| Average household size |https://ec.europa.eu/eurostat/databrowser/view/ilc_lvph01/|
|eurostat|nrg_pc_202| Gas prices for HH consumers |https://ec.europa.eu/eurostat/databrowser/view/nrg_pc_202/|
|eurostat|nrg_pc_203| Gas prices for non-HH consumers |https://ec.europa.eu/eurostat/databrowser/view/nrg_pc_203/|
|eurostat|nrg_pc_204| Electricity prices for HH consumers |https://ec.europa.eu/eurostat/databrowser/view/nrg_pc_204/|
|eurostat|nrg_pc_205| Electricity prices for non-HH consumers |https://ec.europa.eu/eurostat/databrowser/view/nrg_pc_205/|
|ecb|MNA|Main aggregates, national accounts|https://data.ecb.europa.eu/data/datasets/MNA/data-information|


In [1]:
import utils
import json

with open('sub_codelists.json', 'r') as f:
    sub_codelists = json.load(f)

We need to understand the dataset before using it.
We should do this by seeing the data structure (i.e., the columns and the possible values for each of them).

The SDMX standard API allows to query the exact codes that are used by a dataflow. Unfortunately, that method is not always availabe, and that is the case for Eurostat. Therefore, in order to research what are the actual combinations used by this dataset, we need to get actual data. Our proposal in this case is to get data for the last observation, which should provide the maximum number of combinations with the minimum number of periods.

We have created a method in utils to simplify this to the maximum. We only need to write the dataset we are interested in researching, and we'll get a summary of the contents.

In [None]:
utils.get_summary_dataset('demo_pjan', agency='estat')

In [None]:
utils.get_summary_dataset('nrg_bal_s', agency='estat')

In [None]:
utils.get_summary_dataset('ilc_lvph01', agency='estat')


In [None]:
utils.get_summary_dataset('nrg_pc_202', agency='estat')

In [None]:
utils.get_summary_dataset('nrg_pc_204', agency='estat')

We may need to check that the breakdowns we have selected are correct:

In [None]:
nrg_bal_aggregated = {
    'freq': 'A',
    'nrg_bal': sub_codelists['nrg_bal_total_consumption'],
    'siec': sub_codelists['nrg_bal_siec_total'],
    'unit': 'GWH',
    'geo': sub_codelists['geo_total_eu'],
}

nrg_bal_dissaggregated = {
    'freq': 'A',
    'nrg_bal': sub_codelists['nrg_bal_consumption_basic_plus_other'],
    'siec': sub_codelists['nrg_bal_siec_breakdown'],
    'unit': 'GWH',
    'geo': sub_codelists['geo_eu_countries'],
}


demo_pjan_aggregated = {
    'freq': 'A',
    'unit': 'NR',
    'age': 'TOTAL',
    'sex': 'T',
    'geo': sub_codelists['geo_total_eu'],
    }

demo_pjan_disaggregated = {
    'freq': 'A',
    'unit': 'NR',
    'age': 'TOTAL',
    'sex': 'T',
    'geo': sub_codelists['geo_eu_countries'],
    }


comparison_nrg_bal = utils.compare_aggregates('nrg_bal_s', nrg_bal_aggregated, nrg_bal_dissaggregated, agency='estat')
comparison_demo_pjan = utils.compare_aggregates('demo_pjan', demo_pjan_aggregated, demo_pjan_disaggregated, agency='estat')
display(comparison_nrg_bal)
display(comparison_demo_pjan)

In [None]:

utils.get_summary_dataset('MNA', agency='ecb')
