# CSV Hierarchies

In this short demo notebook, we see how to describe the experiment hierarchy in a csv file. 

First we define the basics of a simple brightway based calculation, without any hierarchy

In [1]:
import pandas as pd

from enbios import Experiment

BRIGHTWAY_ECOINVENT_391_PROJECT_NAME = "ecoinvent_391"
experiment_config = {
    "adapters": [
        {
            "adapter_name": "brightway-adapter",
            "config": {"bw_project": BRIGHTWAY_ECOINVENT_391_PROJECT_NAME},
            "methods": {
                "GWP1000": (
                    "ReCiPe 2016 v1.03, midpoint (H)",
                    "climate change",
                    "global warming potential (GWP1000)",
                )
            },
        }
    ]
}

There are a couple of test files within enbios. 

In [2]:
import os
from enbios.const import BASE_TEST_DATA_PATH

csv_hierarchy_files = BASE_TEST_DATA_PATH / "hierarchy_csvs/inputs/"
os.listdir(csv_hierarchy_files)
# do not try out files ending with "_x", these are invalid test files

['hierarchy1.csv',
 'hierarchy2.csv',
 'hierarchy3.csv',
 'hierarchy4.csv',
 'hierarchy1_x.csv']

Let's look at a first very simple example: `hierarchy1.csv`

In [3]:
pd.read_csv(csv_hierarchy_files / "hierarchy1.csv").fillna("")

Unnamed: 0,level_0,level_1,module,config.code
0,root,,sum,
1,,biomass,bw,4a9e1a0ac89a66b2fe1565cbf2628d9c


The columns which describe the structure of the hierarchy are named __`level_{i}`__. Basically how it internally works, is that it first checks for all columns, that have this structure: `level_<number>` and then sorts them alphabetically. You might want to check the order, if your hierarchy is very deep. Each row describes one node and the contents on one respective column defines the level. 

The __module__ column defines the adapter or aggregator for that node. Enbios will automatically specify the right option, since it basically just depends on if the node is at the bottom of the hierarchy (structural node - adapter) or not (functional node - aggregator).

all other columns should start with `config.` and are values for the configuration. Enbios will consider all those columns as keys for a structured config dictionary, where `.` is used as a separator to create a nested dictionary, in case the config requires that.

E.g. 
```
config.loc.0, config.loc.1
eu, es
```
will be turned into:
`{'config': {'loc': ['eu', 'es']}}`

Let's insert print the hierarchy

In [4]:
experiment_config["hierarchy"] = csv_hierarchy_files / "hierarchy1.csv"
experiment = Experiment(experiment_config)

experiment.get_simplified_hierarchy(print_it=True);

{
  "root": {
    "biomass": null
  }
}


A more complex example:

In [5]:
pd.read_csv(csv_hierarchy_files / "hierarchy4.csv").fillna("")

Unnamed: 0,level_0,level_1,level_2,module,config.code
0,root,,,sum,
1,,intermediate,,sum,
2,,,biomass,bw,4a9e1a0ac89a66b2fe1565cbf2628d9c
3,,,biomass2,bw,4a9e1a0ac89a66b2fe1565cbf2628d9c
4,,intermediate2,,sum,
5,,,biomass3,bw,4a9e1a0ac89a66b2fe1565cbf2628d9c
6,,,biomass4,bw,4a9e1a0ac89a66b2fe1565cbf2628d9c


In [6]:
experiment_config["hierarchy"] = csv_hierarchy_files / "hierarchy4.csv"
experiment = Experiment(experiment_config)

experiment.get_simplified_hierarchy(print_it=True);

{
  "root": {
    "intermediate": {
      "biomass": null,
      "biomass2": null
    },
    "intermediate2": {
      "biomass3": null,
      "biomass4": null
    }
  }
}


If it is crucial to use different level column names you can call the function `csv2hierarchy` directly and passing one of the two arguments:
`level_cols` a list of strings, which include the columns which specify the names of nodes or `levels_regex` (which is the default with the regular expression : `^level_\d+$`. The result is a dictionary which can be inserted into the experiment config as the hierarchy.

In [7]:
from enbios.base.tree_operations import csv2hierarchy

csv2hierarchy(csv_hierarchy_files / "hierarchy4.csv")

{'name': 'root',
 'children': [{'name': 'intermediate',
   'children': [{'name': 'biomass',
     'config': {'code': '4a9e1a0ac89a66b2fe1565cbf2628d9c'},
     'adapter': 'bw'},
    {'name': 'biomass2',
     'config': {'code': '4a9e1a0ac89a66b2fe1565cbf2628d9c'},
     'adapter': 'bw'}],
   'aggregator': 'sum'},
  {'name': 'intermediate2',
   'children': [{'name': 'biomass3',
     'config': {'code': '4a9e1a0ac89a66b2fe1565cbf2628d9c'},
     'adapter': 'bw'},
    {'name': 'biomass4',
     'config': {'code': '4a9e1a0ac89a66b2fe1565cbf2628d9c'},
     'adapter': 'bw'}],
   'aggregator': 'sum'}],
 'aggregator': 'sum'}