## 0.0 setup

### 0.0.0 development environment setup

1. [set up your Conda environment](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html) using the provided environment file `conda_env_hybridization.yml`:

```
conda env create -f <path to environment yaml file>
```

2. clone the [`pylcaio` repository](https://github.com/OASES-project/pylcaio) into your user directory (`~/`)

In [16]:
%%capture
pylcaio_directory = os.path.join(Path.home(), 'pylcaio')
!git clone https://github.com/OASES-project/pylcaio.git $pylcaio_directory

3. copy files from the shared drive (`/srv/data/autumn_school/hybridization`) to your user directory (`~/hybridization_data`)

In [3]:
# ensures that we start with a fresh directory, since ecospold2matrix can mess up Ecoinvent files
!rm -rf ~/hybridization_data/input 
!mkdir -p ~/hybridization_data/input
!cp -a /srv/data/autumn_school/hybridization/* ~/hybridization_data/input

In [4]:
!mkdir -p ~/hybridization_data/output

## 0.1. imports
### 0.1.1. regular imports

In [1]:
# i/o
import sys
import os
from pathlib import Path
import gzip
import pickle
# configuration
import yaml
# lca
import ecospold2matrix as e2m
import pymrio
import brightway2 as bw
# type hints
from ecospold2matrix import ecospold2matrix
from pymrio import IOSystem
# data science
import pandas as pd
# deep copy
import copy

Using environment variable BRIGHTWAY2_DIR for data directory:
/home/weinold/bw_data


### 0.1.2. local imports

append `pylcaio` location to system path to ensure it can be used by Python

In [18]:
sys.path.append(os.path.join(pylcaio_directory, 'src')) # required for local import of pylcaio
import pylcaio

## 0.2. file paths

set location of databases (Ecoinvent and Exiobase) for use by the appropriate Python packages

### 0.2.1. directories

In [7]:
%%capture
print(path_dir_hybridization_input := os.path.join(Path.home(), 'hybridization_data/input'))
print(path_dir_hybridization_output := os.path.join(Path.home(), 'hybridization_data/output'))

### 0.2.2. databases

In [8]:
%%capture
# Exiobase
print(path_file_exiobase_input := os.path.join(path_dir_hybridization_input, 'IOT_2012_pxp.zip'))
print(path_file_exiobase_output := os.path.join(path_dir_hybridization_output, 'exiobase_monetary_pxp_2012.pickle'))
# Ecoinvent
print(path_dir_ecoinvent_input := os.path.join(path_dir_hybridization_input, 'ecoinvent-3.5-cutoff'))

## 1.1. read databases and save to disk
### 1.1.1 read Exiobase database and save `pickle` to disk

❔ creates `pymrio.IOSystem` class instance (collection of pd.DataFrames etc.) \
⏳ ~1min

In [9]:
%%time
exiobase: pymrio.IOSystem = pymrio.parse_exiobase3(path_file_exiobase_input)
with open(path_file_exiobase_output, 'wb') as file_handle:    
    pickle.dump(obj = exiobase, file = file_handle, protocol=pickle.HIGHEST_PROTOCOL)

CPU times: user 1min 4s, sys: 2.19 s, total: 1min 6s
Wall time: 1min 6s


### 1.1.2 read Ecoinvent database and save `pickle` to disk

❔ creates e2m.Ecospold2Matrix class instance \
⏳ ~12min

In [11]:
%%capture
print(e2m_project_name := 'ecoinvent_3_5_cutoff')
print(tmp_dir_e2m := os.path.join(path_dir_hybridization_output, str(e2m_project_name + '_log')))
print(tmp_pattern_e2m := '*.db')

#### 1.1.2.1. run `ecospold2matrix`

In [12]:
parser = e2m.Ecospold2Matrix(
    sys_dir = path_dir_ecoinvent_input,
    project_name = e2m_project_name,
    out_dir = path_dir_hybridization_output,
    positive_waste = False,
    nan2null = True
)

2022-10-20 17:41:49,732 - ecoinvent_3_5_cutoff - INFO - Ecospold2Matrix Processing
INFO:ecoinvent_3_5_cutoff:Ecospold2Matrix Processing
2022-10-20 17:41:49,735 - ecoinvent_3_5_cutoff - INFO - Current git commit: 52113ceb55775a6adab801376a53a22ed64b54d3
INFO:ecoinvent_3_5_cutoff:Current git commit: 52113ceb55775a6adab801376a53a22ed64b54d3
2022-10-20 17:41:49,736 - ecoinvent_3_5_cutoff - INFO - Project name: ecoinvent_3_5_cutoff
INFO:ecoinvent_3_5_cutoff:Project name: ecoinvent_3_5_cutoff
2022-10-20 17:41:49,737 - ecoinvent_3_5_cutoff - INFO - Unit process and Master data directory: /home/weinold/hybridization_data/input/ecoinvent-3.5-cutoff
INFO:ecoinvent_3_5_cutoff:Unit process and Master data directory: /home/weinold/hybridization_data/input/ecoinvent-3.5-cutoff
2022-10-20 17:41:49,737 - ecoinvent_3_5_cutoff - INFO - Data saved in: /home/weinold/hybridization_data/output
INFO:ecoinvent_3_5_cutoff:Data saved in: /home/weinold/hybridization_data/output
2022-10-20 17:41:49,738 - ecoinven

In [13]:
parser.ecospold_to_Leontief(
    fileformats = 'Pandas',
    with_absolute_flows=True)

2022-10-20 17:41:54,637 - ecoinvent_3_5_cutoff - INFO - Products extracted from IntermediateExchanges.xml with SHA-1 of b2c87a5bf5982a60515a6e1160e43c620a218369
INFO:ecoinvent_3_5_cutoff:Products extracted from IntermediateExchanges.xml with SHA-1 of b2c87a5bf5982a60515a6e1160e43c620a218369
2022-10-20 17:42:03,445 - ecoinvent_3_5_cutoff - INFO - Activities extracted from ActivityIndex.xml with SHA-1 of 3ac94e9826a9a031ff2e0bfbdceeecaeb72a9117
INFO:ecoinvent_3_5_cutoff:Activities extracted from ActivityIndex.xml with SHA-1 of 3ac94e9826a9a031ff2e0bfbdceeecaeb72a9117
2022-10-20 17:42:03,466 - ecoinvent_3_5_cutoff - INFO - Processing 16022 files in /home/weinold/hybridization_data/input/ecoinvent-3.5-cutoff/datasets
INFO:ecoinvent_3_5_cutoff:Processing 16022 files in /home/weinold/hybridization_data/input/ecoinvent-3.5-cutoff/datasets
2022-10-20 17:43:02,325 - ecoinvent_3_5_cutoff - INFO - Flows saved in /home/weinold/hybridization_data/input/ecoinvent-3.5-cutoff/flows.pickle with SHA-1 o

#### 1.1.2.2. clean up temporary files

unfortunately, `ecospold2matrix` creates lots of files (`.log, .db`) where the output directory can be not set. they are not cleaned up automatically. they might interfere with repeated runs of the code. this is why we must clean up these files.

In [14]:
def delete_e2m_files(list_string: list) -> None:
    for i in list_string:
        !rm -rf $i
    pass

In [15]:
delete_e2m_files(
        [
            tmp_dir_e2m,
            tmp_pattern_e2m,
        ]
)