## 0.0. setup

1. [set up your Conda environment](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html) using the provided environment file `conda_env_hybridization.yml`:

```
conda env create -f <path to environment yaml file>
```

2. create directories and copy database files from the shared drive (`/srv/data/autumn_school/hybridization/`)

In [1]:
# ensures that we start with a fresh directory, since ecospold2matrix can mess up Ecoinvent files
!rm -rf ~/hybridization_data/databases_raw 
!mkdir -p ~/hybridization_data/databases_raw
!cp -a /srv/data/autumn_school/hybridization/databases_raw/* ~/hybridization_data/databases_raw

In [2]:
!mkdir -p ~/hybridization_data/databases_pickle

## 0.1. imports
### 0.1.1. regular imports

In [3]:
# i/o
import sys
import os
from pathlib import Path
import gzip
import pickle
# lca
import ecospold2matrix as e2m
import pymrio
import brightway2 as bw
# data science
import pandas as pd

Using environment variable BRIGHTWAY2_DIR for data directory:
/home/weinold/bw_data


### 0.1.2. local imports

In [6]:
%%capture
pylcaio_directory = os.path.join(Path.home(), 'pylcaio')
!git clone https://github.com/michaelweinold/pylcaio_integration_with_brightway.git $pylcaio_directory # this is a fork with various fixes

In [5]:
sys.path.append(os.path.join(Path.home(), 'pylcaio', 'src')) # required for local import of pylcaio
import pylcaio

## 0.2. file paths

set location of databases (Ecoinvent and Exiobase) for use by the appropriate Python packages

### 0.2.1. directories

In [6]:
%%capture
print(path_dir_databases_raw := os.path.join(Path.home(), 'hybridization_data/databases_raw'))
print(path_dir_databases_pickle := os.path.join(Path.home(), 'hybridization_data/databases_pickle'))

### 0.2.2. databases

In [11]:
%%capture
# Exiobase
print(path_file_exiobase_input := os.path.join(path_dir_databases_raw, 'IOT_2012_pxp.zip'))
print(path_file_exiobase_output := os.path.join(path_dir_databases_pickle, 'exiobase_monetary_pxp_2012.pickle'))
# Ecoinvent
print(path_dir_ecoinvent_input := os.path.join(path_dir_databases_raw, 'ecoinvent-3.5-cutoff'))
print(path_file_ecoinvent_characterisation := os.path.join(path_dir_databases_raw, 'LCIA_implementation_3.5.xlsx'))

## 1.1. read databases and save to disk
### 1.1.1 read Exiobase database and save `pickle` to disk

❔ creates `pymrio.IOSystem` class instance (collection of pd.DataFrames etc.) \
⏳ ~1min

In [8]:
%%time
exiobase: pymrio.IOSystem = pymrio.parse_exiobase3(path_file_exiobase_input)
with open(path_file_exiobase_output, 'wb') as file_handle:    
    pickle.dump(obj = exiobase, file = file_handle, protocol=pickle.HIGHEST_PROTOCOL)

CPU times: user 1min 1s, sys: 2.31 s, total: 1min 4s
Wall time: 1min 5s


### 1.1.2 read Ecoinvent database and save `pickle` to disk

❔ creates e2m.Ecospold2Matrix class instance \
⏳ ~12min

In [10]:
%%capture
print(e2m_project_name := 'ecoinvent_3_5_cutoff')
print(tmp_dir_e2m := os.path.join(path_dir_databases_pickle, str(e2m_project_name + '_log')))
print(tmp_pattern_e2m := '*.db')

#### 1.1.2.1. run `ecospold2matrix`

In [13]:
parser = e2m.Ecospold2Matrix(
    sys_dir = path_dir_ecoinvent_input,
    project_name = e2m_project_name,
    characterisation_file = path_file_ecoinvent_characterisation,
    out_dir = path_dir_databases_pickle,
    positive_waste = False,
    nan2null = True
)

2022-10-22 08:11:12,619 - ecoinvent_3_5_cutoff - INFO - Ecospold2Matrix Processing
INFO:ecoinvent_3_5_cutoff:Ecospold2Matrix Processing
2022-10-22 08:11:12,621 - ecoinvent_3_5_cutoff - INFO - Current git commit: df4b52cf0ef8bafafa69e933ddd512ee51431e38
INFO:ecoinvent_3_5_cutoff:Current git commit: df4b52cf0ef8bafafa69e933ddd512ee51431e38
2022-10-22 08:11:12,622 - ecoinvent_3_5_cutoff - INFO - Project name: ecoinvent_3_5_cutoff
INFO:ecoinvent_3_5_cutoff:Project name: ecoinvent_3_5_cutoff
2022-10-22 08:11:12,622 - ecoinvent_3_5_cutoff - INFO - Unit process and Master data directory: /home/weinold/hybridization_data/databases_raw/ecoinvent-3.5-cutoff
INFO:ecoinvent_3_5_cutoff:Unit process and Master data directory: /home/weinold/hybridization_data/databases_raw/ecoinvent-3.5-cutoff
2022-10-22 08:11:12,623 - ecoinvent_3_5_cutoff - INFO - Data saved in: /home/weinold/hybridization_data/databases_pickle
INFO:ecoinvent_3_5_cutoff:Data saved in: /home/weinold/hybridization_data/databases_pickl

In [14]:
parser.ecospold_to_Leontief(
    fileformats = 'Pandas',
    with_absolute_flows=True)

2022-10-22 08:11:16,393 - ecoinvent_3_5_cutoff - INFO - Products extracted from IntermediateExchanges.xml with SHA-1 of b2c87a5bf5982a60515a6e1160e43c620a218369
INFO:ecoinvent_3_5_cutoff:Products extracted from IntermediateExchanges.xml with SHA-1 of b2c87a5bf5982a60515a6e1160e43c620a218369
2022-10-22 08:11:25,473 - ecoinvent_3_5_cutoff - INFO - Activities extracted from ActivityIndex.xml with SHA-1 of 3ac94e9826a9a031ff2e0bfbdceeecaeb72a9117
INFO:ecoinvent_3_5_cutoff:Activities extracted from ActivityIndex.xml with SHA-1 of 3ac94e9826a9a031ff2e0bfbdceeecaeb72a9117
2022-10-22 08:11:25,495 - ecoinvent_3_5_cutoff - INFO - Processing 16022 files in /home/weinold/hybridization_data/databases_raw/ecoinvent-3.5-cutoff/datasets
INFO:ecoinvent_3_5_cutoff:Processing 16022 files in /home/weinold/hybridization_data/databases_raw/ecoinvent-3.5-cutoff/datasets
2022-10-22 08:12:24,700 - ecoinvent_3_5_cutoff - INFO - Flows saved in /home/weinold/hybridization_data/databases_raw/ecoinvent-3.5-cutoff/f

starting characterisation


2022-10-22 08:14:03,979 - ecoinvent_3_5_cutoff - INFO - Will use column 7, named CF_35, for characterisation factors
INFO:ecoinvent_3_5_cutoff:Will use column 7, named CF_35, for characterisation factors
2022-10-22 08:14:03,980 - ecoinvent_3_5_cutoff - INFO - Starting characterisation matching
INFO:ecoinvent_3_5_cutoff:Starting characterisation matching
2022-10-22 08:14:08,272 - ecoinvent_3_5_cutoff - INFO - Characterisation matching done. C matrix created
INFO:ecoinvent_3_5_cutoff:Characterisation matching done. C matrix created
2022-10-22 08:14:08,273 - ecoinvent_3_5_cutoff - INFO - Starting to export to file
INFO:ecoinvent_3_5_cutoff:Starting to export to file
2022-10-22 08:14:08,274 - ecoinvent_3_5_cutoff - INFO - about to write to file
INFO:ecoinvent_3_5_cutoff:about to write to file
2022-10-22 08:16:59,033 - ecoinvent_3_5_cutoff - INFO - Final, symmetric, normalized matrices saved in /home/weinold/hybridization_data/databases_pickle/ecoinvent_3_5_cutoffPandas_symmNorm.gz.pickle w

#### 1.1.2.2. clean up temporary files

unfortunately, `ecospold2matrix` creates lots of files (`.log, .db`) where the output directory can be not set. they are not cleaned up automatically. they might interfere with repeated runs of the code. this is why we must clean up these files.

In [None]:
def delete_e2m_files(list_string: list) -> None:
    for i in list_string:
        !rm -rf $i
    pass

In [None]:
delete_e2m_files(
        [
            tmp_dir_e2m,
            tmp_pattern_e2m,
        ]
)