### Prerequisites

1. Install `pylcaio_ecoinvent_3_8` from [`pylcaio_ecoinvent_3_8.yml`](https://github.com/michaelweinold/config_conda/blob/main/pylcaio_ecoinvent_3_8.yml)
2. Install `pylcaio_ecoinvent_3_9` from [`pylcaio_ecoinvent_3_9.yml`](https://github.com/michaelweinold/config_conda/blob/main/pylcaio_ecoinvent_3_9.yml)
3. Ecoinvent database (spold files, etc.) in directory named `ecoinvent_3.9.1_cutoff_ecoSpold02` in the raw data directory (`path_dir_data_raw`)
4. Ecoinvent characterization file `LCIA_Implementation_v3.9.1.xlsx` in directory named `ecoinvent_3_9_1_cutoff_ecoSpold02` in the raw data directory (`path_dir_data_raw`)
5. `Exiobase` data present in directory `path_dir_data_raw` or internet connection for automatic download from zenodo.org

## 0. Setup
### 0.1. Imports

In [1]:
# i/o
import sys
import os
from pathlib import Path
import gzip
import pickle
# git
import git
# configuration
import yaml
# data science
import pandas as pd
import copy
# lca
import pymrio
import ecospold2matrix as e2m
# type hints
from pymrio import IOSystem
from pathlib import PosixPath

### 0.2. Variables

In [2]:
str_ecoinvent_version: str = '3.8'
str_exiobase_system: str = 'pxp'
str_exiobase_year: str = '2011'
str_exiobase_zip_file: str = 'IOT_' + str_exiobase_year + '_' + str_exiobase_system + '.zip'

### 0.3. File Paths
#### 0.3.1. Directories

In [3]:
%%capture
print(path_dir_data := Path(Path.home() / 'data'))
print(path_dir_data_raw := Path(path_dir_data / 'data_raw'))
print(path_dir_data_processed := Path(path_dir_data / 'data_processed'))

In [4]:
%%capture
print(path_dir_repo_pylcaio_parent := Path(Path.home() / 'github'))
print(path_dir_repo_pylcaio_src := Path(path_dir_repo_pylcaio_parent / 'pylcaio/src'))

In [5]:
%%capture
print(path_dir_exiobase_raw := path_dir_data_raw / 'exiobase_3_8')
print(path_dir_ecoinvent_raw := path_dir_data_raw / str('ecoinvent_' + str_ecoinvent_version + '_cutoff_ecoSpold02'))

#### 0.3.2. Files

In [6]:
%%capture
print(path_file_exiobase_processed := path_dir_data_processed / 'exiobase_3_8.pickle')
print(path_file_ecoinvent_processed := path_dir_data_processed / str('ecoinvent_' + str_ecoinvent_version + 'Pandas_symmNorm.gz.pickle'))

In [7]:
%%capture
print(path_file_hybrid_system := Path(path_dir_data_processed / 'hybrid.pickle'))

In [8]:
%%capture
print(path_file_ecoinvent_LCIA_implementation := str(path_dir_ecoinvent_raw / str('LCIA_Implementation_v' + str_ecoinvent_version + '.xlsx')))

## 1. Data Preparation

### 1.1. Download `Exiobase` from zenodo.org 

In [12]:
if path_dir_exiobase_raw.exists():
    pass
else:
    pymrio.download_exiobase3(
        storage_folder = path_dir_exiobase_raw,
        system = str_exiobase_system,
        years = str_exiobase_year
    )

### 1.2. Check if `Ecoinvent` data is present

In [13]:
assert path_dir_ecoinvent_raw.exists(), 'Ecoinvent data not found.'

## 2. Parse Databases (needs to be run only once)
### 2.1. Parse `Exiobase`

⏳ ~1.5 min on MacBook Pro

In [11]:
exiobase: pymrio.IOSystem = pymrio.parse_exiobase3(path_dir_exiobase_raw / str_exiobase_zip_file)
with open(path_file_exiobase_processed, 'wb') as file_handle:
    pickle.dump(obj = exiobase, file = file_handle, protocol=pickle.HIGHEST_PROTOCOL)

### 2.2. Parse `Ecoinvent`

⏳ ~15 min on MacBook Pro

In [9]:
parser = e2m.Ecospold2Matrix(
    sys_dir = str(path_dir_ecoinvent_raw), # passing a Posix Path object breaks the functionality
    project_name = str('ecoinvent_' + str_ecoinvent_version),
    out_dir = path_dir_data_processed,
    characterisation_file = path_file_ecoinvent_LCIA_implementation,
    positive_waste = False,
    nan2null = True
)
parser.save_interm = False
parser.prefer_pickles = True

2023-02-23 16:36:28,043 - ecoinvent_3.8 - INFO - Ecospold2Matrix Processing
2023-02-23 16:36:28,057 - ecoinvent_3.8 - INFO - Current git commit: f81888470d18f0c102d0b6e5272ccbf6bf07dd1a
2023-02-23 16:36:28,058 - ecoinvent_3.8 - INFO - Project name: ecoinvent_3.8
2023-02-23 16:36:28,059 - ecoinvent_3.8 - INFO - Unit process and Master data directory: /Users/michaelweinold/data/data_raw/ecoinvent_3.8_cutoff_ecoSpold02
2023-02-23 16:36:28,059 - ecoinvent_3.8 - INFO - Data saved in: /Users/michaelweinold/data/data_processed
2023-02-23 16:36:28,059 - ecoinvent_3.8 - INFO - Replace Not-a-Number instances with 0.0 in all matrices
2023-02-23 16:36:28,060 - ecoinvent_3.8 - INFO - Pickle intermediate results to files
2023-02-23 16:36:28,060 - ecoinvent_3.8 - INFO - Order processes based on: ISIC, activityName
2023-02-23 16:36:28,060 - ecoinvent_3.8 - INFO - Order elementary exchanges based on: comp, name, subcomp
rm: ecoinvent_3.8_characterisation.db: No such file or directory


In [10]:
parser.ecospold_to_Leontief(
    fileformats = 'Pandas',
    with_absolute_flows = True
)

2023-02-23 16:36:34,004 - ecoinvent_3.8 - INFO - Products extracted from IntermediateExchanges.xml with SHA-1 of 1da23bc8fd24d97422a2a21ba3626d2cdfa6a428
2023-02-23 16:36:56,747 - ecoinvent_3.8 - INFO - Activities extracted from ActivityIndex.xml with SHA-1 of 03403c01ac6f74a5d6cc5ca8820593f7e516b709
2023-02-23 16:36:56,775 - ecoinvent_3.8 - INFO - Processing 19565 files in /Users/michaelweinold/data/data_raw/ecoinvent_3.8_cutoff_ecoSpold02/datasets
2023-02-23 16:38:07,333 - ecoinvent_3.8 - INFO - Processing 19565 files - this may take a while ...
2023-02-23 16:39:37,374 - ecoinvent_3.8 - INFO - Elementary flows extracted from ElementaryExchanges.xml with SHA-1 of f65edb9180cc5fb6df99289157b5aab92d30c0d1
2023-02-23 16:39:37,393 - ecoinvent_3.8 - INFO - OK.   No untraceable flows.
2023-02-23 16:39:37,625 - ecoinvent_3.8 - INFO - OK. Source activities seem in order. Each product traceable to an activity that actually does produce or distribute this product.
2023-02-23 16:39:37,882 - ecoi

starting characterisation
            cas                                        aName     bad_cas  \
0       93-65-2                                     mecoprop   7085-19-0   
1   107534-96-3                                 tebuconazole  80443-41-0   
2      302-04-5                                          NaN  71048-69-6   
3   138261-41-3                                          NaN  38261-41-3   
4      108-62-3                                  metaldehyde   9002-91-9   
5      107-15-3                                          NaN    117-15-3   
6       74-89-5                                 methyl amine     75-89-5   
7    17428-41-0                                 arsenic, ion   7440-38-2   
8    22537-48-0                                 cadmium, ion   7440-43-9   
9    18540-29-9                                  chromium vi   7440-47-3   
10   17493-86-6                                  Copper, ion   7440-50-8   
11   14701-22-5                                  Nickel, ion  

  self.STR.cas = self.STR.cas.str.replace('^[0]*','')


OperationalError: duplicate column name: id

#### 2.2.1. Remove Temporary Files

In [3]:
Path.unlink(Path.cwd() / ('ecoinvent_3_9' + '_characterisation.db'), missing_ok = True)
Path.unlink(Path.cwd() / 'C_long', missing_ok = True)

## 3. Load Databases

In [24]:
exiobase: IOSystem = pd.read_pickle(path_file_exiobase_processed)
with gzip.open(path_file_ecoinvent_processed,'rb') as f:
    ecoinvent = pd.read_pickle(f)

## 4. `pylcaio`
### 4.1. `pylcaio` Import

⏳ ~5 min on MacBook Pro

In [25]:
if path_dir_repo_pylcaio_src.exists():
    pass
else:
    git.Git(path_dir_repo_pylcaio_parent).clone("https://github.com/MaximeAgez/pylcaio")

In [26]:
sys.path.append(str(path_dir_repo_pylcaio_src))
import pylcaio 

In [27]:
database_loader: pylcaio.DatabaseLoader  = pylcaio.DatabaseLoader(
    lca_database_processed = ecoinvent,
    io_database_processed = exiobase,
    lca_database_name_and_version = 'ecoinvent3.9',
    io_database_name_and_version = 'exiobase3'
)

In [28]:
lcaio_object: pylcaio.LCAIO = database_loader.combine_ecoinvent_exiobase(
    complete_extensions = False,
    impact_world = True,
    regionalized = False
)

No path for the capital folder was provided. Capitals will not be endogenized


In [31]:
lcaio_object.hybridize(
    method_double_counting = 'STAM',
    capitals = False,
)

KeyError: "None of [Index(['1048ed29-dc9e-5866-8b1c-d8a72a8267ed_3f6dada9-2497-4e1c-9e1b-eabafa6920f8',\n       '332eb8bb-add8-5317-8710-96775c4e0953_3f6dada9-2497-4e1c-9e1b-eabafa6920f8',\n       '3957068b-07f6-586c-83ea-5911f7c1d7e0_3f6dada9-2497-4e1c-9e1b-eabafa6920f8',\n       'd2080a25-f29e-5e9d-8fd1-55d77ec791e7_3f6dada9-2497-4e1c-9e1b-eabafa6920f8',\n       '02f02f1a-e177-5c99-ab52-fb2768e76071_a235b2ff-3237-44b0-a445-b852376a1939',\n       '02f02f1a-e177-5c99-ab52-fb2768e76071_d7f544d6-c372-4fc3-81ae-44aa3614c9fc',\n       '99808fb5-5e9a-5b68-970b-bad146e2de29_a235b2ff-3237-44b0-a445-b852376a1939',\n       '99808fb5-5e9a-5b68-970b-bad146e2de29_d7f544d6-c372-4fc3-81ae-44aa3614c9fc',\n       '07c1b13f-25ff-54ff-846c-54d0920de46c_692b4f7e-9e79-4f69-b22f-b66f68f2f9cc',\n       '07c1b13f-25ff-54ff-846c-54d0920de46c_f467c4d0-ea1c-4ae3-8d69-712598a0478a',\n       ...\n       'a078a64e-ed02-5fae-8868-a02f3368f7e1_57143a43-20ab-44a5-a4a2-26effbdfafd1',\n       'bf68033b-50f5-596a-a605-247a69f1a1b4_57143a43-20ab-44a5-a4a2-26effbdfafd1',\n       '30f669a7-4631-5081-9769-bb9ccb7d5810_aa6252cb-4154-42a5-8d03-6a345ee108b8',\n       '579c86f3-072d-5967-b391-86cbbc0b4048_71ea78cc-59a2-469a-bf4e-ac659ca0b32c',\n       '39964ce2-5de7-5d31-b31c-46ebb0bdd0e5_c784e421-2f79-47e4-8b6d-2c29fa41aad0',\n       '4242d0f1-ffbe-5679-93e8-4314a71646a8_cc7e810b-79d2-4c9c-96e6-60a3c739952c',\n       '96b5d631-96e5-508e-880a-381a93402b23_bc5bb983-c18d-4c7c-9035-850b88974d59',\n       '85e606eb-2a57-5221-b5b4-401371acb39d_923a8462-8631-4096-bf70-1aa37f90341b',\n       '881f0e6f-186e-569a-9a7e-f38f779c9551_8e8b442a-da4e-42e1-9b6c-bd83619f8cb0',\n       '98f1a041-6f8a-5279-83b4-15a8217f4f1c_24423cba-455e-4dd1-a473-65f0e667397c'],\n      dtype='object', length=5789)] are in the [index]"

In [20]:
lcaio_object.save_system(
    file_name = 'hybrid_iwp.pickle',
    file_path = path_dir_data_processed,
    format = 'pickle'
)

Database saved to /Users/michaelweinold/data/data_processed/hybrid_iwp.pickle
Description file saved to /Users/michaelweinold/data/data_processed/description_hybrid_iw.txt
