# Getting started with pyLCAIO

Welcome to the tutorial of pyLCAIO which will show you how to simply hybridize (with default values) an LCA database and an IO database. If having any problems refer to the FAQ section at the end of this notebook, if it does not answer your problem then raise an issue on Github.

PyLCAIO works with Python and will therefore require you to download Python on your machine. You can either dowload it on https://www.python.org/ or use https://www.anaconda.com/ which makes life easier.

Furthermore, running pyLCAIO __requires 12GB RAM__. Below this amount of RAM, there is a high change that you will encouter a "MemoryError" during the use of pyLCAIO, for which there is no solution currently.

**Check the modules that need to be imported or not**

To make sure that everything run smoothly when you are using pyLCAIO, we recommend that you set the versions of the different modules used as follows (just run these command in your command prompt):
* pip install pandas==0.23.4
* pip install numpy==1.13.1

If the system states that you cannot install these specific versions because you are not allowed to, enter these commands:
* pip install pandas==0.23.4 --user
* pip install numpy==1.13.1 --user

Working with later updates of these modules can result in bugs.

For this tutorial, ecoinvent3.5 and exiobase3 were chosen as our LCA and IO database respectively. Therefore, you need to download the ecoinvent 3.5_cutoff_ecoSpold02.7z and ecoinvent 3.5_LCIA_implementation.7z files from https://v35.ecoquery.ecoinvent.org/File/Files and __unzip them__.  You also need to download a pxp (product by product) file of whichever reference year you want from exiobase3 monatery versions found here: https://www.exiobase.eu/index.php/data-download/exiobase3mon

These databases in the format you jsut downloaded them are not readable by Python (and therefore by pyLCAIO) and must be adapted to a python-readable format. PyLCAIO relies on two external modules to do this: ecospold2matrix (https://github.com/majeau-bettez/ecospold2matrix) and pymrio (https://github.com/konstantinstadler/pymrio). You need to download these two modules.

For ecospold2matrix, you need to download the dev branch and not the master branch.

<img src="images/prtsc1.png">

### Create the dataframe of ecoinvent

To create the dataframe of ecoinvent, follow this other tutorial (https://github.com/majeau-bettez/ecospold2matrix/blob/master/doc/ecospold2matrix_demo.ipynb).

The code you are running with ecospold2matrix should look like this:

<img src="images/prtsc2.png">

With your own pathes to the different modules/database of course and the project name that you find suitable. Creating the dataframe will take about 30 minutes and do not panick when red lines appear, it's normal!

After using ecospold2matrix, you should have in the "out_dir" you indicated a pickle file with a similar name to this: ecoinvent3.5.cutoffPandas_symmNorm.gz.pickle which contains the transformed ecoinvent.

Note: pickle is a storage format, just like .docx or .pdf, except it's only readable by Python.

We now have ecoinvent in dataframes, but it is in a pickle format. We need to unpickle it:

In [None]:
import gzip
import pickle
import pandas as pd
# do not forget to change the path to your ecoinvent pickle
with gzip.open('my_path_to_the_ecoinvent_pickle','rb') as f:
    ecoinvent = pd.read_pickle(f)

Now the ecoinvent variable contains a dictionary (a collection) of dataframes for the different matrices ecoinvent provides (technosphere, biosphere, metadata, characterization).

### Create the dataframe of exiobase

Much simpler, pymrio already includes a simple parser. You just need to run the method parse_exiobase3 as follows:

In [None]:
import sys
# still the pathes to change!
sys.path.append('my_path_to_pymrio')
import pymrio
io = pymrio.parse_exiobase3('my_path_to_exiobase')

### Run pyLCAIO

We now imported everything we needed, except pylcaio itself.

In [None]:
import sys
# one last path to change!
sys.path.append('my_path_to_pylcaio/src/')
import pylcaio

PyLCAIO is divided into three classes: DatabaseLoader, LCAIO and Analysis.

* DatabaseLoader is the class that is assigned to successfully extract all information needed from the ecoinvent and exiobase dataframes we previously imported, and modify the information. It also loads appropriate product concordance, geography concordance, filter, STAM filter matrices and STAM categories.
* LCAIO is the class assigned to the hybridization itself. Once everything was extracted by DatabaseLoader, LCAIO incorporates this data and use it to create the hybrid database.
* Analysis is the class assigned to analyze and perform analyses on the resulting hybrid databases, such as calculation of life cycle emissions or contribution analyses. 

In object-oriented programming, every class must be initiallized to create a object of that class (here we called it database_loader). To initialize DatabaseLoader, four arguments are needed. The unpickled dictionary of dataframes created through ecospold2matrix (we previously called it ecoinvent), the parsed exiobase file created with pymrio (we previously called it io), the name and version of both databases to hybridize (here it's 'ecoinvent3.5' and 'exiobase3').

In [None]:
database_loader = pylcaio.DatabaseLoader(ecoinvent, io, 'ecoinvent3.5', 'exiobase3')

The object database_loader was successlly created. We now can run its ".combine_ecoinvent_exiobase()" method and at the same time, we initialize a second object (from the class LCAIO called lcaio_object) with the output of the method ".combine_ecoinvent_exiobase()".

The operation takes 1 or 2 minutes.

In [None]:
lcaio_object = database_loader.combine_ecoinvent_exiobase()

The lcaio_object was just initialized and ecoinvent and exiobase dataframes along with other parameters were transferred to it. We can now hybridize both databases with the method ".hybridize()". This method requires an argument: the name of the method to correct double counting. There are currently two choices available: 'STAM' or 'binary'. For details on these methods and know which one to use refer to: 
* Agez, Maxime, Guillaume Majeau-Bettez, Manuele Margni, Anders Hammer Strømman, and Réjean Samson. 2019. “Lifting the Veil on the Correction of Double Counting Incidents in Hybrid Life Cycle Assessment.” Journal of Industrial Ecology.
* Agez, Maxime, Richard Wood, Manuele Margni, Anders Hammer Strømman, Réjean Samson, and Guillaume Majeau-Bettez. 2019. “Hybridization of Complete LCA and MRIO Databases for Comprehensive a Product System Coverage.” Journal of Industrial Ecology.

In [None]:
lcaio_object.hybridize('STAM')

The operation should take around 15 minutes (we are inverting a 26000x26000 matrix here so it takes a while)

We can now save the resulting hybrid database in a pickle to not have to rerun the hybridization process everytime. By default the created pickle will appear in the /src/Databases/hybrid_databases/ folder of your pylcaio package.

In [None]:
lcaio_object.save_system('pickle')

# FAQ

__*I just saved the hybridized system and everything went smoothly. What do I do now?*__

To perform analyses on the hybridized database use the *Analysis* class of pyLCAIO. To know how to use it refer to the following notebook: *Coming soon!*

To add processes and create your own system and hybridize it, refer to the following notebook: *Coming soon!*

__*Error ModuleNotFound*__

To install whatever module is necessary you can use the pip install command. If the module was already installed but the code still says it cannot find that is because it is not watching at the right place. We thus need to guide it. Enter sys.path.append() and put the location of the module inside the brackets (after having put ' ' around it) and after importing the sys module.

__*Python cannot find pylcaio even though I entered the path in sys.path.append*__

Be sure to integrate src/ in your path to pylcaio.

__*I get an error in ecospold2matrix that says that it cannot find IntermediateExchanges.xml*__

You most probably forgot to unzip the ecospold file of ecoinvent.

__*I don't know what a path is and how to find it*__

A path is the adress of a file or folder on your computer. To obtain the path of an object, just check its Properties and copy paste its location. For Windows users the path should start with C://Users/, for Mac and Linux users it should start with /Users/.