# Structured Publication of Life Cycle Assessment Models
## Example using US LCI
### Brandon Kuczenski

Last update 11/16/2017

This document accompanies a paper under submission to the _Journal of Industrial Ecology_.  The notebook describes how to reproduce the "secondary aluminum" example described in the manuscript, and illustrates how a set of free tools by the author can be used to generate structured publications of any other product system in the US LCI database as well.

The example derived from Ecoinvent can also be easily reproduced; however, it requires an Ecoinvent license.  

To reproduce the example, you must locally install three repositories from http://github.com, all of which use Python 3.  Support for python 2 is not provided.  

To begin, first ensure you have a functional python 3 environment: at a command line enter `python --version`. This code has been tested on Python 3.6.0:

    $ python --version
    Python 3.6.0
    
You also need a package installer; I use `pip`:

    $ pip --version
    pip 9.0.1 from /usr/lib/python3.6/site-packages (python 3.6)

(if your pip is out of date, try `pip install --upgrade pip`)

After that, procure the following three repositories from Github:

    git clone git@github.com:bkuczenski/lca-tools.git
    git clone git@github.com:bkuczenski/lca-matrix.git
    git clone git@github.com:bkuczenski/lca-tools-datafiles.git
    
Within `lca-tools` and `lca-matrix` it is necessary to select the `streamline` branch because it contains the most current work:

    cd lca-tools
    git checkout streamline
    cd ../lca-matrix
    git checkout streamline

Next, prepare your python environment.  I recommend creating a new `virtualenv` for your work, but this is not required (note changes as appropriate for your environment):

    mkvirtualenv -p /usr/bin/python3 bk-JIE

Please install the following packages:

 * `eight` -- for some early attempts at backward compatibility 
 * `xlrd` -- read Excel files
 * `xlwt` -- write Excel files
 * `lxml` -- handle XML
 * `pylzma` -- read 7zip files (commonly used by ecoinvent; ~5x better compression than zip)
 * `scipy` -- for sparse matrices -- may require other (non-python) components
 * `matplotlib` -- for plotting data

You should also ensure that you have `ipython` and `jupyter`, both of which can be installed with `pip` if they are not already on your system.  After that you are ready to go! 

    ipython
    

#### Environment config

In [7]:
import os
import sys
GITHUB_PATH = '/data/GitHub/'  # customize for your environment
PUBLISH_PATH = '/data/GitHub/2017/Publication-JIE/examples'
sys.path.append(os.path.join(GITHUB_PATH, 'lca-tools'))
sys.path.append(os.path.join(GITHUB_PATH, 'lca-matrix'))

import lcatools
import lcamatrix


#### Identify the data files

In [10]:
CATALOG = os.path.join(GITHUB_PATH, 'lca-tools-datafiles/catalogs')

USLCI = 'uslci_clean.json.gz'
EI_LCIA = 'ei_lcia.json.gz'

In [11]:
US = lcatools.archive_from_json(os.path.join(CATALOG, USLCI))

Loading JSON data from /data/GitHub/lca-tools-datafiles/catalogs/uslci_clean.json.gz:
Found Extension: zip
701 new process entities added (701 total)
4176 new flow entities added (4176 total)
20 new quantity entities added (20 total)


For now, the technology matrix only works with allocated single-output processes, but USLCI contains many multi-output processes.  Here we introduce allocations.  The allocation factors can be modified by editing the `uslci.py` file in the `lca-matrix` repository.

In [12]:
from lcamatrix.uslci import uslci_allocations, uslci_flow_characterizations
from lcamatrix.catalog import apply_allocation, apply_flow_config

In [13]:
apply_flow_config(US, uslci_flow_characterizations)
apply_allocation(US, uslci_allocations)

### Prepare the Flow Database -- for LCIA

In [19]:
from lcatools.flowdb.flowdb import FlowDB
db = FlowDB()

The Flow Database stores collections of _flowables_, which are substances (mainly distinguished by CAS number) and _quantities_, which include both physical quantities and LCIA characterization quantities.  For a given quantity, each flowable can be characterized with respect to a particular _compartment_.  Compartment names are pre-harmonized with USLCI, Ecoinvent, ILCD, and GaBi.  

The flow database starts out empty and gets populated with data by the user.  Each flow is parsed into a flowable and a compartment, and then that flow's characterizations are added to the database.  A flow's identity is detected by comparing its CAS number, UUID, and name to a giant list of about 6,000,000 synonyms (drawn from ILCD and Ecoinvent) to map it to one of about 7,400 known flowables.  The synonym list needs some work, but it's serviceable.

In [22]:
EL = lcatools.archive_from_json(os.path.join(CATALOG, EI_LCIA))

Loading JSON data from /data/GitHub/lca-tools-datafiles/catalogs/ei_lcia.json.gz:
**Upstream reference encountered: /data/LCI/Ecoinvent/3.2/cutoff

0 new process entities added (0 total)
3255 new flow entities added (3255 total)
708 new quantity entities added (708 total)


The `import_archive_cfs` function returns a list of flows that were not found among the synonyms; for now we are ignoring them.

In [23]:
_ = db.import_archive_cfs(EL)

At this point the flow database has been populated with the characterization factors for all 700 LCIA methods that are part of Ecoinvent's LCIA implementation.

### Prepare the background LCI Database

In [14]:
from lcamatrix.background import BackgroundManager

The background manager performs a partial ordering of the database, identifying foreground, background, and cutoff flows.

In [15]:
B = BackgroundManager(US)
B.add_all_ref_products()

self-dependency detected! Paper, freesheet, uncoated, average production, at mill, 2006 [RNA]


Outputs are product flows in the foreground that are not required by any other process in the database.

In [17]:
outputs = [k for k in B.product_flows()]
len(outputs) == 395

True

Here I am identifying product systems that contain more than one foreground node.

In [18]:
product_systems = [h for h in outputs if len(B.foreground(h)) > 1]
len(product_systems) == 103

True

### Pick our product system

In [54]:
al = [p for p in B.product_flows('aluminum')]
for i, ps in enumerate(al):
    print('%2d:  %s' % (i, ps.process))

 0:  Aluminum, secondary, rolled [RNA]
 1:  Aluminum, secondary, shape casted [RNA]
 2:  Semi-permanent mold (SPM) casting, aluminum [RNA]
 3:  Aluminum, secondary, ingot, at plant, 1998 [RNA]
 4:  Aluminum, extrusion, at plant [RNA]
 5:  Precision sand casting, aluminum [RNA]
 6:  Aluminum, secondary, ingot, from automotive scrap, at plant [RNA]
 7:  Aluminum, hot rolling, at plant [RNA]
 8:  Aluminum, secondary, extruded [RNA]
 9:  Aluminum, secondary, ingot, from beverage cans, at plant [RNA]
10:  Lost foam casting, aluminum [RNA]
11:  Aluminum, primary, ingot, at plant, 1998 [RNA]
12:  Aluminum, cold rolling, at plant [RNA]


In [28]:
my_ps = al[6]  # this is the product system used in the JIE paper as an example

#### Inspect our product system.

In [29]:
from lcamatrix.foreground import ForegroundFragment
from lcamatrix.foreground_publication import ForegroundPublication
from lcamatrix.foreground_table import ForegroundTeX

In [30]:
frag = ForegroundFragment(B, db, my_ps)

Fragment with 4 foreground flows
 Ad: 39x4, 17 nonzero
 Bf: 3427x4, 26 nonzero


In [31]:
frag.Af

<4x4 sparse matrix of type '<class 'numpy.float64'>'
	with 3 stored elements in Compressed Sparse Row format>

In [32]:
frag.Af.todense()

matrix([[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
           0.00000000e+00],
        [  1.03200000e+00,   0.00000000e+00,   0.00000000e+00,
           0.00000000e+00],
        [  2.35000000e-05,   0.00000000e+00,   0.00000000e+00,
           0.00000000e+00],
        [  0.00000000e+00,   0.00000000e+00,   1.87000000e+00,
           0.00000000e+00]])

The `DisplayFragment` class uses `pandas` to print the foreground data in nice tables; take a look at it if you are interested.  For now, let's just generate our outputs.

#### Characterize the product system with respect to Ecoinvent TRACI

In [33]:
qs_traci = EL.search(entity_type='quantity', Method='traci')
[str(q) for q in qs_traci]

['TRACI, human health, carcinogenics [kg benzene-Eq] [LCIA]',
 'TRACI, environmental impact, ecotoxicity [kg 2,4-D-Eq] [LCIA]',
 'TRACI, environmental impact, ozone depletion [kg CFC-11-Eq] [LCIA]',
 'TRACI, environmental impact, acidification [moles of H+-Eq] [LCIA]',
 'TRACI, environmental impact, global warming [kg CO2-Eq] [LCIA]',
 'TRACI, human health, respiratory effects, average [kg PM2.5-Eq] [LCIA]',
 'TRACI, environmental impact, photochemical oxidation [kg NOx-Eq] [LCIA]',
 'TRACI, environmental impact, eutrophication [kg N] [LCIA]',
 'TRACI, human health, non-carcinogenics [kg toluene-Eq] [LCIA]']

Because of the way the Flow Database works, it is possible to make approximate matches for flows whose characteristics don't correspond exactly.  If there are multiple possible characterizations, the user must specify which ones to use.  

The user gets prompted to make this decision when the product system is characterized.  In this case, the user must decide whether the flow "glyphosate [soil, unspecified]" should be characterized as an emission to agricultural or industrial soil.



In [35]:
for q in qs_traci:
    frag.characterize(q)

Multiple CFs found: [1.0558e-06, 7.8959e-06]
Flow: Glyphosate [unspecified] [kg]
Quantity: TRACI, human health, carcinogenics [kg benzene-Eq] [LCIA]
Pick characterization to apply

Select item: 

Choice Item
 [0]      7.9e-06 [GLO] [kg benzene-Eq] Glyphosate (CAS 001071-83-6) [industrial]
 [1]     1.06e-06 [GLO] [kg benzene-Eq] Glyphosate (CAS 001071-83-6) [agricultural]
------ ----------------------------------------------------------------------
Enter choice (or "None"): 0
Multiple CFs found: [0.27015, 0.24539]
Flow: Glyphosate [unspecified] [kg]
Quantity: TRACI, environmental impact, ecotoxicity [kg 2,4-D-Eq] [LCIA]
Pick characterization to apply

Select item: 

Choice Item
 [0]         0.27 [GLO] [kg 2,4-D-Eq] Glyphosate (CAS 001071-83-6) [agricultural]
 [1]        0.245 [GLO] [kg 2,4-D-Eq] Glyphosate (CAS 001071-83-6) [industrial]
------ ----------------------------------------------------------------------
Enter choice (or "None"): 1
Multiple CFs found: [0.0087974, 0.033012]
Flow

In [36]:
frag.fg_lcia()  # foreground LCIA results

matrix([[  1.25787600e-05],
        [  3.10154400e-07],
        [  0.00000000e+00],
        [  1.79034750e-07],
        [  1.80480000e-05],
        [  8.49912750e-10],
        [  0.00000000e+00],
        [  4.15675000e-08],
        [  4.69519200e-01]])

In [38]:
frag.bg_lcia().todense()  # background LCIA results

matrix([[  4.20592170e-04],
        [  2.71670005e-02],
        [  5.57925193e-12],
        [  2.51700469e-01],
        [  1.07362785e+00],
        [  8.77619877e-04],
        [  2.56952813e-03],
        [  9.13285980e-05],
        [  1.06014598e+00]])

In [42]:
res = frag.lcia_results()
res.show()  # more comprehensive results

completed 20 iterations
LCIA Results
Aluminum, secondary, ingot, from automotive scrap, at plant [RNA]:==Aluminum, secondary, ingot, from automotive scrap, at plant [Other Aluminum Rolling and Drawing]
------------------------------------------------------------
[ 0] 025  0.00043317 TRACI, human health, carcinogenics [kg benzene-Eq] [LCIA]
[ 1] 118    0.027167 TRACI, environmental impact, ecotoxicity [kg 2,4-D-Eq] [LCIA]
[ 2] 1cb  5.5793e-12 TRACI, environmental impact, ozone depletion [kg CFC-11-Eq] [LCIA]
[ 3] 44f      0.2517 TRACI, environmental impact, acidification [moles of H+-Eq] [LCIA]
[ 4] 5f9      1.0736 TRACI, environmental impact, global warming [kg CO2-Eq] [LCIA]
[ 5] 780  0.00087762 TRACI, human health, respiratory effects, average [kg PM2.5-Eq] [LCIA]
[ 6] 9b8   0.0025695 TRACI, environmental impact, photochemical oxidation [kg NOx-Eq] [LCIA]
[ 7] c58   9.137e-05 TRACI, environmental impact, eutrophication [kg N] [LCIA]
[ 8] d93      1.5297 TRACI, human health, non-carci

In [43]:
res['44f'].show_details()  # select an LCIA method by the leading portion of its UUID 

TRACI, environmental impact, acidification [moles of H+-Eq] [LCIA] moles of H+-Eq
------------------------------------------------------------
   0.00307 x       50.8 =      0.156 [GLO] Sulfur dioxide  [unspecified]
     0.002 x         40 =       0.08 [GLO] Nitrogen oxides  [unspecified]
  0.000186 x       50.8 =    0.00945 [GLO] Sulfur oxides [unspecified]
   0.00011 x       44.7 =    0.00491 [GLO] Hydrogen Chloride [unspecified]
  1.37e-05 x       81.3 =    0.00111 [GLO] Hydrogen fluoride [unspecified]
  8.83e-07 x       95.5 =   8.43e-05 [GLO] Ammonia  [unspecified]
             Total score: 0.251701 
     0.252 TRACI, environmental impact, acidification [moles of H+-Eq] [LCIA]


At this time, the background manager does not automatically perform contribution analysis over the fragment, but that is easy to do from the publication.

### Publish our foreground model

In [47]:
# TeX table
t = ForegroundTeX(frag)
with open(os.path.join(PUBLISH_PATH, 'aluminum-secondary_doco.tex'), 'w') as fp:
    fp.write(t.foreground_table(aggregate=True))
    

In [49]:
# XLS spreadsheet
pub = ForegroundPublication(frag, audit_cf=False)  # enter audit_cf=True to see all characterization factors
pub.publish(os.path.join(PUBLISH_PATH, 'aluminum-secondary_doco.xls'), 
            full=False)  # full=True for non-sparse matrices

completed 20 iterations
AD01
completed 20 iterations
AD04
completed 21 iterations
AD13
completed 20 iterations
AD18
completed 19 iterations
AD20
completed 20 iterations
AD24
completed 20 iterations
AD30
completed 20 iterations
AD33
completed 20 iterations
AD34
