# Import Brightway libraries

In [1]:
import bw2io as bi
import bw2data as bd
import bw2calc as bc
from functools import partial

# Import ecoinvent 3.6

In [2]:
bd.projects.set_current("ecoinvent 3.6")

In [3]:
bi.bw2setup()

Creating default biosphere



Writing activities to SQLite3 database:


Applying strategy: normalize_units
Applying strategy: drop_unspecified_subcategories
Applied 2 strategies in 0.01 seconds


0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00


Title: Writing activities to SQLite3 database:
  Started: 02/25/2021 12:20:41
  Finished: 02/25/2021 12:20:42
  Total time elapsed: 00:00:00
  CPU %: 56.00
  Memory %: 0.98
Created database: biosphere3
Creating default LCIA methods

Applying strategy: normalize_units
Applying strategy: set_biosphere_type
Applying strategy: drop_unspecified_subcategories
Applying strategy: link_iterable_by_fields
Applied 4 strategies in 1.36 seconds
Wrote 850 LCIA methods with 219059 characterization factors
Creating core data migrations



In [6]:
ei = bi.SingleOutputEcospold2Importer("/Users/cmutel/Documents/LCA/Ecoinvent/3.6/cutoff/datasets/", "ecoinvent 3.6")
ei.apply_strategies()

Extracting XML data from 18121 datasets
Extracted 18121 datasets in 138.26 seconds
Applying strategy: normalize_units
Applying strategy: update_ecoinvent_locations
Applying strategy: remove_zero_amount_coproducts
Applying strategy: remove_zero_amount_inputs_with_no_activity
Applying strategy: remove_unnamed_parameters
Applying strategy: es2_assign_only_product_with_amount_as_reference_product
Applying strategy: assign_single_product_as_activity
Applying strategy: create_composite_code
Applying strategy: drop_unspecified_subcategories
Applying strategy: fix_ecoinvent_flows_pre35
Applying strategy: drop_temporary_outdated_biosphere_flows
Applying strategy: link_biosphere_by_flow_uuid
Applying strategy: link_internal_technosphere_by_composite_code
Applying strategy: delete_exchanges_missing_activity
Applying strategy: delete_ghost_exchanges
Applying strategy: remove_uncertainty_from_negative_loss_exchanges
Applying strategy: fix_unreasonably_high_lognormal_uncertainties
Applying strategy:

In [7]:
ei.statistics()

18121 datasets
615644 exchanges
0 unlinked exchanges
  


(18121, 615644, 0)

In [8]:
ei.write_database()

Writing activities to SQLite3 database:
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:01:09


Title: Writing activities to SQLite3 database:
  Started: 02/25/2021 12:26:12
  Finished: 02/25/2021 12:27:21
  Total time elapsed: 00:01:09
  CPU %: 80.00
  Memory %: 19.24
Created database: ecoinvent 3.6


Brightway2 SQLiteBackend: ecoinvent 3.6

In [9]:
bd.databases

Databases dictionary with 2 object(s):
	biosphere3
	ecoinvent 3.6

# Provide method metadata

In [10]:
DESCRIPTION=("A midpoint-level mineral resource impact assessment method matching this perspective is proposed, "
             "called the crustal scarcity indicator (CSI), with characterization factors called crustal scarcity "
             "potentials (CSPs) measured as kg silicon equivalents per kg element. They are based on crustal "
             "concentrations, which have been suggested to correlate with several important resource metrics "
             "(reserves, reserve base, reserves plus cumulative production, and ore deposits), thereby constituting "
             "proxies for long-term global elemental scarcity.")

In [11]:
ei = bi.ExcelLCIAImporter(
    filepath="CSI_CF.xlsx", 
    name=("Crustal Scarcity Index", "1.0"), 
    description=DESCRIPTION, 
    unit="kilogram Silicon equivalent",
    # Can add any additional metadata
    article_url="https://link.springer.com/article/10.1007%2Fs11367-020-01781-1",
    doi="https://doi.org/10.1007/s11367-020-01781-1",
    authors="Rickard Arvidsson, Maria Ljunggren Söderman, Björn A. Sandén, Anders Nordelöf, Hampus André, Anne-Marie Tillman",
)

# Data cleanup

## Step 1: Remove empty lines

Because of the way the Excel works, lines you touch but are still empty can be considered part of the worksheet. We remove these.

In [12]:
[cf for cf in ei.data[0]['exchanges'] if not cf['Flow']]

[{'Type': None,
  'Flow': None,
  'Category': None,
  'Sub category': None,
  'Flow property': None,
  'Chracterization Factor': None,
  'Unit': None,
  None: None},
 {'Type': None,
  'Flow': None,
  'Category': None,
  'Sub category': None,
  'Flow property': None,
  'Chracterization Factor': None,
  'Unit': None,
  None: None}]

In [13]:
def drop_empty_lines(data):
    for method in data:
        method['exchanges'] = [obj for obj in method['exchanges'] if obj['Flow']]
    return data    

In [14]:
ei.apply_strategy(drop_empty_lines)

Applying strategy: drop_empty_lines


## Step 2: Change `Flow` to `name`, 'Chracterization Factor' to 'amount'

We need to match the `name` in the CF to the `name` in the biosphere database.

In [15]:
def rename_flow(data):
    for method in data:
        for cf in method['exchanges']:
            cf['name'] = cf.pop('Flow')
            cf['amount'] = cf.pop('Chracterization Factor')
    return data

In [16]:
ei.apply_strategy(rename_flow)

Applying strategy: rename_flow


## Step 3: Change category to ecoinvent standard

CSI input data uses `Resource`, but ecoinvent data uses `natural resource`.

In [17]:
def rename_resource_category(data):
    for method in data:
        for cf in method['exchanges']:
            if cf['Category'] == 'Resource':
                cf['Category'] = 'natural resource'
    return data

In [18]:
ei.apply_strategy(rename_resource_category)

Applying strategy: rename_resource_category


## Step 4: Create tuple of `categories`

Brightway support a [tuple](https://en.wikipedia.org/wiki/Tuple) of categories of arbitrary length, while CSI uses length 1 or 2. Length 1 uses `undefined` as a dummy parameter. We convert the fields `Category` and `Sub category` to a tuple of uniform style, without dummy parameters.

In [19]:
def create_categories_as_tuple(data):
    for method in data:
        for cf in method['exchanges']:
            if cf['Sub category'] == 'unspecified':
                cf['categories'] = (cf['Category'],)
            else:
                cf['categories'] = (cf['Category'], cf['Sub category'])
    return data

In [20]:
ei.apply_strategy(create_categories_as_tuple)

Applying strategy: create_categories_as_tuple


# Run the normal strategies

This includes linking against the ecoinvent biosphere database.

In [21]:
ei.apply_strategies()

Applying strategy: csv_restore_tuples
Applying strategy: csv_numerize
Applying strategy: csv_drop_unknown
Applying strategy: set_biosphere_type
Applying strategy: drop_unspecified_subcategories
Applying strategy: link_iterable_by_fields
Applying strategy: drop_falsey_uncertainty_fields_but_keep_zeros
Applying strategy: convert_uncertainty_types_to_integers
Applied 8 strategies in 0.23 seconds


Check our results.

In [22]:
ei.statistics()

1 methods
597 cfs
405 unlinked cfs


(1, 597, 405)

Not great. As with all IO operations, we now enter an iterative improvement period...

# Iterative improvement

## Step 1: Name discrepancies

First, we notice that there are some name discrepancies. Namely, CSI provides reasonable names:

In [23]:
[x for x in ei.data[0]['exchanges'] if x['name'] == 'Copper, 0.59% in sulfide, Cu 0.22% and Mo 8.2E-3% in crude ore']

[{'Type': 'A',
  'Category': 'natural resource',
  'Sub category': 'unspecified',
  'Flow property': 'Mass',
  'Unit': 'kg Si-Eq/kg',
  None: None,
  'name': 'Copper, 0.59% in sulfide, Cu 0.22% and Mo 8.2E-3% in crude ore',
  'amount': 10000,
  'categories': ('natural resource',),
  'type': 'biosphere'}]

But ecoinvent does something weird:

In [24]:
[x 
 for x in bd.Database(bd.config.biosphere) 
 if 'Copper, 0.59% in sulfide, Cu 0.22% and Mo 8.2E-3% in crude ore' in x['name'] 
 and x['categories'][0] == 'natural resource']

['Copper, 0.59% in sulfide, Cu 0.22% and Mo 8.2E-3% in crude ore, in ground' (kilogram, None, ('natural resource', 'in ground'))]

It is subtle, but for some reason ecoinvent adds the suffix **`, in ground`**, even though we know that as the category is `in ground`.

We also notice a mismatch here: The `categories` don't match. CSI doesn't have a CF for `('natural resource', 'in ground')`, while ecoinvent doesn't have a flow for `('natural resource',)`.

We will need two strategies to fix these two issues.

In [25]:
def add_in_ground_if_helpful(data):
    biosphere = {o['name'] 
                 for o in bd.Database(bd.config.biosphere) 
                 if o['categories'][0] == 'natural resource'}
    
    for method in data:
        for cf in method['exchanges']:
            if not cf.get('input'):  # Skip already linked CFs
                if cf['name'] not in biosphere and cf['name'] + ", in ground" in biosphere:
                    cf['name'] += ", in ground"
    
    return data

In [26]:
ei.apply_strategy(add_in_ground_if_helpful)

Applying strategy: add_in_ground_if_helpful


In [27]:
def change_categories_if_helpful(data):
    biosphere = {(o['name'], tuple(o['categories'])) for o in bd.Database(bd.config.biosphere)}
    
    for method in data:
        for cf in method['exchanges']:
            if cf.get('input'):  # Skip already linked CFs
                continue
            if (
                    (cf['name'], cf['categories']) not in biosphere 
                    and (cf['name'], cf['categories'] + ("in ground",)) in biosphere
                ):
                cf['categories'] = cf['categories'] + ("in ground",)
    
    return data    

In [28]:
ei.apply_strategy(change_categories_if_helpful)

Applying strategy: change_categories_if_helpful


Try linking with our changes.

In [29]:
ei.apply_strategy(partial(
    bi.strategies.link_iterable_by_fields,
    other=bd.Database(bd.config.biosphere),
    kind="biosphere",
    fields=("name", "categories"),
))

Applying strategy: link_iterable_by_fields


In [30]:
ei.statistics()

1 methods
597 cfs
231 unlinked cfs


(1, 597, 231)

## Step 2: Pain

Ecoinvent has the following flows for copper:

* Copper, 1.18% in sulfide, Cu 0.39% and Mo 8.2E-3% in crude ore, in ground
* Copper, 1.25% in sulfide, Cu 0.24% and Zn 0,1% in crude ore, in ground
* Copper, 1.42% in sulfide, Cu 0.81% and Mo 8.2E-3% in crude ore, in ground

etc. Oh, and also:

* Cu, Cu 5.2E-2%, Pt 4.8E-4%, Pd 2.0E-4%, Rh 2.4E-5%, Ni 3.7E-2% in ore, in ground

<img src="consistency.png">

Let's see what the CST developers have and don't have:

In [31]:
ei.write_excel("CSI")

Wrote matching file to:
/Users/cmutel/Library/Application Support/Brightway3/ecoinvent-36.d2073cbbe9aff3f7cc76c4f9a36d6789/output/lcia-matching-CSI.xlsx


This file is also available [here](https://github.com/brightway-lca/brightway2/raw/master/notebooks/files/lcia-matching-CSI.xlsx).

After checking these against the list of biosphere flows in ecoinvent 3.7.1, it looks like the CSI developers got everything! The missing flows only exist in SimaPro for compatibiltiy with other databases.

BTW, if you want to see a more reasonable list of flows, check out https://github.com/USEPA/Federal-LCA-Commons-Elementary-Flow-List.

## Step 3: Drop the missing CFs

These are for flows not present in the "clean" version of ecoinvent, and so won't matter for our results.

In [32]:
ei.drop_unlinked(True)

Applying strategy: drop_unlinked_cfs
Applied 1 strategies in 0.00 seconds


In [33]:
ei.statistics()

1 methods
366 cfs
0 unlinked cfs


(1, 366, 0)

## Step 4: Check to make sure we don't have duplicates

Due to the way the [scipy sparse](https://docs.scipy.org/doc/scipy/reference/sparse.html) matrix constructors work, duplicate CFs would be summed during the construction of the characterization matrix. This would be bad :)

In [34]:
len(ei.data[0]['exchanges']), len({tuple(o['input']) for o in ei.data[0]['exchanges']})

(366, 216)

C'est pas bon... but at least easy to fix.

In [52]:
unique_exchanges = {tuple(o['input']): o for o in ei.data[0]['exchanges']}
ei.data[0]['exchanges'] = list(unique_exchanges.values())

In [53]:
ei.write_methods(overwrite=True)

Wrote 1 LCIA methods with 216 characterization factors


# Check against reference values

In [54]:
steel = next(x for x in bd.Database("ecoinvent 3.6") if x['name'] == 'market for steel, low-alloyed')
steel

'market for steel, low-alloyed' (kilogram, GLO, None)

In [59]:
REFERENCE = 433.46247

In [60]:
lca = bc.LCA({steel: 1}, ("Crustal Scarcity Index", "1.0"))
lca.lci()
lca.lcia()
print(lca.score / REFERENCE)

0.9999999298305655


In [58]:
copper = next(x for x in bd.Database("ecoinvent 3.6") if x['name'] == 'market for copper')
copper

'market for copper' (kilogram, GLO, None)

In [62]:
REFERENCE = 23868.3

In [63]:
lca = bc.LCA({copper: 1}, ("Crustal Scarcity Index", "1.0"))
lca.lci()
lca.lcia()
print(lca.score / REFERENCE)

0.999999266738787


In [64]:
nickel = next(x for x in bd.Database("ecoinvent 3.6") if x['name'] == 'market for nickel, 99.5%')
nickel

'market for nickel, 99.5%' (kilogram, GLO, None)

In [65]:
REFERENCE = 4.78E+04

In [66]:
lca = bc.LCA({nickel: 1}, ("Crustal Scarcity Index", "1.0"))
lca.lci()
lca.lcia()
print(lca.score / REFERENCE)

1.0002587730554768


Pretty good. The details on the [SimaPro matrix algorithm](https://pre-sustainability.com/files/2014/05/The_Improved_SimaPro_8_Calculation_Engine_Whitepaper_A4.pdf) are not transparent, but I think they do something iterative. We use [pardiso](https://www.pardiso-project.org/), which in my judgment is more reliable, but I don't have actual evidence either way.