## Getting started

Import Brightway2 functionality

In [1]:
from brightway2 import *

Create new project

In [2]:
projects.current = "find-multioutput-processes"

Install base data

In [3]:
bw2setup()

Creating default biosphere

Applying strategy: drop_unspecified_subcategories

Writing activities to SQLite3 database:
0%                          100%
[##############################] | ETA[sec]: 0.000 
Total time elapsed: 0.362 sec



Title: Writing activities to SQLite3 database:
  Started: 08/25/2015 10:41:24
  Finished: 08/25/2015 10:41:24
  Total time elapsed: 0.362 sec
  CPU %: 94.100000
  Memory %: 0.262725
Created database: biosphere3
Creating default LCIA methods

Applying strategy: set_biosphere_type
Applying strategy: drop_unspecified_subcategories
Applying strategy: link_iterable_by_fields
Wrote 692 LCIA methods with 170915 characterization factors
Creating core data migrations



## Importing ecoinvent 3.1

Import ecoinvent 3.1 cutoff, but don't do any data manipulation yet

In [4]:
ei31cutoff = SingleOutputEcospold2Importer(
    "/Users/cmutel/Documents/LCA Documents/Ecoinvent/3.1/cutoff/datasets",
    "ecoinvent 3.1 cutoff"
)

Extracting ecospold2 files:
0%                          100%
[##############################] | ETA[sec]: 0.000 | Item ID: fff527b1-0fe4-4
Total time elapsed: 84.606 sec


Title: Extracting ecospold2 files:
  Started: 08/25/2015 10:42:13
  Finished: 08/25/2015 10:43:37
  Total time elapsed: 84.606 sec
  CPU %: 82.800000
  Memory %: 2.945280
Extracted 11301 datasets in 84.80 seconds


## Get multioutput datasets

It is easy to find multioutput datasets, as the allocated datasets include all products, but set the non-reference product flow amounts to zero.

In [12]:
multioutput = [x for x in ei31cutoff.data if len([y for y in x['exchanges'] if y['type'] == 'production']) > 1]
len(ei31cutoff.data), len(multioutput)

(11301, 1836)

Examine a single multioutput dataset

In [23]:
example = multioutput[1226]
print(example['name'])
[exc for exc in example['exchanges'] if exc['type'] == 'production']

treatment of coal gas, in power plant


[{'activity': None,
  'amount': 1.0,
  'comment': 'Calculated value. 1kWh = 3.6 MJ',
  'flow': '66c93e71-f32b-4591-901c-55395db5c132',
  'loc': 1.0,
  'name': 'electricity, high voltage',
  'production volume': 835000000.0,
  'type': 'production',
  'uncertainty type': 0,
  'unit': 'kilowatt hour'},
 {'activity': None,
  'amount': 0.0,
  'comment': 'Calculated value',
  'flow': '71e2f1db-a2c5-44d0-8337-dfff15be974d',
  'loc': 0.0,
  'name': 'heat, district or industrial, other than natural gas',
  'production volume': 1220924369.7479,
  'type': 'production',
  'uncertainty type': 0,
  'unit': 'megajoule'}]

We can also find the "partner" dataset - the production of heat instead of electricity - for our example dataset.

In [29]:
partner = [ds for ds in multioutput if ds['activity'] == example['activity'] and ds != example]
assert len(partner) == 1
partner = partner[0]
[exc for exc in partner['exchanges'] if exc['type'] == 'production']

[{'activity': None,
  'amount': 0.0,
  'comment': 'Calculated value. 1kWh = 3.6 MJ',
  'flow': '66c93e71-f32b-4591-901c-55395db5c132',
  'loc': 0.0,
  'name': 'electricity, high voltage',
  'production volume': 835000000.0,
  'type': 'production',
  'uncertainty type': 0,
  'unit': 'kilowatt hour'},
 {'activity': None,
  'amount': 1.0,
  'comment': 'Calculated value',
  'flow': '71e2f1db-a2c5-44d0-8337-dfff15be974d',
  'loc': 1.0,
  'name': 'heat, district or industrial, other than natural gas',
  'production volume': 1220924369.7479,
  'type': 'production',
  'uncertainty type': 0,
  'unit': 'megajoule'}]

We don't know anything about the allocation procedure applied, but we can use the production volumes to reverse the allocation! This was we don't need access to all the master data (which isn't available for download in any case).

## Wait a minute...

However, out list of processes isn't exactly correct, as not all multioutput processes have datasets that produce all possible reference products. Let's get only the multioutput processes where we have access to both (or more) allocated products.

In [26]:
count_dict = {}
for ds in multioutput:
    count_dict[ds['activity']] = count_dict.get(ds['activity'], 0) + 1

true_multioutput = [ds for ds in multioutput if count_dict[ds['activity']] > 1]

len(true_multioutput)

1489

So, there are 1489 allocated datasets for which we have both (or more) allocated reference products. How many activities are there?

In [27]:
len({x for x, y in count_dict.items() if y > 1})

615

## Where can we apply substitution?

We can only apply substitution if another process produces the "byproduct" flow as its reference product - the flow, unit, and location have to match to do this automatically.

In [32]:
[y for y in [
    x for x in true_multioutput if x['name'] == 'treatment of coal gas, in power plant'
][0]['exchanges'] if y['type'] == 'production']

[{'activity': None,
  'amount': 0.0,
  'comment': 'Calculated value',
  'flow': '71e2f1db-a2c5-44d0-8337-dfff15be974d',
  'loc': 0.0,
  'name': 'heat, district or industrial, other than natural gas',
  'production volume': 4386554.62184873,
  'type': 'production',
  'uncertainty type': 0,
  'unit': 'megajoule'},
 {'activity': None,
  'amount': 1.0,
  'comment': 'Calculated value. 1kWh = 3.6 MJ',
  'flow': '66c93e71-f32b-4591-901c-55395db5c132',
  'loc': 1.0,
  'name': 'electricity, high voltage',
  'production volume': 3000000.0,
  'type': 'production',
  'uncertainty type': 0,
  'unit': 'kilowatt hour'}]

In [41]:
can_be_substituted = 0

for ds in true_multioutput:
    ref_prod_list = [exc for exc in ds['exchanges'] if exc['amount'] and exc['type'] == 'production']
    assert len(ref_prod_list) == 1
    ref_prod = ref_prod_list[0]
    
    if [x for x in ei31cutoff.data 
            if x != ds
            and x['location'] == ds['location']
            and x['name'] == 'market for ' + ref_prod['name']
            ]:
        can_be_substituted += 1

print(len(true_multioutput), can_be_substituted)

1489 618
