# Faster calculations with aggregation and simplifications

This notebook shows how we can get very fast calculations for a specific use case where the foreground is dynamic, but we know that the background doesn't change, and where the LCIA methods are fixed ahead of time. 

We know that [precalculated aggregated datasets can be fast](https://link.springer.com/article/10.1007/s11367-018-1444-x), and they are easy to generate in Brightway. If we know that the background won't change, we can replace *only the needed background activities* with precalculated activities, and skip loading and processing the whole background system completely.

<img src="files/aggregated-background.png">

We can also reduce the biosphere matrix to only the flows needed to calculate the preselected LCIA method scores.

<img src="files/reduced-biosphere.png">

In [1]:
import brightway2 as bw
import numpy as np
import random

## Generate example database `B`

In our example, we will use ecoinvent as `A`, and make up a random `B`.

In [2]:
ECOINVENT_LABEL = "ecoinvent 3.5 cutoff"
assert ECOINVENT_LABEL in bw.databases

In [3]:
some_methods = [bw.methods.random() for _ in range(3)]
all_flows = [o[0] for method in some_methods for o in bw.Method(method).load()]
some_flows = [random.choice(all_flows) for _ in range(10)]

In [4]:
db = iter(bw.Database(ECOINVENT_LABEL))

some_inputs = [next(db) for _ in range(5)]

We can now build our example database.

In [5]:
example_data = [{
    'name': 'fake {}'.format(index),
    'code': str(index),
    'exchanges': [{
        'input': ('example', str(index)),
        'amount': 1,
        'type': 'production',
    }]
} for index in range(5)]

In [6]:
# Add links to background database
for _ in range(10):
    random.choice(example_data)['exchanges'].append({
        'input': random.choice(some_inputs),
        'amount': np.random.normal() + 1,
        'type': 'technosphere'
    })
    
    
# Add links to foreground database
for _ in range(8):
    random.choice(example_data)['exchanges'].append({
        'input': ('example', random.choice(example_data)['code']),
        'amount': np.random.normal() + 1,
        'type': 'technosphere'
    })

# Add links to biosphere flows
for _ in range(6):
    random.choice(example_data)['exchanges'].append({
        'input': random.choice(some_flows),
        'amount': np.random.normal(),
        'type': 'biosphere'
    })

In [7]:
bw.Database("example").write({('example', o.pop("code")): o for o in example_data})

Writing activities to SQLite3 database:
0% [#####] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00


Title: Writing activities to SQLite3 database:
  Started: 08/22/2019 14:59:02
  Finished: 08/22/2019 14:59:02
  Total time elapsed: 00:00:00
  CPU %: 120.90
  Memory %: 0.43


Check to make sure that all our foreground activities have LCIA scores for our methods:

In [16]:
lca = bw.LCA({bw.Database("example").random(): 1}, method=some_methods[0])
lca.lci()
lca.lcia()

for act in bw.Database("example"):
    for method in some_methods:
        lca.switch_method(method)
        lca.redo_lcia({act: 1})
        assert lca.score != 0

## Aggregate background activities

Ultimately, we want the [processed](https://docs.brightwaylca.org/intro.html#intermediate-and-processed-data) [array](https://docs.brightwaylca.org/lca.html#building-matrices) for this database - but this will be generated automatically if we write a new database.

In [17]:
background_scores = {}

for act in some_inputs:
    background_scores[act.key] = {}
    for method in some_methods:
        lca.switch_method(method)
        lca.redo_lcia({act: 1})
        background_scores[act.key][method] = lca.score

In [19]:
NEW_NAME = "Aggregated ecoinvent background"

In [27]:
def as_tuple(o):
    if isinstance(o, tuple):
        return o
    elif hasattr(o, "key"):
        return o.key
    else:
        raise ValueError

In [36]:
def extract_relevant_flows(act, lca, flows):
    """Return the life cycle inventory elementary flow amounts for one unit of demand of ``act``.
    
    ``lca`` is a pre-calculated ``LCA`` object in which ``act`` is in the technosphere.
    ``flows`` is an iterable of biosphere flows.
    
    Returns a dictionary like ``{flow_key: amount}``."""
    lca.redo_lci({act: 1})
    flows_present = {o for o in flows if o in lca.biosphere_dict}
    # Get total inventory (for entire supply chain)
    inventory_vector = lca.inventory.sum(axis=1)
    data = {flow: inventory_vector[lca.biosphere_dict[flow]] for flow in flows_present}
    return {k: v for k, v in data.items() if v != 0}

In [47]:
def format_as_brightway_dataset(act, new_db_name, biosphere_exchanges):
    """Minimally format an activity and a dict of biosphere exchange amounts as a Brightway database ready for writing."""
    # Because we have calculated the inventory for 1 unit already, don't need to rescale
    # or deal with other production exchange weirdness. A production exchange will
    # be added automatically
    return {'exchanges': [{
            'input': as_tuple(flow),
            'output': (new_db_name, act[1]),
            'amount': amount,
            'type': "biosphere",
        } for flow, amount in biosphere_exchanges.items()]
    }

In [48]:
new_background_database = {
    (NEW_NAME, act['code']): format_as_brightway_dataset(act, NEW_NAME, extract_relevant_flows(act, lca, some_flows)) 
    for act in some_inputs
}

In [49]:
bw.Database(NEW_NAME).write(new_background_database)

Writing activities to SQLite3 database:
0% [#####] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00


Title: Writing activities to SQLite3 database:
  Started: 08/23/2019 11:59:06
  Finished: 08/23/2019 11:59:06
  Total time elapsed: 00:00:00
  CPU %: 118.10
  Memory %: 0.90


Check to make sure our aggregated database gives the right results

In [50]:
processed_array_filepaths = [bw.Database(NEW_NAME).filepath_processed(), bw.Database("biosphere3").filepath_processed()]

In [52]:
class IndependentLCA(bw.IndepentLCAMixin, bw.LCA):
    pass

In [56]:
bw.Method(method).filepath_processed()

'/Users/cmutel/Library/Application Support/Brightway3/default.c21f969b5f03d33d43e04f8f136e7682/processed/edip-wo-ltel.7c25f011c60ae201b58dee42dde662b0.npy'

In [57]:
for method in some_methods:
    original_lca = bw.LCA({some_inputs[0]: 1}, method=method)
    original_lca.lci()
    original_lca.lcia()
    
    independent_lca = IndependentLCA(
        {bw.mapping[some_inputs[0]]: 1}, 
        database_filepath=processed_array_filepaths,
        method=[bw.Method(method).filepath_processed()]
    )
    independent_lca.lci()
    independent_lca.lcia()

TypeError: 'int' object is not subscriptable

In [26]:
def map_model_outputs(list_of_keys, offset=100_000_000):
    """Generate a mapping dictionary of ``(key: integer)`` for use in preparation of processed arrays.
    
    Requires Python 3.6 for fancy number stuff :)"""
    return {key: offset + index for index, key in enumerate(sorted(set(list_of_keys)))}

In [20]:
from bw2data.utils import TYPE_DICTIONARY, MAX_INT_32
import numpy as np


def process_list_of_exchanges(data, filepath):
    """Given an iterable of ``(input_key, output_key, type, amount)``, generate a Brightway processed array at the given filepath.
    
    The input and output keys must already be mapped, i.e. they must be integers between 0 and 4294967295. Be careful to ensure they don't overlap with other keys used!"""
    dtype = [
        ('input', np.uint32),
        ('output', np.uint32),
        ('row', np.uint32),
        ('col', np.uint32),
        ('type', np.uint8),
        ('amount', np.float32),
    ]
    arr = np.zeros((len(data),), dtype=dtype)
    
    for index, line in enumerate(data):
        arr[index] = (
            line['input'],
            line['output'],
            MAX_INT_32,
            MAX_INT_32,
            TYPE_DICTIONARY[line["type"]],
            line['amount'],
        )

    arr.sort(order=['input', 'output', 'amount'])
    np.save(filepath, arr, allow_pickle=False)