# Exploration functions - `bw2analyzer` interpretation

This notebook shows some new exploration functions addedd to `bw2analyzer`.

In [None]:
import bw2data as bd
import bw2analyzer as ba
import bw2calc as bc
import bw2io as bi

We use `ecoinvent` as an example database, but the functions shown here are generic.

This is setup for the DdS infrastructure and courses where we have access to ecoinvent. Adjust as needed.

In [None]:
if 'ei38-teaching-25' not in bd.projects:
    bi.restore_project_directory("/srv/data/projects/ecoinvent38-25.tar.gz")

In [None]:
bd.projects.set_current('ei38-teaching-25')

## `print_recursive_supply_chain`

Sometimes it is convenient to print the supply chain of an activity. This function is only for exploration; use `bw2calc.GraphTraversal` in production.

In [None]:
ei = bd.Database("ei 3.8 cutoff")

In [None]:
act = ei.random()
act

In [None]:
ba.print_recursive_supply_chain(act)

This function also supports:

* Using a custom string instead of tabs for indentation
* Writing to a file-like object instead of printing to `stdout`
* Rescaling all amounts to an arbitrary value
* Using a cutoff to limit what is returned
* Specifying an arbitrary maximum recursion depth

Here is a silly example of most of these options:

In [None]:
ba.print_recursive_supply_chain(act, max_level=5, cutoff=0.02, tab_character="🐶", amount=2)

## `print_recursive_calculation`

We can do the same thing, but filter not by the amounts consumed but their respective environmental impacts.

In [None]:
ipcc = ('IPCC 2013', 'climate change', 'GWP 100a')

In [None]:
ba.print_recursive_calculation(act, ipcc)

This function supports the same arguments as `print_recursive_supply_chain`.

## `find_differences_in_inputs`

Some databases have multiple activities that seem similar, but it is hard to tell how different they really are. `find_differences_in_inputs` is one of three functions that helps distinguish between different activities. It will look through the database that the activity came from, find all other activities with the same name and reference product, and see if their inputs are substantially different.

In our example activity, there are 60 different activities with the same name and reference product, so we limit the result to certain locations.

In [None]:
act = bd.get_node(
    name='heat and power co-generation, natural gas, conventional power plant, 100MW electrical',
    unit='kilowatt hour',
    location='PT',
    database="ei 3.8 cutoff",
)

In [None]:
canada = ['CA-BC', 'CA-AB', 'CA-NS', 'CA-MB', 'CA-QC', 'CA-PE', 'CA-ON', 'CA-NB', 'CA-SK', 'CA-NT']

This function compares the net amount of each flow, and adds up multiple exchanges which reference the same flow. The above printed result is still a bit hard to interpret; returning this analysis result as a dataframe can allow for quick interpretation of the results:

In [None]:
df = ba.find_differences_in_inputs(act, locations=canada, as_dataframe=True)
df / df.iloc[0]

## `compare_activities_by_lcia_score`

A common question when trying to choose between different activities with similar sounding names is: Does it matter which one we choose? Are they actually any different? `compare_activities_by_lcia_score` allows for a comparison of any activities. If we look at very similar activities, we don't see a real difference:

In [None]:
justin = [
    a for a in ei 
    if a['name'] == act['name'] 
    and a['reference product'] == act['reference product']
    and a['location'] in canada
]
justin

In [None]:
ba.compare_activities_by_lcia_score(
    [
        a for a in justin 
        if a['location'] != 'CA-QC'  # Je me souviens ;)
    ],
    ipcc
)

Allowing for Québec already produces different results:

In [None]:
ba.compare_activities_by_lcia_score(
    justin,
    ipcc
)

Note that the Brightway developers take no position on Québec, Canada, or the correctness of the results provided above!

You can set the cutoff for what a "real" difference is with the `band` argument. Just for fun, let's look at many different CHP activities.

In [None]:
{act['name'] for act in ei if act['name'].startswith('heat and power co-generation, natural gas')}

In [None]:
ba.compare_activities_by_lcia_score(
    [a for a in ei 
     if a['name'].startswith('heat and power co-generation, natural gas')
     and a['reference product'] == act['reference product']
    ], 
    ipcc,
    band=1
)

In [None]:
ba.compare_activities_by_lcia_score(
    [a for a in ei 
     if a['name'].startswith('heat and power co-generation, natural gas')
     and a['reference product'] == act['reference product']
    ], 
    ipcc,
    band=0.25
)

## `compare_activities_by_grouped_leaves`

Knowing that there are differences isn't always super helpful - one wants to *why* there are differences. We can't really go through the supply chains of many different activities manually, and keep all the differences straight. The function `compare_activities_by_grouped_leaves` will sum the *impacts* of the different supply chain activities by their [Common Product Classification](https://unstats.un.org/unsd/classifications/Econ/cpc) code. This will only work on databases which provide CPC codes in the same format as ecoinvent.

This function has multiple output formats. The default is a Python tuple of `(labels, data)`, but this is not shown here, as it is intended for programmatic use. Humans would choose either an HTML table:

In [None]:
ba.compare_activities_by_grouped_leaves(
    [a for a in ei 
     if a['name'].startswith('heat and power co-generation, natural gas')
     and a['reference product'] == act['reference product']
    ][:5], 
    ipcc,
    output_format="html",
)

Or as a pandas `DataFrame`:

In [None]:
df = ba.compare_activities_by_grouped_leaves(
    [a for a in ei 
     if a['name'].startswith('heat and power co-generation, natural gas')
     and a['reference product'] == act['reference product']
    ][:5], 
    ipcc,
    output_format="pandas",
)
df

What is included in this result? Direct emissions are those coming from the functional unit. Then, the function goes through the supply chain of each activity, and applies a cutoff criteria. If a given input matches the cutoff criteria (either it is too deep in the supply chain, or its impact is too small), we look up that inputs CPC code, and add it to any existing inputs with the same CPC code. We can also add impacts from direct emissions of an activity, even if it doesn't meet the cutoff criteria (if we didn't do this, the shares wouldn't sum to one).

As in the above functions, you can control how deep the search goes with the `max_level` and `cutoff` arguments. You can also switch from the default result type, which is fractional share of total impact, to absolute impact amounts, with the `mode` argument.

You can also use this function to explore the supply chain of a single activity. This is helpful when databases like ecoinvent provide many similar input activities (e.g. many electricity providers); it can be helpful to group by the product classification.

In [None]:
offset_printing = bd.get_node(
    name="offset printing, per kg printed paper",
    location='CH',
    database="ei 3.8 cutoff",
)
offset_printing

In [None]:
ba.compare_activities_by_grouped_leaves(
    [offset_printing], 
    ipcc,
    output_format="html",
)

In this particular instance, it was interesting for me to see the high fraction of impact coming from use of aluinium, which (probably) isn't the first thing one thinks of when considering printing.