# 1 - The supply chain graph

This notebook describes how supply chain graphs can be entered, edited, and used in the latest version of the Brightway LCA framework. It follows our recommended practice for using Brightway. When in doubt on which commands to use, please [check the cheat sheet](https://docs.brightway.dev/en/latest/content/cheatsheet/index.html), and let us know if you think something there is missing.

The most common approach in life cycle assessment, and the approach that Brightway uses, is to model systems as [graphs](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)). Graphs have nodes and edges, and data attributes can be attached to both nodes and edges. Here is an example of a graph for a simple product system making a bicycle and the associated LCA objects:

<img src="images/supply-chain-simple.png">

In our graph, edges are [directed](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)#Directed_graph) - each edge has a _source_ and a _target_. 

We have also added types to both the nodes and edges. These types are metadata, just like things like names, units, and locations. This metadata helps us make sense of the graph in the context of LCA.

Brightway has a suggested set of labels to use for metadata for the different node and edge types - see the [Brightway interface schemas](https://github.com/brightway-lca/bw_interface_schemas/blob/main/bw_interface_schemas/models.py)

This graph can be logically divided between processes - things that do something, action words, or verbs, and products and elementary flows - the physical things being acted upon, or nouns:

<img src="images/bipartite.png">

Brightway is designed around flexibility, and so this partition is a convention, not a rule. However, for the sake of your and our sanity, we strongly recommend following this pattern :)

It can be helpful to split an LCA graph into a set of [subgraphs](https://en.wikipedia.org/wiki/Glossary_of_graph_theory#subgraph). For example, we might want to separate the work of two analysts, or separate a foreground and a background system. In Brightway, a subgraph is called a `Database`. Let's add a `Database to our graph:

<img src="images/with-database.png">

A `Database` is just a collection of nodes - it can be large or small, there aren't any general rules. Edges don't belong to a database, as they can cross from one database to another.

In Brightway, we currently have "graphy"-type methods to access nodes and edges, and non-"graphy"-type methods to access database relationships and LCIA, but you should be thinking of them mentally as being part of a large graph. Let's make that graph:

In [None]:
import bw2data as bd

In [None]:
db = bd.Database("Demo")
# Let the metadata system know this database exists. Not necessary if using a `bw2io` importer.
db.register(hi_mom=True, language="Singlish")

Registering a database allows us to find it in our registry of databases (sorry if that was a bit recursive!):

In [None]:
bd.databases

Creating a `Database` also created its metadata:

In [None]:
db.metadata

Our first two nodes - the bicycle itself, and the bicycle production activity.

We are using here some [fixed values for type labels](https://github.com/brightway-lca/brightway2-data/blob/main/bw2data/configuration.py). This is preferable to entering the strings ourselves to avoid human error or inconsistency.

In [None]:
bicycle = db.new_node(
    name="bicycle",
    unit="number",
    type=bd.labels.product_node_default,
)

bike_production = db.new_node(
    name="bike production",
    location="DK",
    type=bd.labels.process_node_default,
)

bicycle.save()
bike_production.save()

*Question* for reflection: Why does the bicycle production have a location and not a unit (and the opposite for the bicycle)?

*Question*: What other attributes like location could we add to bicycle production?

Let's add the rest of the life cycle inventory:

In [None]:
natural_gas = db.new_node(
    name='natural gas',
    unit='megajoule',
    type=bd.labels.product_node_default,
)
natural_gas_production = db.new_node(
    name='natural gas production',
    location='NO',
    type=bd.labels.process_node_default,
)

natural_gas.save()
natural_gas_production.save()

Brightway doesn't enforce any uniqueness constraints on fields like `name`. The only fields that must be unique are a combination of the `database` and the node `code`. If we can specify the `code` ourselves, we can run the same cell twice safely:

In [None]:
cf_production = db.new_node(
    code="cf-production",
    name='carbon fibre production',
    location='DE',
    type=bd.labels.process_node_default,
)
cf = db.new_node(
    code="cf",
    name='carbon fibre',
    unit="kilogram",
    type=bd.labels.product_node_default,
)


cf_production.save()
cf.save()

Brightway allows you complete flexibility to store any additional fields that you want on nodes and edges, but our recommendation is to use the following for fields outside the base set of fields given in [bw_interface_schemas](https://github.com/brightway-lca/bw_interface_schemas/blob/main/bw_interface_schemas/models.py):

* `documentation`
    * `dict[str, str]`, e.g. `node['documentation'] = {"treatment_standards_routes": "from processing of high-energy waste"}`
    * For documentation fields outside the general comment
* `tags`
    * `dict[str, JsonValue]`, e.g. `node['tags'] = {"CN 2024": "http://data.europa.eu/xsp/cn2024/681511000080", "start_year": 2024}`
    * For items which come a pre-defined and finite set of possible values
* `attrs`
    * `dict[str, JsonValue]`2024}`
    * For items which whose values are different for each node, and/or are not know in advance of chosen from a given list
* `properties`
    * `dict[str, JsonValue]`, e.g. `node['properties'] = {"carbon_mass_fraction": 0.5}`
    * Quantitative measure of the process or product

Be careful setting `node['tags'] = {<something>}` as this will overwrite any data that was already given as `tags`. If you aren't sure if `tags` exists, you can do `node.setdefault('tags', {})['<something>'] = '<something>'` ([`setdefault` documentation](https://docs.python.org/3/library/stdtypes.html#dict.setdefault)). 

In [None]:
co2 = db.new_node(
    name="Carbon Dioxide", 
    context=('air',),
    tags={'CAS Number': '124-38-9'},
    unit='kilogram',
    type=bd.labels.biosphere_node_default,
)

co2.save()

We also need to create edges between the nodes. We can do this in many ways, here is one - let's add the production of products by processes:

In [None]:
bike_production.new_edge(
    amount=1,
    input=bicycle,
    type=bd.labels.production_edge_default,
).save()
cf_production.new_edge(
    amount=1,
    input=cf,
    type=bd.labels.production_edge_default,
).save()
natural_gas_production.new_edge(
    amount=1,
    input=natural_gas,
    type=bd.labels.production_edge_default,
).save()    

The use of `input` is a bit weird in the above - this will change as it is incorrect.

Sometimes we can run the same cell multiple times, and create duplicate exchanges. Brightway **will allow** you to create multiple edges between the same source and target.

*Question* for reflection: Why does Brightway allow such seemingly duplicate edges? Does this reflect real-world conditions?

Brightway has a utility function to fix these errors:

In [None]:
db.delete_duplicate_exchanges()

Now let's add the material and energy inputs:

In [None]:
bike_production.new_edge(
    amount=2.5, 
    type=bd.labels.consumption_edge_default,
    input=cf
).save()

What about some uncertainty? We use [stats_arrays](https://stats-arrays.readthedocs.io/en/latest/) to model probability distribution functions.

In [None]:
cf_production.new_edge(
    amount=237.3,  # plus 58 kWh of electricity, in ecoinvent 3.8 
    uncertainty_type=5, 
    minimum=200, 
    maximum=300, 
    type=bd.labels.consumption_edge_default,
    input=natural_gas,
).save()

And our emission of carbon dioxide:

In [None]:
cf_production.new_edge(
    amount=26.6, 
    uncertainty_type=5, 
    minimum=26,
    maximum=27.2, 
    type=bd.labels.biosphere_edge_default,
    input=co2,
).save()

This is a life cycle inventory, and is enough to do an LCI calculation. Let's do a quick calculation:

In [None]:
import bw2calc as bc

Don't worry about the syntax here right now, we will talk about it later.

In [None]:
functional_unit, data_objs, _ = bd.prepare_lca_inputs({bicycle: 1}, remapping=False)

In [None]:
lca = bc.LCA(demand=functional_unit, data_objs=data_objs)
lca.lci()

*Question*: How much CO2 should be emitted by our functional unit? You can do this calculation by examining the graph manually.

In [None]:
lca.inventory[lca.dicts.biosphere[co2.id], :].sum()

## Exercise

Create a new bicycle made of steel. You will need some coal and some iron ore mining for the steel, and some steel for the bicycle.

Your use of steel consumed some iron ore, a natural resource. In a **new database**, create a biosphere flow for this iron ore, and add the iron ore flow. You might not be sure about the numbers - you can reflect that in the uncertainty you assign to the exchanges.

Finally, if we want to do impact assessment, we need some nodes to represent impact categories, and edges to represent characterization factors:

<img src="images/with-lcia.png">

## LCIA

To define characterization nodes and edges, we use a different data structure:

In [None]:
import stats_arrays as sa

In [None]:
ipcc = bd.Method(('IPCC',))
ipcc.write([
    (co2, {'amount': 1, 'uncertainty_type': sa.NormalUncertainty.id, 'loc': 1, 'scale': 0.05}),
])

We can now do a full LCIA, not just an LCI:

In [None]:
functional_unit, data_objs, _ = bd.prepare_lca_inputs({bicycle: 1}, method=('IPCC',), remapping=False)

In [None]:
lca = bc.LCA(demand=functional_unit, data_objs=data_objs)
lca.lci()
lca.lcia()
lca.score

To use uncertainty, we tell the `LCA` object to use the probability distributions:

In [None]:
import pandas as pd

In [None]:
lca = bc.LCA(demand=functional_unit, data_objs=data_objs, use_distributions=True)
lca.lci()
lca.lcia()

df = pd.DataFrame([{'score': lca.score} for _ in zip(lca, range(25))])
df.hist()

## Exercise

Create a new LCIA method for your iron ore consumption. Calculate the LCIA result you should get, and then verify that you have the correct value.

## Switching graphs - `projects`

A Brightway `project` is a separate graph - completely self-contained, and independent of other projects. This independence can lead to data duplication, but helps keep each project safe from the changes in the others.

We start in the `default` project:

In [None]:
bd.projects.current

We can change to a new project:

In [None]:
bd.projects.set_current("Another project")
bd.projects.current

We now don't have any databases or any other data in the graph:

In [None]:
bd.databases

Let's switch back to our default project. It isn't best practice to do data development in the `default` project, so let's rename it:

In [None]:
bd.projects.set_current("default")
bd.projects.rename_project("class-exercise-bicycle")

During development, it is very convenient to just delete everything and rerun the notebook to get the right data. Let's make a shortcut to purge the project:

In [None]:
I_SCREWED_UP = False

if I_SCREWED_UP:
    try:
        bd.projects.delete_project("class-exercise-bicycle", delete_dir=True)
    except ValueError:
        pass

## Searching through the database

In addition to storing and using nodes and edges, our graph database can be searched in different ways. Let's show this with a larger database.

We can use a shortcut to install some data:

In [None]:
import bw2io as bi
bi.install_project("USEEIO-1.1", overwrite_existing=True)

# If that doesn't work for whatever reason, we can import the original data with this:
# bi.useeio11()

In [None]:
bd.projects.set_current("USEEIO-1.1")

In [None]:
bd.databases

In [None]:
db = bd.Database("USEEIO-1.1")
db.name

We can search with the 'search' function.

In [None]:
fun = db.search("amusement")[0]
fun['name'] = 'fun'
fun.save()

In [None]:
db.search('amusement')

In [None]:
{node['name'] for node in db if node['type'] == 'product'}

## Interacting with the graph

In [None]:
moo = bd.get_node(name='Cattle ranches and feedlots', type='product')

In [None]:
type(moo) == bd.Node

We can assign any attributes to nodes (and to edges)

In [None]:
moo['moo'] = 'loud'

Note that this attributes are not saved to the database by default - we have to tell Brightway to save changed data!

We have some attributes which are common to all inventory databases

In [None]:
moo['categories'], moo['location'], moo['unit']

The node classes act like dictionaries, and raise error for missing keys

In [None]:
moo['missing']

## Exercise

Iterate through all `product` nodes in the US EEIO and tag every node whose combined name and unit is more than 40 characters long with `"long" = True`.

## Exchange iterators

We have shortcuts to traverse the supply chain graph. For inputs, we have `.technosphere()` and `.biospere()`; for producing edges, we have `.producers()`, and for edges to other nodes which consume the outputs of our node there is `.consumers()`. You can also get all edges with `.edges()`.

These are all [iterators](https://jakevdp.github.io/WhirlwindTourOfPython/10-iterators.html).

In [None]:
bd.projects.set_current("class-exercise-bicycle")

In [None]:
list(cf_production.technosphere())

In [None]:
list(cf_production.biosphere())

In [None]:
list(cf.consumers())

In [None]:
list(cf_production.producers())

The US EEIO is the same, it is normalized to the production of one USD. It can be interesting to sum the costs of the inputs:

In [None]:
bd.projects.set_current("USEEIO-1.1")

In [None]:
sum([o['amount'] for o in bd.get_node(name='Cattle ranches and feedlots', type='process').technosphere()])

# Contribution

Let's show a little bit of what Brightway can do. We can compare the correlation of LCA scores across a variety of categories.

There is an atuomatic way to do this in Brightway, but we can also program it manually to see how it works.

Stop for a bit and think about what one would need to calculate LCA scores for 380 products and ~10 impact categories.

In [None]:
products_in_order = [obj for obj in db if obj['type'] == 'product']
categories_in_order = [method for method in bd.methods if method[0] == 'Impact Potential']

In general, in Brightway there is *one secret* to getting good performance: Don't rebuild matrices unless you really need to. Rebuilding a matrix is not slow, but the time adds up if you do it a lot. But most importantly, if you are using `pypardiso` (normally everything except for ARM machines), and you keep the technosphere matrix the same, it will secretly remember all the preparation work it did to solve the linear system and you will get a factor of 100 speed increase on subsequent calculations.

So, in this case we will use one `LCA` object, and use the `lci` method repeatedly.

In [None]:
import numpy as np

results = np.zeros((len(products_in_order), len(categories_in_order)))

def get_lcia_scores(products, categories, results):
    lca = bc.LCA({products[0]: 1}, categories[0])
    lca.lci()
    lca.lcia()
    
    method_matrices = [lca.characterization_matrix.copy()]
    
    for other_method in categories[1:]:
        # Only build each characterization matrix once instead of once per product
        lca.switch_method(other_method)
        method_matrices.append(lca.characterization_matrix.copy())
    
    for i, product in enumerate(products):
        lca.lci({product.id: 1})
        for j, characterization_matrix in enumerate(method_matrices):
            results[i, j] = (characterization_matrix * lca.inventory).sum()
    
    return results

In [None]:
from time import time

start = time()
results = get_lcia_scores(products_in_order, categories_in_order, results)
print(time() - start)

## Exercise

Compare how long it would take to calculate LCA results for 3 products and 3 impact categories if you had to create a new LCA object each time.

In [None]:
from scipy import stats

def create_correlation_matrix(scores_array):
    num_methods = scores_array.shape[1]
    correlations = np.zeros((num_methods, num_methods))

    for row in range(num_methods):
        for col in range(num_methods):
            if col <= row:
                continue                               # Only need to compute correlation once
            dataset_1 = scores_array[:, row]
            dataset_2 = scores_array[:, col]
            mask = (dataset_1 != 0) * (dataset_2 != 0) # Ignore activities that have zero score
            corr = stats.kendalltau( # Get tau value, drop p-statistic
                dataset_1[mask], 
                dataset_2[mask]
            )[0]
            if np.isnan(corr):
                correlations[row, col] = 0
            else:
                correlations[row, col] = corr

    correlations = correlations + correlations.T       # Make sorting easier by adding filling in lower left triangle
    return correlations

In [None]:
correlation_matrix = create_correlation_matrix(results)

In [None]:
%matplotlib inline

In [None]:
import matplotlib.pyplot as plt

fig = plt.gcf()
fig.set_size_inches(12, 12)

masked_correlation = np.ma.array(correlation_matrix, mask=correlation_matrix == 0).T
plt.pcolor(masked_correlation, cmap=plt.cm.cubehelix_r)
plt.colorbar()
plt.ylim(None, correlation_matrix.shape[1])
plt.xlim(None, correlation_matrix.shape[0])
plt.xticks(np.arange(0.5, 10), [obj[1] for obj in categories_in_order])
plt.yticks(np.arange(0.5, 10), [obj[1] for obj in categories_in_order])
plt.tight_layout()

In [None]:
for category in categories_in_order:
    print(category[1], bd.methods[category]['description'])