# What's new in Brightway 2.5

## Backwards compatibility

Compatilibility with Brightway 2 has been maintained whenever possible, but there are a few cases where compatiblity could not be kept. There are noted **in bold** when they occur, and listed below:

* In `bw2data`, `Database.get()` and `Database().get()` are no longer supported. Use `get_node(database="something", **other_filters)` instead.
* In `bw2calc`, the `LCA` class now takes over responsibility for all types of LCA calculations, including Monte Carlo. Depending on the type of sampling strategy desired, use `LCA(use_distributions=True)` or `LCA(use_arrays=True)` instead of `MonteCarloLCA`.

Before we get started, let's install a simple database for examples:

In [None]:
import bw2data as bd
import bw2calc as bc
import bw2io as bi

In [None]:
bd.projects.set_current("2.5 examples in action")

In [None]:
bi.add_example_database()

# `bw2data`

## `Node` and `Edge`

In addition to `process` and `activity`, we now have `node`. Before you throw things at the wall (how many names can they come up with?), let me explain. `bw2data` uses what is essentially a graph database, with two main tables: Nodes and edges (the actual table names are `ActivityDataset` and `ExchangeDataset`). Nodes can serve as processes/activities, but are also elementary/biosphere flows, and anything else we want to store in the database (logical relationships, impact assessment, named parameters, etc). This isn't too say that you store everything in the Brightway database, but `node` is clearly a better name that `process` for e.g. CO2. Here are the node accessors:

### `bw2data.get_node()`

`get_node` behaves differently than `get_activity`. `get_activity` assumes an input of a Brightway key - a combination of database and code. This won't work with `get_node`:

In [None]:
bd.get_activity(('Mobility example', 'Steel'))

In [None]:
bd.get_node(key=('Mobility example', 'Steel'))

The reason why this doesn't work is that `get_node` only looks for specific attributes of the `node` itself, not composite ones like the `key`. If you need to pass a key, use `get_activity`; otherwise, rewrite your query:

In [None]:
bd.get_node(database='Mobility example', code='Steel')

We can also filter on other attributes, both those stored as "core" attributes (code, database, name, product, type, location):

In [None]:
bd.get_node(name='Steel')

But also other arbitrary attributes:

In [None]:
steel = bd.get_node(name='Steel')
steel['foo'] = 'bar'
steel.save()
bd.get_node(foo='bar')

`get_node` will raise `bw2data.errors.UnknownObject` if not node can be found that matches the given filters, or `bw2data.errors.MultipleResults` if more than one node fits the given filters.

You can also use `Database().get_node()`, it works the same way. However, **`Database.get()`** is deprecated, as this is now a [core Peewee method](https://docs.peewee-orm.com/en/latest/peewee/api.html#Model.get).

## `get_id` and the removal of `mapping`

Previously, the `mapping` object was stored as a pickle file, and linked activity/process/node *keys* to *integer ids*. This was very dumb, as we already have an indexed mapping of these objects to unique integer ids in the database itself: the primary key `id` column. `mapping` still exists, but only as a compatibility layer; instead, use `node.id` and `get_id()`:

In [None]:
steel.id

In [None]:
bd.get_id(steel)

In [None]:
bd.get_id(steel.key)

In [None]:
bd.mapping[steel.key]

In [None]:
str(bd.mapping)

There is an important implication of removing `mapping` - it is **no longer possible to reference non-existant nodes in edges**. This was allowed previously to allow for a very high degree of flexibility, but this is no longer technically possible, nor is it reasonable, as it is much too easy to cause unintentional errors.

## `bw2data` uses the database columns differently

The database schema for nodes (`ActivityDataset`) is:

```SQL
CREATE TABLE "activitydataset" (
    "id" INTEGER NOT NULL PRIMARY KEY, 
    "data" BLOB NOT NULL, 
    "code" TEXT NOT NULL, 
    "database" TEXT NOT NULL, 
    "location" TEXT, 
    "name" TEXT, 
    "product" TEXT, 
    "type" TEXT
)
```

Previously, when loading or saving rows to this table, all data including `code`, `database`, `location`, `name`, `product`, and `type` was serialized to the `data` blob as a [pickle](https://docs.python.org/3/library/pickle.html).This made loading the Brightway objects easy, but it effectively made the other columns read-only - changes made to database directly were not propogated when loading Brightway objects. This is now changed, we use the values in the database columns, so you can use them directly:

In [None]:
from bw2data.backends import ActivityDataset as AD

In [None]:
AD.update(name="Wow, this is some car!").where(AD.name == 'Combustion car').execute()

In [None]:
bd.get_node(code="Steel")

Why should you care?

For one thing, in Brightway 3, there will no longer be an `ActivityDataset` and a separate `Activity` class; rather, they will be unified, and we will be able to use [peewee](https://docs.peewee-orm.com/en/latest/index.html) query methods natively. This won't necessarily be easier in all cases, but will expose more functionality, and will use less magic to hide the underlying database schema, which is better hygiene in the long run.

But learning to write SQL is also a good idea in itself - it is a different way of think about data, and things like bulk updates are always nice (unless they go wrong 😛).

## More powerful `Activity` attribute lookups

Some `Activity` objects for things like industry or product classifications, or properties like price or carbon content, but these can be awkward to retrieve:

In [None]:
steel = bd.get_node(name='Steel')
steel['properties'] = {'carbon content': {'amount': 0.01}}
steel['classifications'] = {'ISIC': {'code': '2410', 'system': 'ISIC Rev. 4'}}
steel.save()

In [None]:
[value for key, value in steel['properties'].items() if key == 'carbon content']

Instead, we can now just do:

In [None]:
steel['carbon content']

In [None]:
steel['ISIC']

Please note the following:

* `classifications` are looked up before `properties`.
* Looking up normal attributes (even arbitrary ones) happens before traversing the `classifications` and `properties`.

## Easier access to reference products

If you have a suitably formatted activity, you can do:

In [None]:
steel.rp_exchange()

This works by looking through all exchanges with the type `production`; if there is only one, that is returned; otherwise, return the exchange whose input name is the same as the node's `reference product`. Raises `ValueError` is no suitable exchange is found.

We can also look up attributes of the reference product exchange:

In [None]:
exc = steel.rp_exchange()
exc['properties'] = {'iron content': 0.98}
exc.save()

In [None]:
steel['iron content']

This lookup occurs after the `classifications` and `properties` of the node itself.

## Easier data cleanup

If you are building inventories manually, it is easy to accidentally add an exchange too many times:

In [None]:
steel, co2 = bd.get_node(name="Steel"), bd.get_node(name="CO2")

for _ in range(10):
    steel.new_edge(input=co2, amount=1.5, type="biosphere").save()

We can now easily clean up these duplicates:

In [None]:
bd.Database('Mobility example').delete_duplicate_exchanges()

In [None]:
for exc in steel.biosphere():
    print(exc)

As always, best practice is to always have a completely reproducible workflow, so that if things get messed up you can delete everything and regenerate the data from scratch!

## Filepaths are instances of `pathlib.Path`

`Path` objects are [pretty great](https://treyhunner.com/2018/12/why-you-should-be-using-pathlib/), you should [use them](https://docs.python.org/3/library/pathlib.html).

In [None]:
type(bd.projects.dir), type(bd.projects.logs_dir)

In [None]:
type(bd.Database('Mobility example').dirpath_processed()), type(bd.Database('Mobility example').filepath_processed())

## Easier access to `Datapackages`

Brightway `Datapackages` are a replacement for the previous processed arrays. They bring a range of new functionality, including the ability to be stored on many different kinds of physical or virtual filesystems using [PyFilesystem](https://docs.pyfilesystem.org/en/latest/). But this means that loading them requires the specification of a filesystem, which can be extra code, and a pain if you don't remember the exact command. Instead, you can use:

In [None]:
bd.Database('Mobility example').datapackage()

Note that, in combination with the above, local PyFilesystem `OSFS` objects [**need a string, not a `Path`**](https://github.com/PyFilesystem/pyfilesystem2/issues/238).

## IOTable improvements

This is being rewritten in the `file-remover-progressive` branch, so will be demonstrated instead of provided as an example.

## Brightway ❤️ Pandas

The previous mix of functions for importing some data to dataframes have been replaced with a common set of methods which work with both `bw2data` and `bw2calc`.

They also work with IOTables.

Utility function to get the US EEIO and its LCIA methods easily. This has its own biosphere flows, so we don't run `bw2setup`.

In [None]:
bi.useeio11()

Pick a product and an activity node at random.

In [None]:
product = next(node for node in bd.Database("US EEIO 1.1") if node['type'] == 'product')
activity = next(node for node in bd.Database("US EEIO 1.1") if node['type'] == 'process')
product, activity

The first dataframe is all the nodes (processes or activities) in the given database:

In [None]:
df = bd.Database("US EEIO 1.1").nodes_to_dataframe()
df

The columns come from the data attributes stored on the nodes. If one node has the attribute, it is added as a column. You can control which columns get returned, and how they are sorted, see the docstring.

This is a normal dataframe, so you can filter it, add or remove columns, and sort as desired.

In [None]:
df.columns

We can also list all the edges (exchanges) as a dataframe. This is normally too much information, and can take a bit of time to produce, but can be useful.

In [None]:
df = bd.Database("US EEIO 1.1").edges_to_dataframe()
df

Now we have standard column labels. as these are directed edges, they have a source and a target. Most of the columns should be self-explanatory. Note that we differentiate between `'target_reference_product'` and `'source_product'`, and only provide the `categories` on the `source`.

In [None]:
df.columns

If you want to add or remove columns, you can pass in an iterable of formatting functions. These functions must satisfy the following rules:

* The take the keyword arguments `node`, `edge`, and `row`.
* They modify the dictionary `row` in place. Any return value is ignored.
* `node` and `edge` are dictionaries following the [wurst internal format](https://wurst.readthedocs.io/#internal-data-format). `node` is the target, and `edge` is both attributes of the edge and of the source.

Here is a simple example:

In [None]:
def remove_target_database(node, edge, row):
    del row['target_database']
    
def food_sector(node, edge, row):
    row['is_food'] = 'food' in edge['name'].lower()

In [None]:
df = bd.Database("US EEIO 1.1").edges_to_dataframe(formatters=[remove_target_database, food_sector])
df

In the case of `target_name`, the dataframe has more than 150.000 rows, but only 388 unique values.

You can skip the conversion to categorical columns by passing `categorical=False`.

To save on memory, we turn some columns into categorical columns, where each unique value is only stored once.

In [None]:
df.dtypes

In [None]:
df['target_name']

We can also get a dataframe of the edges for a specific node. Here we get all edges, but you can filter this further with the edge constructors `.production()`, `.technosphere()`, and `.biosphere()`.

In [None]:
df = activity.exchanges().to_dataframe()
df

Same columns as before.

In [None]:
df.columns

We can also get dataframes for LCA calculation results.

In [None]:
lca = bc.LCA({product: 1}, method=('Impact Potential', 'HRSP'))
lca.lci()
lca.lcia()

By default, this method looks at the `characterized_inventory` matrix, and sorts by the top 200 values (using absolute value).

In [None]:
df = lca.to_dataframe()
df

The columns labels are a bit different, as we don't have target and source but instead matrix rows and columns. The meaning of these rows and columns changes from matrix to matrix. The same pattern with `'row_product'`, `'col_reference_product'`, and `'row_categories'` applies though.

In [None]:
df.columns

We can get dataframes for any matrix. In standard LCA, the matrices are:

* inventory
* technosphere_matrix
* biosphere_matrix
* characterization_matrix
* characterized_inventory

Regionalization adds more matrices. Note that for other matrices you will need to specify the row and column mapping dictionaries, see the docstring.

In [None]:
lca.to_dataframe(matrix_label='biosphere_matrix')

# bw2calc

## Specify `data_objs` and new functional unit

The biggest change in Brightway 2.5 is the use of [bw_processing](https://github.com/brightway-lca/bw_processing) and [matrix_utils](https://github.com/brightway-lca/matrix_utils), which in turn lead to a new calling convention for LCA class instantiation. These new libraries allow for calculations to be shipped off to cloud servers, which means that they have no connection to `bw2data` or metadata about the meaning of any ID values - instead, they only solve linear problems.

The new calling convention is therefore functional unit **with the IDs for the nodes** and **datapackages as `data_objs`**. Here is an example:

In [None]:
product = next(node for node in bd.Database("US EEIO 1.1") if node['type'] == 'product')
product

In [None]:
database_dp = bd.Database("US EEIO 1.1").datapackage()
lcia_dp = bd.Method(('Impact Potential', 'HC')).datapackage()

In [None]:
lca = bc.LCA({product.id: 1}, data_objs=[database_dp, lcia_dp])
lca.lci()
lca.lcia()

In [None]:
lca.score

The old calling convention will still work, but only if you have `bw2data` installed, the correct project selected, etc.

## `bw2data.prepare_lca_inputs`

If you don't want to remember the new calling convention, you can use a helper function: `bw2data.prepare_lca_inputs`. It will return three things: A new demand dictionary, the datapackages, and (if `remapping=True`, the default) dictionaries to allow you to map matrix indices back to Brightway (database, code) keys.

In [None]:
fu, dps, remapping = bd.prepare_lca_inputs({bd.get_node(name='Steel'): 1}, ('IPCC', 'simple'))

In [None]:
fu

In [None]:
dps

In [None]:
remapping

## No automatic remapping

Previously, `bw2calc.LCA` would automatically change the integer values given in `bw2data.mapping` to keys. This is no longer the case, as we assume normal behaviour in the future is to prefer node IDs to keys, and also don't have any guarantees on whether `bw2data` is available. Therefore, **you need to call LCA.remap_inventory_dicts() manually**.

In [None]:
lca = bc.LCA(demand=fu, data_objs=dps, remapping_dicts=remapping)
lca.lci()
lca.lcia()
lca.demand

In [None]:
lca.dicts.product[bd.get_node(name='Steel').id]

In [None]:
lca.remap_inventory_dicts()

In [None]:
lca.demand

In [None]:
lca.dicts.product[bd.get_node(name='Steel').key]

In [None]:
list(bd.Database("Mobility example"))

## `.redo_lci` ➡️ `.lci`, `.redo_lcia` ➡️ `.lcia`

The preferred way to do additional LCI or LCIA calculations has changed, and has been simplified to the same way we do an initial caluculation, namely with `.lci(new_demand_dict)` and `.lcia(new_demand_dict)`.

In [None]:
lca = bc.LCA(demand=fu, data_objs=dps)
lca.lci()
lca.lcia()

for act in [
    bd.get_node(name='Driving an combustion car'),
    bd.get_node(name='Driving an electric car'), 
]:
    lca.lcia({act.id: 1})
    print(lca.score, act)

Note the following **important condition**: The functional unit must be specified either as integer IDs (see above), or as Brightway (database, code) keys *if you have run `remap_inventory_dicts()`*. The software will help you as much as possible:

In [None]:
lca.lcia({bd.get_node(name='Driving an combustion car').key: 1})

## `LCA` object can now do Monte Carlo

There is now no longer a separate Monte Carlo class, as the sampling of values has moved to `matrix_utils` as is more generic than before - we don't need to assume that people only have probability distribution functions, but are open to many more types of quantitative uncertainty or scenarios.

Therefore, calls to `MonteCarloLCA` need to **change to `LCA(..., use_distributions=True)`**, and you can use the normal `.lci()` and `.lcia()` functions.

In [None]:
ipcc = bd.Method(('IPCC', 'uncertain'))
ipcc.register()
ipcc.write([(('Mobility example', 'CO2'), {'uncertainty_type': 3, 'amount': 1, 'loc': 1, 'scale': 0.1})])

In [None]:
lca = bc.LCA(
    demand={bd.get_node(name='Driving an electric car'): 1}, 
    method=('IPCC', 'uncertain'),
    use_distributions=True
)
lca.lci()
lca.lcia()

for _ in zip(range(10), lca):
    print(lca.score)

### `keep_first_iteration`

Sometimes you want to keep the values sampled when you set up the LCA object, instead of getting the next value in a sequence when you start iterating over Monte Carlo results. In this case, just call `.keep_first_iteration` and it will skip the first iteration step.

In [None]:
lca = bc.LCA(
    demand={bd.get_node(name='Driving an electric car'): 1}, 
    method=('IPCC', 'uncertain'),
    use_distributions=True
)
lca.lci()
lca.lcia()
lca.keep_first_iteration()
print("Score after setup:", lca.score)

for _ in zip(range(4), lca):
    print("In loop:", lca.score)

## New `.dicts` accessor

The old methods still work (well, they will in the next dev release :), but the preferred way is to do the following:

In [None]:
product = next(node for node in bd.Database("US EEIO 1.1") if node['type'] == 'product')
activity = next(node for node in bd.Database("US EEIO 1.1") if node['type'] == 'process')
emission = next(node for node in bd.Database("US EEIO 1.1") if node['type'] == 'emission')

In [None]:
lca = bc.LCA({product: 1})
lca.lci()

To matrix index:

In [None]:
act_matrix = lca.dicts.activity[activity.id]
act_matrix

To database ID:

In [None]:
lca.dicts.activity.reversed[act_matrix], activity.id

And the same for products and biosphere flows:

In [None]:
lca.dicts.product[product.id], lca.dicts.biosphere[emission.id]