# Introduction to `Wurst`

This notebook shows some basics on how to use the [Wurst](https://github.com/IndEcol/wurst) library.

The life cycle of a Wurst run is to load some data, perform some data transformations, and then save the data in a suitable form. Wurst knows about the five following generic transformation types:

* Add new datasets. This doesn't require a transformation function, just add the new datasets to your existing list of datasets, making sure that they have the same format.
* Scale inputs and emissions by a constant factor. In many cases, you know that the efficiency of a process will increase over time. You can then scale all fuel and raw material inputs and emissions by that increased efficiency.
* Change specific emissions or inputs. Not all emissions are correlated with their inputs - some are driven by pollution control technologies. Making specific changes does require a transformation function, but there isn't a generic one in Wurst, as these types of changes are specific enough that it makes sense to write the function yourself.
* Break up regional datasets. This is relatively simple - just copy a dataset using `copy_to_new_location`, and then use `relink_technosphere_exchanges`. See below for implementation details.
* Redefine markets. A common output from a future scenario is a new market mix of technologies. It is normally easier to add the new markets inputs manually, but Wurst does have a function to empty markets, so they can be redefined.

## Loading data

You can load data from whatever data source you prefer; in this case, we will get data using a database already imported into Brightway. However, you can use any other data source, as long as it conforms to the [Wurst expected data format](https://wurst.readthedocs.io/#internal-data-format).

In [1]:
import wurst as w
import brightway2 as bw

In [4]:
data = w.extract_brightway2_databases(["ecoinvent 3.3 cutoff"])

Getting activity data


100%|██████████| 13831/13831 [00:00<00:00, 101845.35it/s]


Adding exchange data to activities


100%|██████████| 501148/501148 [00:43<00:00, 11422.64it/s]
  0%|          | 20/13831 [00:00<01:31, 150.45it/s]

Filling out exchange data


100%|██████████| 13831/13831 [00:02<00:00, 4703.96it/s]


## Searching

One fundamental part of all transformations is finding the data you are interested in. The Wurst framework is built around the idea of composable filters - filters that you can define for many datasets (e.g. all electricity), and then make more specific. Here is an example:

In [5]:
electricity_unit = [w.equals("unit", "kilowatt hour")]
natural_gas = electricity_unit + [w.contains("name", "natural gas")]
combined_cycle = natural_gas + [w.contains("name", "combined cycle")]

You can find all the searching filters listed in the [documentation](https://wurst.readthedocs.io/technical.html#searching).

Searching functions are a list of filters, which you can't use directly:

In [6]:
combined_cycle(data)

TypeError: 'list' object is not callable

Instead, we use the functions `get_one` or `get_many`. However, this returns a [generator](https://www.learnpython.org/en/Generators), which won't do anything until you call it and iterate over the results:

In [7]:
w.get_many(data, *combined_cycle)

<filter at 0x11e8bb128>

Normally we would iterate over this sequence and modify each dataset; here, we will just get the number of datasets returned:

In [9]:
len(list(w.get_many(data, *combined_cycle)))

121

`get_one` and `get_many` take as inputs the data, and any number of functions; we could also do the following:

In [10]:
len(list(w.get_many(data, w.equals("unit", "kilowatt hour"), w.contains("name", "natural gas"))))

306

Because our filters are already in a list, we need to expand that list to a series of input arguments; `*` will pass each filter as a separate argument. See also [Stack Overflow](https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters).

## Linking and unlinking

Datasets in Wurst have the links between input and output datasets broken on purpose. So, instead of any exchange being defined by `{'input': 'some unique id', 'output': 'some other unqiue id'}`, the input is defined by its properties, such as the activity name, the product name, the location, and the unit. These can be manipulated to point to new activities, or even split into several different providers. Relinking is done using the [linking functions](http://wurst.readthedocs.io/technical.html#linking); relinking happens automatically when writing to Brightway.

## Scaling by a constant factor

The function [`change_exchanges_by_constant_factor`](http://wurst.readthedocs.io/technical.html#wurst.transformations.activity.change_exchanges_by_constant_factor) takes the following arguments:

* ``ds`` is a dataset document.
* ``value`` is a number. Existing exchange amounts will be multiplied by this number.
* ``technosphere_filters`` is an iterable of filter functions. Optional.
* ``biosphere_filters`` is an iterable of filter functions. Optional.

The filter arguments use the same searching syntax as above.

Note that production exchanges are not scaled, only technosphere inputs and biosphere flows.

Here is an example - we make all combined cycle plants 10% more fuel efficient, but don't change the water flows, as these scaled by power output (maybe, this is just an example).

In [13]:
for ds in w.get_many(data, *combined_cycle):
    w.change_exchanges_by_constant_factor(ds, 0.9, biosphere_filters=[w.contains('name', 'water')])

## Substitute specific exchanges

As stated above, there isn't a generic function for this; rather, you just search for the appropriate exchanges are change them manually.

For example, let's change the amount of carbon dioxide in a particular plant from 0.58 kg per kWh to 1.0 kg per kWh. As this exchange is uncertain, we use the utility function `rescale_exchange`; otherwise, the `amount` field would not be in agreement with the uncertainty fields.

In [17]:
ng_in_shangdong = w.get_one(data, w.equals('location', 'CN-SD'), *combined_cycle)
our_exc = w.get_one(w.biosphere(ng_in_shangdong), w.equals('name', 'Carbon dioxide, fossil'))
w.rescale_exchange(our_exc, 1. / our_exc['amount'])

{'amount': 1.0,
 'categories': ('air', 'non-urban air or from high stacks'),
 'input': ('biosphere3', 'aa7cac3a-3625-41d4-bc54-33e2cf11ec46'),
 'loc': 1.0,
 'location': None,
 'name': 'Carbon dioxide, fossil',
 'pedigree': {'completeness': 3,
  'further technological correlation': 3,
  'geographical correlation': 3,
  'reliability': 3,
  'temporal correlation': 5},
 'product': None,
 'production volume': 0.0,
 'scale': 0.2264950330581225,
 'type': 'biosphere',
 'uncertainty type': 0,
 'unit': 'kilogram'}

## Relinking in space

Adding new activites often means that exchanges need to be adjusted to include or substitute the new activities. Wurst provies the function [`relink_technosphere_exchanges`](https://wurst.readthedocs.io/#spatial-relationships), which will try to find inputs with the correct activity name, product, and unit. 

The behaviour of this function changes considerably depending on the input arguments. `contained` (default is `True`) will control whether inputs shuld only be considered it their location is completely contained within the location of the consuming activity; the alternative is to consider all locations which intersect the containing activity. `exclusive` (default is `True`) will constrol whether the input locations are allowed to overlap - if `exlusive` is true, then no overlap is allowed. `exclusive` will start with the smallest input locations if `biggest_first` is true (default is ``False``); otherwise it will start with the largest input location.

A `RoW` input will be added if there is a single topological face in the location of the consuming activity's location which isn’t covered by the location of any providing activity.

If no providers can be found, `relink_technosphere_exchanes` will try to add a `GLO` provider if available. If there are still no valid providers, a `InvalidLink` exception is raised, unless `drop_invalid` is `True`, in which case the exchange will be deleted.

Allocation amoung input activities is done using the function [`allocate_inputs`](https://wurst.readthedocs.io/technical.html#geo-functions).

For details on the underlying `Geomatcher` object, see the [constructive_geometries library](https://github.com/cmutel/constructive_geometries) and [example notebook](https://github.com/cmutel/constructive_geometries/blob/master/examples/Geomatching.ipynb).

The following example shows one possible application for the UN subregion [Northern Europe](https://unstats.un.org/unsd/methodology/m49/).

In [2]:
ds = {
    'location': ('ecoinvent', 'UN-NEUROPE'),
    'exchanges': [{
        'name': 'A', 'product': 'B', 'unit': 'C',
        'amount': 10,
        'type': 'technosphere',
    }]
}
given_data = [{
    'name': 'A', 'reference product': 'B', 'unit': 'C',
    'location': 'SE',
    'exchanges': [{
        'type': 'production', 'amount': 1,
        'production volume': 2,
    }]
}, {
    'name': 'A', 'reference product': 'B', 'unit': 'C',
    'location': 'NO',
    'exchanges': [{
        'type': 'production', 'amount': 1,
        'production volume': 4,
    }]
}]
new_data = [{
    'name': 'A', 'reference product': 'B', 'unit': 'C',
    'location': 'RoW', # RoW means allocation by production volume isn't possible, instead split evenly
    'exchanges': [{
        'type': 'production', 'amount': 1,
        'production volume': 14,
    }]
}, {
    'name': 'D', 'reference product': 'E', 'unit': 'F',
    'location': 'DK', # Right location but wrong activity
    'exchanges': [{
        'type': 'production', 'amount': 1,
        'production volume': 1,
    }]
}]
w.relink_technosphere_exchanges(ds, given_data + new_data)

{'exchanges': [{'amount': 3.333333333333333,
   'loc': 3.333333333333333,
   'location': 'SE',
   'name': 'A',
   'product': 'B',
   'type': 'technosphere',
   'uncertainty type': 0,
   'unit': 'C'},
  {'amount': 3.333333333333333,
   'loc': 3.333333333333333,
   'location': 'NO',
   'name': 'A',
   'product': 'B',
   'type': 'technosphere',
   'uncertainty type': 0,
   'unit': 'C'},
  {'amount': 3.333333333333333,
   'loc': 3.333333333333333,
   'location': 'RoW',
   'name': 'A',
   'product': 'B',
   'type': 'technosphere',
   'uncertainty type': 0,
   'unit': 'C'}],
 'location': ('ecoinvent', 'UN-NEUROPE')}