# `ecoinvent_migrate` Library

`ecoinvent_migrate` makes the change report Excel files from ecoinvent usable.

These files are designed to allow for relinking against new versions of ecoinvent, **not** for updating an existing ecoinvent installation.

This library requires a valid ecoinvent license for all functionality. Output files are provided in the `outputs` directory.

## Usage

Migration are from one release to the next, e.g. from 3.5 to 3.6. There are separate files, and separate functions, for technosphere and biosphere edges. The files produced are serialized to JSON and software agnostic, but play well with the Brightway ecosystem.

### Technosphere

The technosphere mapping needs to install both versions of the ecoinvent release, as the provided changes file does not give enough information to do mapping. Specifically, we need to use the information in the release to look up the production volumes in order to do production-weighted `1-to-N` disaggregations.

In [1]:
from ecoinvent_migrate import *
filepath = generate_technosphere_mapping("3.7.1", "3.8")

[32m2024-09-25 23:33:22.908[0m | [1mINFO    [0m | [36mecoinvent_migrate.utils[0m:[36mconfigure_logs[0m:[36m18[0m - [1mWriting logs to /Users/cmutel/Library/Logs/ecoinvent_migrate/2024-09-25T23-33-22[0m
[32m2024-09-25 23:33:23.885[0m | [1mINFO    [0m | [36mecoinvent_migrate.main[0m:[36mget_change_report_context[0m:[36m61[0m - [1mVersions available for this license: ['3.10', '3.9.1', '3.9', '3.8', '3.7.1', '3.7', '3.6', '3.5', '3.4', '3.3', '3.2', '3.1', '3.01', '2'][0m
[32m2024-09-25 23:33:24.472[0m | [1mINFO    [0m | [36mecoinvent_migrate.main[0m:[36mget_change_report_context[0m:[36m74[0m - [1mUsing change report annex file Change Report Annex v3.7.1 - v3.8.xlsx[0m
[32m2024-09-25 23:33:27.741[0m | [1mINFO    [0m | [36mecoinvent_migrate.data_io[0m:[36mget_brightway_databases[0m:[36m113[0m - [1mLoading source database ecoinvent-3.7.1-cutoff to cache data attributes[0m
[32m2024-09-25 23:33:28.508[0m | [1mINFO    [0m | [36mecoinvent_migr

This produces a file which has allows foreground inventory datasets which linked to 3.7.1 to find suitable replacement datasets in 3.8. The produced file looks like this:

In [2]:
import json
data = json.load(open(filepath))
{key: value for key, value in data.items() if key not in ("create", "replace", "update", "delete", "disaggregate")}

{'name': 'ecoinvent-3.7.1-cutoff-ecoinvent-3.8-cutoff',
 'description': 'Data migration file from ecoinvent-3.7.1-cutoff to ecoinvent-3.8-cutoff generated with `ecoinvent_migrate` version 0.4.1',
 'contributors': [{'title': 'ecoinvent association',
   'path': 'https://ecoinvent.org/',
   'role': 'author'},
  {'title': 'Chris Mutel',
   'path': 'https://chris.mutel.org/',
   'role': 'wrangler'}],
 'created': '2024-09-25T21:33:29.837546+00:00',
 'version': '2.0.0',
 'licenses': [{'name': 'CC-BY-4.0',
   'path': 'https://creativecommons.org/licenses/by/4.0/legalcode',
   'title': 'Creative Commons Attribution 4.0 International'}],
 'graph_context': ['edges'],
 'mapping': {'source': {'expression language': 'XPath',
   'labels': {'filename': "concat(//*:activity/@id, '_', //*:intermediateExchange[*:outputGroup = '0' and @amount > 0]/@intermediateExchangeId, '.spold')",
    'name': '//*:activityName/text()',
    'location': '//*:geography/*:shortname/text()',
    'reference product': "//*:in

In [3]:
data['replace'][0]

{'source': {'name': 'assembly of generator and motor, auxilliaries and energy use, heat and power co-generation unit, 160kW electrical',
  'location': 'RoW',
  'reference product': 'assembly of generator and motor, auxilliaries and energy use, for heat and power co-generation unit, 160 KW electrical',
  'unit': 'unit'},
 'target': {'name': 'assembly of generator and motor, auxilliaries and energy use, heat and power co-generation unit, 160kW electrical',
  'location': 'RoW',
  'reference product': 'assembly of generator and motor, auxilliaries and energy use, heat and power co-generation unit, 160kW electrical',
  'unit': 'unit'}}

In [4]:
data['disaggregate'][0]

{'source': {'name': 'application of plant protection product, by field sprayer',
  'location': 'RoW',
  'reference product': 'application of plant protection product, by field sprayer',
  'unit': 'ha'},
 'targets': [{'name': 'application of plant protection product, by field sprayer',
   'location': 'Canada without Quebec',
   'reference product': 'application of plant protection product, by field sprayer',
   'unit': 'ha',
   'allocation': 0.025737164985667214},
  {'name': 'application of plant protection product, by field sprayer',
   'location': 'RoW',
   'reference product': 'application of plant protection product, by field sprayer',
   'unit': 'ha',
   'allocation': 0.9742628350143328}]}

To use this file, you will need to take different actions for the two verbs. For `replace`, edges in your foreground which link to an ecoinvent dataset with the same attributes as in the `source` section can be replaced one-to-one with an edge to an ecoinvent process from the later release whose attributes match those in the `target` section. For the `disaggregate` verb, you will need to split the initial foreground edge into two or more edges, and scale the original amount and uncertainty information by the `allocation` value.

If you are using Brightway, there are convenience functions in `randonneur` and cached migration files which will be used for you automatically.

Migrations are designed and only tested for forward progress, i.e. from one release to the next subsequent release. Going in the opposite direction is not recommended.

You can't skip across multiple releases - attempting to do so will raise a `VersionJump` error:

In [5]:
generate_technosphere_mapping("3.7.1", "3.9.1")

[32m2024-09-25 23:33:29.916[0m | [1mINFO    [0m | [36mecoinvent_migrate.utils[0m:[36mconfigure_logs[0m:[36m18[0m - [1mWriting logs to /Users/cmutel/Library/Logs/ecoinvent_migrate/2024-09-25T23-33-29[0m
[32m2024-09-25 23:33:30.810[0m | [1mINFO    [0m | [36mecoinvent_migrate.main[0m:[36mget_change_report_context[0m:[36m61[0m - [1mVersions available for this license: ['3.10', '3.9.1', '3.9', '3.8', '3.7.1', '3.7', '3.6', '3.5', '3.4', '3.3', '3.2', '3.1', '3.01', '2'][0m


VersionJump: 
Source (3.7.1) and target (3.9.1) don't have a change report.
Usually this is the case when one jumps across multiple releases, but not always.
For example, the change report for 3.7.1 is from 3.6, not 3.7.
The change report we have available is:
Change Report Annex v3.9 - v3.9.1.xlsx
        

Technosphere mapping files are system model specific, and the default system model is `cutoff`. You can specify a different system model following the `ecoinvent_interface` [function specification](https://github.com/brightway-lca/ecoinvent_interface?tab=readme-ov-file#database-releases) with the `system_model` parameters, e.g. `generate_technosphere_mapping(..., system_model='apos')`.

### Biosphere

The same procedure applies for biosphere edges:

In [6]:
from ecoinvent_migrate import *
filepath = generate_biosphere_mapping("3.9.1", "3.10", keep_deletions=True)

[32m2024-09-25 23:33:35.720[0m | [1mINFO    [0m | [36mecoinvent_migrate.utils[0m:[36mconfigure_logs[0m:[36m18[0m - [1mWriting logs to /Users/cmutel/Library/Logs/ecoinvent_migrate/2024-09-25T23-33-35[0m
[32m2024-09-25 23:33:35.737[0m | [1mINFO    [0m | [36mecoinvent_migrate.main[0m:[36mgenerate_biosphere_mapping[0m:[36m215[0m - [1mThe `EE Deletions` format is not consistent across versions.
Please check the outputs carefully before applying them.[0m
[32m2024-09-25 23:33:36.651[0m | [1mINFO    [0m | [36mecoinvent_migrate.main[0m:[36mget_change_report_context[0m:[36m61[0m - [1mVersions available for this license: ['3.10', '3.9.1', '3.9', '3.8', '3.7.1', '3.7', '3.6', '3.5', '3.4', '3.3', '3.2', '3.1', '3.01', '2'][0m
[32m2024-09-25 23:33:37.260[0m | [1mINFO    [0m | [36mecoinvent_migrate.main[0m:[36mget_change_report_context[0m:[36m74[0m - [1mUsing change report annex file Change Report Annex v3.9.1 - v3.10.xlsx[0m
[32m2024-09-25 23:33:38.1

In [7]:
import json
data = json.load(open(filepath))
{key: value for key, value in data.items() if key not in ("create", "replace", "update", "delete", "disaggregate")}

{'name': 'ecoinvent-3.9.1-biosphere-ecoinvent-3.10-biosphere',
 'description': 'Data migration file from ecoinvent-3.9.1-biosphere to ecoinvent-3.10-biosphere generated with `ecoinvent_migrate` version 0.4.1',
 'contributors': [{'title': 'ecoinvent association',
   'path': 'https://ecoinvent.org/',
   'role': 'author'},
  {'title': 'Chris Mutel',
   'path': 'https://chris.mutel.org/',
   'role': 'wrangler'}],
 'created': '2024-09-25T21:33:38.131050+00:00',
 'version': '2.0.0',
 'licenses': [{'name': 'CC-BY-4.0',
   'path': 'https://creativecommons.org/licenses/by/4.0/legalcode',
   'title': 'Creative Commons Attribution 4.0 International'}],
 'graph_context': ['edges'],
 'mapping': {'source': {'expression language': 'XPath',
   'labels': {'name': '//*:elementaryExchange/*:name/text()',
    'unit': '//*:elementaryExchange/*:unitName/text()',
    'uuid': '//*:elementaryExchange/@elementaryExchangeId',
    'formula': '//*:elementaryExchange/@formula',
    'context': ['//*:elementaryExchan

In [8]:
data["replace"][0]

{'source': {'uuid': '4f777e05-70f9-4a18-a406-d8232325073f',
  'name': '2,4-D amines'},
 'target': {'uuid': 'b6b4201e-0561-5992-912f-e729fbf04e41',
  'name': '2,4-D dimethylamine salt'}}

In [9]:
data["delete"][0]

{'source': {'uuid': '91861063-1826-4860-9957-7c5bde5817a6',
  'name': 'Salt water (obsolete)'},
 'comment': 'There is no salt water flow in ecoinvent.'}

By default, the `delete` verb is skipped, as this is a more cautious approach to existing data. To have the `delete` section included, call `generate_biosphere_mapping(..., keep_deletions=True)`.

### Common input arguments

Both `generate_technosphere_mapping` and `generate_biosphere_mapping` accept the following input arguments:

* source_version (str): String representation of an ecoinvent version, e.g. "3.8"
* target_version (str): String representation of an ecoinvent version, e.g. "3.8"
* ecoinvent_username (str, optional): Ecoinvent account username
* ecoinvent_password (str, optional): Ecoinvent account password
* write_logs (bool, default `True`): Create detailed and high-level logs during mapping file creation
* output_directory (`pathlib.Path`, default is `platformlibs.user_data_dir`): Directory for the result files

The following input parameters should normally be left to their default values:

* project_name (str): The Brightway project name into which we install ecoinvent releases to check change report data validity.
* output_version (str, default is "1.0.0"): [Datapackage version number](https://specs.frictionlessdata.io/data-package/#version)
* licenses (list, default is CC-BY): Licenses following the [frictionless data datapackage standard](https://specs.frictionlessdata.io/data-package/#licenses)
* description (str, default is auto-generated): Description of generated datapackage.

### How does this library work?

We start by using [ecoinvent_interface](https://github.com/brightway-lca/ecoinvent_interface) to download the change report Excel file, and the two ecoinvent releases (source and target). We need to download the ecoinvent data because the change report is for the unlinked and unallocated "master" data; there are some changes needed for the specific system models.

For biosphere mapping, we read the Excel file, search around for the correct worksheet and column names, and map the data to "replace" and "delete" sections. This is pretty simple.

For technosphere mapping, we need to check if the indicated datasets are actually in `GLO` or in `RoW` (and analogously in `RER` / `RoE`.) We do this by finding the corresponding datasets in the actual database releases. We also need to use the actual data to look up the allocation factors when a single dataset is split into multiple datasets.

Not every line in the change report Excel file can be used, either because of the specifics of the system model, or some other unknown discrepancy. These exceptions are logged to both the log files and `sys.stderr`:

```console
2024-06-14 14:17:38.641 | WARNING  | ecoinvent_migrate.wrangling:resolve_glo_row_rer_roe:219 -
    Target process given in change report but missing in ecoinvent-3.8-cutoff lookup:
    {'name': 'rutile production, synthetic, 95% titanium dioxide, Benelite process',
     'location': 'GLO',
     'reference product': 'rutile, 95% titanium dioxide', 'unit': 'kg'}
```

We also need to [supplement the change report](https://github.com/brightway-lca/ecoinvent_migrate/blob/main/ecoinvent_migrate/patches.py) with changes in the data not included in the change report. These patches are incomplete, we welcome help in helping create a complete migration set.

Once the given change data is segregated and cleaned, it is serialized to JSON, and manually added to `randonneur_data`.