# `ecoinvent_interface`

This library is the "missing API" for ecoinvent data and files. Please note that **it is unofficial and not supported** by the ecoinvent centre, and they could break it at any time.

## Authentication

The `ecoinvent_interface` requires a valid ecoinvent account. Authentication is done via the `Settings` object. Accessing ecoinvent requires supplying a username and password.

Note that you **must accept** the ecoinvent license and personal identifying information agreement **on the website** before using your user account via this library.

You can provide credentials in three ways:

* Manually, via arguments to the `Settings` object instantiation:

```python
from ecoinvent_interface import Settings
my_settings = Settings(username="bob", password="example")
```

* Via the `EI_PASSWORD` and `EI_USERNAME` environment variables

```bash
export EI_USERNAME=bob
export EI_PASSWORD=example
```

If your environment variable values have special characters, using single quotes should work, e.g. `export EI_PASSWORD='compl\!cat$d'`.

Followed by:

```python
from ecoinvent_interface import Settings
# Environment variables read automatically
my_settings = Settings()
```

* Or with the use of a [pydantic_settings secrets directory](https://docs.pydantic.dev/latest/usage/pydantic_settings/#secrets). The easiest way to create the correct files is via the utility function `permanent_setting`:

```python
from ecoinvent_interface import Settings, permanent_setting
permanent_setting("username", "bob")
permanent_setting("password", "example")
# Secrets files read automatically
my_settings = Settings()
```

Secrets files are stored in `ecoinvent_interface.storage.secrets_dir`.

For each value, manually set values always *take precedence* over environment variables, which in turn *take precendence* over secrets files.

A reasonable guide for choosing between the three is to use secrets on your private, local machine, and to use environment variables on servers or containers. We will use environment variables in this demo.

In [12]:
%set_env EI_USERNAME <put stuff here it can have spaces>
%set_env EI_PASSWORD <put stuff here it can have spaces>

env: MY_FIRST_ENVVAR=Hi mom


# Database releases

To get a database release, we need to make three selections. First, the version:

In [8]:
from ecoinvent_interface import EcoinventRelease, Settings, ReleaseType, CachedStorage
my_settings = Settings()
ei = EcoinventRelease(my_settings)

In [2]:
ei.list_versions()

['3.10',
 '3.9.1',
 '3.9',
 '3.8',
 '3.7.1',
 '3.7',
 '3.6',
 '3.5',
 '3.4',
 '3.3',
 '3.2',
 '3.1',
 '3.01',
 '2']

Second, the system model:

In [3]:
ei.list_system_models('3.7.1')

['cutoff', 'consequential', 'apos']

The ecoinvent API uses a short and long form of the system model names; you can get the longer names by passing translate=False. You can use either form in all EcoinventRelease methods.

In [4]:
ei.list_system_models('3.7.1', translate=False)

['Allocation cut-off by classification',
 'Substitution, consequential, long-term',
 'Allocation at the Point of Substitution']

Finally, the type of release. These are stored in an `Enum`. There are six release types; if you just want the database to do calculations choose the `ecospold` type.

* `ReleaseType.ecospold`: The single-output unit process files in ecospold2 XML format
* `ReleaseType.matrix`: The so-called "universal matrix export"
* `ReleaseType.lci`: LCI data in ecospold2 XML format
* `ReleaseType.lcia`: LCIA data in ecospold2 XML format
* `ReleaseType.cumulative_lci`: LCI data in Excel
* `ReleaseType.cumulative_lcia`: LCIA data in Excel

In [6]:
list(ReleaseType)

[<ReleaseType.ecospold: 'ecoinvent {version}_{system_model_abbr}_ecoSpold02.7z'>,
 <ReleaseType.matrix: 'universal_matrix_export_{version}_{system_model_abbr}.7z'>,
 <ReleaseType.lci: 'ecoinvent {version}_{system_model_abbr}_lci_ecoSpold02.7z'>,
 <ReleaseType.lcia: 'ecoinvent {version}_{system_model_abbr}_lcia_ecoSpold02.7z'>,
 <ReleaseType.cumulative_lci: 'ecoinvent {version}_{system_model_abbr}_cumulative_lci_xlsx.7z'>,
 <ReleaseType.cumulative_lcia: 'ecoinvent {version}_{system_model_abbr}_cumulative_lcia_xlsx.7z'>]

See the ecoinvent website for more information on what these values mean.

Once we have made a selection for all three choices, we can get the release files. They are saved to a cache directory and extracted by default.

In [7]:
ei.get_release(version='3.7.1', system_model='apos', release_type=ReleaseType.matrix)

PosixPath('/Users/cmutel/Library/Application Support/EcoinventInterface/cache/universal_matrix_export_3.7.1_apos')

## Cached storage

The default cache uses [platformdirs](https://platformdirs.readthedocs.io/en/latest/), and the directory location is OS-dependent. You can use a custom cache directory with by specifying `output_dir` when creating the `Settings` class instance.

You can work with the cache when offline:

In [9]:
cs = CachedStorage()
list(cs.catalogue)

['ecoinvent 3.10_cutoff_ecoSpold02.7z',
 'ecoinvent 3.10_LCIA_implementation.7z',
 'ecoinvent 3.9.1_cutoff_ecoSpold02.7z',
 'ecoinvent 3.8_cutoff_ecoSpold02.7z',
 'ecoinvent 3.9.1_LCIA_implementation.7z',
 'ecoinvent 3.8_LCIA_implementation.7z',
 'universal_matrix_export_3.7.1_apos.7z']

In [11]:
cs.catalogue['ecoinvent 3.9.1_LCIA_implementation.7z']

{'path': '/Users/cmutel/Library/Application Support/EcoinventInterface/cache/ecoinvent 3.9.1_LCIA_implementation',
 'archive': 'ecoinvent 3.9.1_LCIA_implementation.7z',
 'extracted': True,
 'created': '2024-06-08T22:37:35.870041',
 'system_model': None,
 'version': '3.9.1',
 'kind': 'extra'}

## `EcoinventRelease` *extra* files

There are two other kinds of files available: *reports*, and what we call *extra* files. Let's see the *extra* files for version `'3.7.1'`:

In [12]:
ei.list_extra_files('3.7.1')

{'ecoinvent 3.7.1_Change Report_including Annex.zip.7z': {'uuid': 'bfa0318b-9b5a-4d6c-b9f2-eb49c1719b1a',
  'size': 5211864,
  'modified': datetime.datetime(2023, 4, 25, 0, 0)},
 'ecoinvent 3.7.1_LCIA_implementation.7z': {'uuid': '5d92d6da-9f05-463f-a7c9-f436f4e2eaec',
  'size': 8292322,
  'modified': datetime.datetime(2023, 4, 25, 0, 0)},
 'electricity_analysis_3.7.1_Allocation, APOS.xlsx': {'uuid': 'c3593a36-b11b-4f82-b183-2624a18addf8',
  'size': 40350040,
  'modified': datetime.datetime(2023, 4, 25, 0, 0)},
 'electricity_analysis_3.7.1_Allocation, cut-off.xlsx': {'uuid': '7577d77f-1274-44fd-9fc7-eb77c4e15286',
  'size': 40283505,
  'modified': datetime.datetime(2023, 4, 25, 0, 0)},
 'electricity_analysis_3.7.1_Consequential.xlsx': {'uuid': 'dd378ebd-a063-41c1-8230-911bf2f01c72',
  'size': 4408638,
  'modified': datetime.datetime(2023, 4, 25, 0, 0)},
 'market_composition_3.7.1.xlsx': {'uuid': '0ea836e7-8b24-44c2-bcaf-b2f398a44e18',
  'size': 3476642,
  'modified': datetime.datetime(

This returns a dictionary of filenames and metadata. We can download the `ecoinvent 3.7.1_LCIA_implementation.7z` file; by default it will automatically be extracted.

In [13]:
ei.get_extra(version='3.7.1', filename='ecoinvent 3.7.1_LCIA_implementation.7z')

PosixPath('/Users/cmutel/Library/Application Support/EcoinventInterface/cache/ecoinvent 3.7.1_LCIA_implementation')

## `EcoinventRelease` *reports*

Reports require a login but not a version number:

In [14]:
ei.list_report_files()

{'Allocation, cut-off, EN15804_documentation.pdf': {'uuid': 'a90b3cbf-bbf6-4b49-aef7-4f7a53250287',
  'size': 5078006,
  'modified': datetime.datetime(2021, 10, 1, 0, 0),
  'description': 'This document provides a documentation on the calculation of the indicators in the “Allocation, cut-off, EN15804” system model.'},
 'Consideration of land use change in ecoinvent version 3.3.pdf': {'uuid': '0a9ff340-abce-42b3-8424-3ec53468bb40',
  'size': 1170983,
  'modified': datetime.datetime(2021, 2, 5, 0, 0),
  'description': 'This document describes the LUC accounting method applied to ecoinvent v3.3'},
 'ecoinvent 3 report_Agriculture.zip': {'uuid': '136cfe6e-da14-41cb-960e-378d07f96261',
  'size': 5710488,
  'modified': datetime.datetime(2017, 5, 31, 0, 0),
  'description': 'This file contains several .pdf reports and xlsx files dealing with the agricultural sector in ecoinvent v3.'},
 'ecoinvent 3 report_Life cycle inventories for the treatment of iron and steel industry by-products.pdf': {'

Downloading follows the same pattern as before:

In [15]:
ei.get_report('Allocation, cut-off, EN15804_documentation.pdf')

PosixPath('/Users/cmutel/Library/Application Support/EcoinventInterface/cache/Allocation, cut-off, EN15804_documentation.pdf')

Zip and 7z files are extracted by default.

# `EcoinventProcess` interface

This class gets data and reports for specific processes. It first needs to know what release version and system model to work with:

In [16]:
from ecoinvent_interface import EcoinventProcess, Settings
my_settings = Settings()
ep = EcoinventProcess(my_settings)
ep.set_release(version="3.7.1", system_model="apos")

### Finding a dataset id

The ecoinvent API uses integer indices (e.g. `https://ecoquery.ecoinvent.org/3.10/cutoff/dataset/7957`, and these values aren't found in the release values. We have cached these indices for versions `3.7.1`, `3.8`, and `3.9.1`. If you already know the integer index, you can use that:

In [17]:
ep.select_process(dataset_id="1")

You can also use the filename, if you know it:

In [18]:
F = "b0eb27dd-b87f-4ae9-9f69-57d811443a30_66c93e71-f32b-4591-901c-55395db5c132.spold"
ep.select_process(filename=F)
ep.dataset_id

'1'

Finally, you can pass in a set of `attributes`. You should use the name, reference product, and/or location to uniquely identify a process. You don't need to give all attributes, but will get an error if the attributes aren't specific enough.

`attributes` is a dictionary, and can take the following keys: `name` or `activity_name`, `reference product` or `reference_product`, and `location` or `geography`. The system will adapt the names as needed to find a match.

In [19]:
ep.select_process(
    attributes={
        "name": "rye seed production, Swiss integrated production, for sowing",
        "location": "CH",
        "reference product": "rye seed, Swiss integrated production, for sowing",
    }
)
ep.dataset_id

'40'

### Basic process information

Once you have selected the process, you can get basic information about that process:

In [21]:
ep.get_basic_info()

{'index': 40,
 'version': '3.7.1',
 'system_model': 'apos',
 'activity_name': 'rye seed production, Swiss integrated production, for sowing',
 'geography': {'comment': None,
  'short_name': 'CH',
  'long_name': 'Switzerland'},
 'reference_product': 'rye seed, Swiss integrated production, for sowing',
 'has_access': True,
 'unit': 'kg',
 'sector': 'Agriculture & Animal Husbandry'}

You can also call `ep.get_documentation()` to get a representation of the ecospold2 XML file in Python.

### Process documents

You can use `ep.get_file` with one of the following file types to download process files:

* ProcessFileType.upr: Unit Process ecospold XML
* ProcessFileType.lci: Life Cycle Inventory ecospold XML
* ProcessFileType.lcia: Life Cycle Impact Assessment ecospold XML
* ProcessFileType.pdf: PDF Dataset Report
* ProcessFileType.undefined: Undefined (unlinked and multi-output) Dataset PDF Report

For example:

In [22]:
from ecoinvent_interface import ProcessFileType
from pathlib import Path
ep.get_file(file_type=ProcessFileType.lcia, directory=Path.cwd())

PosixPath('/Users/cmutel/Code/DdS/brightcon-2024-material/talks/Thursday/ecoinvent_XXX talk/ecoinvent-3.7.1-apos-lcia-40.xml')

Would download the life cycle impact assessment ecospold XML file to the current working directory. The `get_file` method requires specifying the `directory`.

# Relationship to EIDL

This library initially started as a fork of [EIDL](https://github.com/haasad/EcoInventDownLoader), the ecoinvent downloader. As of version 2.0, it has been completely rewritten. Currently only the authentication code comes from `EIDL`.

Differences with `EIDL`:

* Designed to be a lower-level infrastructure library. All user and web browser interaction was removed.
* Username and password can be specified using [pydantic_settings](https://docs.pydantic.dev/latest/usage/pydantic_settings/).
* Can download all release files, plus reports and "extra" files.
* Will autocorrect filenames when possible for ecoinvent inconsistencies.
* Can download data on inventory processes.
* Can find inventory processes using their filename or attributes.
* Uses a more robust caching and cache validation strategy.
* More reasonable token refresh strategy.
* No HTML parsing or filename string hacks.
* Streaming downloads.
* Descriptive logging and error messages.
* No shortcuts for Brightway or other LCA software.
* Custom library headers are set to allow users of this library to be identified. No user information is transmitted.
* Comprehensive tests.

# Usage in `brightway`

You will be much happier in life if you use `bw2io.import_ecoinvent_release` instead of trying to work with the raw ecoinvent data.

Here are the input arguments for this function:

* version: The ecoinvent release version as a string, e.g. '3.9.1'
* system_model: The system model as a string in short or long form, e.g. 'apos' or 'Allocation cut-off by classification'
* username: ecoinvent username
* password: ecoinvent password
* lci: Flag on whether to import the inventory database
* lcia: Flag on whether to import the LCIA impact categories. The biosphere database must exist if `lci` is `False`
* biosphere_name: Name of database to store biosphere flows. They will be stored in the main LCI database if not specified.
* biosphere_write_mode: How to handle an existing biosphere database. Must be either `replace` or `patch`
* importer_signal: Used by the Activity Browser to provide feedback during the import
* namespace_lcia_methods: Add ecoinvent version as a prefix to LCIA impact categories, e.g. `("ecoinvent-3.9.1", "global warming")`. Helps clarify the version intended for use, and allows for multiple LCIA implementation versions to be installed in parallel

See [the source code](https://github.com/brightway-lca/brightway2-io/blob/main/bw2io/ecoinvent.py#L174) for all the things that we need to fix to get imports working.

In [1]:
import bw2data as bd
import bw2io as bi

In [2]:
bd.projects.set_current("ecoinvent_interface demo")

In [3]:
bi.import_ecoinvent_release("3.8", "cutoff")

Applying strategy: normalize_units
Applying strategy: drop_unspecified_subcategories
Applying strategy: ensure_categories_are_tuples
Applied 3 strategies in 0.00 seconds
4421 datasets
	0 exchanges
	Links to the following databases:

	0 unlinked exchanges (0 types)
		


100%|████████████████████████████████████| 4421/4421 [00:00<00:00, 59619.06it/s]

Vacuuming database 





Created database: ecoinvent-3.8-biosphere
Extracting XML data from 19565 datasets
Extracted 19565 datasets in 25.04 seconds
Applying strategy: normalize_units
Applying strategy: update_ecoinvent_locations
Applying strategy: remove_zero_amount_coproducts
Applying strategy: remove_zero_amount_inputs_with_no_activity
Applying strategy: remove_unnamed_parameters
Applying strategy: es2_assign_only_product_with_amount_as_reference_product
Applying strategy: assign_single_product_as_activity
Applying strategy: create_composite_code
Applying strategy: drop_unspecified_subcategories
Applying strategy: fix_ecoinvent_flows_pre35
Applying strategy: drop_temporary_outdated_biosphere_flows
Applying strategy: link_biosphere_by_flow_uuid
Applying strategy: link_internal_technosphere_by_composite_code
Applying strategy: delete_exchanges_missing_activity
Applying strategy: delete_ghost_exchanges
Applying strategy: remove_uncertainty_from_negative_loss_exchanges
Applying strategy: fix_unreasonably_high_l

100%|████████████████████████████████████| 19565/19565 [00:23<00:00, 835.84it/s]


Vacuuming database 
Created database: ecoinvent-3.8-cutoff
Substituting Chlortoluron for Chlorotoluron
Skipping unmatched flow Cyfluthrin:(soil, agricultural)
Substituting Thiophanate-methyl for Thiophanat-methyl
Substituting Fluorochloridone for Flurochloridone
