<img src="images/sentierdev_logo.png">

# Need for change

The current way we work with industrial ecology data does not scale. Examples:

* ~Name systems~
* ~Searching for names~
* Unit process inventories and models

What has excited the community in the last few years?

* [ESSD](https://essd.copernicus.org/)
* [Climate trace](https://climatetrace.org/)
* [HESTIA](https://www.hestia.earth/)

*My* conclusion: Insanity is doing the same thing over and over again and expecting different results

# The problems we face creating something new

* ~Finding~
* Maintaining
* Adapting

# Our proposal

<img src="images/overview.png">

# Data

We imagine a data storage organized around column-oriented formats and workflows. Each data object would have some metadata (lineage, license, logs) and a data frame.

If the data frame included multiple columns, then the observations in each row are correlated.

Each column header is a term in the Sentier.dev vocabulary.

There are some columns which give the spatial, temporal, and technological specificity of the observations.

| timestamp | location IRI | model IRI | fuel consumption IRI | CO2 IRI | Particulate IRI |
| --- | --- | --- | --- | --- | --- |
| 2024-09-26 08:06:20Z | https://geonames.org/3208175/ | https://example.org/EASA.A.064 | ... | ... | ... |

When we want to do a calculation, the data storage is queried for all relevant data at the spatiotemporal context and term specificity requested; the model can choose to broaden its query if not enough data is found initially.

Multiple data frames for a given input term can be returned; it is up to the model to define how to weight or resample these different data sets, though we will provide generic tools for resampling, data quality assessment, and gap filling.

One awesome benefit of this approach is how easy maintenance beceomes - as new data enters the system, the continuous integration does an assessment across the whole database of what has changed, why it changed, and how much closer the database as a whole is to our defined validation goals. An example of a validation goal could be that we include enough emissions of PFAS to account for the marginal change in observed regional concentrations. We can then automatically approve the new data (and send the submitter the report on why their contribution mattered) if the changes are within our accepted norms, or escalate the data to a working group or the broader community.

The volunteers currently supporting DdS have no interest in arguing about system models, and will start with data and models which can reproduce measured flows in the economy and into and out of the environment. We welcome people building other system models in our platform.

# Models

Models are mathematical formulae implemented in code which transform input data sets into output data sets. 

They should be well tested and documented, preferably with reference to literature which describes why they are constructed the way they are.

Models need to be implemented in a FOSS language which can run easily in a container. We will start with Python but are open to other languages. Our intent is to not have any step of the pipeline assume a certain language; rather, we will be strict on the data schema and validation constraints.

Models built on our base classes can also do the following:

* Local and semi-local sensitivity analysis (global sensitivity analysis across the whole database would be quite expensive and of little practical value, or at least that's what we think now)
* Describe themselves graphically via code introspection
* Perform outlier detection
* Give feedback to users on more

## Functional paradigm

Our design principle follow functional principles - functions are not allowed to change state, but only to produce new objects. We can't design efficient graph traversal without guarantees on graph structure.

## Ancillary models

Models can call additional models needed to satisfy the necessary properties, and spatial/temporal input requirements. For example, the input is available or commonly imported from another place - we call the transport model to supplement our production model. We have similar things for stocks/flows and certain types of storage. In this sense, a given model can change its local graph structure, but not the global graph structure. We're still thinking about how to implement this, you are free to read and contribute to these discussions:

* [Transport in the network](https://github.com/sentier-dev/sentier.dev/discussions/18)
* [Generic models](https://github.com/sentier-dev/sentier.dev/discussions/17)

## Hybrid calculation structure

We think that we will end up running nonlinear functions (models) until a certain cutoff criteria is met, and then we will drop down to the linearized representation of supply chains (matrix). It could be that there is more than one matrix for different sectors or locations.

# Group task

We are collecting models and data sources here: [https://github.com/sentier-dev/sentier.dev/discussions](https://github.com/sentier-dev/sentier.dev/discussions).

Your task is to suggest new data sources and models.

## Data

Your task is to imagine data that you think would be useful, for LCA, MFA, IO, circular economy, or other industrial ecology, or related sectors. We would rather have too much data than to little. We are especially interested in data sets, with many data points for a given model coefficient; this can include time series data.

In addition to classic IE data, we would love to have things like:

* Household consumption by product category
* Prices (any type)
* Population
* Trade volumes, with as much detail as possible
* Measured or estimated fluxes into air and water of elementary flows
* Geographic patterns of activity

[Data suggestion form](https://github.com/sentier-dev/sentier.dev/discussions/new?category=data-suggestion)

## Models

A model for the sake of this exercise is a formal description of the logic behind a product system. This could be written in code, but could also be equations in a paper. We will need to adapt everything at first until we find patterns or tools to make this process easier, so are open to all kinds of input sources.

Here are some of the models we have thought about using:

* [HESTIA models](https://gitlab.com/hestia-earth/hestia-engine-models)
* [US EPA WARM](https://www.epa.gov/warm)
* [carculator](https://github.com/romainsacchi/carculator)
* [GREET](https://www.energy.gov/eere/greet) (we can make GREET great ;)
* [PRELIM](https://www.ucalgary.ca/energy-technology-assessment/open-source-models/prelim)

[Model suggestion form](https://github.com/sentier-dev/sentier.dev/discussions/new?category=model-suggestion)