# Interacting with openLCA using Python
### The olca-schema module
<subtitle>Created: Monday, February 5, 2024</subtitle>

This notebook examines the olca-schema Python package, developed by GreenDelta for interacting with openLCA.

This notebook was written by Priyadarshini and Tyler W. Davis (2023&ndash;2024).
The requirements for executing this code are:

- Python 3.11 (or higher)
- Jupyter Lab 4.0 (or higher)
- olca-ipc 2.0 (or higher)
- PyYaml 6.0 (or higher)
- Pandas 2.0 (or higher)

The `olca-schema` (or `olca_schema` as referenced by the Python interpreter) package is a dependency of `olca-ipc`, the definitions and methods for working with the openLCA app through the IPC service.

In [None]:
import olca_schema as o

A package is an object when imported with data type, 'module'; you should see it in your local variables list (remember that we gave the package an alias, 'o').

In [None]:
type(o)

In [None]:
dir()

The main interaction with openLCA is through the **root entities**.
These are class definitions in the `olca-schema` package.
Let's look to see what's available.

In [None]:
dir(o)

We see 'Actor' as one of the root entities.
Let's see what its documentation says about it.
A quick shorthand for accessing the documentation for a class, attribute, or method is to add a question mark (?) after it

In [None]:
o.Actor?

> What happens when you add a second question mark after the class name above?

It looks like an Actor can have several optional **attributes**.
Let try to instantiate an Actor class object with several variations of parameters.

In [None]:
o.Actor() # no parameters

In [None]:
o.Actor(name='James Bond')

In [None]:
o.Actor(name='James Bond', city='London', country='UK')

> What happens with each initialization?
> What changes?
> What stays the same?

Let's try saving our Actor to a variable. 
Try using your information below.

In [None]:
a = o.Actor(
    name='', 
    city='', 
    country=''
)

Let's see what we made.

In [None]:
a

> What attributes and methods are associated with the Actor class?

In [None]:
dir(a)

We see the same attributes from the initialization and a few "to" methods (e.g., "to_dict," "to_json," and "to_ref").

Let's start with attributes.
For example, let's try to add our email address.

In [None]:
a.email

Nothing!

We know the expected data type is string (from the documentation); let's set it.

In [None]:
a.email = ''  # add your email

In [None]:
a

That seems to have done the trick.

It's still not a pretty view.
Let's take a closer look at those methods starting with "to_dict".

In [None]:
a.to_dict?

Unfortunately, we live in a world with less-than-perfect documentation.
But, fortunately, it is open-source; you can add the second question mark to see the code.

It looks like it turns the Actor class into a dictionary object.

In [None]:
a.to_dict()

Yep. 

Good guess what the other two do.

In [None]:
a.to_json()

Note that JSON is a string object.

Try printing it so see how it looks.
Notice how similar it is to a Python dictionary!

In [None]:
a.to_ref()

Reference classes are a non-root entity.

These were created because LCA references many other entities.
Think about a process with many input and output flows---each with its own flow definition and (if it's a product flow) its own reference process and each of those with their own inputs and outputs.

How cumbersome to manage all that information in just one process!

The solution?

Create small versions of each class with just enough info to "find the original" in the database.
Enter the `Ref` class, which can be built from any root entity.
This is the premise behind the 'LD' in JSON-LD.

> What is the minimum information needed to find a root entity?

## Unit Process

Empowered with our new knowledge, let's try to create a simple unit process for the classic PET Bottle Production (see [here](https://www.openlca.org/wp-content/uploads/2020/03/GreenDelta-Bottle-Tutorial_1.10.pdf) for reference).

There are three inputs for this process:

- PET Granulate Production (Unit Process)
    - Quantitative reference flow: Granulates (PET, HDPE, PP)
    - Inputs:
        - polyethylene terephthalate (PET) granulate: 60 g
        - polyethylene high density granulate (PE-HD): 4 g
        - polypropylene granulate (PP): 1 g
- PET Transport A (Unit Process)
    - Quantitative reference flow: Granulates (PET, HDPE, PP), transported
    - Inputs:
        - Granulates (PET, HDPE, PP): 0.065 kg
        - Transport in t*km: 0.065 kg \* 500 km
- PET Bottle Filling
    - Quantitative reference flow: PET Bottle, filled
    - Inputs:
        - Granulates (PET, HDPE, PP), transported: 1 item
        - Drinking water: 1 kg

In [None]:
o.Process?

Read the documentation for this class and see what attributes and methods we have.

Let's start with the first process for PET Granulate Production.

In [None]:
gp_up = o.Process(name='PET Granulate Production')

In [None]:
gp_up.category = "A Water Bottle"

In [None]:
gp_up

One of the missing attributes is 'process_type' and we know this is supposed to be a unit process. The expected data type for this attribute is 'ProcessType'.

In [None]:
o.ProcessType

Enum is a data type for enumerations.

> What are the possible values?

In [None]:
gp_up.process_type = o.ProcessType.UNIT_PROCESS

In [None]:
gp_up

Unit process have inputs and outputs.
In the world of openLCA, these are defined as **exchanges**.
Each exchange is a flow, is either an input or output, and has a quantity with units.

In [None]:
o.Exchange?

Let's look at the first exchange: 60 g of polyethylene terephthalate (PET) granulate.

We see that an exchange needs a flow.

In [None]:
o.Flow?

A flow has several optional attributes.
The name, flow properties, and flow type stand out as important for this example.
Let's create a new flow for this exchange.

In [None]:
pet_gran_flow = o.Flow(name="polyethylene terephthalate (PET) granulate")

In [None]:
pet_gran_flow

Let's look at Flow Type first.
We see that it is another enumeration data type.
These granules are a product flow.
Let's pick the right one.

In [None]:
dir(o.FlowType)

In [None]:
pet_gran_flow.flow_type = o.FlowType.PRODUCT_FLOW

In [None]:
pet_gran_flow

We know that product flows are technosphere flows; we can update the category attribute for materials production: plastics.

In [None]:
pet_gran_flow.category = "Materials production/Plastics"

In [None]:
pet_gran_flow

Next are the flow properties, which is a list of FlowPropertyFactor objects.
The [FlowPropertyFactor](https://greendelta.github.io/olca-schema/classes/FlowPropertyFactor.html) is a means for converting between flow properties (e.g., unit conversion from mass to volume or from mass to energy).

In [None]:
o.FlowPropertyFactor?

The flow property factor has a flow property that it references.
A [flow property](https://greendelta.github.io/olca-schema/classes/FlowProperty.html) is a quantity used to express the amounts of a flow.

In [None]:
o.FlowProperty?

Again, we see several optional attributes to describe a flow property, namely: a category, type, name, and unit group.

A common flow property is the physical property, _Mass_.

A flow property includes a reference to a unit group.
A unit group is a collection of related units defined by a base unit (e.g., the kilogram) and additional units that can be derived from the base unit (e.g., milligram).
The derivation is a simple conversion factor (e.g., 0.000001 kg = 1 milligram).

When you create a new database in openLCA, there is an option to include the default flow properties and unit groups, which is really helpful.
Unfortunately, there isn't an easy way to access these outside the software.

The Federal LCA Commons' Elementary Flow List has these defaults defined.
The code below reads the data from this list and generates lists of the olca-schema objects.

In [None]:
# CODE EXCERPT FROM ELECTRICITYLCI

import json
import io
import logging
import os
from zipfile import ZipFile

import requests


def _archive_json(data_list, file_path):
    """Write a list of dictionaries to a JSON file.

    Parameters
    ----------
    data_list : list
        A list of dictionaries.
    file_path : str
        A valid filepath to be written to (CAUTION: overwrites existing data)
    """
    logging.debug("Writing %d items to %s" % (len(data_list), file_path))
    out_str = ",".join([json.dumps(x.to_dict()) for x in data_list])
    out_str = "[%s]" % out_str
    with open(file_path, 'w') as f:
        f.write(out_str)


def _read_fedefl(data_dir="."):
    """Return list of GreenDelta's unit group and flow property objects.

    A local copy of the LCA Commons' Federal Elementary Flow List unit groups
    is either accessed (in eLCI's data directory) or created (using requests).

    Notes
    -----
    This method writes up to two files in electricitylci's data directory:

    -   flow_properties.json
    -   unit_groups.json

    Returns
    -------
    tuple
        A tuple of length two.
        First item is a list of 27 olca-schema UnitGroup objects.
        Second item is a list of 33 olca-schema FlowProperty objects.
    """
    url = (
        "https://www.lcacommons.gov/"
        "lca-collaboration/ws/public/download/json/"
        "repository_Federal_LCA_Commons@elementary_flow_list"
    )
    u_file = "unit_groups.json"
    u_path = os.path.join(data_dir, u_file)
    u_list = []

    # HOTFIX: add flow properties
    p_file = "flow_properties.json"
    p_path = os.path.join(data_dir, p_file)
    p_list = []

    if not os.path.exists(u_path) or not os.path.exists(p_path):
        # Pull from Federal Elementary Flow List
        logging.info("Reading data from Federal LCA Commons")
        r = requests.get(url, stream=True)
        with ZipFile(io.BytesIO(r.content)) as zippy:
            # Find the unit groups, convert them to UnitGroup class
            for name in zippy.namelist():
                # Note there are only three folders in the zip file:
                # 'flow_properties', 'flows', and 'unit_groups';
                # we want the 27 JSON files under unit_groups
                # and the 33 JSON files under flow_properties.
                if name.startswith("unit") and name.endswith("json"):
                    u_dict = json.loads(zippy.read(name))
                    u_obj = o.UnitGroup.from_dict(u_dict)
                    u_list.append(u_obj)
                elif name.startswith("flow_") and name.endswith("json"):
                    p_dict = json.loads(zippy.read(name))
                    p_obj = o.FlowProperty.from_dict(p_dict)
                    p_list.append(p_obj)

        # Archive to avoid running requests again.
        _archive_json(u_list, u_path)
        logging.info("Saved unit groups from LCA Commons to JSON")

        _archive_json(p_list, p_path)
        logging.info("Saved flow properties from LCA Commons to JSON")

    # Only read locally if needed (i.e., if data wasn't just downloaded)
    if os.path.exists(u_path) and len(u_list) == 0:
        logging.info("Reading unit groups from local JSON")
        with open(u_path, 'r') as f:
            my_list = json.load(f)
        for my_item in my_list:
            u_list.append(o.UnitGroup.from_dict(my_item))

    if os.path.exists(p_path) and len(p_list) == 0:
        logging.info("Reading flow properties from local JSON")
        with open(p_path, 'r') as f:
            my_list = json.load(f)
        for my_item in my_list:
            p_list.append(o.FlowProperty.from_dict(my_item))

    return (u_list, p_list)

In [None]:
u_groups, f_props = _read_fedefl()

With these, we can take advantage of olca-schema's units module to find the correct flow property for our unit, kilograms.

In [None]:
import olca_schema.units as o_units

In [None]:
o_units.property_ref('kg')

Now we can find the FlowProperty class object by its **UUID**.

In [None]:
[x.id for x in f_props].index(o_units.property_ref('kg').id)

In [None]:
f_props[15]

We are finally ready to create our FlowPropertyFactor for our product flow!

In [None]:
pet_gran_flow.flow_properties = [
    o.FlowPropertyFactor(
        conversion_factor=1,
        flow_property=f_props[15],
        is_ref_flow_property=True,
    )
]

Now with our flow, we can create our exchange.

In [None]:
o.Exchange?

Note that we can access the flow property using the class dot operator and list indexing.

In [None]:
pet_gran_flow.flow_properties[0].flow_property

In [None]:
pet_gran_ex = o.Exchange(
    amount=0.06,
    flow=pet_gran_flow,
    flow_property=pet_gran_flow.flow_properties[0].flow_property.to_ref(),
    internal_id=2,
    is_avoided_product=False,
    is_input=True,
    is_quantitative_reference=False,
    unit=o_units.unit_ref('kg'),
)

In [None]:
pet_gran_ex

In one fell swoop, we can define the other input exchanges.

In [None]:
pe_hd_flow = o.Flow(
    name="polyethylene high density granulate (PE-HD)",
    flow_type = o.FlowType.PRODUCT_FLOW,
    category = "Materials production/Plastics",
    flow_properties = [
        o.FlowPropertyFactor(
            conversion_factor=1,
            flow_property=f_props[15],
            is_ref_flow_property=True,
        )
    ]
)

pe_hd_ex = o.Exchange(
    amount=0.004,
    flow=pe_hd_flow,
    flow_property=pe_hd_flow.flow_properties[0].flow_property.to_ref(),
    internal_id=3,
    is_avoided_product=False,
    is_input=True,
    is_quantitative_reference=False,
    unit=o_units.unit_ref('kg'),
)

In [None]:
pp_flow = o.Flow(
    name="polypropylene granulate (PP)",
    flow_type = o.FlowType.PRODUCT_FLOW,
    category = "Materials production/Plastics",
    flow_properties = [
        o.FlowPropertyFactor(
            conversion_factor=1,
            flow_property=f_props[15],
            is_ref_flow_property=True,
        )
    ]
)

pp_ex = o.Exchange(
    amount=0.001,
    flow=pp_flow,
    flow_property=pp_flow.flow_properties[0].flow_property.to_ref(),
    internal_id=4,
    is_avoided_product=False,
    is_input=True,
    is_quantitative_reference=False,
    unit=o_units.unit_ref('kg'),
)

In [None]:
pp_ex.to_dict()

And lastly, the output exchange.
Note that I saved the internal_id 1 for this flow.

In [None]:
gran_flow = o.Flow(
    name="Granulates (PET, HDPE, PP)",
    flow_type = o.FlowType.PRODUCT_FLOW,
    category = "A Water Bottle",
    flow_properties = [
        o.FlowPropertyFactor(
            conversion_factor=1,
            flow_property=f_props[15],
            is_ref_flow_property=True,
        )
    ]
)

gran_ex = o.Exchange(
    amount=0.065,
    flow=gran_flow,
    flow_property=gran_flow.flow_properties[0].flow_property.to_ref(),
    internal_id=1,
    is_avoided_product=False,
    is_input=False,
    is_quantitative_reference=True,
    unit=o_units.unit_ref('kg'),
)

Now, add these exchanges to our process.

In [None]:
gp_up.exchanges = [
    gran_ex,
    pet_gran_ex,
    pe_hd_ex,
    pp_ex
]

In [None]:
gp_up

In [None]:
# Find the flow property for Item(s)
[x.id for x in f_props].index(o_units.property_ref('Item(s)').id)

In [None]:
# Find the flow property for kg*km
[x.id for x in f_props].index(o_units.property_ref("kg*km").id)

Note that Transport has an exchange with a provider!

In [None]:
gran_t_flow = o.Flow(
    name="Granulates (PET, HDPE, PP), transported",
    flow_type = o.FlowType.PRODUCT_FLOW,
    category = "A Water Bottle",
    flow_properties = [
        o.FlowPropertyFactor(
            conversion_factor=1,
            flow_property=f_props[1],
            is_ref_flow_property=True,
        )
    ]
)
t_flow = o.Flow(
    name="Transport in t*km",
    flow_type = o.FlowType.PRODUCT_FLOW,
    category = "Transport services/Other transport",
    flow_properties = [
        o.FlowPropertyFactor(
            conversion_factor=1,
            flow_property=f_props[11],
            is_ref_flow_property=True,
        )
    ]
)

trans_up = o.Process(
    name="PET Transport A",
    category="A Water Bottle",
    process_type=o.ProcessType.UNIT_PROCESS,
    is_infrastructure_process=False,
    exchanges=[
        o.Exchange(
            amount=1,
            flow=gran_t_flow,
            flow_property=gran_t_flow.flow_properties[0].flow_property.to_ref(),
            internal_id=1,
            is_avoided_product=False,
            is_input=False,
            is_quantitative_reference=True,
            unit=o_units.unit_ref('Item(s)'),
        ),
        o.Exchange(
            amount=0.065,
            flow=gran_flow,
            flow_property=gran_flow.flow_properties[0].flow_property.to_ref(),
            internal_id=2,
            is_avoided_product=False,
            is_input=True,
            is_quantitative_reference=False,
            default_provider=gp_up.to_ref(),  # NEW!
            unit=o_units.unit_ref('kg'),
        ),
        o.Exchange(
            amount_formula="0.065*500",
            flow=t_flow,
            flow_property=t_flow.flow_properties[0].flow_property.to_ref(),
            internal_id=3,
            is_avoided_product=False,
            is_input=True,
            is_quantitative_reference=False,
            unit=o_units.unit_ref('kg*km'),
        ),
    ],
)

Note that PET Bottle, filled has an exhange with a provider!

In [None]:
pet_fill_flow = o.Flow(
    name="PET Bottle, filled",
    flow_type = o.FlowType.PRODUCT_FLOW,
    category = "A Water Bottle",
    flow_properties = [
        o.FlowPropertyFactor(
            conversion_factor=1,
            flow_property=f_props[15],
            is_ref_flow_property=True,
        )
    ]
)
dw_flow = o.Flow(
    name="drinking water",
    flow_type = o.FlowType.PRODUCT_FLOW,
    category = "Materials production/Water",
    flow_properties = [
        o.FlowPropertyFactor(
            conversion_factor=1,
            flow_property=f_props[15],
            is_ref_flow_property=True,
        )
    ]
)

bf_up = o.Process(
    name="PET Bottle Filling",
    category="A Water Bottle",
    process_type=o.ProcessType.UNIT_PROCESS,
    is_infrastructure_process=False,
    exchanges=[
        o.Exchange(
            amount=1.065,
            flow=pet_fill_flow,
            flow_property=pet_fill_flow.flow_properties[0].flow_property.to_ref(),
            internal_id=1,
            is_avoided_product=False,
            is_input=False,
            is_quantitative_reference=True,
            unit=o_units.unit_ref('kg'),
        ),
        o.Exchange(
            amount=1,
            flow=gran_t_flow,
            flow_property=gran_t_flow.flow_properties[0].flow_property.to_ref(),
            internal_id=2,
            is_avoided_product=False,
            is_input=True,
            is_quantitative_reference=False,
            default_provider=trans_up.to_ref(),
            unit=o_units.unit_ref('Item(s)'),
        ),
        o.Exchange(
            amount=1.0,
            flow=dw_flow,
            flow_property=dw_flow.flow_properties[0].flow_property.to_ref(),
            internal_id=3,
            is_avoided_product=False,
            is_input=True,
            is_quantitative_reference=False,
            unit=o_units.unit_ref('kg'),
        ),
    ],
)

Now we have a set of unit process, flows, flow properties, and unit groups.
Let's gather together our flows!w

In [None]:
%whos

In [None]:
flow_list = [

]

Let's make process documentation for our processes!

In [None]:
o.ProcessDocumentation?

In [None]:
from datetime import datetime

cur_time = datetime.isoformat(datetime.now())

In [None]:
from datetime import timedelta

In [None]:
next_time = datetime.isoformat(datetime.now() + timedelta(days=365))

In [None]:
next_time

In [None]:
pdoc = o.ProcessDocumentation(
    data_collection_description="Based on the openlca.org GreenDelta Bottle Tutorial",
    data_documentor=a.to_ref(),
    data_generator=a.to_ref(),
    geography_description="United States",
    intended_application="For educational purposes only.",
    is_copyright_protected=False,
    project_description="The LCA for producing a plastic water bottle",
    reviewer=a.to_ref(),
    creation_date=cur_time,
    valid_from=cur_time,
    valid_until=next_time,
)

In [None]:
pdoc.to_dict()

Let's put together our processes and add the documentation.

In [None]:
bf_up.process_documentation = pdoc
gp_up.process_documentation = pdoc
trans_up.process_documentation = pdoc

p_list = [bf_up, gp_up, trans_up]

Now, let's export it to a JSON-LD!

In [None]:
import olca_schema.zipio as zipio

In [None]:
my_jsonld = "a_water_bottle.zip"
with zipio.ZipWriter(my_jsonld) as writer:
    for x in u_groups:
        writer.write(x)
    for x in f_props:
        writer.write(x)
    for x in flow_list:
        writer.write(x)
    for x in p_list:
        writer.write(x)
    writer.write(a)

Try opening our JSON-LD in openLCA!

- create a new empty database
- import "a_water_bottle.zip"
- overwrite all existing data