# Characterized cumulative elementary flows

This notebook does metaanalysis of the respective importance cumulative, characterized elementary flows.

For each elementary (biosphere) flow, each activity in ecoinvent-3.9.1-cutoff, and and each impact category implemented in 3.9.1, we calculate:

* The average normalized score across all activities per flow and impact category
* The median normalized score across all activities per flow and impact category
* The average and median ranks across all activities per flow and impact category

## Set up environment and project with ecoinvent database

In [1]:
import bw2data as bd
import bw2calc as bc
import bw_processing as bwp
import numpy as np
import itertools
import tqdm
import pandas as pd
from pathlib import Path

In [2]:
bd.projects.set_current("ecoinvent-3.9-cutoff")

In [3]:
ROOT_DIR = Path.cwd().parent.resolve()
CACHE_DIR = ROOT_DIR / "Cache"
OUTPUT_DIR = ROOT_DIR / "Outputs"

In [4]:
methods = sorted(bd.methods)
len(methods)

762

In [5]:
ei = bd.Database("ecoinvent-3.9-cutoff")
len(ei)

21255

## Filter activities to only consider markets

This isn't perfect, but a reasonable attempt to only get "real" activities which produce useful products or services.

In [6]:
markets = sorted(
    [x for x in ei if x['name'].startswith("market for")],
    key=lambda x: (x['name'], x['reference product'], x['location'], x['unit'], x['code'])
)
len(markets)

6403

## Set up calculations

One `LCA` object for all calculations saves creating and factoring the technosphere each time.

In [7]:
lca = bc.LCA({markets[0]: 1}, method=methods[0])
lca.lci()
lca.lcia()



## Get all characterization matrices

Instead of building them each time for each activity, only build them once.

In [8]:
method_matrices = {}

In [9]:
for method_tuple in tqdm.tqdm(methods):
    lca.switch_method(method_tuple)
    method_matrices[method_tuple] = lca.characterization_matrix.copy()

100%|████████████████████████████████████████| 762/762 [00:02<00:00, 311.47it/s]


In [10]:
def calculate_inventory_results() -> np.ndarray:
    """Calculate inventory results, or load from cache"""
    cache_filepath = CACHE_DIR / "inventory-all-activities-391.npy"
    
    if cache_filepath.is_file():
        return np.load(cache_filepath)
    else:
        results = np.zeros((lca.biosphere_matrix.shape[0], len(markets)), dtype=np.float32)
        results.shape  

        for market_index, market in enumerate(tqdm.tqdm(markets)):
            lca.lci({market.id: 1})
            results[:, market_index] = lca.inventory.sum(axis=1).ravel()

        np.save(cache_filepath, results)    
        return results

In [11]:
results = calculate_inventory_results()
results.shape

(2420, 6403)

## Construct containers for results

In [12]:
results_mean = np.zeros((lca.biosphere_matrix.shape[0], len(methods)))
results_median = np.zeros((lca.biosphere_matrix.shape[0], len(methods)))
# Not helpful, or maybe calculation bug?
# results_mean_rank = np.zeros((lca.biosphere_matrix.shape[0], len(methods)))
results_median_rank = np.zeros((lca.biosphere_matrix.shape[0], len(methods)))

## Caculate indices for elementary flows

Note that we do the following:

* Take the absolute value of all characterized inventory results - we want importance, not the numeric sign
* Normalize by the total per activity and impact category
* Fix division by zero errors by setting `NaN` to zero.

In [14]:
for method_index, method_tuple in enumerate(tqdm.tqdm(methods)):
    matrix = np.abs(method_matrices[method_tuple] @ results)
    normalized = (matrix / matrix.sum(axis=0))
    normalized[np.isnan(normalized)] = 0
    results_mean[:, method_index] = np.mean(normalized, axis=1)
    results_median[:, method_index] = np.median(normalized, axis=1)    
    
    ranks = np.abs(normalized.argsort(axis=0).argsort(axis=0) - normalized.shape[0])
    row_zero = normalized.sum(axis=1) == 0
    ranks[row_zero, :] = 0
    # results_mean_rank[:, method_index] = np.mean(ranks, axis=1)
    results_median_rank[:, method_index] = np.median(ranks, axis=1)        

100%|█████████████████████████████████████████| 762/762 [08:26<00:00,  1.50it/s]


## Export results

In [15]:
def get_labelled_results(array):
    formatted_results = []

    for index in tqdm.tqdm(range(lca.biosphere_matrix.shape[0])):
        node = bd.get_node(id=lca.dicts.biosphere.reversed[index])
        row = {
            'name': node['name'],
            'unit': node['unit'],
            'category_primary': node['categories'][0],
            'category_secondary': node['categories'][1] if len(node['categories']) > 1 else None,
            'id': node['code'],
        } | {
            "|".join(method_tuple): float(array[index, method_index]) or None
            for method_index, method_tuple in enumerate(methods)
        }
        formatted_results.append(row)

    formatted_results.sort(key=lambda x: (x['name'], x['category_primary'], x['category_secondary'] or '', x['unit']))
        
    return formatted_results

In [16]:
pd.DataFrame(get_labelled_results(results_mean)).to_csv(OUTPUT_DIR / "3.9.1-elementary-mean.csv", index=False)

100%|█████████████████████████████████████| 2420/2420 [00:00<00:00, 2705.37it/s]


In [17]:
pd.DataFrame(get_labelled_results(results_median)).to_csv(OUTPUT_DIR / "3.9.1-elementary-median.csv", index=False)

100%|█████████████████████████████████████| 2420/2420 [00:00<00:00, 2918.25it/s]


In [18]:
pd.DataFrame(get_labelled_results(results_median_rank)).to_csv(OUTPUT_DIR / "3.9.1-elementary-median-rank.csv", index=False)

100%|█████████████████████████████████████| 2420/2420 [00:00<00:00, 2934.77it/s]
