# Normalization and weighting - discussion with examples

*Note*: This notebook uses Brightway 2, not the development version 2.5.

In [None]:
import bw2data as bd
import bw2calc as bc
import bw2io as bi

## Setup a project with a database

In [2]:
if "biosphere3" not in bd.databases:
    bi.bw2setup()

In [3]:
FP = "/Users/chrismutel/Sync/Documents/LCA/Ecoinvent/3.9.1/cutoff/datasets"

In [4]:
if "ecoinvent-3.9-cutoff" not in bd.databases:
    ei = bi.SingleOutputEcospold2Importer(
        FP, 
        "ecoinvent-3.9-cutoff"
    )
    ei.apply_strategies()
    ei.statistics()    
    ei.write_database()

## Example of an LCA without weighting or normalization

The fundamental equation is:

$$h = CB \cdot diag( A^{-1} f )$$

Where:

* $h$ is the result matrix (biosphere flows by activities)
* $C$ is the characterization matrix (diagonal, biosphere flows by biosphere flows)
* $B$ is the biosphere matrix (biosphere flows by activities)
* $A$ is the technosphere matrix (products by activities)
* $f$ is the demand vector (products)

In [5]:
act = bd.Database("ecoinvent-3.9-cutoff").random()

In [6]:
meth = bd.methods.random()

# Fix for reproducibility
meth = ('ReCiPe 2016 v1.03, endpoint (H) no LT',
 'human health no LT',
 'water use: human health no LT')

In [7]:
lca = bc.LCA(demand={act: 1}, method=meth)
lca.lci()
lca.lcia()



The `characterization_matrix` is diagonal, and has dimensions `(biosphere flows, biosphere flows)`. However, Brightway will filter flows to only include those actually used by the activities at hand, i.e. those for which there are entries in the `biosphere_matrix`.

In [8]:
lca.characterization_matrix.shape, len(bd.Database("biosphere3"))

((2420, 2420), 4709)

The number of elements in the `characterization_matrix` varies by the impact category (`Method` in Brightway vocabulary). This one happens to only have a few:

In [9]:
lca.characterization_matrix

<2420x2420 sparse matrix of type '<class 'numpy.float64'>'
	with 4 stored elements in Compressed Sparse Row format>

Our random impact category is:

In [10]:
meth

('ReCiPe 2016 v1.03, endpoint (H) no LT',
 'human health no LT',
 'water use: human health no LT')

The data format for `Method` is `(key of flow - a tuple of database name and code (str), numeric coefficient or uncertainty distribution)`.

In [11]:
for key, value in bd.Method(meth).load():
    print(bd.get_activity(key), value)

'Water' (cubic meter, None, ('air', 'lower stratosphere + upper troposphere')) 2.22e-06
'Water' (cubic meter, None, ('air', 'non-urban air or from high stacks')) 2.22e-06
'Water' (cubic meter, None, ('air',)) 2.22e-06
'Water' (cubic meter, None, ('air', 'urban air close to ground')) 2.22e-06


And we see these values in the `characterization_matrix`:

In [12]:
for key, value in bd.Method(meth).load():
    index = lca.biosphere_dict[key]
    print(lca.characterization_matrix[index, index], value)

2.2199999420990935e-06 2.22e-06
2.2199999420990935e-06 2.22e-06
2.2199999420990935e-06 2.22e-06
2.2199999420990935e-06 2.22e-06


## Specifying a normalization using `Normalization`

I am going to make up some values for the normalization factors. Let's say that an average person uses 500 litres of water a day, or around 2.000.000 litres a year. Our unit for water is cubic meters, so this is 2.000 cubic meters per person per year. I can apply this as a total normalization factor in the same data format as the characterization factors:

In [13]:
n_total = bd.Normalization(("ReCiPe water use", "total"))
n_total.write([
    (key, 1 / 2000) # Division because we are normalizing to this usage amount
    for key, _ in bd.Method(meth).load()
])

In [14]:
lca_n1 = bc.LCA(demand={act: 1}, method=meth, normalization=("ReCiPe water use", "total"))
lca_n1.lci()
lca_n1.lcia()
lca_n1.normalize()

Our fundamental equation is now:

$$h = NCB \cdot diag( A^{-1} f )$$

Where:

* $N$ is the normalization matrix (diagonal, biosphere flows by biosphere flows)

In [15]:
for key, value in bd.Normalization(("ReCiPe water use", "total")).load():
    index = lca_n1.biosphere_dict[key]
    print(lca_n1.normalization_matrix[index, index], value)

0.0005000000237487257 0.0005
0.0005000000237487257 0.0005
0.0005000000237487257 0.0005
0.0005000000237487257 0.0005


The results matrix $h$ is normally `characterized_inventory`, but after weighting it is `normalized_inventory`:

In [16]:
lca_n1.normalized_inventory

<2420x21238 sparse matrix of type '<class 'numpy.float64'>'
	with 4411 stored elements in Compressed Sparse Row format>

`characterized_inventory` is still available, this is the values before normalization:

In [17]:
lca_n1.characterized_inventory

<2420x21238 sparse matrix of type '<class 'numpy.float64'>'
	with 4411 stored elements in Compressed Sparse Row format>

We can also have *flow-specific* normalization factors:

In [18]:
N_FACTORS = {
    ('air', 'lower stratosphere + upper troposphere'): 0.05 / 2000,
    ('air', 'non-urban air or from high stacks'): 0.2 / 2000,
    ('air',): 0.5 / 2000,
    ('air', 'urban air close to ground'): 0.25 / 2000,
}    

n_specific = bd.Normalization(("ReCiPe water use", "flow-specific"))
n_specific.write([
    (key, N_FACTORS[bd.get_activity(key)['categories']])
    for key, _ in bd.Method(meth).load()
])

In [19]:
lca_n2 = bc.LCA(demand={act: 1}, method=meth, normalization=("ReCiPe water use", "flow-specific"))
lca_n2.lci()
lca_n2.lcia()
lca_n2.normalize()

In [20]:
for key, value in bd.Normalization(("ReCiPe water use", "flow-specific")).load():
    index = lca_n2.biosphere_dict[key]
    flow = bd.get_activity(key)
    print(flow, lca_n2.normalization_matrix[index, index], value)

'Water' (cubic meter, None, ('air', 'lower stratosphere + upper troposphere')) 2.499999936844688e-05 2.5e-05
'Water' (cubic meter, None, ('air', 'non-urban air or from high stacks')) 9.999999747378752e-05 0.0001
'Water' (cubic meter, None, ('air',)) 0.0002500000118743628 0.00025
'Water' (cubic meter, None, ('air', 'urban air close to ground')) 0.0001250000059371814 0.000125


## Pre-calculating normalized characterization factors

Instead of specifying normalization separately, I think it can be easier to pre-multiply the characterization factors by the normalization factors. In cases where both characterization and normalization are static (i.e. don't have uncertainty), this doesn't really lose any information. Power users who have uncertainty distributions or other estimates of uncertainty, and want to e.g. perform comparative sensitivity analysis on weighting, normalization, and characterization, should stick with the more complete but more complicated formulation.

In [21]:
method_normalized_name = meth + ("normalized",)
method_normalized_name

('ReCiPe 2016 v1.03, endpoint (H) no LT',
 'human health no LT',
 'water use: human health no LT',
 'normalized')

In [22]:
N_FACTORS = {
    ('air', 'lower stratosphere + upper troposphere'): 0.05 / 2000,
    ('air', 'non-urban air or from high stacks'): 0.2 / 2000,
    ('air',): 0.5 / 2000,
    ('air', 'urban air close to ground'): 0.25 / 2000,
}    

method_normalized = bd.Method(method_normalized_name)
method_normalized.register()
method_normalized.write([
    (
        key, 
        cf * N_FACTORS[bd.get_activity(key)['categories']]
    )
    for key, cf in bd.Method(meth).load()
])

We don't need to give a `normalization` or call `normalize`:

In [23]:
lca_n3 = bc.LCA(demand={act: 1}, method=method_normalized_name)
lca_n3.lci()
lca_n3.lcia()

The result is the same, minus floating point error

In [29]:
lca_n3.score, lca_n2.normalized_inventory.sum()

(3.921549984454231e-15, 3.921550135307141e-15)

This approach makes data management simpler with few to any practical tradeoffs.

# Weighting

Our fundamental equation is now:

$$h = w_{x}NCB \cdot diag( A^{-1} f )$$

Where:

* $w_{x}$ is the weighting coefficient for the impact category given in $C$

*Note:* In Brightway, weighting can be done [with *or without* normalization](https://github.com/brightway-lca/brightway2-calc/blob/bw2legacy/bw2calc/lca.py#L408).

The data format in Brightway 2 is a **one-element list**. In the development version, this will change, but the exact format is not yet decided.

Brightway **does not create a weighting matrix** in version 2; it does in the development version, to simplify everything to a single code path and expose the power available in datapackages.

In [34]:
weight = bd.Weighting(("ReCiPe", "water use bad"))
weight.register()
weight.write([1_000_000])

In [39]:
lca_n4 = bc.LCA(demand={act: 1}, method=method_normalized_name, weighting=("ReCiPe", "water use bad"))
lca_n4.lci()
lca_n4.lcia()



In [40]:
lca_n4.characterized_inventory.sum()

3.921549984454231e-15

In [42]:
lca_n4.weight()
lca_n4.weighting_calculation()  # This is a bug, should be done by `weight`

New score is one million times the previous value, as this was our weight

In [43]:
lca_n4.weighted_inventory.sum()

3.921549984454204e-09

## Pre-calculating weighted and normalized factors

We can do the same thing here, for the same reasons:

In [44]:
method_nw_name = meth + ("normalized & weighted",)
method_nw_name

('ReCiPe 2016 v1.03, endpoint (H) no LT',
 'human health no LT',
 'water use: human health no LT',
 'normalized & weighted')

In [45]:
N_FACTORS = {
    ('air', 'lower stratosphere + upper troposphere'): 0.05 / 2000,
    ('air', 'non-urban air or from high stacks'): 0.2 / 2000,
    ('air',): 0.5 / 2000,
    ('air', 'urban air close to ground'): 0.25 / 2000,
}    
WEIGHTING_FACTOR = 1_000_000

method_nw = bd.Method(method_nw_name)
method_nw.register()
method_nw.write([
    (
        key, 
        cf * N_FACTORS[bd.get_activity(key)['categories']] * WEIGHTING_FACTOR
    )
    for key, cf in bd.Method(meth).load()
])

In [46]:
lca_n5 = bc.LCA(demand={act: 1}, method=method_nw_name)
lca_n5.lci()
lca_n5.lcia()



In [48]:
lca_n4.weighted_inventory.sum(), lca_n5.score

(3.921549984454204e-09, 3.921549889138143e-09)