# Interpretation

The notebook shows some of the functionality of the [bw2analyzer](https://bitbucket.org/cmutel/brightway2-analyzer) library.

In [None]:
import brightway2 as bw
import bw2analyzer as bwa
import pyprind
import seaborn as sns
import numpy as np

In [None]:
!conda install -y seaborn

In [None]:
bw.projects.set_current("bw2_seminar_2017")

Let's get inventory and a characterized inventory matrices, and make sure we have an LCIA score to interpret!

In [None]:
lca = bw.LCA({bw.Database('ecoinvent 2.2').random(): 1}, bw.methods.random())
lca.lci()
lca.lcia()
lca.score

# Contribution analysis

Find the most damaging activities and biosphere flows.

`annotated_top_processes` will return a list of tuples: ``(lca score, supply amount, activity name)``.

In [None]:
ca = bwa.ContributionAnalysis()
ca.annotated_top_processes(lca, limit=5)

In [None]:
lca.demand

In [None]:
lca.method

We can also set a limit below one, which will filter by fraction of impact instead of number of activities; we can also return activity *keys* instead of names.

In [None]:
ca = bwa.ContributionAnalysis()
ca.annotated_top_processes(lca, names=False, limit=0.02, limit_type='percent')

And of course we can do the same for elementary flows:

In [None]:
ca = bwa.ContributionAnalysis()
ca.annotated_top_emissions(lca, limit=0.02, limit_type='percent')

In addition to contribution analysis, we can also look at individual activities (columns) or flows (rows); you (should have) learned to do this already :)

What is we want to group names together, i.e. to get the total impact for all "phosphates"? There isn't a built-in function for this, but it is relatively easy to do.

In [None]:
from collections import defaultdict

all_unique_names_and_their_keys = defaultdict(list)

for flow in bw.Database("biosphere3"):
    if flow.key in lca.biosphere_dict:
        all_unique_names_and_their_keys[flow['name']].append(flow.key)
    
all_unique_names_and_their_keys

In [None]:
all_unique_names_and_their_rows = {
    name: [lca.biosphere_dict[key] for key in keys] 
    for name, keys in all_unique_names_and_their_keys.items()
}

all_unique_names_and_their_rows

In [None]:
all_unique_names_and_their_scores = {
    name: [lca.characterized_inventory[row, :].sum() for row in rows]
    for name, rows in all_unique_names_and_their_rows.items()
}

all_unique_names_and_their_scores

Now I am going to switch the order, from `name, rows` to `total scores, name`, because I want to sort by the impact. I will also ignore zeros, as they are uninteresting in this context.

By default, sorting will be smallest number first, so we say `reverse=True` to get highest number first.

In [None]:
sorted_scores = sorted(
    [(sum(scores), name) for name, scores in all_unique_names_and_their_scores.items()], 
    reverse=True
)

sorted_scores[:10]

We can also encapsulate all this functionality in a single function (which should be added to `bw2analyzer`!)

In [None]:
from collections import defaultdict

def top_emissions_by_name(lca, biosphere_database='biosphere3'):
    names = defaultdict(list)

    for flow in bw.Database("biosphere3"):
        if flow.key in lca.biosphere_dict:
            names[flow['name']].append(
                lca.characterized_inventory[lca.biosphere_dict[flow.key], :].sum()
            )
    
    return sorted(
        [(sum(scores), name) for name, scores in names.items()], 
        reverse=True
    )

In [None]:
top_emissions_by_name(lca)[:5]

# Graph the technosphere matrix

In [None]:
%matplotlib inline

In [None]:
from bw2analyzer.matrix_grapher import SparseMatrixGrapher

In [None]:
SparseMatrixGrapher(lca.biosphere_matrix).graph()

In [None]:
 lca.technosphere_matrix

Try to do this as well for the biosphere matrix and characterization matrix.

We can also change the way the rows are ordered to be prettier:

In [None]:
SparseMatrixGrapher(lca.technosphere_matrix).ordered_graph()

# Econometric indicators

`bw2analyzer` provides the [Gini coefficient](https://en.wikipedia.org/wiki/Gini_coefficient), [Herfindahl index](https://en.wikipedia.org/wiki/Herfindahl_index), [Theil index](https://en.wikipedia.org/wiki/Herfindahl_index), and [Concentration ratio](https://en.wikipedia.org/wiki/Concentration_ratio).

These indices can be applied to any matrix, or subsection of a matrix. Let's consider the Gini coefficent for all elements in the characterized inventory matrix. We will filter to only consider non-zero values.

In [None]:
all_values_in_characterized_inventory = lca.characterized_inventory.data
all_values_in_characterized_inventory.shape

In [None]:
non_zero_values = all_values_in_characterized_inventory != 0
non_zero_values.sum()

In [None]:
bwa.econ.gini_coefficient(all_values_in_characterized_inventory[non_zero_values])

This is quite typical in LCA, as most activities and flows do not contribute to a given impact category, either because they are not characterized, or because the activities have no biosphere flows at all.

Let's also calculate our other econometric indices:

In [None]:
data = all_values_in_characterized_inventory[non_zero_values]

bwa.econ.herfindahl_index(data), bwa.econ.theil_index(data), bwa.econ.concentration_ratio(data)

Not much we can learn from that... what about if we examine everything in ecoinvent?

In [None]:
index_scores = {
    'gini': [],
    'concentration': [],
    'theil': [],
    'herfindahl': []
}

lca.switch_method(bw.methods.random())

for act in pyprind.prog_bar(bw.Database("ecoinvent 2.2")):
    lca.redo_lcia({act: 1})
    data = lca.characterized_inventory.data[lca.characterized_inventory.data != 0]
    if not data.shape[0]:
        continue
    index_scores['gini'].append(bwa.econ.gini_coefficient(data))
    index_scores['concentration'].append(bwa.econ.concentration_ratio(data))
    index_scores['theil'].append(bwa.econ.theil_index(data))
    index_scores['herfindahl'].append(bwa.econ.herfindahl_index(data))    

In [None]:
sns.distplot(index_scores['gini'])

In [None]:
sns.distplot(index_scores['concentration'])

In [None]:
sns.distplot(index_scores['theil'])

In [None]:
scores = np.array(index_scores['herfindahl'])
filtered_herfindahl = scores[scores > 0]

sns.distplot(filtered_herfindahl)