# Brightway program structure and technical basics

Note: This notebooke assumes you have followed the [Seminar installation instructions](https://github.com/PoutineAndRosti/Brightway-Seminar-2017).

## Learning objectives

* Define the data and metadata files which make a project

### Setting up

#### Accessing Brightway2 libraries

The different modules in Brightway2 are Python libraries. This means that, to use them, you can use any environment from which you normally use Python (Idle, command prompt, Spyder or, as is the case today, Jupyter Notebooks).  

We will favour Jupyter Notebooks in this seminar because it allows us to integrate code and text. It will also allow us to provide code snippets for you to complete.  

Note that the [Brightway2 installation package](https://docs.brightwaylca.org/installation.html) installs Brightway2 in a separate [Conda environment](https://conda.io/docs/using/using.html). This isolates Brightway2 from your other Python installations. It however requires you to activate the bw2 environment. You can do this the same way you [normally activate Conda environments](https://conda.io/docs/using/envs.html#change-environments-activate-deactivate), or by executing the bw2-env.bat batch file installed in your `bw2-python` directory (located at `C:\bw2-python`in Windows).  

The `bw2-python` directory also offers two other ways to run Brightway2: via IPython (run the `bw2-ipython.bat` file) or via Jupyter Notebooks (`bw2-notebook.bat`).  

For this course, you should run `bw2-notebook.bat` and open the Notebooks (such as this one), allowing you to directly run the code and get some hands-on experience. 

Like all other Python packages, you need to `import` Brightway2 modules. We will here import everything from Brightway2:

In [1]:
import brightway2 as bw



### Projects, set-up

The top-level containent in Brightway2 is the project (see [here](https://docs.brightwaylca.org/intro.html#projects) for a description and [here](https://docs.brightwaylca.org/technical/bw2data.html#projects) for the docs). A project contains LCI databases, LCIA methods and other less often used objects. Objects from one project do not interract with objects within other projects. By analogy, projects are like databases in openLCA and SimePro.  
When you first launch Brightway2, you will be in the `default` project. You can check this using the `current` property of the `projects` object: 

In [2]:
bw.projects.current

'default'

Let's create a new project for this seminar, unsurprisingly called "bw2_seminar_2017". There are two ways of doing this:  
* `projects.create_project('bw2_seminar_2017')` will create the project, but you will remain in your current project.
* `projects.set_current('bw2_seminar_2017')` will switch you to the project passed as argument, and create it first if it doesn't exist.  Let's do the latter:

In [3]:
# The name of the project is entered as string; 
# it doesn't really have any restrictions, so can include spaces, 
# special characters, other languages, or even emoji.

bw.projects.set_current('bw2_seminar_2017') 

You can always see what projects you have on your computer by running `list(bw.projects)`. Unless you have worked with Brightway2 before on your computer, your list should contain two projects: 'default' and 'bw2_seminar_2017'.

In [None]:
# Exercise: list the projects on your computer.


Like in all Python modules, you can get additionnal information on the `projects` object and associated properties and methods by typing `help(projects)`. The [docs](https://docs.brightwaylca.org/technical/bw2data.html#projects) are of course more verbose.

One property of `projects` is its location, given by `projects.dir`:

In [None]:
bw.projects.dir

Looking at what is inside:  
<img src="project_folder_before_setup.JPG">

All the directories are empty except the `lci` directory, which contains an empty database:

All in all, the project takes up 4KB.  
Its now time to start populating the project.

The first thing you should do is add flow and LCIA methods. This is done by running the `bw2setup` function:

In [None]:
bw2setup()

The output tells us that bw2_setup created some very useful things:  
  - Created a database called "biosphere3": this database contains elementary flows (called biosphere exchanges in Brightway2)  
  - 718 impact assessment methods  
  
It also created some a `mapping` between the imported exchanges and some integer: more on this later.  
The whole directory now takes up 125MB.

Looking at the contents of the `databases.db` database, we see we have imported 4029 exchanges:  
<img src="database_after_setup_data.JPG">

While not impossible to interact with the data at this level, you probably never will unless you are developping some funky program. Instead, it is strongly recommended to learn to work with `abstractions`. Let's explore some now:

#### Biosphere database  
The data in Brightway is stored in databases. When you run `bw2_setup()`, the first database that is created is the 'biosphere' database, as mentionned above.  
You can always list the databases inside a project by simply typing 'databases'. This accesses the 'database.json' file in your 'project.dir' (I learned the latter by typing `databases?`, you should try it too.)

In [None]:
databases

To actually access a database, you need to use the `Database' method` (again, you can type `Database?` for more information - this is the last I'll mention this.

In [None]:
Database('biosphere3')

It doesn't actually return anything other than information about the Backend.  
However, there are many properties and functions associated with this database object.  These are found [here](https://docs.brightwaylca.org/technical/bw2data.html). We can also have a look through the autocomplete. Let's assign the database to a variable:

In [None]:
my_bio = Database('biosphere3')

Let's check the my_bio `type`:

In [None]:
type(my_bio)

Let's check its length:

In [None]:
len(my_bio)

This is exactly the number of items we saw had been added to databases.db

If you type `my_bio.` and click on tab, you should get a list of properties and methods associated with database objects. Try this now:

In [None]:
my_bio.        #Type my_bio. and click tab. Have a look at the different properties and objects

Some of the more basic ones we will be using now are :  
  - random() - returns a random activity in the database
  - get(*valid_exchange_tuple*) - returns an activity, but you must know the activity key
  - load() - loads the whole database as a dictionary.
  - make_searchable - allows searching of the database (by default, it is already searchable)
  - search - search the database  
  
Lets start with random:

In [None]:
my_bio.random()

This returns a biosphere activity, but without assigning it to a variable, there is not much we can do with it directly.  
Let's assign it to a variable:

In [None]:
random_biosphere = my_bio.random()
random_biosphere

In [None]:
type(random_biosphere)

To see what it contains, we can load convert it to a dictionary:

In [None]:
random_biosphere.as_dict()

We can see that the activities in the biosphere3 database have unique codes, which we can use with the `get` function:

In [None]:
my_bio.get(random_biosphere['code'])

Activities can also be "gotten" via `get_activity`, but the argument is the a tuple with the database name and the activity code:

In [None]:
get_activity(('biosphere3', 'ffcd4d88-aeb9-491c-ae8c-98838ed38b4d'))

In [None]:
random_biosphere.key

Let's say we are looking for a specific elementary flow, we can use search:

In [None]:
Database('biosphere3').search('carbon dioxide')

The database object is also iterable, allowing "home-made" searches through list comprehensions.

In [None]:
[act for act in my_bio if 'Carbon dioxide' in act['name'] 
                                            and 'fossil' in act['name']
                                            and 'urban air' in str(act['categories'])
         ]

In [None]:
act_I_want = [act for act in my_bio if 'Carbon dioxide' in act['name'] 
                                            and 'fossil' in act['name']
                                            and 'urban air' in str(act['categories'])
         ][0]

In [None]:
act_I_want.as_dict()['code']

In [None]:
# Exercise: look for and assign to a variable an emission of nitrous oxide emitted to air in the "urban air" subcompartment.

Let's leave the biosphere database here for now.

#### Methods

bw2_setup() also installed LCIA methods.

In [None]:
methods

One can load a random method:

In [None]:
methods.random()

In [None]:
type(methods.random())

Here, the random method returns the tuple by which the method is identified. To get to an actual method, the following syntax is used:

In [None]:
Method(methods.random())

Of course, a random method is probably not useful except to play around. To find an actual method, one can again use list comprehensions. Let's say I am interested in using the IPCC2013 100 years method:

In [None]:
[m for m in methods if "IPCC" in str(m) and ("2013") in str(m) and "100" in str(m)]

I am interested in the last of these, and will assign it to a variable

In [None]:
ipcc2013_name = [m for m in methods if "IPCC" in str(m) and ("2013") in str(m) and "100" in str(m)][2]
ipcc2013_name

In [None]:
type(ipcc2013_name)

In [None]:
ipcc_2013_method = Method(ipcc2013_name)

In [None]:
type(ipcc_2013_method)

Again, there are a bunch of methods associated with a method object. You can access these by typing ipcc_2013_method. and clicking tab.  
For example, metadata:

In [None]:
ipcc_2013_method.name

In [None]:
ipcc_2013_method.metadata

In [None]:
ipcc_2013_method.metadata['unit']

Let's use the `load` method to see what is in the object:

In [None]:
ipcc_2013_method.load()

This contains tupples with (elementary flow, characterization factors). I cam make this more human readable by doing something like this:

In [None]:
[(get_activity(ef[0])['name'], ef[1]) for ef in ipcc_2013_method.load()]

Enough said for now about methods.

### LCI datases

There is much information on the structure of LCI databases in Brightway2 [here](https://docs.brightwaylca.org/intro.html#inventory-databases), [here](http://nbviewer.jupyter.org/urls/bitbucket.org/cmutel/brightway2/raw/default/notebooks/Databases.ipynb) and [here](https://docs.brightwaylca.org/technical/bw2data.html#databases).  Probably the easiest way to learn about them, however, is to import one and have a look.  

Here is the code to import the ecoinvent v3.3 database. Don't do it though, not now: it takes too long, and we will be unable to do anything else:

In [None]:
fpei33 = r'C:\Users\pasca\Dropbox (MAGI)\temp\ecoinvent33_cutoff\datasets'

In [None]:
ei33 = SingleOutputEcospold2Importer(fpei33, 'ecoinvent 3.3 cutoff')
ei33.apply_strategies()
ei33.statistics()

In [None]:
ei33.write_database()

Let's instead import ecoinvent v2.2:

In [None]:
fpei22 = r'E://datasets'

In [None]:
ei22 = SingleOutputEcospold1Importer(
        fpei22,
        'ecoinvent 2.2'
    )
ei22.apply_strategies()
ei22.statistics()
ei22.write_database()

Other code to import LCI databases in other formats are found [here](https://bitbucket.org/cmutel/brightway2-io/src/211f748e7b9987aef452a1ead1f483cc0b4bc25c/bw2io/importers/?at=default).

If you check that the database has actually been added to your project: 

In [None]:
databases

Note: the ei22 (or ei33) object created above is not the actual database, but actually an object used strictly for importing. 

In [None]:
type(ei22)

To access the actual database, you need to use the Database method: 

In [None]:
#Uncomment the one you actually imported. 

Database('ecoinvent 2.2')
# Database('ecoinvent 3.3 cutoff')

This is a more advanced topic, but note that there are alternative backends. See [here](https://docs.brightwaylca.org/technical/bw2data.html#inventory-data-backends).

Let's assign the database to a variable and see what we can do:

In [None]:
#Uncomment the one you actually imported. 

eidb = Database('ecoinvent 2.2')
# eidb = Database('ecoinvent 3.3 cutoff')

In [None]:
# Check the length of the database:
len(eidb)

Again, we can get an idea of useful methods and attributes by typing eidb. and Tab. Do this now.

In [None]:
eidb. #Press tab!

One can again load the entire database to iterate over activities or exchanges within the database. This is quite a big object, but your computers can take it.

In [None]:
eidb_loaded = eidb.load()

However, you often will not need to do that at all. The most common interaction with the database object is to access activities, add activities, save, etc. although this really depends on what you are doing with Brightway2...

#### Activities and exchanges

In the context of LCI databases, activities are the nodes "within the technosphere". They are therefore the columns in the technosphere matrix $A$.  
There are different ways to get access to an activity. Let's use the `random()` method for now to explore a rendom activity in the ecoinvent database.

In [None]:
random_act = eidb.random()

In [None]:
random_act

In [None]:
type(random_act)

To see what is stored in an activity object, let's convert our random act in a dictionary: 

In [None]:
random_act.as_dict()

Notice one important thing: no exchanges!  

Exchanges are the edges between nodes. These can be the edges between two activities within the technosphere (an element $a_{ij}$ of matrix $A$) or an edge between an activity in the technosphere and an activity in the "biosphere" (an element $b_{kj}$ of the biosphere matrix $B$).

One can however iterate through the exchanges. At this point, it is actually the best way to get to an exchange:

In [None]:
# All exchanges:
[exc for exc in random_act.exchanges()]

One could also decide to only iterate through the biosphere exchanges, biosphere exchanges or production exchanges using, respectively, `random_act.technosphere`, `random_act.biosphere` and `random_act.production`.  

In [None]:
# Biosphere exchanges (i.e. elementary flows)

# Production exchanges

# Technosphere exchanges

Let's look at one of these exchanges by assigning one to a variable and exploring it:

In [None]:
random_exchange = [exc for exc in random_act.exchanges()][2]

In [None]:
type(random_exchange)

Again, you can have an idea of what is readily accessible in terms of methods and attributes by typing `random_exchange.` + Tab.

In [None]:
random_exchange.as_dict()# random_exchange.

Let's see what makes up an exchange by converting our random exchange to a dictionary.

In [None]:
random_exchange.as_dict()

Of prime interest to identify the exchange:  
  - The `input` is the activity the exchange is originating from
  - The `output` is the activity the exchange is terminating in  
    - If `input` == `output`, the exchange is actually the reference flow (production exchange)  
    - If the exchange is a "biosphere exchange" (i.e. an elementary flow), then its `output` will be in the biosphere database.

#### Our first LCA!

Brightway has a so-called LCA object. It is instantiated using `LCA(args)`. The only required argument is a functional unit, described by a dictionary with keys = activities and values = amounts (more [here](https://docs.brightwaylca.org/lca.html#specifying-a-functional-unit)). A second argument that is often passed is an LCIA method, passed using the method tuple.  
Let's create our first LCA object using our random activity and our IPCC method.  

In [None]:
myFirstLCA = LCA({random_act:1}, ('IPCC 2013', 'climate change', 'GWP 100a'))

We can now explore the methods and properties of the LCA object:

In [None]:
myFirstLCA. #Press Tab

Let's explore a few:

#### Demand

In [None]:
myFirstLCA.demand

To access the actual activity from the demand, you would do this:

In [None]:
list(myFirstLCA.demand.keys())[0]

In [None]:
demanded_act = list(myFirstLCA.demand.keys())[0]

In [None]:
demanded_act == random_act

There are also other attributes that have simply not been built yet: 

In [None]:
myFirstLCA.demand_array

In [None]:
myFirstLCA.score

This is because the actual matrices have not yet been built. Running myFirstLCA.lci() will:
 - attribute row and column numbers to all elements in our $A$ and $B$ matrices and store these in a paramerer array (NumPy structured array) - processed data.
 - Build coordinate (coo) matrices based on this information - actual matrices.  
 
The turning of the processed data (structured arrays) into matrices is described [here](https://docs.brightwaylca.org/lca.html#building-matrices).  

Let's run the lci() method now:

In [None]:
myFirstLCA.lci()

Now we have access to many other attributes and methods.

**Demand array**, the $f$ in $As=f$

The demand array is a numpy array, where all elements are = 0 except for the ones specified in the functional unit.

In [None]:
myFirstLCA.demand_array

In [None]:
type(myFirstLCA.demand_array)

In [None]:
myFirstLCA.demand_array.shape

In [None]:
myFirstLCA.demand_array.sum()

So where is this "1"? This is where we need to start talking about indices. The row and column indices are stored in LCA-specific dictionaries. For example, we have product_dict that links the types of products used in our LCA (rows in the $A$ matrix) with the row numbers in the actual $A$ matrix that Brightway built.

In [None]:
myFirstLCA.product_dict

There are three such dictionaries: 
 - `activity_dict`: Columns in the technosphere matrix $A$ or biosphere matrix $B$
 - `product_dict` : Rows in the technosphere matrix $A$  
 - `biosphere_dict`: Rows in the biosphere matrix $B$

In passing, note that our (square) $A$ matrix has the same row and column dimensions, in other words:

In [None]:
myFirstLCA.activity_dict == myFirstLCA.product_dict

So, to our question (where is this "1"), we need to use the `activity_dict` to find out. The key we need is the (database, code) tuple of our demand:

In [None]:
demand_database = list(myFirstLCA.demand.keys())[0]['database']
demand_code = list(myFirstLCA.demand.keys())[0]['code']
(demand_database, demand_code)

In [None]:
row_of_demand = myFirstLCA.activity_dict[(demand_database, demand_code)]
row_of_demand # Row number of our demand vector containing the functional unit.

In [None]:
myFirstLCA.demand_array[row_of_demand]

The .lci() also created other very important arrays:

** $A$ matrix**

In [None]:
myFirstLCA.technosphere_matrix

In [None]:
print(myFirstLCA.technosphere_matrix)

**$B$ matrix**

In [None]:
myFirstLCA.biosphere_matrix

In [None]:
print(myFirstLCA.biosphere_matrix)

**Supply matrix**: Vector containing the amount each activity will need to provide to meet the functional demand, i.e. $s=A^{-1}f$.

In [None]:
myFirstLCA.supply_array

**Inventory matrix**: Contains the inventory *by activity* (i.e. not summed). Inother words, we do not have $g=BA^{-1}f$, but rather $G=B diag(A^{-1}f)$

In [None]:
myFirstLCA.inventory

In [None]:
print(myFirstLCA.inventory)

tech_params, bioparams
The matrices above only get populated with numbers. However, behind these are the structured arrays mentioned above. These are also accessible:

In [None]:
import pandas as pd

In [None]:
pd.DataFrame(myFirstLCA.tech_params).head(2)

In [None]:
myFirstLCA.tech_params

In [None]:
myFirstLCA.bio_params

We can manually aggregate the LCI if we want:

In [None]:
myFirstLCA.inventory.shape

In [None]:
LCI_summed = myFirstLCA.inventory.sum(axis=1)

Again, to identify what number corresponds to what, you need to use the biosphere_dict or its reverse (see [here](http://stackoverflow.com/questions/39494583/connecting-exchange-names-and-codes-to-lca-inventory-results/39518156#39518156) and upvote if useful ;))

Next step: **LCIA**

In [None]:
myFirstLCA.lcia()

A number of other matrices are now available:

In [None]:
myFirstLCA.characterization_matrix

In [None]:
myFirstLCA.characterization_matrix.shape

In [None]:
myFirstLCA.characterized_inventory

The overall score is now an attribute of the LCA object: 

In [None]:
myFirstLCA.score

In [None]:
myFirstLCA.switch_method()