# Data input/output

Chris Mutel ([PSI](https://www.psi.ch/)), Pascal Lesage ([CIRAIG](http://www.ciraig.org/en/))

## Learning objectives  
  - Learn how to input LCI data to Brightway in different ways:  
    - Programmatically, via dictionaries created directly in Python  
    - Excel/cvs importers  
    - SimaPro csv
      - Two unit-process example
      - Agribalyse
    - Importing ecoSpold, ecoSpold2

## Standard inputs and setup

In [None]:
import brightway2 as bw
import os
from pathlib import Path

In [None]:
import bw2io as bi
bi.restore_project_directory("/srv/data/projects/ecoinvent38.tar.gz")

In [None]:
bw.projects

Setting the project

This is how we could have done it with the actual ecoinvent files:

```python
ei = "/home/user/Documents/dbs/ecoinvent/3.8/cutoff/datasets"
ei = bw.SingleOutputEcospold2Importer(ei, "ecoinvent 3.8 cutoff")
ei.apply_strategies()
ei.statistics()
ei.write_database()
```

Assigning our database to a variable

In [None]:
eidb = bw.Database('ecoinvent 3.8 cutoff')

In [None]:
len(eidb)

## Context

Performing LCA generally requires:
  - Background LCI data (e.g. an LCI database)  
  - Foreground LCI data  
  - Sets of characterization factors.    
This section will deal with the way data is input to Brightway

Useful documentation about what a database in Brightway is can be found [here](https://github.com/brightway-lca/brightway2/blob/master/notebooks/Databases.ipynb)
 and [here](https://2.docs.brightway.dev/intro.html#inventory-databases).

# Creating a database programmatically

One can create a database via a database. This database would include activities that would contain information about the activities themselves as well as information about the exchanges that are output to this activity (although these get seperated out when the data is written to the actual `database.db`). Let's look at the components of a database for a random activity:

In [None]:
random_act = eidb.random()

In [None]:
# Information about activities themselves:
random_act.as_dict()

In [None]:
# Information associated with the exchanges of this activity:
[exc for exc in random_act.exchanges()][1].as_dict()

Let's create one for our simple LCA:

In [None]:
database_as_dictionary = bw.Database("Database as dict")

water_bottle_data = {
    ("Database as dict", "Some code for the bottle production"): {
        "name": "Water bottle production",
        'unit': 'unit',
        'location': 'CH',
        'categories': ("Some made up", "category here"),
        "exchanges": [{
            "amount": 0.33,
            "input": ('ecoinvent 3.8 cutoff', ''), #Aluminium
            "type": "technosphere",
            "uncertainty type":0,
            "unit=": "kg"},
                      {
            "amount": 0.33,
            "input": ('ecoinvent 3.8 cutoff', ''), #Deformation stroke
            "type": "technosphere",
            "uncertainty type":0,
            "unit=": "kg"}
        ],
        },
    ("Database as dict", "Some code for drinking a bottle full of water"): {
        "name": "Water drinking",
        'unit': 'liter',
        'location': 'CH',
        'categories': ("Another made up", "category here"),
        "exchanges": [{
            "amount": 1,
            "input": ("Database as dict", "Some code for the bottle production"), #Our water bottle
            "type": "technosphere",
            "uncertainty type":5,
            "loc":0.005,
            "minimum":0.0005,
            "maximum":0.05,
            "unit": "kg"},
                      {
            "amount": 1,
            "input": ('ecoinvent 3.8 cutoff', ''), #Water
            "type": "technosphere",
            "uncertainty type":0,
            "unit=": "kg"}],
        }
}

In [None]:
database_as_dictionary.write(water_bottle_data)

In [None]:
len(database_as_dictionary)

In [None]:
lca = bw.LCA({drinking_act:1}, ('IPCC 2013', 'climate change', 'GWP 100a'))
lca.load_lci_data()

In [None]:
water_bottle_LCA_from_dict_input = bw.LCA({drinking_act:1}, ('IPCC 2013', 'climate change', 'GWP 100a'))

In [None]:
# Your turn:
# Do the LCI + LCIA and get the score

# Importing from CSV or Excel

## Choose: the easy way or the hard way

If you choose the easy way, just to the "easy way" section.

### The hard way

We will import a dataset in Excel, but this dataset has errors that we will have to fix.

In [None]:
imp = bw.ExcelImporter(Path(".") / "files" / "excel_importer_example_problems.xlsx")
imp.apply_strategies()
imp.match_database(fields=('name', 'unit', 'location'))
imp.match_database("ecoinvent 3.8 cutoff", fields=('name', 'unit', 'location'))
imp.statistics()

Oops... we have some problems. Let's export our in-process data to a Excel sheet and see what went wrong. It is probably also sensible to load our input Excel sheet at the same time.

In [None]:
imp.write_excel()

It looks like we have a few small inconsistencies, like the name of the processes or their locations. Try to fix these on your own, and re-run the code two cells above to get an idea of your progress. The attributes used for matching (`fields`) must match exactly, though they are case-insensitive.

If you can't get it to work, you can change the name of the import file to "excel_importer_example_fixed.xlsx"; this already has the necessary corrections.

If you need to find the ecoinvent processes, you can search ecoinvent (as you should already know :)

In [None]:
for u in imp.unlinked:
    print(u)

In [None]:
db = bw.Database("ecoinvent 3.8 cutoff")
db.search("aluminium from scrap")

### The easy way

Import a file where everything should link without any problems. Faster, but you don't learn about how, why, and all those other silly questions.

In [None]:
bw.ExcelImporter?

In [None]:
imp = bw.ExcelImporter(Path(".") / "files" / "excel_importer_example.xlsx")
imp.apply_strategies()
imp.match_database(fields=('name', 'unit', 'location'))
imp.match_database("ecoinvent 3.8 cutoff", fields=('name', 'unit', 'location'))
imp.statistics()

In [None]:
imp.metadata

### Writing the database

Finally, we need to write this data to a new `Database`.

In [None]:
imp.write_database()

In [None]:
imp.metadata

Test to make sure we can do an LCA

In [None]:
lca = bw.LCA(
    {("BW2 Excel water bottle import", "WriteSomeCode_UUID_isFineButNotNecessary"): 1}, 
    ('IPCC 2013', 'climate change', 'GWP 100a')
)
lca.lci()
lca.lcia()
lca.score

# Importing from ecospold 1

**Note** The following notebooks (and databases) are too old to be fully executed, but we can take a look at them to see how to proceed when there are problems importing.

Please open the [following notebook](http://nbviewer.jupyter.org/urls/bitbucket.org/cmutel/brightway2/raw/default/notebooks/IO%20-%20Importing%20Agribalyse%20with%20Ecoinvent%202.2.ipynb), download it to this directory, and run it separately. For now, you can find a copy of [agribalyse 1.2 here](http://brightwaylca.org/data/agribalyse.zip); this will need to be unzipped in a child directory of the current directory.

# Importing from other formats

The procedure for importing from other formats is quite similar, you just have to choose the correct importer class. This does not mean that it is easy! Some formats are harder than others, but the [example notebooks](https://docs.brightwaylca.org/notebooks.html#example-notebooks) should at least point you in the right direction.