# Enter and process data

This notebook is part of supporting information for "Matrix-based Methods for Regionalized Life Cycle Assessment" by Chris Mutel and Stefanie Hellweg, submitted to ES&T.

The most recent version of these notebooks can be found at https://github.com/cmutel/regionalized-lca-examples.

It will not run without the following:

* bw2data, version >= 3.4.2
* bw2calc, version >= 1.7
* bw2regional, version >= 0.5.1
* rower, version >= 0.1
* bw2_lcimpact, version >= 0.2

In [None]:
def test_installed_software():
    import bw2data
    import bw2calc
    import bw2regional
    import rower
    import bw2_lcimpact

    assert bw2data.__version__ >= (3, 4, 2)
    assert bw2calc.__version__ >= (1, 7)
    assert bw2regional.__version__ >= (0, 5, 1)
    assert rower >= (0, 1)
    assert bw2_lcimpact.__version__ >= (0, 2)
    
test_installed_software()

# Setup

In [None]:
import brightway2 as bw
import bw2regional as bwr
import numpy as np
import pandarus
import xlrd
import os
import json
import csv

In [None]:
bw.projects.set_current("computational methods paper")

In [None]:
bw.create_core_migrations()

Get basic information about this Brightway2 installation

In [None]:
import bw2data, bw2calc, bw2regional, bw2io, pandarus
print("bw2data:", bw2data.__version__)
print("bw2calc:", bw2calc.__version__)
print("bw2regional:", bw2regional.__version__)
print("bw2io:", bw2io.__version__)
print("pandarus:", pandarus.__version__)

# Geocollections: `water cfs`, `water-xt`, and `states`

Create geocollections for two new maps:

    * `water cfs`: Gridded spatial scale as defined in `Pfister et al 2009`
    * `water-xt`: Finely detailed raster cells used in extension table for water use
    * `states`: US state boundaries, from national atlas

Here are the CFs given on the `water cfs` spatial scale:

<img src='images/raster-cfs.png'>

In [None]:
water_cfs_vector = pandarus.convert_to_vector(
    pandarus.round_raster(
        pandarus.clean_raster(os.path.abspath(os.path.join("data", 'clipped.tiff')))
    ),
    "output"
)

In [None]:
water_cfs_vector

In [None]:
bwr.geocollections['water cfs'] = {
    'filepath': os.path.abspath(water_cfs_vector),
    'field': "id",
}

In [None]:
bwr.geocollections['states'] = {
    'filepath': os.path.abspath(os.path.join('data', 'state_boundaries.gpkg')),
    'field': 'STATE'
}

## Convert extension table inputs from mass to mass per area

The extension table data is not per area - divide each cell by its area to get correct units.

In [None]:
source_fp = os.path.join("data", "blue-water.tiff")
destination_fp = os.path.join("output", 'blue-water-per-area.tiff')

bwr.divide_by_area(source_fp, destination_fp)

We can now point to the correct extension table data.

In [None]:
bwr.geocollections['water-xt'] = {'filepath': os.path.join("output", 'blue-water-per-area.tiff')}

Check to make sure that it worked

In [None]:
sorted(bwr.geocollections)

# Biosphere database

Download, if necessary, and add surface and groundwater irrigation flows (requires internet connection).

In [None]:
if "biosphere3" not in bw.databases:
    bw.create_default_biosphere3()

In [None]:
biosphere = bw.Database("biosphere3")

In [None]:
data = [
    {
        'code': 'water, irrigation, surface',
        'name': 'Surfacewater irrigation',
        'unit': 'kg',
        'type': 'biosphere',
    }, {
        'code': 'water, irrigation, groundwater',                                               
        'name': 'Groundwater irrigation',
        'unit': 'kg',
        'type': 'biosphere'
    }
]

for ds in data:
    biosphere.new_activity(**ds).save()

# New Intersections: `("water cfs", "states")` and `("states", "water cfs")`

Create new Intersection object that maps state boundaries to LCIA CF raster cells

In [None]:
pandarus_result = pandarus.intersect(
    bwr.geocollections['water cfs']['filepath'],
    bwr.geocollections['water cfs']['field'],    
    bwr.geocollections['states']['filepath'],
    bwr.geocollections['states']['field'],
)

In [None]:
bwr.import_from_pandarus(pandarus_result[1])

Check to make sure data is reasonable

In [None]:
print(len(bwr.Intersection(('water cfs', 'states')).load()))
[x for x in bwr.Intersection(('water cfs', 'states')).load() if x[0][1] == '978']

In [None]:
0.799 / 1.14, 0.239 / 0.34, 1.12 / 1.61

I checked these values independently using QGis. The areas are not the same, due to a difference in projections, but the actual values don't matter - we only use the relative values, and they are quite close.

<img src="images/intersection-tests.png" width="300">

Pandarus | Manual | Ratio
--- | --- | ---
0.799 | 1.14 | 0.701
0.239 | 0.34 | 0.703
1.12 | 1.61 | 0.696

## Add intersection of `water cfs` and `states` as new geocollection

We can save some work by breaking things down to the smallest possible spatial units - the actual polygons that are the intersections between the spatial units of `water cfs` and `states`.

In [None]:
bwr.remote.calculate_intersection("water cfs", "states")

In [None]:
bwr.remote.intersection_as_new_geocollection("water cfs", "states", "states to water cfs")

# New LCIA Method: `('irrigation water', 'surface', 'grid scale')`

Create regionalized LCIA method. Start by creating and registering a new Method object.

In [None]:
m_name = ('irrigation water', 'surface', 'grid scale')
m = bw.Method(m_name)
if m_name not in bw.methods:
    m.register(description="Water consumption CFs from Pfister et al 2009", unit="PDF m2 / yr")

This little piece of magic will look at the spatial data file given as metadata for the Method, and import the specified band or layer as CFs for the given flow.

In [None]:
from bw2regional.utils import import_regionalized_cfs

import_regionalized_cfs(
    geocollection="water cfs", 
    method=bw.Method(m_name), 
    # "val" is the default field label when converted to a vector
    mapping={"val": [('biosphere3', 'water, irrigation, surface')]},
    scaling_factor = 1/1000 # Convert from m3 to kg
) 

Check to make sure results are reasonable

In [None]:
bw.Method(m_name).load()[:5]

# New LCIA Method: `('irrigation water', 'surface', 'state scale')`

Create another regionalized LCIA method, at the scale of the inventory.

In [None]:
m_name = ('irrigation water', 'surface', 'state scale')
m = bw.Method(m_name)
if m_name not in bw.methods:
    m.register(
        description="Water consumption CFs from Pfister et al 2009", 
        unit="PDF m2 / yr", 
        geocollections=['states']
    )

In [None]:
wb = xlrd.open_workbook(os.path.join("data", "State_level EI99 CF.xlsx"))
sheet = wb.sheet_by_name("US state level")

In [None]:
state_cf_data = [(
         ('biosphere3', 'water, irrigation, surface'),
         sheet.row(x)[4].value / 1000.,  # Convert m3 to kg
         ('states', sheet.row(x)[1].value)) for x in range(2, sheet.nrows)
        ]

m.write(state_cf_data)

state_cf_data[:3]

# New LCIA Method: `('irrigation water', 'surface', 'site-generic')`

Create a final, site-generic LCIA method.

In [None]:
m_name = ('irrigation water', 'surface', 'site-generic')
m = bw.Method(m_name)
if m_name not in bw.methods:
    m.register(description="Water consumption CFs from Pfister et al 2009", unit="PDF m2 / yr")

In [None]:
m.write([(('biosphere3', 'water, irrigation, surface'), 0.310016 / 1000.)])
m.load()

# New extension table: `blue water consumption`

In [None]:
stats_output = pandarus.raster_statistics(
    bwr.geocollections['states to water cfs']['filepath'],
    bwr.geocollections['states to water cfs']['field'],    
    bwr.geocollections['water-xt']['filepath']
)

In [None]:
bwr.pandarus.import_xt_from_rasterstats(
    stats_output, 
    "blue water consumption",
    'states to water cfs'
)

Check that values are reasonable

In [None]:
bwr.ExtensionTable("blue water consumption").load()[:4]

# New Loading: `irrigation water surface withdrawals`

In [None]:
l_name = ('irrigation water surface withdrawals')
loading = bwr.Loading(l_name)
if l_name not in bwr.loadings:
    loading.register(unit="MGal/day/km^2")

In [None]:
loading_data = json.load(open(os.path.join("output", "loading.json")))

# Add geocollection id to loading data
loading_data = [[x[0] * 1e6, ('water cfs', x[1])] for x in loading_data]
loading.validate(loading_data)
loading.write(loading_data)

loading_data[0]

# New LCI database: `crops`

We use the LCI datasets for the US from the LCA Data Commons; specifically, version 1 prepared in Ecospold version 1 format, and available from https://data.nal.usda.gov/dataset/unit-process-data-field-crop-production-version-1/resource/31ee2655-a96b-4d16-82d7-48e53575a501. This data is public domain.

We have filtered these datasets to the crops of interest in the case study, and use only data for the year 2000.

In the code repository, these files are compressed to save space. You should unzip `data/crops.zip`.

In [None]:
if "crops" in bw.databases:
    del bw.databases['crops']

importer = bw.SingleOutputEcospold1Importer(os.path.join("data", "crops"), "crops")

In [None]:
importer.apply_strategies()

In [None]:
importer.match_database(fields=['name', 'location'])

In [None]:
importer.statistics()

## Cleanup Step 1: Fix state names

Need to go from `AL` to `('states', 'Alabama')` to match the locations in the `'states'` geocollection and the state-level CFs.

In [None]:
state_mapping = {
    'AL': 'ALABAMA',
    'AR': 'ARKANSAS',
    'AZ': 'ARIZONA',
    'CA': 'CALIFORNIA',
    'CO': 'COLORADO',
    'GA': 'GEORGIA',
    'IA': 'IOWA',
    'ID': 'IDAHO',
    'IL': 'ILLINOIS',
    'IN': 'INDIANA',
    'KS': 'KANSAS',
    'KY': 'KENTUCKY',
    'LA': 'LOUISIANA',
    'MI': 'MICHIGAN',
    'MN': 'MINNESOTA',
    'MO': 'MISSOURI',
    'MS': 'MISSISSIPPI',
    'MT': 'MONTANA',
    'NC': 'NORTH CAROLINA',
    'ND': 'NORTH DAKOTA',
    'NE': 'NEBRASKA',
    'NY': 'NEW YORK',
    'OH': 'OHIO',
    'OK': 'OKLAHOMA',
    'OR': 'OREGON',
    'PA': 'PENNSYLVANIA',
    'SD': 'SOUTH DAKOTA',
    'TN': 'TENNESSEE',
    'TX': 'TEXAS',
    'WA': 'WASHINGTON',
    'WI': 'WISCONSIN',
}

In [None]:
for ds in importer.data:
    ds['location'] = ('states', state_mapping[ds['location']].title())

## Cleanup Step 2: Make dataset names simpler

In [None]:
name_mapping = {
    r'corn grain; at harvest in 2000; at farm; 85%-91% moisture': 'corn',
    r'cotton lint; at harvest in 2000; at farm; 90%-93% moisture': 'cotton',
    r'soybeans; at harvest in 2000; at farm; 85%-92% moisture': 'soybeans',
    r'winter wheat; at harvest in 2000; at farm; 86%-90% moisture': 'winter wheat'
}

In [None]:
for ds in importer.data:
    ds['name'] = name_mapping[ds['name']]
    
    for exc in ds['exchanges']:
        if exc['name'] in name_mapping:
            exc['name'] = name_mapping[exc['name']]

## Cleanup Step 3: Change `from unspec. source` irrigation to surface water

This is a conservative assumption, and we don't have any better guesses.

We also clean up the names by removing the products.

In [None]:
{exc['name'] for ds in importer.data for exc in ds['exchanges'] if 'water; withdrawal' in exc['name']}

In [None]:
for ds in importer.data:
    for exc in ds['exchanges']:
        if exc['name'].startswith('water; withdrawal from unspec. source'):
            exc['name'] = 'water; withdrawal from surface water' 
        elif exc['name'].startswith('water; withdrawal from surface water'):
            exc['name'] = 'water; withdrawal from surface water' 
        elif exc['name'].startswith('water; withdrawal from groundwater'):
            exc['name'] = 'water; withdrawal from groundwater' 

## Cleanup Step 4: Link irrigation water to biosphere flows

In [None]:
for ds in importer.data:
    for exc in ds['exchanges']:
        if exc['name'] == 'water; withdrawal from surface water':
            exc['input'] = ("biosphere3", 'water, irrigation, surface')
        elif exc['name'] == 'water; withdrawal from groundwater':
            exc['input'] = ("biosphere3", 'water, irrigation, groundwater')

## Internal linking and write database

In [None]:
importer.drop_unlinked(True)

In [None]:
importer.write_database()

In [None]:
[ds for ds in bw.Database("crops")]

## Create weighted national production mixes database

Still need to create the mixes for each crop production, with weights from state crop production (normalized to sum to one).

In [None]:
def to_dict(row):
    return {
            'crop': row[2],
            'state': row[0],
            'production': int(row[3].replace(",", ""))
            }

with open(os.path.join("data", "state-production.csv"), "r") as f:
    reader = csv.reader(f)
    next(reader)
    production_data = [to_dict(x) for x in reader if x[2] != 'Spring Wheat']
    
production_data[:3]

In [None]:
for crop in ('Corn', 'Cotton', 'Soybeans', 'Winter Wheat'):
    inputs = {ds['location'][1]: ds for ds in importer.data if ds['name'] == crop.lower()}
    productions = {line['state'].title(): line for line in production_data if line['crop'] == crop}
    total = sum([o['production'] for o in productions.values()])
    mapped_total = sum([o['production'] for k, o in productions.items() if k in inputs])
    print("For crop", crop, "found this fraction of national production:", mapped_total / total)

It is not great, but not so bad - in any case, it will have to do, as we don't have any additional LCI data.

For whatever reason, the [USDA LCA commons](http://lcacommons.gov) website has data for winter wheat for Michigan, but only 2004 and 2009.

In [None]:
Image(filename='images/lca-commons.png')

In [None]:
mix_db = bw.Database("production mixes")
if "production mixes" not in bw.databases:
    mix_db.register(depends=["crops"], geocollections=['states'])

In [None]:
data = {}

for crop in ('Corn', 'Cotton', 'Soybeans', 'Winter Wheat'):
    inputs = {ds['location'][1]: ds for ds in importer.data if ds['name'] == crop.lower()}
    productions = {line['state'].title(): line for line in production_data if line['crop'] == crop}
    total = sum([o['production'] for k, o in productions.items() if k in inputs])

    exchanges = [{
        'input': ("production mixes", crop),
        'amount': 1,
        'type': 'production',
    }]
    
    for loc, ds in inputs.items():
        if loc not in productions:
            continue
            
        exchanges.append({
            'input': (ds['database'], ds['code']),
            'amount': productions[loc]['production'] / total,
            'type': 'technosphere',
            'location': ds['location']
        })
    
    data[("production mixes", crop)] = {
        'type': 'process',
        'name': crop,
        'location': 'GLO',
        'exchanges': exchanges
    }

In [None]:
mix_db.write(data)

# Set database geocollections

In [None]:
bw.databases["biosphere3"]['geocollections'] = []
bw.databases["crops"]['geocollections'] = ['states']
bw.databases['production mixes']['geocollections'] = ['states']
bw.databases.flush()