# Write CFs as JSON with loading

This notebook is part of supporting information for "Matrix-based Methods for Regionalized Life Cycle Assessment" by Chris Mutel and Stefanie Hellweg, submitted to ES&T.

The most recent version of these notebooks can be found at https://github.com/cmutel/regionalized-lca-examples.

It will not run without the following:

* bw2data, version >= 3.4.2
* bw2calc, version >= 1.7
* bw2regional, version >= 0.5.1
* rower, version >= 0.1
* bw2_lcimpact, version >= 0.2

In [None]:
def test_installed_software():
    import bw2data
    import bw2calc
    import bw2regional
    import rower
    import bw2_lcimpact

    assert bw2data.__version__ >= (3, 4, 2)
    assert bw2calc.__version__ >= (1, 7)
    assert bw2regional.__version__ >= (0, 5, 1)
    assert rower >= (0, 1)
    assert bw2_lcimpact.__version__ >= (0, 2)
    
test_installed_software()

# Setup

In [None]:
import bz2
import fiona
import os
import json
import xlrd
import pandarus
import pyprind
import numpy as np
import rasterio
from collections import defaultdict

In [None]:
import brightway2 as bw
import bw2regional as bwr

In [None]:
bw.projects.set_current("computational methods paper")

# Introduction

Loadings, as conceptualized in our paper, are weights used to calculate weighted averages - in this case, the weights are the existing emissions burden, and the weighted average is the characterization factor. The spatial pattern of the loadings are an approximation of *where* emissions happen.

Normally, loadings are provided by LCIA method developers, as they need to know the background level of emissions to calculate marginal CFs. In this case, we need to transform a dataset of irrigation withdrawals on the county-level to our LCIA spatial scale, which is raster cells. This notebook describes how such a transformation can take place.

Given no additional information, we assume that irrigation has a uniform spatial pattern in each county. If this is true, then we need to calculate the weighted average irrigation per raster cell. The weights in this case would be the relative area of each county in each raster cell. The basic procedure here is identical to the way we transform spatial scales in the regionalized LCA methods developed in our paper.

# Irrigation water consumption

Get irrigation data from USGS spreadsheet

In [None]:
water_use_book = xlrd.open_workbook(os.path.join("data", "usco2005.xls"))
water_use = water_use_book.sheet_by_name('County')
water_use.nrows

In [None]:
row = water_use.row(0)

for index, value in enumerate(row):
    if index in (3, 33, 40, 101):
        print(index, value)

## Which column is the most realistic for actual water withdrawals?

We are interested in the following columns:

  * 3: Concatenated state-county FIPS code
  * 33: IR-WSWFr	Irrigation, surface-water withdrawals, fresh, in Mgal/d
  * 40: IC-WSWFr	Irrigation-Crop, surface-water withdrawals, fresh, in Mgal/d <- really poor spatial coverage
  * 101: TO-WSWFr	Total surface-water withdrawals, fresh, in Mgal/d

 Column 40: *Irrigation-Crop, surface-water withdrawals, fresh, in Mgal/d* has really poor spatial coverage:
 
 <img src='images/col-40.png'>

Column 101: *Total surface-water withdrawals, fresh, in Mgal/d* is dominated by power production (maybe once-through plants?)

<img src='images/col-101.png'>

However, column 33: *Irrigation, surface-water withdrawals, fresh* seems like what we would expect (in addition to crops, it includes golf irrigation!?).

<img src='images/col-33.png'>

# What does the actual data look like?

The FIPS code is a string, the irrigation amount is a float.

In [None]:
values_row = water_use.row(1)
values_row[3].value, values_row[33].value

In [None]:
water_use_dict = defaultdict(float)

for i in range(1, water_use.nrows):
    water_use_dict[water_use.cell(i, 3).value] = water_use.cell(i, 33).value

# Intersect US counties with our grid CFs

Here is the map of our characterization factors:

<img src='images/cfs.png'>

To calculate the intersection of these two scales, we use our [pandarus](http://pandarus.readthedocs.org/) library.

In [None]:
pandarus_result = pandarus.intersect(
    os.path.join("output", "956da8c2f7dca5ad8687266eaae0694a08ba7fb7decd794b38a916a430c67f9a.1.geojson"),
    "id",
    os.path.join("data", "county_borders.gpkg"),
    'CO2000P020',
)[1]

In [None]:
with bz2.BZ2File(pandarus_result) as f:
    county_matches = json.load(f)['data']

Our intersection data is a list of:

    (
        grid CF ids - can be used directly, 
        CO2000P020 field - must be matched to our string FIPS codes,
        intersected area in meters
    )

In [None]:
print(len(county_matches))
county_matches[:5]

Get map from 'CO2000P020' value to county FIPS code:

In [None]:
fips_mapping = {}

with fiona.open(os.path.join("data", "county_borders.gpkg")) as f:
    for feat in f:
        fips_mapping[feat['properties']['CO2000P020']] = feat['properties']['FIPS']

In [None]:
fips_mapping[1308]

# Merge datasets

## Data consistency check

Need to see whether the FIPS codes from the counties map match the FIPS codes from the irrigation spreadsheet. FIPS codes can change (see e.g. [this web page](https://www.census.gov/geo/reference/county-changes.html)).

In [None]:
fips_in_gpkg = set(fips_mapping.values())
fips_in_spreadsheet = {water_use.cell(i, 3).value for i in range(1, water_use.nrows)}

In [None]:
fips_in_gpkg.difference(fips_in_spreadsheet)

* 51560 is Clifton Forge City, VA, is not an independent city as of July, 2001 (see [FIPS changes](https://www.ddorn.net/data/FIPS_County_Code_Changes.pdf)). Previous code is 51005. We ignore this for now...
* 23000 is the entire state of Maine, and can be ignored in this case study (Sorry Mainers!)

## Merge `CO2000P020` codes to FIPS codes

The geopackage is a polygon, not a multipolygon, so some counties have more than one feature. These need to be aggregated.

In [None]:
len(fips_mapping), len(set(fips_mapping.values()))

In [None]:
temp_dict = defaultdict(float)

for x, y, z in county_matches:
    temp_dict[(x, fips_mapping[y])] += z

In [None]:
county_matches = [(x[0], x[1], y) for x, y in temp_dict.items()]
len(county_matches), county_matches[0]

## County areas

Get total intersected area for each county

In [None]:
total_intersected_area_per_county = defaultdict(float)

for _, y, z in county_matches:
    total_intersected_area_per_county[y] += z

# Create Loading

[Loadings](http://brightway2-regional.readthedocs.org/formats.html#loadings) in ``bw2regional`` should have the form:

    [
        [amount, IA spatial unit id]
    ]

However, the ``amount`` should have units of mass per area, so we neeed to calculate:

   $$value_{grid} = \frac{\sum_{county \in counties} area(county \cap grid) \cdot irrigation_{county} }{ [ \sum_{county \in counties} area(county \cap grid) ]^{2} }$$
   
The denominator is squared because we need it twice: first, to normalize the weights in the numerator, and second, to convert from irrigation to irrigation per square meter. We know that our grid completely covers all counties, so the sum of the intersected area is the total area of the county.

In [None]:
grid_values = {}

for grid_id in set([x[0] for x in county_matches]):
    grid_values[grid_id] = (
        sum([area * water_use_dict[county] for g, county, area in county_matches if g == grid_id]) /
        sum(total_intersected_area_per_county[county] for g, county, _ in county_matches if g == grid_id) ** 2
    )

In [None]:
import json

with open(os.path.join("output", "loading.json"), "w") as f:
    json.dump([(v, k) for k, v in grid_values.items()], f, indent=2)

## Write results to check reasonableness

In [None]:
with fiona.open(
        os.path.join("output", "956da8c2f7dca5ad8687266eaae0694a08ba7fb7decd794b38a916a430c67f9a.1.geojson")
    ) as source:

    schema = {
        'geometry': 'Polygon',
        'properties': {'id': 'str', 'val': 'float'}
    }
    
    with fiona.open(
            os.path.join("output", "loading_check.geojson"), 
            'w',
            crs=source.crs,
            driver=source.driver,
            schema=schema,
        ) as sink:

        for f in source:
            try:
                f['properties'] = {
                    'id': f['properties']['id'], 
                    'val': grid_values[f['properties']['id']] * 1e6
                }
                sink.write(f)
            except KeyError:
                pass

Seems reasonable enough

<img src='images/loadings.png'>