# Write CFs as JSON with loading

This notebook is part of supporting information for "Computational Methods for Regionalized Life Cycle Assessment" by Chris Mutel and Stefanie Hellweg.

It will not run without an up-to-date installation of the following:
* [brightway2](https://docs.brightwaylca.org/installation.html)
* [brightway2-regional](http://brightway2-regional.readthedocs.org/)

# Setup

In [18]:
import bz2
import fiona
import os
import json
import xlrd
import pyprind
import numpy as np
import rasterio

In [2]:
import brightway2 as bw
import bw2regional as bwr

In [3]:
bw.projects.set_current("computational methods paper")

# Loading data on intersecting spatial supports

This is the output of a [pandarus](http://pandarus.readthedocs.org/) job. It relates county shapefile integer indices (not FIPS codes) to raster cells.

In [4]:
with bz2.BZ2File(os.path.join("data", "county-cfs.json.bz2")) as f:
    county_matches = json.load(f)

In [5]:
print(len(county_matches))
county_matches[:10]

15316


[[3318, 'Cell(-109.75, 33.25)', 2578653127.8129683],
 [2, 'Cell(-122.25, 49.25)', 9850880.43477085],
 [229, 'Cell(-109.75, 46.25)', 190192847.98099387],
 [1165, 'Cell(-94.75, 41.75)', 165914939.1054086],
 [101, 'Cell(-88.75, 48.25)', 346677.09238775267],
 [1990, 'Cell(-85.75, 38.25)', 8641929.401829962],
 [2666, 'Cell(-82.25, 36.25)', 417109652.08117104],
 [1789, 'Cell(-100.75, 39.25)', 775938289.607816],
 [3596, 'Cell(-95.75, 31.75)', 392467981.4989172],
 [470, 'Cell(-90.75, 45.75)', 386487454.33441937]]

## Map to county FIPS codes

We don't have county FIPS ids, because the shapefile is geometry type polygon, not multipolygon, and some counties were split (so there would be duplicate FIPS codes). Therefore, we need to aggregate shapefile indices to FIPS codes ourselves.

In [7]:
counties = fiona.open(os.path.join("data", "county_borders.gpkg"))
print(len(counties))

for x in counties:
    print(x['properties'])
    break

4693
OrderedDict([('AREA', 0.562), ('PERIMETER', 4.169), ('CO2000P020', 1306), ('STATE', 'MN'), ('COUNTY', 'Lake of the Woods County'), ('FIPS', '27077'), ('STATE_FIPS', '27'), ('SQUARE_MIL', 1296.704)])


Technically, the county FIPS is 077, and the state FIPS is 27, but we will use 27077, the concatenated version.

In [8]:
county_indices_to_fips = {index: obj['properties']['FIPS'] for index, obj in enumerate(counties)}

Consolidate the county indices to FIPS codes.

Create dictionary with form:

    {
        (FIPS code, cell-label): intersected area
    }

In [9]:
county_fips_cell_dict = {}

for county_id, cell_id, area in county_matches:
    key = (county_indices_to_fips[county_id], cell_id)
    county_fips_cell_dict[key] = county_fips_cell_dict.get(key, 0) + area

In [10]:
print(len(county_fips_cell_dict))
list(county_fips_cell_dict.items())[0]

13632


(('04009', 'Cell(-109.75, 33.25)'), 2578653127.8129683)

## County areas

Get total intersected area for each county

In [11]:
total_intersected_area_per_county = {
    fips: sum([value for key, value in county_fips_cell_dict.items() if key[0] == fips]) 
    for fips in {key[0] for key in county_fips_cell_dict}
}

list(total_intersected_area_per_county.items())[0]

('37185', 1152114945.786825)

# Irrigation water consumption

Get irrigation data from USGS spreadsheet

In [12]:
water_use_book = xlrd.open_workbook(os.path.join("data", "usco2005.xls"))
water_use = water_use_book.sheet_by_name('County')
print(water_use.nrows)

3223


In [13]:
row = water_use.row(0)

for index, value in enumerate(row):
    if index in (3, 33, 40, 101):
        print(index, value)

3 text:'FIPS'
33 text:'IR-WSWFr'
40 text:'IC-WSWFr'
101 text:'TO-WSWFr'


## Which column is the most realistic for actual water withdrawals?

We are interested in the following columns:

  * 3: Concatenated state-county FIPS code
  * 33: IR-WSWFr	Irrigation, surface-water withdrawals, fresh, in Mgal/d
  * 40: IC-WSWFr	Irrigation-Crop, surface-water withdrawals, fresh, in Mgal/d <- really poor spatial coverage
  * 101: TO-WSWFr	Total surface-water withdrawals, fresh, in Mgal/d

 Column 40: *Irrigation-Crop, surface-water withdrawals, fresh, in Mgal/d* has really poor spatial coverage:
 
 <img src='images/col-40.png'>

Column 101: *Total surface-water withdrawals, fresh, in Mgal/d* is dominated by power production (maybe once-through plants?)

<img src='images/col-101.png'>

However, column 33: *Irrigation, surface-water withdrawals, fresh* seems like what we would expect (in addition to crops, it includes golf irrigation!?).

<img src='images/col-33.png'>

Create a dictionary linking county FIPS codes to column 33 water withdrawals:

In [14]:
irrigation_data = {}

for row_index in range(1, water_use.nrows):
    row = water_use.row(row_index)
    irrigation_data[row[3].value] = row[33].value or 0

# Create Loading

[Loadings](http://brightway2-regional.readthedocs.org/formats.html#loadings) in ``bw2regional`` should have the form:

    [
        [amount, IA spatial unit id]
    ]

However, the ``amount`` should have units of mass per area, so we neeed to calculate:

   $$\frac{water \; consumption * county \; intersected \; area}{county \; total \; area}$$
  
To do this, we use the data we already have:

    total_intersected_area_per_county: {FIPS code: area}
    county_fips_cell_dict: {(FIPS code, raster cell): area}
    irrigation_data: {FIPS code: water consumption}

## County FIPS code correspondence and integrity

However, we first have to check data integrity - FIPS codes can change (see e.g. [this web page](https://www.census.gov/geo/reference/county-changes.html)). We only care about values in *county_fips_cell_dict* which aren't in *irrigation_data* or *total_intersected_area_per_county*.

In [15]:
county_fips = {key[0] for key in county_fips_cell_dict}
irrigation_cell_ids = set(irrigation_data.keys())
area_cell_ids = set(total_intersected_area_per_county.keys())

print("Codes missing in `irrigation_data`:", county_fips.difference(irrigation_cell_ids))
print("Codes missing in `total_intersected_area_per_county`:", county_fips.difference(area_cell_ids))

Codes missing in `irrigation_data`: {'23000', '51560'}
Codes missing in `total_intersected_area_per_county`: set()


* 51560 is Clifton Forge City, VA, is not an independent city as of July, 2001 (see [FIPS changes](https://www.ddorn.net/data/FIPS_County_Code_Changes.pdf)). Previous code is 51005. We ignore this for now...
* 23000 is the entire state of Maine, and can be ignored in this case study (Sorry Mainers!)

In [16]:
raster_cells = {key[1] for key in county_fips_cell_dict}

output_data = []

for index, raster_cell in enumerate(pyprind.prog_bar(raster_cells)):
    total_water = 0
    for key, value in county_fips_cell_dict.items():
        if key[1] == raster_cell:
            county = key[0]
            if county in ('51560', '23000'):
                continue
            total_water += irrigation_data[county] * value / total_intersected_area_per_county[county]
    output_data.append((total_water, raster_cell))

output_data[:5]

0%                          100%
[##############################] | ETA: 00:00:00
Total time elapsed: 00:00:05


[(96.96912132466527, 'Cell(-80.75, 27.75)'),
 (0.0021942331058926886, 'Cell(-94.75, 42.75)'),
 (0.07515092995552938, 'Cell(-118.75, 33.25)'),
 (3.5811588223260755, 'Cell(-115.75, 48.75)'),
 (1.238084743991787, 'Cell(-121.75, 47.75)')]

This isn't perfect - for example, West Virginia is completely missing. But it is good enough.

Write results to a file that will be used in the next notebook.

In [22]:
import json

with open(os.path.join("output", "loading.json"), "w") as f:
    json.dump(output_data, f, indent=2)