# Building water availability suitablity layers for GRIDCERF

The following code was used to build the water availability suitability layers for GRIDCERF. GRIDCERF does not provide the source data directly due to some license restrictions related for direct redistribution of the unaltered source data.  However, the following details the provenance associated with each source dataset and how they were processed.


## 1. Downloading the data

### 1.1 Download GRIDCERF

Download the GRIDCERF package if you have not yet done so from here:  https://doi.org/10.5281/zenodo.6601789.  Please extract GRIDCERF inside the `data` directory of this repository as the paths in this notebook are set to that expectation.


### 1.2 Download the available water data


- **Title**:  Surface Water Flow
- **Description from Source**:   This data is a representation of stream and river water bodies derived from the National Hydrography Dataset (NHD) and is symbolized by flow rate from gauge adjusted values from the USGS Extended Unit Runoff Method(EROM) table. The EROM table contains other mean annual flow/velocity statistics for the NHDFlowline features such as Flow from runoff or Velocity from runoff. Subject matter experts on this data agreed that for the purposes of finding flow discharge rates of cooling water for power plants, the gauge adjusted vales were the ideal empirical values for this study. It should also be noted that the flow rates were originally in Cubic feet per second (cfs) but converted to Gallons per minute (gpm).  The geospatial data sets included in NHDPlusV2 are intended to support a variety of water- related applications. They already have been used in an application to develop estimates of mean annual streamflow and velocity for each NHDFlowline feature in the conterminous United States. The results of these analyses are included with the NHDPlusV2 data. A water-quality model developed by the U.S. Geological Survey (USGS) called SPARROW (Spatially Referenced Regressions on Watershed Attributes), can utilize the NHDPlusV2 network functionality to track the downstream transport of nutrients, sediments, or other substances. NHDPlusV2 water bodies and estimates of streamflow and velocity are used in SPARROW to identify reservoir retention and in-stream loss factors. NHDPlusV2 climatic and land surface attributes can be used in SPARROW to identify potential factors in the delivery of nutrients from the land surface to streams. NHDPlusV2 data is also being used in select areas for a USGS Web-based application, called StreamStats. StreamStats provides tools to interactively select any point in the implemented areas, delineate watersheds, and to obtain streamflow and watershed characteristics for the selected point. NHDPlusV2 has been designed to accommodate many users' needs for future applications. NHDPlusV2 provides the framework and tools necessary to customize the behavior of the network relationships as well as building upon the attribute database, for which the user can assign their own data to the network.
- **Source URL**:  https://ezmt.anl.gov/mapexport/surface_water_flow_nhdplus_v2_erom_eispc_v2.zip
- **Date Accessed**:  10/14/22
- **Citation**
> Moore, R. B. et al. User’s guide for the national hydrography dataset plus (NHDPlus) high resolution: U.S. Geological Survey Open-File Report 2019–1096. https://pubs.er.usgs.gov/publication/ofr20191096 (2019).


## 2. Setup environment

### 2.1 Install GDAL

This application requires GDAL to be installed.  We will call GDAL directly from your command prompt or terminal, so please ensure that you can do so before running the following cells.  More information on how to install GDAL can be found here:  https://gdal.org/download.html


### 2.3 Import necessary Python packages

In [1]:
import os
import glob

import rasterio
import numpy as np
import pandas as pd
import geopandas as gpd


## 3. Configuration

In [2]:
# get the parent directory path to where this notebook is currently stored
root_dir = os.path.dirname(os.getcwd())

# data directory in repository
data_dir = os.path.join(root_dir, "data")

# GRIDCERF data directory from downloaded archive
gridcerf_dir = os.path.join(data_dir, "gridcerf")

# GRIDCERF reference data directory
reference_dir = os.path.join(gridcerf_dir, "reference")

# GRIDCERF common data directory
common_dir = os.path.join(gridcerf_dir, "common")

# GRIDCERF technology_specific data directory
technology_specific_dir = os.path.join(gridcerf_dir, "technology_specific")

# GRIDCERF compiled final suitability data directory
compiled_dir = os.path.join(gridcerf_dir, "compiled")

# template land mask raster
template_raster = os.path.join(reference_dir, "gridcerf_landmask.tif")

# source data directory
source_dir = os.path.join(gridcerf_dir, "source", "water")

# temporary output raster for processing
temp_output_raster = os.path.join(source_dir, "temporary_raster.tif")

# generate a list of all common exclusion files
common_raster_list = glob.glob(os.path.join(common_dir, "*.tif"))

# source NHD shapefile
nhd_shapefile = os.path.join(source_dir, "surface_water_flow_nhdplus_v2_erom_eispc_v2", "ez_gis.surface_water_flow_nhdplus_v2_erom_eispc_v2.shp")

# bins for minimum mean annual flow requirements where the key is the target 
#  file name and the value is the threshold in MGD
mgd_list = [2, 10, 25, 35, 40, 55, 70, 75, 95, 110, 120, 135]


## 4. Generate wind suitability rasters

### 4.1 Functions to build suitability

In [3]:
def preprocess_nhd_data(nhd_shapefile: str,
                        template_raster: str) -> gpd.GeoDataFrame:
    """Preprocess NHD flowlines to convert to millions gallons per day (MGD) and prepare
    a rasterization field.
    
    """
    
    # get target coordinate reference system from template raster
    with rasterio.open(template_raster) as src:
        target_crs = src.crs
    
    # only keep gallons per minute flow and geometry and reproject
    gdf = gpd.read_file(nhd_shapefile)[['q_gpm', 'geometry']].to_crs(target_crs)

    # convert to millions gallons per day
    gdf['mgd'] = (gdf['q_gpm'] / 1000000) * 60 * 24

    # drop gpm field
    gdf.drop(columns=['q_gpm'], inplace=True)

    # set raster value
    gdf['value'] = 0
    
    return gdf
    

### 4.2 Generate available water suitability rasters

In [8]:
# preprocess NHD flowlines for rasterization to MGD thresholds
print("Preprocessing NHD data...")
gdf = preprocess_nhd_data(nhd_shapefile=nhd_shapefile,
                          template_raster=template_raster)

# create buffered flowlines matching the flow requirement
for i in mgd_list:
    
    print(f"Processing flow threshold of:  {i} MGD")
    
    # construct a file basename
    basename = f"gridcerf_nhd2plus_surfaceflow_greaterthan{i}mgd_buffer20km"
    
    # extract the flowlines that support the minimum flow requirement
    gdx = gdf.loc[gdf['mgd'] >= i].copy()

    # buffer by 20 km (20000 meters)
    gdx['geometry'] = gdx.buffer(20000)
    
    # construct temporary shapefile output file path
    output_shp = os.path.join(source_dir, f"{basename}.shp")
    
    # write output shapefile
    gdx[['value', 'geometry']].to_file(output_shp)
    
    # construct the water availability raster output raster name
    output_raster = os.path.join(technology_specific_dir, f"{basename}.tif")

    # construct the GDAL raster command
    gdal_rasterize_cmd = f"gdal_rasterize -l {basename} -a value -tr 1000.0 1000.0 -init 1.0 -te -2405552.8355 -1389065.2005 2287447.1645 1609934.7995 -ot Int16 -of GTiff {output_shp} {output_raster}"
    
    # execute the GDAL command via the system terminal
    os.system(gdal_rasterize_cmd)


Preprocessing NHD data...
Processing flow threshold of:  2 MGD
0...10...20...30...40...50...60...70...80...90...100 - done.
Processing flow threshold of:  10 MGD
0...10...20...30...40...50...60...70...80...90...100 - done.
Processing flow threshold of:  25 MGD
0...10...20...30...40...50...60...70...80...90...100 - done.
Processing flow threshold of:  35 MGD
0...10...20...30...40...50...60...70...80...90...100 - done.
Processing flow threshold of:  40 MGD
0...10...20...30...40...50...60...70...80...90...100 - done.
Processing flow threshold of:  55 MGD
0...10...20...30...40...50...60...70...80...90...100 - done.
Processing flow threshold of:  70 MGD
0...10...20...30...40...50...60...70...80...90...100 - done.
Processing flow threshold of:  75 MGD
0...10...20...30...40...50...60...70...80...90...100 - done.
Processing flow threshold of:  95 MGD
0...10...20...30...40...50...60...70...80...90...100 - done.
Processing flow threshold of:  110 MGD
0...10...20...30...40...50...60...70...80...9