## Construct Least Cost Paths - sorted on potential
This notebook constructs a pipeline network connecting all biogas sources using the following workflow: 
1.	Start with the biogas source with the largest potential (f^3/yr) and find its least cost path to the existing pipeline network.
2.	Move to the site with the next highest potential and find its least cost path to the updated pipeline (original + the pipeline linking the 1st site to the original pipe network). 
3.	Proceed with remaining sites, in order of declining potential, until all are connected.
4.	Then, when all sites have been connected, compute the accumulated volume of biogas passing through each segment from all “upstream sites”. 

#### The datasets required for this analysis include:
* The spreadsheet of biogas source, including coordinates, total waste, and biogas potential
* The MIT pipeline cost surface (all US)
* The REXTAG natural gas pipeline raster (all US)

#### Analysis consists of the following functions:
* **Data Prep**:
 * Import biogas locations as Pandas dataframe: `df_biogas`
 * Convert biogas locations to GeoPandas geodataframe, using WGS 84 coordinate ref. system (CRS): `gdf_biogas`
 * Import MIT pipeline cost surface as rasterio raster: `ds_pipecost_full`
 * Transform biogas geodataframe to same CRS as pipeline cost raster: `ds_pipelines_full`
 * Subset pipeline cost raster to extent of biogas locations, buffered by 5km: `ds_pipecost`
 * Subset pipeline raster to same extent as above: `ds_pipelines`

* **Functions**:
 * `cost_distance(source_pt, cost_array)`: Compute cost distance away from site. Outputs are a cost distance and traceback array. 
 * `compute_lcp(source_pt, cost_array, pipes_array)`: 

---
## Data Prep

In [1]:
#Imports
import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point,LineString
import rasterio
import rasterio.mask

from skimage import graph

import matplotlib.pyplot as plt
%matplotlib inline

* Filenames

In [2]:
#Inputs
fn_biogas_excel = '..\\data\\NCSwineBiogasPotentialUniqueFarms.xlsx'
fn_MIT_cost_surface = '..\\data\\MIT_Surface_Full\\costsurface500m.img'
fn_REXTAG_pipe_raster = '..\\data\\processed\\USA_pipes_operational.tif'

#Outputs
fn_biogas_points = '..\\data\\processed\\biogas_sources.shp'
fn_subset_costs =  '..\\data\\processed\\subset_costs.tif'
fn_subset_pipes = '..\\data\\processed\\subset_pipelines.tif'
fn_biogas_routes = ('..\\data\processed\\Routes.shp')

* Import biogas data and convert to geodataframe

In [3]:
#Read the excel sheet into a dataframe
df_biogas = pd.read_excel(fn_biogas_excel,sheet_name='Sheet2')

In [5]:
#Isolate records with biogas potential values and sort from highest to lowest
df_biogas = (
    #Remove records missing biogas potential data and the Total row
    df_biogas.loc[(df_biogas['Biogas Potential (m^3 / year)'] > 0) &
                  (df_biogas['Facility ID'] != 'TOTAL')]
    #Sort on biogas potential from highest to lowest
    .sort_values(by='Biogas Potential (m^3 / year)',ascending=False)
    #Reset the index
    .reset_index(drop=True))

In [6]:
#Display the contents
df_biogas.head()

Unnamed: 0,Facility ID,Facility Name,Address,City,County Name,Zip,Latitude,Longitude,Regulated Activity,Allowable Count,Total Waste (tons / year),Biogas Potential (m^3 / year)
0,291,Magnolia III DM Section 4 Sites 1-4 Section 3 ...,1114 Beasley Mill Rd,Warsaw,Duplin,28398.0,34.8875,-78.1314,Swine - Farrow to Wean,106640,490225.577124,13726320.0
1,154,Mr. Holmes Sites #1 - #14 #1 7 #18 Blueberry S...,2313 Mr Holmes Farm Rd,Garland,Bladen,28441.0,34.819191,-78.544711,Swine - Farrow to Wean,78630,301328.821333,8437207.0
2,1930,White Oaks Farm Inc,604 Benton Pond Rd,Fremont,Wayne,27830.0,35.5161,-77.9231,Swine - Farrow to Wean,65550,266616.481879,7465261.0
3,292,DM Farms Sec 2 Sites 1-4,419 Dail Rd,Magnolia,Duplin,28453.0,34.8672,-78.1514,Swine - Feeder to Finish,63360,243401.613396,6815245.0
4,1925,Somerset Farm,Sr 1139 1855-A N Line Rd,Roper,Washington,27970.0,35.8889,-76.5358,Swine - Feeder to Finish,59000,226652.386212,6346267.0


In [7]:
#Create a series of Point geometries for each record
geom = [Point(xy) for xy in zip(df_biogas.Longitude, df_biogas.Latitude)]
#Construct the geodataframe from the data and geometry, setting crs to WGS84
gdf_biogas = gpd.GeoDataFrame(df_biogas, geometry = geom, crs = 4326)
#Delete the original dataframe (to conserve memory)
del(df_biogas)

### Subset the cost and pipeline rasters to the extent of the biogas sites
* Read in the raster datasets

In [9]:
#Read the MIT img file into a rasterio raster
ds_pipecost_full = rasterio.open(fn_MIT_cost_surface)

#Read the REXTAG tif file into a rasterio raster
ds_pipelines_full = rasterio.open(fn_REXTAG_pipe_raster)

* Transform the biogas points to the same crs as the MIT raster

In [10]:
#Get the MIT coordinate reference system
crs_pipecost = ds_pipecost_full.crs
#Apply the transformation
gdf_biogas = gdf_biogas.to_crs(crs_pipecost)
#Save to file
gdf_biogas.to_file(fn_biogas_points)

* Compute the extent of the biogas features, buffered 5000m

In [11]:
#Collapse the biogas points to a multipoint object, buffer 5000m, and pull its exent
bg_extent = gdf_biogas.geometry.unary_union.buffer(5000).envelope

* Subset & save the cost raster. *Note, we also reclassify zero values to high values*

In [12]:
#Subset the cost raster using the bg_extent shape as a mask
cost_image, cost_transform = rasterio.mask.mask(ds_pipecost_full,[bg_extent],crop=True)
#Revise zero values to high cost values in cost_image
cost_image[cost_image == 0] = cost_image.max() * 10
#Create and revise the output metadata
cost_meta = ds_pipecost_full.meta
cost_meta.update({"driver":"GTiff",
                  "height":cost_image.shape[1],
                  "width":cost_image.shape[2],
                  "transform":cost_transform})
#Write to a geoTiff file
with rasterio.open(fn_subset_costs,'w',**cost_meta) as dst:
    dst.write(cost_image)

* Subset & save the pipeline raster.

In [13]:
#Subset the pipes raster using the bg_extent shape as a mask
pipe_image, pipe_transform = rasterio.mask.mask(ds_pipelines_full,[bg_extent],crop=True)
pipe_meta = ds_pipecost_full.meta

#Save the subset pipeline raster
pipe_meta.update({"driver":"GTiff",
                  "height":pipe_image.shape[1],
                  "width":pipe_image.shape[2],
                  "nodata":255,
                  "dtype":pipe_image.dtype,
                  "transform":pipe_transform})
#Write to a geoTiff file
with rasterio.open(fn_subset_pipes,'w',**pipe_meta) as dst:
    dst.write(pipe_image)

In [14]:
#Read the rasters back in (as read only)
ds_costs = rasterio.open(fn_subset_costs)
ds_pipeline = rasterio.open(fn_subset_pipes)

#Read in bands as arrays
arr_cost = ds_costs.read(1)
arr_pipes = ds_pipeline.read(1)

In [15]:
#Create a masked array from the pipeline dataset
pipe_mask = np.ma.masked_array(arr_pipes,mask=arr_pipes==255)

---
## Analysis: Functions
1. Compute least cost path for a biogas source to nearest pipe

In [16]:
#Get attributes of input cost/pipe rasters
x_size,y_size = ds_costs.res

#Create a graph from the cost raster
lc_graph = graph.MCP_Geometric(arr_cost,sampling=(x_size,y_size))

#Make working copies of the biogas features 
gdf_biogas_routes = gdf_biogas.copy(deep=True)

#Create lookup dictionary of pipe type
pipeDict = {1:'Transmission',2:'Distribution',3:'Gathering'}

#Iterate through each record
for layer_number,row in gdf_biogas_routes.iterrows():
    #Status: report every 20th run
    if layer_number%20 == 0: print(gdf_biogas_routes.shape[0] - layer_number,end=' ')

    #Get the biogas feature for the specified layer number
    fac_id = row['Facility ID']
    bg_point = row['geometry']

    #Get the index location
    idx = ds_pipeline.index(bg_point.x,bg_point.y)
    
    #Compute the cost-distance and traceback arrays
    cd_array, tb_array = lc_graph.find_costs(starts=([idx]))

    ##Find the pipe coord with lowest cost in cd_array

    #Mask just pipeline pixels from cost distance output
    arr_cd_pipes = np.ma.masked_array(cd_array,mask=arr_pipes==255)
    
    #Locate the min value, i.e. where the LCP should end
    minPipeCost = arr_cd_pipes.min()
    
    #Determine the row and column where the min occurs
    rMin, cMin = np.where(arr_cd_pipes == minPipeCost)
    
    #Extract values from value arrays
    pipe_coords = (rMin[0],cMin[0])
    
    #Get the pipe type at the output
    pipe_type_code = arr_pipes[pipe_coords]
    pipe_type = pipeDict[pipe_type_code]

    #Get the row/col indices of pixels in the LCP 
    lcp_indices = lc_graph.traceback(pipe_coords)
    
    #If the site is already on the pipeline, a lineString cannot be created
    if(len(lcp_indices)<2):
        gdf_biogas_routes.loc[layer_number,'TYPE'] = pipe_type
        gdf_biogas_routes.loc[layer_number,'cost'] = minPipeCost
        r, c = pipe_coords
        toPt = Point(ds_pipeline.xy(r,c))
        pipe_line = LineString((bg_point,toPt))
        gdf_biogas_routes.loc[layer_number,'geometry'] = pipe_line
        continue

    ##Add the route polyline to the geodatabase
    #Create linestring from lcp_indices
    pipe_line = LineString([ds_pipeline.xy(r,c) for r,c in lcp_indices])
    
    #Update the dataset so that the geometry is the route
    gdf_biogas_routes.loc[layer_number,'geometry'] = pipe_line
    gdf_biogas_routes.loc[layer_number,'TYPE'] = pipe_type
    gdf_biogas_routes.loc[layer_number,'cost'] = minPipeCost

    #Update the pipeline with the new connection
    for r,c in lcp_indices:
        arr_pipes[r,c] = pipe_type_code

2040 2020 2000 1980 1960 1940 1920 1900 1880 1860 1840 1820 1800 1780 1760 1740 1720 1700 1680 1660 1640 1620 1600 1580 1560 1540 1520 1500 1480 1460 1440 1420 1400 1380 1360 1340 1320 1300 1280 1260 1240 1220 1200 1180 1160 1140 1120 1100 1080 1060 1040 1020 1000 980 960 940 920 900 880 860 840 820 800 780 760 740 720 700 680 660 640 620 600 580 560 540 520 500 480 460 440 420 400 380 360 340 320 300 280 260 240 220 200 180 160 140 120 100 80 60 40 20 

* Save results

In [17]:
#Rename columns
gdf_out = gdf_biogas_routes.rename({'Facility ID':'Fac_ID',
                                    'Total Waste (tons / year)':'Waste',
                                    'Biogas Potential (m^3 / year)':'Biogas'
                                   },axis=1)

#Filter for LineString records, and select columns: write to output
outColumns = ['Fac_ID','Waste','Biogas','TYPE','geometry']
gdf_out.loc[gdf_out.geometry.type=='LineString'][outColumns].to_file(fn_biogas_routes)

In [None]:
#Write route attributes to csv file
gdf_biogas_routes.to_csv('..\\data\\processed\\RouteData.csv')