# WNTR Geospatial Tutorial
The following tutorial illustrates the use of the `wntr.gis` module to use geospatial data in resilience analysis.  The tutorial uses a water network model from Kentucky coupled with GIS data to quantify potential water service disruptions from pipes damaged in a landslide.

## Imports
Import WNTR and additional Python packages that are needed for the tutorial
- Geopandas is used to load geospatial data
- Shapely is used to define the region of interest
- Matplotlib is used to create subplots

In [None]:
import geopandas as gpd
from shapely.geometry import box
import matplotlib.pylab as plt
import wntr

# Water Network Model
Water network models can be created from EPANET INP files, from GIS data in GeoJSON or Shapefile format, or from scratch using methods such as `add_junction` and `add_pipe`. The following section creates a water network model from an EPANET INP file and illustrates how models are created from GIS data.

## Create a Water Network Model from an EPANET INP file
The water distribution network model used in this tutorial was downloaded from the [UKnowledge Water Distribution Systems Research Database](https://uknowledge.uky.edu/wdsrd/). KY10 was selected for the analysis.

Citation: Hoagland, Steven, "10 KY 10" (2016). Kentucky Dataset. 12. https://uknowledge.uky.edu/wdst/12. Accessed on 4/4/2024.

In [None]:
# Create a water network model from an INP file
inp_file = '../networks/ky10.inp'
wn = wntr.network.WaterNetworkModel(inp_file)

In [None]:
# Print a basic description of the model
wn.describe(level=1)

In [None]:
# Create a basic network graphic, showing junction elevation
# Note, the remaining graphics in this tutorial are created from the geospatial data directly, rather than the plot_network function
ax = wntr.graphics.plot_network(wn, node_attribute='elevation', node_range=(175, 300), title='ky10 elevation')

## Convert the Water Network Model to GIS data
The Water Network Model is converted to a collection of GeoDataFrames and set the coordinate reference system (CRS) is set to EPSG:3089.  Data for junctions, tanks, reservoirs, pipes, pumps, and valves are stored in separate GeoDataFrames.

In [None]:
# Convert the Water Network Model to GIS data and set the CRS
wn_gis = wntr.network.to_gis(wn)
wn_gis.pipes.head()

In [None]:
# Set the CRS to EPSG:3089, NAD83 / Kentucky Single Zone (ftUS)
crs = 'EPSG:3089'
wn_gis.set_crs(crs)

## Save the GIS data to a GeoJSON or Shape file
The dictionary of GeoDataFrames are written to GeoJSON files or Shapefiles.  One file is created for junctions, tanks, reservoirs, pipes, pumps, and valves.

In [None]:
wn_gis.write_geojson('ky10')

## Generate a Water Network Model from GIS data
Water network models can be created from GeoJSON files or Shapefiles. A specific set of column names are required to define junctions, tanks, reservoirs, pipes, pumps, and valves.  Model attributes like patterns, curves, and options need to be added separately.

In [None]:
# Print valid GeoJSON or Shapefiles column names required to build a model
column_names = wntr.network.io.valid_gis_names()
print("Junction column names", column_names['junctions'])
print("Tank column names", column_names['tanks'])
print("Reservoir column names", column_names['reservoirs'])
print("Pipe column names", column_names['pipes'])
print("Pump column names", column_names['pumps'])
print("Valve column names", column_names['valves'])

In [None]:
# Build a water network model from a set of GeoJSON files
geojson_files = {'junctions': 'ky10_junctions.geojson',
                 'tanks': 'ky10_tanks.geojson',
                 'reservoirs': 'ky10_reservoirs.geojson',
                 'pipes': 'ky10_pipes.geojson',
                 'pumps': 'ky10_pumps.geojson',
                 'valves': 'ky10_valves.geojson'}
wn2 = wntr.network.read_geojson(geojson_files)

In [None]:
# Compare model attributes of the original model with the model built from GeoJSON files (note the absence of patterns and controls)
print(wn.describe(level=1))
print(wn2.describe(level=1))

# External GIS Data
The external data used in this tutorial includes landslide footprint data and social vulnerability data.

## Create a region of interest (ROI) for the analysis
The region of interest (ROI) is defined by a bounding box around all pipes, with a 5000 ft buffer. The ROI is used to clip external data to only include the area included in the analysis.

In [None]:
# Region of interest
bounds = wn_gis.pipes.total_bounds
geom = box(*bounds)
ROI = geom.buffer(5000) # feet

## Load landslide GIS data
The landslide data used in this tutorial was downloaded from the [UKnowledge Kentucky Geological Survey Research Data](https://uknowledge.uky.edu/kgs_data/).  The Kentucky Geological Survey Landslide Inventory from March 2023 was selected for the analysis.  The data contains landslide areas derived from aerial photography. 

Citation: Crawford, M.M., 2023. Kentucky Geological Survey landslide inventory [2023-03]: Kentucky Geological Survey Research Data, https://uknowledge.uky.edu/kgs_data/7/, Accessed on 4/4/2024.

In [None]:
# To reduce the file size checked into the WNTR repository, the following code was run on the raw data file

#landslide_file = '../data/KGS_Landslide_Inventory_exp.gdb'
#landslide_data = gpd.read_file(landslide_file, driver="FileGDB", layer='Areas_derived_from_aerial_photography')
#print(landslide_data.crs)
#landslide_data = landslide_data.clip(ROI)
#landslide_data.to_file("../data/ky10_landslide_data.geojson", index=True, driver='GeoJSON')

In [None]:
# Load the landslide data from file and print the CRS (which is already in EPSG:3089)
landslide_file = '../data/ky10_landslide_data.geojson'
landslide_data = gpd.read_file(landslide_file) 
landslide_data.set_index('index', inplace=True)
landslide_data.index.name = None
print(landslide_data.crs)

landslide_data.head()

In [None]:
# Each landslide is extended to include the surrounding 1000 ft, to create a region that might be impacted by an individual landslide.  
# Other datasets or methods could be used to define landslide susceptibility or vulnerability.
landslide_regions = landslide_data.copy()
landslide_regions['geometry'] = landslide_data.buffer(1000)

In [None]:
# Plot the landslide data and landslide regions along with pipes
ax = landslide_regions.plot(color='gray', alpha=0.5)
ax = landslide_data.plot(color='red', label='Landslide data', ax=ax)
ax = wn_gis.pipes.plot(color='black', ax=ax)
ax.set_title('Landslide and pipe data')
# Uncomment the following 2 lines to zoom in on a specific area
#ax.set_xlim(5.74e6, 5.76e6)
#ax.set_ylim(3.82e6, 3.84e6)

## Load Social Vulnerability Index (SVI) GIS data
The social vulnerability data used in this tutorial was downloaded from the [Centers for Disease Control and Prevention/Agency for Toxic Substances and Disease Registry](https://www.atsdr.cdc.gov/placeandhealth/svi/index.html). The data contains census and social vulnerability metrics for each census tract. 

The quantity of interest used in this analysis is "RPL_THEMES" which ranks vulnerability across socioeconomic status, household characteristics, racial and ethnic minority status, and housing type and transportation.  The value ranges between 0 and 1, where higher values are associated with higher vulnerability.

Citation: Centers for Disease Control and Prevention/Agency for Toxic Substances and Disease Registry/Geospatial Research, Analysis, and Services Program. CDC/ATSDR Social Vulnerability Index 2020 Database Kentucky. https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html. Accessed on 4/4/2024.

In [None]:
# To reduce the file size checked into the WNTR repository, the following code was run on the raw data file

#svi_file = '../data/SVI2020_KENTUCKY_tract.gdb'
#svi_data = gpd.read_file(svi_file, driver="FileGDB", layer='SVI2020_KENTUCKY_tract')
#print(svi_data.crs)
#svi_data.to_crs(crs, inplace=True)
#svi_data = svi_data.clip(ROI)
#svi_data.to_file("../data/ky10_svi_data.geojson", index=True, driver='GeoJSON')

In [None]:
# Load the SVI data from file and print the CRS (which is already in EPSG:3089)
svi_file = '../data/ky10_svi_data.geojson'
svi_data = gpd.read_file(svi_file)
print(svi_data.crs)
svi_data.set_index('index', inplace=True)
svi_data.index.name = None

svi_data.head()

In [None]:
# Plot SVI data and pipes (higher values of SVI are associated with higher vulnerability)
ax = svi_data.plot(column='RPL_THEMES', label='SVI data', cmap='RdYlGn_r', vmin=0, vmax=1, legend=True)
ax = wn_gis.pipes.plot(color='black', ax=ax)
ax.set_title('SVI and pipe data')

# Intersect Water Network Model with GIS data
In this section, landslide and SVI data are interested with the water network model.

## Intersect pipes with landslide regions
Pipes are intersected with landslide regions to determine the landslides that could impact each pipe. This information could be used to compute the likelihood that a pipe will be impacted by landslides.

In [None]:
# Determine landslide regions that intersect each pipe and print in order of descending number of intersections.
pipe_intersect = wntr.gis.intersect(wn_gis.pipes, landslide_regions)

pipe_intersect.sort_values('n', ascending=False).head()

In [None]:
# Add the intersection data to the water network pipe data
wn_gis.pipes[['intersections', 'n']] = pipe_intersect
wn_gis.pipes.sort_values('n', ascending=False).head()

## Intersect landslide regions with pipes
Landslide regions are intersected with pipes to determine the pipes that could be impacted by each landslide.  This information is used to build landslide scenarios.

In [None]:
# Determine pipes that intersect each landslide region, remove landslides that intersect no pipes, and print in order of descending number of intersections.
landslide_intersect = wntr.gis.intersect(landslide_regions, wn_gis.pipes)
landslide_intersect = landslide_intersect[landslide_intersect['n'] > 0]

landslide_intersect.sort_values('n', ascending=False).head()

In [None]:
# Add the intersection data to the landslide regions data
landslide_regions[['intersections', 'n']] = landslide_intersect

landslide_regions.sort_values('n', ascending=False).head()

In [None]:
# Plot intersection results
fig, axes = plt.subplots(1,2, figsize=(15,5))

wn_gis.pipes.plot(color='gray', alpha=0.5, ax=axes[0])
wn_gis.pipes[wn_gis.pipes['n'] > 0].plot(column='n', legend=True, ax=axes[0])
axes[0].set_title('Number of landslide regions that intersect each pipe')

wn_gis.pipes.plot(color='gray', alpha=0.5, ax=axes[1])
landslide_regions.plot(column='n', vmax=10, legend=True, ax=axes[1])
axes[1].set_title('Number of pipes that intersect each landslide region')

## Intersect junctions with SVI data
Junctions are intersected with SVI to determine the social vulnerability of the population at each junction.  This information is used to determine the social vulnerability of individuals that experience water service disruptions.

In [None]:
# Determine the SVI of each junction using "RPL_THEMES", which ranks vulnerability across socioeconomic status, household characteristics, 
# racial and ethnic minority status, and housing type and transportation. The value ranges between 0 and 1, where higher values are associated with higher vulnerability.
junction_svi = wntr.gis.intersect(wn_gis.junctions, svi_data, 'RPL_THEMES')
junction_svi.head()

In [None]:
# Select the mean value to use in the analysis
wn_gis.junctions['RPL_THEMES'] = junction_svi['mean']

In [None]:
# Plot SVI for each census track and SVI assigned to each junction
fig, axes = plt.subplots(1,2, figsize=(15,5))

svi_data.plot(column='RPL_THEMES', label='SVI data', vmin=0, vmax=1, legend=True, ax=axes[0])
wn_gis.pipes.plot(color='black', ax=axes[0])
axes[0].set_title('SVI and pipe data')

wn_gis.pipes.plot(color='gray', alpha=0.5, ax=axes[1])
wn_gis.junctions.plot(column='RPL_THEMES', vmin=0, vmax=1, legend=True, ax=axes[1])
axes[1].set_title('SVI value at each junction')

# Hydraulic Simulations
The following section runs hydraulic simulations for the baseline (no landslide) and landslide scenarios. A subset of landslide scenarios is run to simply the tutorial. For each simulation, the water service availability (WSA) at each junction is computed.  WSA is defined as the ratio of delivered demand to the expected demand. A value below 1 indicates that expected demand is not met.

In [None]:
# Create a function to setup the model for hydraulic simulations
def model_setup(inp_file):
    wn = wntr.network.WaterNetworkModel(inp_file)
    wn.options.hydraulic.demand_model = 'PDD'
    wn.options.hydraulic.required_pressure = 20 # m
    wn.options.hydraulic.minimum_pressure  = 0 # m
    wn.options.time.duration = 48*3600 # 48 hour simulation
    return wn

## Run baseline simulation

In [None]:
# Run a baseline simulation, with no landslides or damage.  Compute water service availability (WSA) for each junction.
wn = model_setup(inp_file)
sim = wntr.sim.EpanetSimulator(wn)
baseline_results = sim.run_sim()

expected_demand = wntr.metrics.expected_demand(wn)
demand = baseline_results.node['demand'].loc[:,wn.junction_name_list]
wsa = wntr.metrics.water_service_availability(expected_demand.sum(axis=0), demand.sum(axis=0))

wsa.head()

In [None]:
# Add WSA from the base simulation to the junction GIS data
wn_gis.junctions['wsa_base'] = wsa

In [None]:
# Plot WSA from the base simulation
ax = wn_gis.pipes.plot(color='black', alpha=0.5)
ax = wn_gis.junctions.plot(column='wsa_base', cmap='RdYlGn', vmin=0, vmax=1, legend=True, ax=ax)
ax.set_title('Baseline WSA')

## Run landslide scenarios
Landslide scenarios are down selected by identifying the set of landslides that impact a unique set of pipes.  Scenarios are further down selected to 6 scenarios to simplify the tutorial.

In [None]:
# Down select landslide regions that impact a unique set of pipes
duplicated_intersections = landslide_regions['intersections'].astype(str).duplicated()
landslide_scenarios = landslide_regions.loc[~duplicated_intersections, :]
landslide_scenarios = landslide_scenarios.sort_values('n', ascending=False)

landslide_scenarios.head()

In [None]:
# Further down select the landslide scenarios to a small set for demonstration purposes. Comment out the following line to run a full analysis.
landslide_scenarios = landslide_scenarios.loc[[6980, 7003, 7202, 7028,6966, 7058],:]

landslide_scenarios

In [None]:
# Plot the location of landslide regions used in the analysis
ax = landslide_scenarios.plot(color='blue')
wn_gis.pipes.plot(color='black', alpha=0.5, ax=ax)
ax.set_title('Landslide scenarios')

In [None]:
# Run hydraulic simulations and extract water service availability for each landslide scenario.  
# Print the landslide number, the number of pipes that intersect the landslide, and the average WSA
results = {}
for i, scenario in landslide_scenarios.iterrows():
    wn = model_setup(inp_file)
    for pipe_i in scenario['intersections']:
        pipe_object =wn.get_link(pipe_i)
        pipe_object.initial_status = 'CLOSED'
    sim = wntr.sim.EpanetSimulator(wn)
    results[i] = sim.run_sim()
    
    # Compute WSA
    demand = results[i].node['demand'].loc[:,wn.junction_name_list]
    wsa = wntr.metrics.water_service_availability(expected_demand.sum(axis=0), demand.sum(axis=0))
    
    # Store WSA in the junctions GeoDataFrame
    column_name = 'wsa_'+str(i)
    wn_gis.junctions[column_name] = wsa
    print(i, len(scenario['intersections']), wsa.mean())

# Analysis Results
The following section plots analysis results, including water service availability for the landslide scenarios and SVI of impacted junctions.

## Water Service Availability
Each scenario includes WSA for each junction.  Note that WSA can be > 1 and < 0 due to numerical differences in expected and actual demand. For certain types of analysis, the WSA should be truncated to values between 0 and 1.

In [None]:
# Extract and plot WSA for 6 scenarios. 
column_names = ['wsa_'+str(i) for i in landslide_scenarios.iloc[0:6,:].index]
wsa_results = wn_gis.junctions[column_names]

ax = wsa_results.plot()
ax.set_ylim(-0.1, 1.1)

In [None]:
# Plot WSA for each scenario
fig, axes = plt.subplots(2,3, figsize=(15,10))
axes = axes.flatten()
axes_counter = 0

for i, scenario in landslide_scenarios.iterrows():
    wsa_column_name = 'wsa_'+str(i)
    ax = axes[axes_counter]
    ax = wn_gis.pipes.plot(color='black', alpha=0.5, ax=ax)
    wn_gis.junctions.plot(column=wsa_column_name, cmap='RdYlGn', vmin=0, vmax=1, legend=True, ax=ax)
    ax = landslide_scenarios.loc[[i],:].boundary.plot(color='blue', ax=ax)
    ax.set_title('Landslide ' + str(i))
    axes_counter = axes_counter + 1

## SVI of impacted junctions
In this analysis, impacted junctions are defined as junctions where WSA falls below 0.5 (50% of the water expected was received) at any time during the simulation. Other criteria could also be used to defined impact.

In [None]:
# Identify and print junctions that have WSA < 0.5
impacted = (wsa_results < 0.5).any(axis=1)
impacted_junctions = impacted[impacted == True].index

impacted_junctions

In [None]:
# Plot the SVI of impacted junctions
ax = wn_gis.pipes.plot(color='black', alpha=0.5)
wn_gis.junctions.loc[impacted_junctions,:].plot(column='RPL_THEMES', cmap='RdYlGn_r',vmin=0, vmax=1, legend=True, ax=ax)
ax.set_title('SVI of impacted junctions')

## Save analysis results to GIS files
Results that are added to the `wn_gis` object can be saved to GIS formatted files.  Note that lists (such as the information stored in 'intersections') is not JSON serializable and must first be removed.  The resulting GIS files contain WSA per scenario and can be loaded into GIS platforms for further analysis.

In [None]:
del wn_gis.pipes['intersections']
wn_gis.write_geojson('ky10_analysis_results')