# WNTR Model Development Tutorial

WNTR includes capabilities to help build water distribution system models from geospatial data files (e.g., geojson and shapfile). The following tutorial illustrates how to generate water distribution system models from perfect and imperfect geospatial datasets. A perfect dataset represents high quality utility data that can be used to generate a model without additional modification. An imperfect dataset represents utility data that requires modifications before it can be used to generate a model.

The tutorial uses the ky4 water distribution network model downloaded from the [UKnowledge Water Distribution Systems Research Database](https://uknowledge.uky.edu/wdsrd/). This model is used to create "perfect" geospatial data which accurately reflects junctions, tanks, reservoirs, pipes, and pumps in the original model. The "imperfect" geospatial data was generated by truncating, skewing, and omitting certain aspects of the perfect datasets.

Note that additional attributes not contained in geospatial data (i.e., controls, patterns, simulation options) are directly added to the model to replicate the original conditions in the ky4 model.

The following tutorial contains three WaterNetworkModels:
- wn0 is the base model built from the original INP file
- wn1 is a model built from perfect geospatial data
- wn2 is a model built from imperfect geospatial data

## Imports
Import WNTR and additional Python packages that are needed for the tutorial
- Geopandas is used to load geospatial data
- NetworkX is used to compute distances on the network
- Shapely is used to adjust network geometry
- Matplotlib is used to create subplots

In [None]:
## Imports
import geopandas as gpd
import networkx as nx
from shapely import LineString
import matplotlib.pylab as plt
import wntr

## Coordinate reference system
Define the coordinate reference system (CRS) of the geospatial data.  Note that geospatial functions, used later in the tutorial, to connect lines and snap geospatial data (`wntr.gis.connect_lines` and `wntr.gis.snap`) use a distance threshold that is in the same units as the CRS (ft).

In [None]:
crs = "EPSG:3547"  # ft

In [None]:
# The following defines coordinates used to zoom in on network graphics
zoom_coords = [(4978500, 4982000), (3903000, 3905500)]

# Create a base model from the INP file (wn0)
The following section creates a `WaterNetworkModel` object from an INP file.

In [None]:
wn0 = wntr.network.WaterNetworkModel("../networks/ky4.inp")

## Run a hydraulic simulation and compute metrics
Compute pressure and average expected demand for use in later comparisons with wn1 and wn2. Note that negative pressures are set to 0, since negative pressures don't have a physical meaning.

In [None]:
# Run simulation and extract pressure and expected demand
sim = wntr.sim.EpanetSimulator(wn0)
results0 = sim.run_sim()
pressure0 = results0.node["pressure"].loc[0, :]
pressure0[pressure0<0] = 0 # remove negative pressure
aed0 = wntr.metrics.average_expected_demand(wn0)

In [None]:
# Plot metrics
fig, axes = plt.subplots(1,2, figsize=(12,3.5))
ax = wntr.graphics.plot_network(wn0, node_attribute=aed0, node_size=30, title="wn0 Average Expected Demand", show_plot=False, ax=axes[0])
ax = wntr.graphics.plot_network(wn0, node_attribute=pressure0, node_size=30, title="wn0 Pressure", show_plot=False, ax=axes[1])

## Create perfect geospatial data
The `write_geojson` method is used to export the water network model to GeoJOSN files to create perfect geospatial data.

In [None]:
wntr.network.io.write_geojson(wn0, "../data/ky4", crs=crs)

# Create a model from perfect geospatial data (wn1)
The following section creates `WaterNetworkModel` object from perfect geospatial data.  Information not included in geospatial data (i.e., controls, patterns, initial status, simulation options) are then added to the model.

## Load perfect geospatial data

GeoJSON files are loaded into WNTR using the `read_geojson` function. The GeoJSON files contain complete attributes for junctions, tanks, reservoirs, pipes, and pumps.  

In [None]:
geojson_files = {
    "junctions": "../data/ky4_junctions.geojson",
    "tanks": "../data/ky4_tanks.geojson",
    "reservoirs": "../data/ky4_reservoirs.geojson",
    "pipes": "../data/ky4_pipes.geojson",
    "pumps": "../data/ky4_pumps.geojson",
}

wn1 = wntr.network.read_geojson(geojson_files)

## Add controls
Controls are added to the model using the string format from EPANET, with values in SI units.

In [None]:
line = "LINK ~@Pump-1 OPEN IF NODE T-3 BELOW  27.6606"  # 90.75 ft
wn1.add_control("Pump1_open", line)

line = "LINK ~@Pump-1 CLOSED IF NODE T-3 ABOVE  32.2326"  # 105.75 ft
wn1.add_control("Pump1_closed", line)

## Add a demand pattern
Demand patterns are added to the model using multipliers and the default pattern name.

In [None]:
multipliers = [
    0.33, 0.25, 0.209, 0.209, 0.259, 0.36,
    0.529, 0.91, 1.2, 1.299, 1.34, 1.34,
    1.32, 1.269, 1.25, 1.25, 1.279, 1.37,
    1.519, 1.7, 1.75, 1.669, 0.899, 0.479,
]
default_pattern_name = wn1.options.hydraulic.pattern
wn1.add_pattern(default_pattern_name, multipliers)

## Add pump initial status
The initial status of the pump named ~@Pump-1 is set to Closed.

In [None]:
pump = wn1.get_link("~@Pump-1")
pump.initial_status = "Closed"

## Run a hydraulic simulation and compute metrics
Compute pressure and average expected demand, for use in later comparison with wn0 results. Note that negative pressures are set to 0.

In [None]:
sim = wntr.sim.EpanetSimulator(wn1)
results1 = sim.run_sim()

pressure1 = results1.node["pressure"].loc[0, :]
pressure1[pressure1<0] = 0 # remove negative pressure
aed1 = wntr.metrics.average_expected_demand(wn1)

## Compare the base model to the model created from perfect geospatial data
Compare the number of components and the difference in average expected demand and pressure (wn0 compared to wn1).

In [None]:
print(f"Base network attributes: {wn0.describe()}")
print(f"Perfect network attributes: {wn1.describe()}")

In [None]:
# Compute absolute difference in average expected demand and pressure
aed_diff1 = (aed0 - aed1).abs()
pressure_diff1 = (pressure0 - pressure1).abs()

In [None]:
fig, axes = plt.subplots(1,2, figsize=(12,3.5))
ax = wntr.graphics.plot_network(wn0, node_attribute=aed0, node_size=30, title="wn0 Average Expected Demand", show_plot=False, ax=axes[0])
ax = wntr.graphics.plot_network(wn0, node_attribute=pressure0, node_size=30, title="wn0 Pressure", show_plot=False, ax=axes[1])

fig, axes = plt.subplots(1,2, figsize=(12,3.5))
ax = wntr.graphics.plot_network(wn1, node_attribute=aed1, node_size=30, title="wn1 Average Expected Demand", show_plot=False, ax=axes[0])
ax = wntr.graphics.plot_network(wn1, node_attribute=pressure1, node_size=30, title="wn1 Pressure", show_plot=False, ax=axes[1])

fig, axes = plt.subplots(1,2, figsize=(12,3.5))
ax = wntr.graphics.plot_network(wn1, node_attribute=aed_diff1, node_size=30, title="Difference in Average Expected Demand", show_plot=False, ax=axes[0])
ax = wntr.graphics.plot_network(wn1, node_attribute=pressure_diff1, node_size=30, title="Difference in Pressure", show_plot=False, ax=axes[1])

In [None]:
# Check that demand and pressure difference between networks is small (< 1e-3)
print(f"Average absolute difference in average expected demand: {aed_diff1.mean()}")
print(f"Average absolute difference in pressure: {pressure_diff1.mean()}")
assert (aed_diff1.mean() < 1e-3), "Average expected demand difference is greater that tolerance"
assert (pressure_diff1.mean() < 1e-3), "Pressure difference is greater that tolerance"

# Create a model from imperfect geospatial data (wn2)

The following imperfections are included in the following geospatial data
- Junction data does not exist (no elevation, demand, or coordinates)
- Pipe data has endpoints that do not align, the pipe data also does not contain start and end node names
- Pump data does not contain start and end node names

Tank and reservoir is complete but needs to be associated with the nearest node

## Refine the geospatial data

### Load imperfect geospatial data, elevation data, and building data
Elevation data can be obtained from the [USGS National Map](https://apps.nationalmap.gov/downloader/) and building data can be obtained from [OpenStreetMaps Buildings](https://osmbuildings.org/data/).

In [None]:
disconnected_pipes = gpd.read_file("../data/ky4_disconnected_pipes.geojson", crs=crs)
disconnected_pumps = gpd.read_file("../data/ky4_disconnected_pumps.geojson", crs=crs)
tanks = gpd.read_file("../data/ky4_tanks.geojson", crs=crs)
reservoirs = gpd.read_file("../data/ky4_reservoirs.geojson", crs=crs)

disconnected_pipes.set_index("name", inplace=True)
disconnected_pumps.set_index("name", inplace=True)
tanks.set_index("name", inplace=True)
reservoirs.set_index("name", inplace=True)

In [None]:
# Additional datasets include elevation data and building data
elevation_data_file = '../data/ky4_elevation.tif' 

buildings = gpd.read_file("../data/ky4_buildings.geojson", crs=crs)
buildings.to_crs(crs, inplace=True)

In [None]:
fig, ax = plt.subplots(figsize=(12,5))
disconnected_pipes.plot(color="b", label='disconnected pipes', ax=ax)
buildings.plot(label='buildings', ax=ax)
ax.legend()
tmp = ax.set_xlim(zoom_coords[0])
tmp = ax.set_ylim(zoom_coords[1])
# Note that this plot creates a UserWarning regarding the legend, which will not show polygons.
# This is a known limitation of geopandas/matplotlib.

### Rename columns
Rename column names that do not conform to WNTR naming convention.  In this case, 'cv' is changed to 'check_valve'.  Use `valid_gis_names` to print a list of valid column names.

In [None]:
disconnected_pipes.rename(columns={'cv':'check_valve'}, inplace=True)

In [None]:
print(wntr.network.io.valid_gis_names())

### Connect pipes and define junctions
The `connect_lines` function is used to connect pipes within a user specified distance threshold (the threshold is in the same units as the CRS).

In [None]:
distance_threshold = 100.0 # ft, used to connect pipes

print('Number of diconnected pipes', disconnected_pipes.shape[0])
pipes, junctions = wntr.gis.connect_lines(disconnected_pipes, distance_threshold)
print('Number of connected pipes', pipes.shape[0])
print(pipes.head())
print(junctions.head())

fig, ax = plt.subplots(figsize=(12,5))
disconnected_pipes.plot(color="b", linewidth=4, alpha=0.5, label='disconnected pipes', ax=ax)
pipes.plot(color="r", linewidth=2, alpha=0.5, label='connected pipes', ax=ax)
junctions.plot(color="k", label='junctions', ax=ax)
ax.legend()
tmp = ax.set_xlim(zoom_coords[0])
tmp = ax.set_ylim(zoom_coords[1])

### Check connectivity
Create a water network model with only junctions and pipes and check to see if the graph is connected. This tutorial assumes that the network is connected at this point, some dataset will still not be connected due to missing tanks/pumps/valves or other inaccuracies.

In [None]:
gis_data = wntr.gis.WaterNetworkGIS({"junctions": junctions,
                                     "pipes": pipes})
wn2_temp = wntr.network.from_gis(gis_data)
G = wn2_temp.to_graph()

uG = G.to_undirected()
print(nx.is_connected(uG))
print(nx.number_connected_components(uG))

assert nx.is_connected(uG)

### Assign elevation to junctions using a raster
The `sample_raster` function is used to assign an elevation to each junction.

In [None]:
junction_elevations = wntr.gis.sample_raster(junctions, elevation_data_file)
junctions["elevation"] = junction_elevations
print(junctions.head())

### Snap reservoirs and tanks to the nearest junction
The `snap` function is used to snap reservoirs to junctions within a user specified distance threshold (the threshold is in the same units as the CRS).  

In [None]:
distance_threshold = 100.0 # ft, used to connect tanks and reservoirs

snap_reservoirs = wntr.gis.snap(reservoirs, junctions, distance_threshold)
print(reservoirs.head())
print(snap_reservoirs)

snap_tanks = wntr.gis.snap(tanks, junctions, distance_threshold)
print(tanks.head())
print(snap_tanks)

### Connect reservoirs and tanks with a pipe
The `add_connector` function defined below is used to add a pipe between each reservoir/tank and the nearest junction so that they are connected to the network.

In [None]:
def add_connector(snap_attribute, pipes):
    for name, row in snap_attribute.iterrows():
        pipe_crs = pipes.crs
        attributes = {'check_valve': 0, 
                      'diameter': 0.3, 
                      'initial_status': 'Open',
                      'length': 1, 
                      'minor_loss': 0,
                      'roughness': 150,
                      'geometry': LineString([row['geometry'], row['geometry']]),
                      'start_node_name': row['node'],
                      'end_node_name': name}
        pipes.loc[name+'_connector'] = attributes
        pipes.set_crs(pipe_crs, inplace=True)
    return pipes
    
pipes = add_connector(snap_reservoirs, pipes)
pipes = add_connector(snap_tanks, pipes)
print(pipes.tail())

### Estimate demands from building size
Estimate demand using the following steps:
1. Estimate building demand from building area, normalized by the total demand in the system
2. Snap building centroids to junctions 
3. Assign a junction to each building

In [None]:
# Assume that total network demand is known
total_demand = aed0.sum()

In [None]:
# Proportionally distribute total demand to buildings by area
buildings["area"] = buildings.area
total_building_area = buildings["area"].sum()
buildings["base_demand"] = (buildings["area"] / total_building_area)*total_demand

fig, ax = plt.subplots(figsize=(12,5))
ax = buildings.plot(column='base_demand', vmin=0, vmax=0.0002, legend=True, zorder=1, ax=ax)
ax = pipes.plot(zorder=0, ax=ax)
tmp = ax.set_xlim(zoom_coords[0])
tmp = ax.set_ylim(zoom_coords[1])

In [None]:
distance_threshold = 1000.0 # ft, used to snap buildings to junctions

building_centroid = buildings.copy()
building_centroid.geometry = buildings.geometry.centroid
snap_buildings = wntr.gis.snap(building_centroid, junctions, distance_threshold)
buildings["junction"] = None
buildings.loc[snap_buildings.index, "junction"] = snap_buildings.loc[:, "node"]

print(buildings.head())
print(snap_buildings.head())

## Build the model

### Add geospatial data
Add junctions, tanks, reservoirs, and pipes.  Pumps are later to the water network model.

In [None]:
gis_data = wntr.gis.WaterNetworkGIS({"junctions": junctions,
                                     "tanks": tanks,
                                     "reservoirs": reservoirs,
                                     "pipes": pipes})
wn2 = wntr.network.from_gis(gis_data)

### Add pumps
Add pumps to the model using the following steps:
1. Snap disconnected pumps to pipes
2. Break the pipe that is closest to each pump
3. Determine pump flow direction, based on distance to the nearest reservoir
4. Add the pump to the model

In [None]:
# Snap disconnected pumps to pipes
distance_threshold = 100.0 # ft used to snap pumps to pipes

snap_pumps = wntr.gis.snap(disconnected_pumps, pipes, distance_threshold)
print(disconnected_pumps.head())
print(snap_pumps.head())

In [None]:
# Compute distance to the nearest reservoirs (there is only 1 reservoir in ky4)
length = wn2.query_link_attribute('length')
G = wn2.to_graph(link_weight = length)
uG = G.to_undirected()
distance_to_reservoir = nx.multi_source_dijkstra_path_length(uG, wn2.reservoir_name_list, weight='weight')

In [None]:
# Break pipes and update the pumps dataframe
pumps = disconnected_pumps.copy()
for pump_name in disconnected_pumps.index:
    nearest_pipe = snap_pumps.loc[pump_name, 'link']
    pipe = wn2.get_link(nearest_pipe)
    # determine start and end node based on distance to the nearest reservoir
    distanceA = distance_to_reservoir[pipe.start_node_name]
    distanceB = distance_to_reservoir[pipe.end_node_name]
    start_node_name = pump_name+'A'
    end_node_name = pump_name+'B'
    if distanceA > distanceB:
        start_node_name = pump_name+'B'
        end_node_name = pump_name+'A'
    wn2 = wntr.morph.break_pipe(wn2, nearest_pipe, nearest_pipe+'_pump_connector', start_node_name, end_node_name)
    pumps['start_node_name'] = start_node_name
    pumps['end_node_name'] = end_node_name

In [None]:
# Add pumps (note that this could be done in the loop above with add_pump)
gis_data = wntr.gis.WaterNetworkGIS({"pumps": pumps})
wn2 = wntr.network.from_gis(gis_data, append=wn2)

### Add controls
Controls are added to the model using the string format from EPANET, with values in SI units.

In [None]:
line = "LINK ~@Pump-1 OPEN IF NODE T-3 BELOW  27.6606"  # 90.75 ft
wn2.add_control("Pump1_open", line)

line = "LINK ~@Pump-1 CLOSED IF NODE T-3 ABOVE  32.2326"  # 105.75 ft
wn2.add_control("Pump1_closed", line)

### Add demand pattern and base demand
Demand patterns are added to the model using multipliers and the default pattern name.  The base demand was estimated from building footprint (above).

In [None]:
multipliers = [
    0.33, 0.25, 0.209, 0.209, 0.259, 0.36,
    0.529, 0.91, 1.2, 1.299, 1.34, 1.34,
    1.32, 1.269, 1.25, 1.25, 1.279, 1.37,
    1.519, 1.7, 1.75, 1.669, 0.899, 0.479,
]
default_pattern_name = wn2.options.hydraulic.pattern
wn2.add_pattern(default_pattern_name, multipliers)

# Add building demands to snapped junction
category = None
for i, row in buildings.iterrows():
    junction_name = buildings.loc[i, "junction"]
    if junction_name is None:
        continue
    base_demand = buildings.loc[i, "base_demand"]
    junction = wn2.get_node(junction_name)
    junction.demand_timeseries_list.append((base_demand, default_pattern_name, category))

### Add pump initial status
The initial status of the pump named ~@Pump-1 is set to Closed.

In [None]:
pump = wn2.get_link("~@Pump-1")
pump.initial_status = "Closed"

## Run a hydraulic simulation and compute metrics
Compute pressure and average expected demand, for use in later comparison with wn0 results. Note that negative pressures are set to 0.

In [None]:
sim = wntr.sim.EpanetSimulator(wn2)
results2 = sim.run_sim()
pressure2 = results2.node["pressure"].loc[0, :]
pressure2[pressure2<0] = 0 # remove negative pressure
aed2 = wntr.metrics.average_expected_demand(wn2)

In [None]:
# Plot data and water network model
fig, ax = plt.subplots(figsize=(12,5))
ax = buildings.plot(column='base_demand', vmin=0, vmax=0.0002, zorder=1, legend=True, legend_kwds={"label":"Demand"}, ax=ax)
ax = disconnected_pipes.plot(zorder=0, ax=ax)
ax.set_xticks([])
ax.set_yticks([])
tmp = ax.set_xlim(zoom_coords[0])
tmp = ax.set_ylim(zoom_coords[1])
tmp = ax.set_title('Imperfect geospatial data\nDisconencted pipes and building demand estimated from area')

fig, ax = plt.subplots(figsize=(12,5))
junctions['aed2'] = aed2
ax = junctions.plot(column='aed2', vmin=0, vmax=0.0002, zorder=1, legend=True, legend_kwds={"label":"Demand"}, ax=ax)
ax = pipes.plot(zorder=0, ax=ax)
ax.set_xticks([])
ax.set_yticks([])
tmp = ax.set_xlim(zoom_coords[0])
tmp = ax.set_ylim(zoom_coords[1])
tmp = ax.set_title('Water network model\nConencted pipes and junction demands')

## Compare the base model to the model created from imperfect geospatial data
Compare the number of components and the difference in average expected demand and pressure (wn0 compared to wn2).

Note that direct node or link comparisons between wn0 and wn2 will not work because the models do not share the same link and node names. The difference of the mean is used instead of the mean of the difference.

In [None]:
print(f"Base network attributes: {wn0.describe()}")
print(f"Imperfect network attributes: {wn2.describe()}")

In [None]:
# Compute absolute difference in mean average expected demand and mean pressure
aed_diff2 = abs(aed0.mean() - aed2.mean())
pressure_diff2 = abs(pressure0.mean() - pressure2.mean())

In [None]:
fig, axes = plt.subplots(1,2, figsize=(12,3.5))
ax = wntr.graphics.plot_network(wn0, node_attribute=aed0, node_size=30, title="wn0 Average Expected Demand", show_plot=False, ax=axes[0])
ax = wntr.graphics.plot_network(wn0, node_attribute=pressure0, node_size=30, title="wn0 Pressure", show_plot=False, ax=axes[1])

fig, axes = plt.subplots(1,2, figsize=(12,3.5))
ax = wntr.graphics.plot_network(wn2, node_attribute=aed2, node_size=30, title="wn2 Average Expected Demand", show_plot=False, ax=axes[0])
ax = wntr.graphics.plot_network(wn2, node_attribute=pressure2, node_size=30, title="wn2 Pressure", show_plot=False, ax=axes[1])

In [None]:
# Check that demand and pressure difference between networks is small (but higher error than wn0/wn1 comparison)
print(f"Average absolute difference in average expected demand: {aed_diff2}")
print(f"Average absolute difference in pressure: {pressure_diff2}")
assert (aed_diff2 < 1e-5), "Average expected demand difference is greater that tolerance"
assert (pressure_diff2 < 0.05), "Pressure difference is greater that tolerance"

## Troubleshooting
This notebook shows how to work with a hypothetical imperfect dataset to create a water network model, however other datasets may require different settings or approaches. 
- The ideal snap thresholds for connecting the different geospatial datasets will likely be different in other cases. It is also important to keep in mind that the unit of the snap thresholds matches the CRS of the geospatial data. In some difficult cases, it may be useful to take an iterative approach using multiple snap thresholds.
- GeoJSON files for water network need to have column names that are compatible with WNTR. Details can be found in the [documentation](https://usepa.github.io/WNTR/model_io.html#geojson-files).