# PyFlowline Tutorial

Welcome to the PyFlowline tutorial notebook! 👋

This tutorial serves as an example of the PyFlowline application using a DGGRID (Discrete Global Grid) mesh.

For additional information on this application and the DGGRID mesh, please refer to the following publication:

Liao, C., Engwirda, D., Cooper, M., Li, M., and Fang, Y.: Discrete Global Grid System-based Flow Routing Datasets in the Amazon and Yukon Basins, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2023-398, in review, 2024.

If you are running this notebook directly from the Binder platform, then all the dependencies are already installed. Otherwise, you must install the PyFlowline package and its dependencies (and/or update your existing installation/environment). Additionally, visualization requires optional dependency packages (refer to the full documentation installation section).

Feel free to modify the notebook to use a different visualization method as needed. 

Enjoy exploring PyFlowline!

---

## 1. Preliminaries

First, let's load some Python libraries.

In [None]:
import os
import json
import shutil
from pathlib import Path
from os.path import realpath
import importlib.util
from shutil import copy2
from datetime import datetime
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

### Check the pyflowline installation.

In [None]:
if importlib.util.find_spec("pyflowline") is not None:
    print('The pyflowline package is installed. ')
else:
    print('The pyflowline package is not installed. Please install it first.')
    exit()

### Add dggrid into the system path.

In [None]:
# If running locally, replace this with the path to the folder containing the dggrid binary
sPath_dggrid_bin = os.pathsep + "/home/jovyan/"
os.environ["PATH"] += sPath_dggrid_bin

### Prepare the input/output workspace folders.

In [None]:
sPath_notebook = Path().resolve()
sPath_parent = str(sPath_notebook.parents[1])
print(f"Parent path: {sPath_parent}")

sWorkspace_data = os.path.join(sPath_parent, 'data', 'yukon')
if not os.path.exists(sWorkspace_data):
    print(sWorkspace_data)
    os.makedirs(sWorkspace_data)

sWorkspace_input = os.path.join(sWorkspace_data, 'input')
if not os.path.exists(sWorkspace_input):
    print(sWorkspace_input)
    os.makedirs(sWorkspace_input)

sWorkspace_output = os.path.join(sWorkspace_data, 'output')
if not os.path.exists(sWorkspace_output):
    print(sWorkspace_output)
    os.makedirs(sWorkspace_output)

print(f"Output path: {sWorkspace_output}")

### Create a temp folder to download the data requirements.

In [None]:
sPath_temp = os.path.join(sPath_parent, 'data', 'tmp')
if not os.path.exists(sPath_temp):
    print(sPath_temp)
    os.makedirs(sPath_temp)
else:
    shutil.rmtree(sPath_temp)

# Specify the repository's URL
hexwatershed_data_repo = 'https://github.com/changliao1025/hexwatershed_data.git'

# Clone the repository
os.system(f'git clone {hexwatershed_data_repo} {sPath_temp}')
sPath_temp_data = os.path.join(sPath_parent, 'data', 'tmp', 'data', 'yukon', 'input')

# Check if the destination directory exists, if exists, remove it
if os.path.exists(sWorkspace_input):
    shutil.rmtree(sWorkspace_input)

# Copy all the files under the temp data folder using shutil
shutil.copytree(sPath_temp_data, sWorkspace_input)

shutil.rmtree(sPath_temp_data)

---
## 2. Configuration files

The pyflowline package uses json configuration files. Example configuration files are provided in the `data/` folder of this repo.

To configure a new case, pyflowline provides functions to read the configuration files, and programatically change the configuration parameters (json key values).

### Import the pyflowline package configuration functions.

In [None]:
# Load functions to read the configuration file and change the json key values.
from pyflowline.configuration.read_configuration_file import pyflowline_read_configuration_file
from pyflowline.configuration.change_json_key_value import change_json_key_value

### Set the file names for the domain configuration and basin configuration.

In [None]:
sFilename_configuration_in = realpath( os.path.join(sWorkspace_input, 'pyhexwatershed_yukon_dggrid.json') )
sFilename_basins_in = realpath( os.path.join(sWorkspace_input, 'pyflowline_yukon_basins.json') )

### Check whether the domain configuration file exists.

In [None]:
if os.path.isfile(sFilename_configuration_in):
    pass
else:
    print(f'The domain configuration file does not exist: {sFilename_configuration_in}')

print('Finished the data preparation step.')

### Check the contents of the json configuration file.

In [None]:
with open(sFilename_configuration_in, 'r') as pJSON:
    parsed = json.load(pJSON)
    print(json.dumps(parsed, indent=4))

The meaning of these json keywords are explained in the [pyflowline documentation](https://pyflowline.readthedocs.io/en/latest/data/data.html#inputs).

---
## 3. Configure a new case: Yukon River Basin with dggrid mesh.

The pyflowline package uses the OOP approach to manage each simulation. A flowline object—a `flowlinecase`—is created by reading the model configuration file (also referred to as the "domain" or "parent" configuration file). 

The first step to setting up a new `flowlinecase` is to configure the pyflowline simulation. This can be done by directly editing the json configuration files, or programmatically. Below we demonstrate several ways to achieve this programmatically.

### Create copies of the configuration files.

 For this example, instead of editing the template configuration files directly (which overwrites them) we will make copies and edit them.

In [None]:
# Copy the configuration file to the output directory.
sFilename_configuration_copy = os.path.join(sWorkspace_output, 'pyflowline_configuration_copy.json')
copy2(sFilename_configuration_in, sFilename_configuration_copy)

# Also copy the basin configuration file to the output directory.
sFilename_basins_configuration_copy = os.path.join(sWorkspace_output, 'pyflowline_configuration_basins_copy.json')
copy2(sFilename_basins_in, sFilename_basins_configuration_copy)

### Change configuration file parameters.

Now we will update a few parameters in the configuration files. It is often convenient (and/or required) to first set file paths either directly in a text editor, or with the `change_json_key_value` function—which directly modifies the json files—and then later update the parameters for a specific case programmatically using keyword arguments, as demonstrated in the next section. Here, we set the file paths using the `change_json_key_value` function.

Since the json file will be overwritten, you may want to make a copy of it first. Here we use the copies we created above.

In [None]:
sFilename_configuration = sFilename_configuration_copy
sFilename_basins = sFilename_basins_configuration_copy

#### Set the output folder parameter.

In [None]:
change_json_key_value(sFilename_configuration, 'sWorkspace_output', sWorkspace_output)

#### Set the basin configuration file name parameter.

In [None]:
change_json_key_value(sFilename_configuration, 'sFilename_basins', sFilename_basins)

#### Set the mesh boundary file name (used to define the domain extent and clip the mesh).

In [None]:
sFilename_mesh_boundary = realpath(os.path.join(sWorkspace_input, 'boundary.geojson'))
change_json_key_value(sFilename_configuration, 'sFilename_mesh_boundary', sFilename_mesh_boundary)

**Note**: In Section 6 (see "Step 2: Create the mesh"), we set a flag which tells the pyflowline software to use this mesh boundary file for the domain instead of the (optional) DEM file, which isn't used in this example.

#### Set the dggrid binary path.

The dggrid binary path was added to the Binder environment path at the beginning of this notebook. 

To run in your local environment, either edit the path or use the example below to directly set the full path to the dggrid binary file using the parameter in the configuration file. Note that the iFlag_user_provided_binary flag must also be set if the example below is used.

In [None]:
# This is intentionally commented out to demonstrate how to set the dggrid binary path if running this notebook in your local environment.

# sFilename_dggrid = "" # set the full path to the dggrid binary file (note: filename, not parent folder) in your local environment
# change_json_key_value(sFilename_configuration, 'sFilename_dggrid', sFilename_dggrid)
# change_json_key_value(sFilename_configuration, 'iFlag_user_provided_binary', 1)

To adapt this example to your workflow, feel free to open the configuration files and directly edit the parameter value pairs, especially workspace paths, for your local setup. Depending on the type of simulation, some of the paths are ignored. Some trial and error may be required, but if you encounter errors, refer to the [pyflowline documentation](https://pyflowline.readthedocs.io) and to the [pyflowline examples](https://github.com/changliao1025/pyflowline/tree/main/examples) in the pyflowline repo.

---
## 4. Create a PyFlowline object

In the prior section, we used the `change_json_key_value` function to programmatically modify parameters (mainly file paths) in the pyflowline configuration files before setting up a new pyflowline simulation. 

Here, we use the `pyflowline_read_configuration_file` function to create a new `flowlinecase` by **reading the domain configuration file**. The function also accepts name-value arguments to set parameter values on the fly.

### Set keywords to define the case.

In [None]:
sRegion = 'yukon'
sMesh_type = 'dggrid'
sDggrid_type = 'ISEA3H'
iCase_index = 1 # an arbitrary index used to track simulations
iResolution_index = 10 # dggrid resolution index
sDate = datetime.now().strftime('%Y%m%d') # today's date
print("Today's date:", sDate)

### Get the dggrid mesh resolution.

In [None]:
from pyflowline.mesh.dggrid.create_dggrid_mesh import dggrid_find_resolution_by_index
dResolution = dggrid_find_resolution_by_index(sDggrid_type, iResolution_index)
print(f"DGGRID spatial resolution: {dResolution} m")

### Create a new `flowlinecase`.

The first argument to the function is the configuration file name, followed by name-value keywords that correspond to parameters in the json configuration files. 

In [None]:
oPyflowline = pyflowline_read_configuration_file(sFilename_configuration, iCase_index_in=iCase_index, sMesh_type_in=sMesh_type, iResolution_index_in=iResolution_index, sDate_in=sDate)

**Note**: The warning message above will be addressed in the next section.

---
## 5. Change model parameters

Model parameters can be updated after creating the model object. In this section, we'll set the basin outlet location, and the path to the input flowline. Note that these parameters are for the *basin configuration*, rather than the *domain* configuration, which we were updating in the prior section.

Review the case settings before proceeding.

In [None]:
print(oPyflowline.tojson())

### Set the basin outlet location coordinates

The approximate basin outlet location is an important parameter, used by `pyflowline` as a starting point for its upstream walk. Note that this parameter is set in the basin configuration file (also referred to as the "child" configuration file).

In a typical workflow, we suggest to plot your flowline in software such as QGIS, visually identify the outlet coordinates, and either type them directly into the basin configuration file, or update them programmatically as shown below.

Use the `pyflowline_change_model_parameter` function to set the outlet coordinates. Note that, when updating the *basin* configuration file, set `iFlag_basin_in=1`.

In [None]:
# Set the basin outlet coordinates
dLongitude_outlet_degree = -164.47594
dLatitude_outlet_degree = 63.04269

oPyflowline.pyflowline_change_model_parameter('dLongitude_outlet_degree', dLongitude_outlet_degree, iFlag_basin_in=1)

oPyflowline.pyflowline_change_model_parameter('dLatitude_outlet_degree', dLatitude_outlet_degree, iFlag_basin_in=1) # set iFlag_basin_in=1 for basin configuration

### Set the input flowline filename
(This is the missing file `flowlinecase` warned about in the prior section)

In [None]:
sFilename_flowline = realpath(os.path.join(sWorkspace_input, 'dggrid10/river_networks.geojson') )
oPyflowline.pyflowline_change_model_parameter('sFilename_flowline_filter', sFilename_flowline, iFlag_basin_in=1)

### Turn debugging off

In [None]:
oPyflowline.pyflowline_change_model_parameter('iFlag_debug', 0, iFlag_basin_in=1)

### Setting parameters for individual basins

In this example, the domain is comprised of a single basin, but when there are multiple basins, their parameters can be viewed and set by indexing into them using the following syntax.

In [None]:
# Check the setting for a single basin
print(oPyflowline.aBasin[0].tojson())

# Set the flowline river length threshold
oPyflowline.aBasin[0].dThreshold_small_river = dResolution * 5

---
## 6. Run a PyFlowline simulation

After the case object is created, we can set up the model and run each step of the pyflowline algorithm, visualizing the results as we go.

### Setup the model

In [None]:
oPyflowline.iFlag_user_provided_binary = 0 # set = 1 if setting the path to the binary
oPyflowline.pyflowline_setup()

Before running any operations, we can visualize the original or raw flowline dataset. 

In [None]:
sFilename_geojson = oPyflowline.aBasin[0].sFilename_flowline_filter_geojson
gdf = gpd.read_file(sFilename_geojson)
gdf.plot()
plt.show()

PyFlowline provides built-in visualiation through PyEarth (this feature is experimental). 

In [None]:
oPyflowline.plot( sVariable_in = 'flowline_filter' )

You can also use QGIS.

The plot function provides a few optional arguments such as map projection and spatial extent. 
By default, the spatial extent is full. 
But you can set the extent to a zoom-in region.

Now let's run the three major steps/operations in the pyflowline algorithm one by one.

### Step 1: Flowline simplification

In [None]:
oPyflowline.pyflowline_flowline_simplification();

In [None]:
# Visualize the result using a built-in visualization method.
oPyflowline.plot( sVariable_in = 'flowline_simplified' )

In [None]:
# Check the result using a custom plot.
sFilename_geojson = oPyflowline.aBasin[0].sFilename_flowline_simplified
gdf = gpd.read_file(sFilename_geojson)
gdf.plot()
plt.show()

### Step 2: Create the mesh

In [None]:
# Set the flag to use the provided sFilename_mesh_boundary file
oPyflowline.iFlag_mesh_boundary = 1
aCell = oPyflowline.pyflowline_mesh_generation()

In [None]:
# Visualize the mesh boundary we provided earlier.
sFilename_geojson = oPyflowline.sFilename_mesh_boundary
gdf = gpd.read_file(sFilename_geojson)
gdf.plot()
plt.show()

In [None]:
# Visualize the generated mesh using a custom plot.
sFilename_geojson = oPyflowline.sFilename_mesh
gdf = gpd.read_file(sFilename_geojson)
gdf.plot()
plt.show()

In [None]:
# Visualize the generated mesh using a built-in visualization method.
oPyflowline.plot( sVariable_in = 'mesh')

### Step 3: Create the conceptual flowline

Last, we can generate the conceptual flowline. We refer to the final flowline as "conceptual" because it has been modified relative to the input flowline, which often represents a "real" flowline. The conceptual flowline has been simplified (e.g., small reaches, loops, and braided channels removed) and adjusted to align with the mesh. These modifications ensure the final flowline is suitable for hydrological modeling, while remaining consistent with the real flowline.

In [None]:
oPyflowline.pyflowline_reconstruct_topological_relationship();

Visualize the conceptual flowline using a built-in method.

In [None]:
oPyflowline.plot( sVariable_in = 'flowline_conceptual')

Visualize the result by overlapping the mesh with the flowline using a custom plot.

In [None]:
# Read the datasets into memory
sFilename_mesh = oPyflowline.sFilename_mesh
sFilename_input_flowline = oPyflowline.aBasin[0].sFilename_flowline_filter
sFilename_conceptual_flowline = oPyflowline.aBasin[0].sFilename_flowline_conceptual
gdf1 = gpd.read_file(sFilename_mesh)
gdf2 = gpd.read_file(sFilename_input_flowline)
gdf3 = gpd.read_file(sFilename_conceptual_flowline)

In [None]:
# Plot the input flowline, and the final conceptual flowline
fig, ax = plt.subplots()
gdf1.plot(ax=ax, facecolor='lightgrey', edgecolor='black', alpha=0.3, label='Mesh')
gdf2.plot(ax=ax, color='deepskyblue', linewidth=3, label='Input Flowline')
gdf3.plot(ax=ax, color='darkred', linewidth=1, label='Conceptual Flowline')

# handles for the legend
mesh_patch = mpatches.Patch(facecolor='lightgrey', label='Mesh', edgecolor='black', alpha=0.3)
input_line = plt.Line2D([0], [0], color='deepskyblue', label='Input Flowline')
conceptual_line = plt.Line2D([0], [0], color='darkred', label='Conceptual Flowline')

ax.legend(handles=[mesh_patch, input_line, conceptual_line], loc='lower left')
ax.set_title('Comparison of Input and Conceptual Flowlines')
ax.set_xlabel("Longitude")
ax.set_ylabel("Latitude")
ax.set_xticks([])
ax.set_yticks([])
plt.show()

Overlap built-in visualization is also supported.

In [None]:
oPyflowline.plot( sVariable_in = 'overlap')

### Save the model output into a json file

In [None]:
# Export output
oPyflowline.pyflowline_export();

The content of the one of the exported json files can be checked:

In [None]:
with open(oPyflowline.sFilename_mesh_info, 'r') as pJSON:
    parsed = json.load(pJSON)
    print(json.dumps(parsed[0], indent=4))

### Congratulations! You have successfully finished a pyflowline simulation. 🎉