## Example: Working with flow direction data

In this example we illustrate some common hydrology GIS problems based on so-called [flow direction data](https://deltares.github.io/pyflwdir/latest/flwdir.html). In HydroMT we make use of functionality from [pyflwdir](https://deltares.github.io/pyflwdir/latest/) to work with this type of data. HydroMT wraps some functionality of **pyflwdir**, to make it easier to work with [raster datasets](https://deltares.github.io/hydromt/latest/user_guide/data_types.html). However, pyflwdir has much more functionality. An overview of all the flow direction methods in HydroMT can be found in the [Reference API](https://deltares.github.io/hydromt/latest/api.html#flow-direction-methods)

Here, we will showcase the following flow direction GIS cases:

1. Derive basin and stream geometries
2. Derive flow directions from elevation data
3. Reproject flow direction data
4. Upscale flow directions


In [None]:
# import hydromt and geopandas
import hydromt
from hydromt.log import setuplog
from hydromt.gis_utils import utm_crs
import geopandas as gpd
from pprint import pprint

First, we load some data to play with from the pre-defined artifact_data data catalog. For more information about working with data in HydroMT, see [the user guide](https://deltares.github.io/hydromt/latest/user_guide/data_overview.html). As an example we will use the [MERIT Hydro](http://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_Hydro) dataset which is set of GeoTiff files with identical grids, one for each variable of the datasets. We use the flow direction  (flwdir); elevation (elevtn) and upstream area (uparea) layers.

In [None]:
# initialize a logger
logger = setuplog("flow direction data", log_level=10)
# initialize a data catalog based on the pre-defined artifact_data catalog
data_catalog = hydromt.DataCatalog(data_libs=["artifact_data=v0.0.8"], logger=logger)

# we load the flow direction  (flwdir); elevation (elevtn) and upstream area (uparea) layers
ds = data_catalog.get_rasterdataset(
    "merit_hydro",
    bbox=[11.7, 45.8, 12.8, 46.7],
    variables=["flwdir", "elevtn", "uparea"],
)
ds

### Derive basin and stream geometries

If you have existing [flow direction data](https://deltares.github.io/pyflwdir/latest/flwdir.html) from sources such as MERIT Hydro, or HydroSHEDS or similar, you can use these to delineate basins and extract streams based on a user-defined threshold. To do this we need to transform the gridded flow direction data into a `FlwdirRaster` object using the [flwdir_from_da()](https://deltares.github.io/hydromt/latest/_generated/hydromt.flw.flwdir_from_da.html) method. This object is at the core of the **pyflwdir** package and creates an actionable common format from a flow direction raster which describes relations between cells.

NOTE: that for most methods a first call might be a bit slow as the numba code is compiled just in time, a second call of the same methods (also with different arguments) will be much faster!

In [None]:
# instantiate a FlwdirRaster object
flwdir = hydromt.flw.flwdir_from_da(ds["flwdir"], ftype="d8")
print(type(flwdir))
print(flwdir)

Next, we derive streams based on a 10 km2 upstream area threshold using the pyflwdir [streams](https://deltares.github.io/pyflwdir/latest/reference.html#pyflwdir.FlwdirRaster.streams) method. Pyflwdir returns a geojson like representation of the streams per stream segment, which we parse to a GeoPandas GeoDataFrame to easily plot it.

In [None]:
feats = flwdir.streams(
    mask=ds["uparea"].values > 10,
    strord=flwdir.stream_order(),  # set stream order property
    uparea=ds["uparea"].values,  # set upstream area property
)
gdf_riv = gpd.GeoDataFrame.from_features(feats, crs=ds.raster.crs)
pprint(gdf_riv.head())

Using the [basin_map()](https://deltares.github.io/hydromt/latest/_generated/hydromt.flw.basin_map.html) method we can delineate all basins in our domain.

In [None]:
# get the best utm zone CRS for a projected CRS
utm = utm_crs(ds.raster.bounds)
ds["basins"] = hydromt.flw.basin_map(
    ds,
    flwdir,
)[0]
# use the  HydroMT "raster" data accessor to vectorize the basin raster.
gdf_bas = ds["basins"].raster.vectorize()
# calculate the area of each basin in the domain and sort the dataframe
gdf_bas["area"] = gdf_bas.to_crs(utm).area / 1e6  # km2
gdf_bas = gdf_bas.sort_values("area", ascending=False)
pprint(gdf_bas.head())

In [None]:
# plot the results
ax = gdf_bas[:5].boundary.plot(color="r", lw=1, zorder=2)
gdf_riv.plot(
    zorder=2,
    ax=ax,
    color="darkblue",
    lw=gdf_riv["strord"] / 8,
)
ds["elevtn"].plot(cmap="terrain", ax=ax, vmin=-500, vmax=2000, alpha=0.7)
ax.set_title("Streams (darkblue) and basins (red)")

### Derive flow directions from elevation data 

If you don't have flow direction data available these can be derived from an elevation raster. HydroMT implements the algorithm proposed by [Wang & Liu (2006)](https://www.tandfonline.com/doi/abs/10.1080/13658810500433453) to do this. We use the [d8_from_dem()](https://deltares.github.io/hydromt/latest/_generated/hydromt.flw.d8_from_dem.html) method which wraps the pyflwdir [fill_depressions()](https://deltares.github.io/pyflwdir/latest/reference.html#pyflwdir.dem.fill_depressions) method. 

The derivation of flow direction can be aided by a river shape file with an upstream area ("uparea") property. Try uncommenting the `gdf_stream` argument and compare the results.

In [None]:
# derive flow directions raster from elevation
da_flw = hydromt.flw.d8_from_dem(
    ds["elevtn"],
    # gdf_stream=gdf_riv,
)
# parse it into a FlwdirRaster object
flwdir1 = hydromt.flw.flwdir_from_da(da_flw, ftype="d8")
# derive streams based on a 10 km2 threshold
feats1 = flwdir1.streams(mask=flwdir1.upstream_area("km2") > 10)
gdf_riv1 = gpd.GeoDataFrame.from_features(feats1, crs=ds.raster.crs)

# plot the new streams  (red) and compare with the original (darkblue)
ax = gdf_riv.plot(zorder=2, color="darkblue")
gdf_riv1.plot(zorder=2, ax=ax, color="r")
ds["elevtn"].plot(cmap="terrain", ax=ax, vmin=-500, vmax=2000, alpha=0.7)
ax.set_title("Original (darkblue) and new (red) streams")

### Reproject flow direction data

Unlike continuous data such as elevation or data with discrete classes such as land use, flow direction data cannot simply be reclassified using common resampling methods. Instead, with the [reproject_hydrography_like()](https://deltares.github.io/hydromt/latest/_generated/hydromt.flw.reproject_hydrography_like.html) a synthetic elevation grid is created based on an upstream area raster, this is reprojected and used to derive a new flow direction grid with the method described above. Note that this works well if we keep approximately the same resolution. For upscaling to larger grid cells different algorithms should be used, see next example.

In [None]:
# reproject the elevation grid first
da_elv_reproj = ds["elevtn"].raster.reproject(dst_crs=utm)  # , dst_res=50)
# reproject the flow direction data
ds_reproj = hydromt.flw.reproject_hydrography_like(
    ds,  # flow direction and upstream area grids
    da_elv=da_elv_reproj,  # destination grid
    logger=logger,
)
# parse it into a FlwdirRaster object
flwdir_reproj = hydromt.flw.flwdir_from_da(ds_reproj["flwdir"], ftype="d8")
# derive streams based on a 10 km2 threshold
feats_reproj = flwdir_reproj.streams(mask=flwdir_reproj.upstream_area("km2") > 10)
gdf_riv_reproj = gpd.GeoDataFrame.from_features(feats_reproj, crs=ds_reproj.raster.crs)

# plot the streams from the reproject data (red) and compare with the original (darkblue)
# NOTE the different coordinates on the figure axis
ax = gdf_riv_reproj.plot(zorder=3, color="r")
gdf_riv.to_crs(utm).plot(ax=ax, zorder=2, color="darkblue")
da_elv_reproj.raster.mask_nodata().plot(
    cmap="terrain", ax=ax, vmin=-500, vmax=2000, alpha=0.7
)
ax.set_title("Original (darkblue) and new reprojected (red) streams")

### Upscale flow directions

Methods to upscale flow directions are required as models often have a coarser resolution than the elevation data used to build them. Instead of deriving flow directions from upscaled elevation data, it is better to directly upscale the flow direction data itself. The [upscale_flwdir()](https://deltares.github.io/hydromt/latest/_generated/hydromt.flw.upscale_flwdir.html) method wraps a pyflwdir method that implements the recently developed Iterative Hydrography Upscaling (IHU) algorithm [(Eilander et al 2020)](https://hess.copernicus.org/articles/25/5287/2021/). Try different upscale factors and see the difference!

In [None]:
# upscale flow direction with a factor "scale_ratio"
# this returns both a flow direction grid and a new FlwdirRaster object
da_flw_lowres, flwdir_lowres = hydromt.flw.upscale_flwdir(
    ds,  # flow direction and upstream area grids
    flwdir,  # pyflwdir FlwdirRaster object
    scale_ratio=20,  # upscaling factor
    logger=logger,
)

# derive streams based on a 10 km2 threshold
feats_lowres = flwdir_lowres.streams(mask=flwdir_lowres.upstream_area("km2") > 10)
gdf_riv_lowres = gpd.GeoDataFrame.from_features(feats_lowres, crs=ds.raster.crs)

# plot the streams from the upscaled flow direction (red) and compare with the original (darkblue)
ax = gdf_riv_lowres.plot(zorder=3, color="r")
gdf_riv.plot(ax=ax, zorder=2, color="darkblue")
ds["elevtn"].raster.mask_nodata().plot(
    cmap="terrain", ax=ax, vmin=-500, vmax=2000, alpha=0.7
)
ax.set_title("Original (darkblue) and new upscaled (red) streams")