## Example: Exporting data using a data catalogue

This example illustrates the how to read and export data for a specific region / dates using the HydroMT [DataCatalog](../_generated/hydromt.data_catalog.DataCatalog.rst) and the `export_data` method.

In [None]:
# import hydromt and setup logging
import os
import hydromt
from hydromt.log import setuplog

logger = setuplog("export data", log_level=10)

### Explore the current data catalogue
For this exercise, we will use the pre-defined catalogue `artifact_data` which contains a global data extracts for the Piave basin in Northern Italy. This data catalogue and the actual data linked to it are for a small geographic extent as it is intended for documentation and testing purposes only. If you have another data catalogue available (and the linked data), you can use it instead.

First let's read the pre-defined artifact data catalogue:

In [None]:
# Download and read artifacts for the Piave basin to `~/.hydromt_data/`.
data_catalog = hydromt.DataCatalog(logger=logger)
data_catalog.from_predefined_catalogs('artifact_data')

The `artifact_data` catalogue is one of the pre-defined available DataCatalog of HydroMT. Here is how you can get a list of the pre-defined data catalogues:

In [None]:
from hydromt.cli import api as hydromt_api
hydromt_api.get_predifined_catalogs()

To read your own data catalogue (as well as a predefined catalogue), you can use the **data_libs** argument of the [DataCatalog](../_generated/hydromt.data_catalog.DataCatalog.rst) which accepts either a absloute/relative path to a data catlogue yaml file or a name of a pre-defined catalogue.

Here an example for the pre-defined `artifact_data` catalogue:

In [None]:
data_catalog = hydromt.DataCatalog(data_libs='artifact_data')

Let's now check with data sources are available in the catalogue:

In [None]:
# For a list of sources including attributes
# data_catalog.sources
# For a list of sources name by data_type
hydromt_api.get_datasets('artifact_data')

And let's now open a plot one of the available datasets to check extent and available dates:

In [None]:
ds = data_catalog.get_rasterdataset('era5')
print(f"Available extent: {ds.raster.bounds}")
print(f"Available dates: {ds.time.values[0]} to {ds.time.values[-1]}")

### Export an extract of the data

Now we will export a subset of the data in our `artifact_data` catlogue using the [DataCatalog.export_data](../_generated/hydromt.data_catalog.DataCatalog.export_data.rst) method.

With this method, you can choose:
  - **data_root**: where to save the exported data
  - **bbox**: bounding box geographic extent of the exported data (by default same as origin).
  - **time_tuple**: start and end time of the exported data (by default same as origin).
  - **source_names**: name of data sources to export (by default all).
  - **unit_conversion**: when exporting data, HydroMT will rename and convert the units of the different variables. When this option is set to False, HydroMT will skip the unit conversion.

Let's select which data source and the extent we want (based on the exploration above): 

In [None]:
# List of data sources to export
source_list = ['merit_hydro_1k','soilgrids','vito','era5']
# Geographic extent
bbox = [12.0, 46.0, 13.0, 46.5]
# Time extent
time_tuple = (('2010-02-10', '2010-02-15'))

And let's export the data in a *data_extract* folder:

In [None]:
data_catalog.export_data(
    data_root = os.path.join(os.getcwd(), 'data_extract'), 
    bbox=bbox, 
    time_tuple=time_tuple,
    source_names=source_list
)

### Open and explore the exported data

Now we have our new extracted data and HydroMT saved as well a new data catalogue file that goes with it:

In [None]:
import os

root = "data_extract"
for path, _, files in os.walk(root):
    print(path)
    for name in files:
        if name.endswith(".xml"):
            continue
        print(f" - {name}")

Let's open the extracted data catalogue:

In [None]:
data_catalog_extract = hydromt.DataCatalog(data_libs=os.path.join('data_extract', 'data_catalog.yml'))
data_catalog_extract.sources

And now let's open the extracted data again and do a nice plot.

In [None]:
ds_extract = data_catalog_extract.get_rasterdataset('era5')

In [None]:
import matplotlib.pyplot as plt
import geopandas as gpd
from shapely.geometry import box
import cartopy.crs as ccrs

proj = ccrs.PlateCarree()  # plot projection

# Get both dem
dem = data_catalog.get_rasterdataset('merit_hydro_1k', variables=['elevtn'])
dem_extract = data_catalog_extract.get_rasterdataset('merit_hydro_1k', variables=['elevtn'])
# get bounding box of each data catalog using merit_hydro_1k
bbox = gpd.GeoDataFrame(geometry=[box(*dem.raster.bounds)], crs=4326)
bbox_extract = gpd.GeoDataFrame(geometry=[box(*dem_extract.raster.bounds)], crs=4326)

# Initialise plot
figsize = (10, 8)
fig = plt.figure(figsize=figsize)
ax = fig.add_subplot(projection=proj)

# Plot the bounding box
bbox.boundary.plot(ax=ax, color="k", linewidth=0.8)
bbox_extract.boundary.plot(ax=ax, color="red", linewidth=0.8)

# Plot elevation
dem.raster.mask_nodata().plot(ax=ax, cmap="gray")
dem_extract.raster.mask_nodata().plot(ax=ax, cmap="terrain")