## Reading raster data

This example shows verrious commonly used options to read single or multiple file raster datasets into an `xarray.Dataset` or `xarray.DataArray` object with geospatial attributes.

In **hydroMT** we typically we read data using the `DataCatalog` which allows for some minimal pre-processing in order to get uniform variable names and units. Here we show the methods that are used *under the hood* by the `DataCatalog.getrasterdataset` method. 

In [None]:
import numpy as np
import xarray as xr
from pprint import pprint
import glob
import os
import hydromt

In [None]:
# setup logging
from hydromt.log import setuplog

logger = setuplog("read raster data", log_level=10)

In [None]:
# Download artifacts for the Piave basin to `~/.hydromt_data/`.
data_catalog = hydromt.DataCatalog(logger=logger)
data_catalog.from_artifacts()

As an example we will use the [MERIT Hydro](http://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_Hydro) dataset from the downloaded artifacts. This data is saved using various GeoTIFF files with identical grids in one folder.

In [None]:
path = os.path.join(os.path.dirname(data_catalog["merit_hydro"].path), "*.tif")
fns = glob.glob(path)
fns

### open_raster

To read raster data and parse it into an `xarray.DataArray` we use the `hydromt.open_raster` method. 
This method is based on `xarray.open_rasterio`, but additionally parses the coordinate reference system meta data. This method reads files from a single [gdal raster file](https://gdal.org/drivers/raster/index.html). Tiled data of a sinlge variable can also be passed as a [virtual raster tileset (vrt) file](https://gdal.org/drivers/raster/vrt.html).

In [None]:
# read a single raster file as DataArray
# the chunks argument provides lazy loading of the data, see xarray.open_rasterio
da = hydromt.open_raster(fns[0], chunks={"x": 1000, "y": 1000})
print(da)

#### (geospatial) attributes

Many (geospatial) attributes can be accessed trough the DataArray/Dataset [raster accessors](https://deltares.github.io/hydromt/latest/api/api_methods.html#attributes)

In [None]:
# coordinate reference system
da.raster.crs

In [None]:
# geospatial transform, see https://www.perrygeo.com/python-affine-transforms.html
da.raster.transform

In [None]:
# names of x- and y dimensions
(da.raster.x_dim, da.raster.y_dim)

In [None]:
# nodata value (or fillvalue)
da.raster.nodata

### open_mfraster

To read multiple raster files with identical grid, but each with a different variable, into a single `xarray.Dataset` we can use the `hydromt.open_mfraster` method. The same method can be used to concatenate multiple raster files with identical grid and the same variable but different *layer* along a single dimension.

In [None]:
# this method takes both a list of paths or a path with a glob.glob pattern such as used here:
print(path)
ds = hydromt.open_mfraster(path, chunks={"x": 1000, "y": 1000})
ds

TIP: To write a dataset back to a stack of raster in a single folder use the `<dataset>.raster.to_mapstack` method.

To concatenate multiple layers of [soilgrids data](https://www.isric.org/explore/soilgrids/faq-soilgrids-2017) into a single-variable dataset using `hydromt.open_mfraster`, we simply need to set the argument `concat=True` and optionally providing a `condat_dim` dimension name:

In [None]:
path = os.path.join(os.path.dirname(data_catalog["soilgrids"].path), "bd*.tif")
fns = glob.glob(path)
fns

In [None]:
ds = hydromt.open_mfraster(fns, concat=True, concat_dim="layer")
ds

### open_raster_from_tindex

If the raster data is tiled but for each tile a different CRS is used (for instance a different UTM projection for each UTM zone), this dataset cannot be described using a VRT file. In this case a vector file can be build to be used a raster tile index using [gdaltindex](https://gdal.org/programs/gdaltindex.html) and read using `hydromt.open_raster_from_tindex`. To read the data into a single `xarray.Dataset` the data needs to be reprojected and mosaiced to a single CRS while reading. As this type of data cannot be loaded lazily the method is typically used with an area of interest for which the data is loaded and combined. As example we use the [GRWL mask](https://doi.org/10.5281/zenodo.1297434) raster tiles for which we have created a tileindex using the aforementiond `gdaltindex` command line tool.

In [None]:
# area of interest based previously loaded soilgrids data bounding box
bbox = ds.raster.bounds
print(bbox)

In [None]:
# the tileindex is a GeoPackage vector file
# with an attribute column 'location' containing the relative paths to the raster file data
import geopandas as gpd

fn_tindex = data_catalog["grwl_mask"].path
print(fn_tindex)
gpd.read_file(fn_tindex, rows=5)

In [None]:
# set destination CRS to EPSG:32633 (UTM zone 33N) to keep a projected crs
ds = hydromt.open_raster_from_tindex(
    fn_tindex, bbox=bbox, nodata=0, mosaic_kwargs={"dst_crs": 32633}
)
ds