## Example: Working with geodatasets

In [1]:
from hydromt import DataCatalog
from hydromt.log import setuplog

logger = setuplog("raster data", log_level=10)
dc = DataCatalog(logger=logger, data_libs=["artifact_data"])

2023-02-09 17:21:21,011 - raster data - log - INFO - HydroMT version: 0.6.1.dev
2023-02-09 17:21:21,379 - raster data - data_catalog - INFO - Reading data catalog artifact_data v0.0.8 from archive
2023-02-09 17:21:21,380 - raster data - data_catalog - INFO - Parsing data catalog from C:\Users\dalmijn\.hydromt_data\artifact_data\v0.0.8\data_catalog.yml


Here, we illustrate some common GIS problems and how the functionality of the DataArray/Dataset vector accessor can be used. The data is accessed using the HydroMT [DataCatalog](https://deltares.github.io/hydromt/latest/api.html#data-catalog). For more information see the [Reading vector data](https://deltares.github.io/hydromt/latest/_examples/reading_vector_data.html) example.

### Geospatial attributes 

Some of the available geospatial attributes are listed here below. For all of attributes (and methods for that matter): check the [HydroMT API reference](https://deltares.github.io/hydromt/latest/api.html)

In [16]:
# Get the waterlevel dataset from the datacatalog
# The waterlevels are series per location (points; here in lat lon)
ds = dc.get_geodataset("gtsmv3_eu_era5")

2023-02-09 17:36:51,321 - raster data - data_catalog - INFO - DataCatalog: Getting gtsmv3_eu_era5 GeoDataset netcdf data from C:\Users\dalmijn\.hydromt_data\artifact_data\v0.0.8\gtsmv3_eu_era5.nc
2023-02-09 17:36:51,322 - raster data - geodataset - INFO - GeoDataset: Read netcdf data.


In [18]:
# Coordinate reference system 
ds.vector.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- undefined
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [4]:
# Geospatial data as geometry objects
ds.vector.geometry

stations
13670    POINT (12.25342 45.25635)
2798     POINT (12.22412 45.27100)
2799     POINT (12.29736 45.22705)
13775    POINT (12.75879 45.24902)
13723    POINT (12.50244 45.25635)
2797     POINT (12.26807 45.34424)
13822    POINT (12.99316 45.24902)
2796     POINT (12.34131 45.37353)
23305    POINT (12.45850 45.41748)
22721    POINT (12.42920 45.41748)
2795     POINT (12.42920 45.41748)
2794     POINT (12.50244 45.46143)
2793     POINT (12.59033 45.49072)
13774    POINT (12.75146 45.50537)
13722    POINT (12.50244 45.50537)
2792     POINT (12.67822 45.52002)
2791     POINT (12.76611 45.54932)
2790     POINT (12.83936 45.57861)
2789     POINT (12.92725 45.62256)
dtype: geometry

In [15]:
# names of x- and y coordinates

(ds.vector.x_name, ds.vector.y_name)

('lon', 'lat')

In [17]:
# Bounding box of the geospatial data     
ds.vector.bounds

array([12.22412, 45.22705, 12.99316, 45.62256])

### Reprojection

In [23]:
# Reproject data to Pseudo Mercator (EPSG: 3857)

ds_pm = ds.vector.to_crs(3857)

In [24]:
# Coordinate reference system
ds_pm.vector.crs

<Derived Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: unnamed
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [25]:
# Check that the coordinate values are indeed no longer in degrees
ds_pm.lon.values[0]

1364044.4748761142

### Conversion

In [26]:
# Create an ogr compliant dataset from ds
# When written in netcdf4 format, this can be read by ogr (osgeo; QGIS)
from numpy import mean
ds_ogr = ds.vector.ogr_compliant(reducer=mean)
ds_ogr

Unnamed: 0,Array,Chunk
Bytes,152 B,152 B
Shape,"(19,)","(19,)"
Dask graph,1 chunks in 6 graph layers,1 chunks in 6 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 152 B 152 B Shape (19,) (19,) Dask graph 1 chunks in 6 graph layers Data type float64 numpy.ndarray",19  1,

Unnamed: 0,Array,Chunk
Bytes,152 B,152 B
Shape,"(19,)","(19,)"
Dask graph,1 chunks in 6 graph layers,1 chunks in 6 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [27]:
# Convert the geospatial data to wkt strings
ds_wkt = ds.vector.to_wkt()
ds_wkt

Unnamed: 0,Array,Chunk
Bytes,299.25 kiB,299.25 kiB
Shape,"(2016, 19)","(2016, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 299.25 kiB 299.25 kiB Shape (2016, 19) (2016, 19) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",19  2016,

Unnamed: 0,Array,Chunk
Bytes,299.25 kiB,299.25 kiB
Shape,"(2016, 19)","(2016, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [38]:
# Convert the geospatial data to geometry objects

ds_geom = ds.vector.to_geom()

ds_geom

Unnamed: 0,Array,Chunk
Bytes,299.25 kiB,299.25 kiB
Shape,"(2016, 19)","(2016, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 299.25 kiB 299.25 kiB Shape (2016, 19) (2016, 19) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",19  2016,

Unnamed: 0,Array,Chunk
Bytes,299.25 kiB,299.25 kiB
Shape,"(2016, 19)","(2016, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


### I/O (Internal and External)

In [32]:
# Create a dummy GeoDataFrame for this example
from geopandas import GeoDataFrame
from shapely.geometry import Point
from pyproj import CRS

gdf = GeoDataFrame(
    [{"Loc": "I", "Stuff": 1},{"Loc": "II", "Stuff": 2}],
    geometry=[Point(0,0), Point(1,1)],
    crs = CRS.from_epsg(4326)
    )

gdf

Unnamed: 0,Loc,Stuff,geometry
0,I,1,POINT (0.00000 0.00000)
1,II,2,POINT (1.00000 1.00000)


In [33]:
# Create a dataset from a GeoDataFrame using the vector accessor
from hydromt.vector import GeoDataset

ds_gdf = GeoDataset.from_gdf(gdf)

ds_gdf

In [37]:
# Write the Dataset of the DataCatalog to a GeoDataFrame
# Waterlevel has besides stations a time dimension
# GeoDataFrames don't like vectors/ 2d array's, so if you want to keep the variable it can be reduced along the time dimension 

gdf = ds.vector.to_gdf(reducer=mean)

gdf

Unnamed: 0_level_0,geometry,waterlevel
stations,Unnamed: 1_level_1,Unnamed: 2_level_1
13670,POINT (12.25342 45.25635),0.138058
2798,POINT (12.22412 45.27100),0.140681
2799,POINT (12.29736 45.22705),0.133489
13775,POINT (12.75879 45.24902),0.121096
13723,POINT (12.50244 45.25635),0.126971
2797,POINT (12.26807 45.34424),0.134395
13822,POINT (12.99316 45.24902),0.116675
2796,POINT (12.34131 45.37353),0.130066
23305,POINT (12.45850 45.41748),0.128541
22721,POINT (12.42920 45.41748),0.128677


In [40]:
# Write the Dataset to an ogr compliant netcdf
from os.path import join

ds.vector.to_netcdf(join("tmpdir","ds_ogr.nc"), ogr_compliant=True)

In [49]:
# It is indeed ogr compliant as ogrinfo.exe is able to read it

!ogrinfo tmpdir/ds_ogr.nc

INFO: Open of `tmpdir/ds_ogr.nc'
      using driver `netCDF' successful.
Metadata:
  NC_GLOBAL#Conventions=CF-1.6
  NC_GLOBAL#coordinates=spatial_ref ogc_wkt
  NC_GLOBAL#GDAL=GDAL 3.6.1
  NC_GLOBAL#ogr_geometry_field=ogc_wkt
  NC_GLOBAL#ogr_layer_type=Point


