# Example: Working with geodatasets

In [1]:
from hydromt import DataCatalog

dc = DataCatalog(data_libs=["artifact_data=v1.0.0"])

Here, we illustrate some common GIS problems and how the functionality of the DataArray/Dataset vector accessor can be used. The data is accessed using the HydroMT [DataCatalog](../_generated/hydromt.data_catalog.DataCatalog.rst). For more information see the [Reading vector data](reading_vector_data.ipynb) example.

## Geospatial attributes 

Some of the available geospatial attributes are listed here below. For all of attributes (and methods for that matter): check the [HydroMT API reference](../api/api.rst)

In [2]:
# Get the waterlevel dataset from the datacatalog
# The waterlevels are series per location (points; here in lat lon)
ds = dc.get_geodataset("gtsmv3_eu_era5")

object: GeoDatasetXarrayDriver does not use kwarg predicate with value intersects.


In [3]:
# Coordinate reference system
ds.vector.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- undefined
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [4]:
# Geospatial data as geometry objects (we show the first 5 points)
ds.vector.geometry.head()

stations
13670    POINT (12.25342 45.25635)
2798       POINT (12.22412 45.271)
2799     POINT (12.29736 45.22705)
13775    POINT (12.75879 45.24902)
13723    POINT (12.50244 45.25635)
dtype: geometry

In [5]:
# names of x- and y coordinates

(ds.vector.x_name, ds.vector.y_name)

('lon', 'lat')

In [6]:
# Bounding box of the geospatial data
ds.vector.bounds

array([12.22412, 45.22705, 12.99316, 45.62256])

## Reprojection

In [7]:
# Reproject data to Pseudo Mercator (EPSG: 3857)
ds_pm = ds.vector.to_crs(3857)

# Coordinate reference system
ds_pm.vector.crs

<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: World between 85.06°S and 85.06°N.
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [8]:
# Check that the coordinate values are indeed no longer in degrees
ds_pm.lon.values[0]

np.float64(1364044.4748761142)

## Conversion

In [9]:
# Create an ogr compliant dataset from ds
# When written in netcdf4 format, this can be read by ogr (osgeo; QGIS)
from numpy import mean

ds_ogr = ds.vector.ogr_compliant(reducer=mean)
ds_ogr

Unnamed: 0,Array,Chunk
Bytes,152 B,152 B
Shape,"(19,)","(19,)"
Dask graph,1 chunks in 6 graph layers,1 chunks in 6 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 152 B 152 B Shape (19,) (19,) Dask graph 1 chunks in 6 graph layers Data type float64 numpy.ndarray",19  1,

Unnamed: 0,Array,Chunk
Bytes,152 B,152 B
Shape,"(19,)","(19,)"
Dask graph,1 chunks in 6 graph layers,1 chunks in 6 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [10]:
# Convert the geospatial data to wkt strings
ds_wkt = ds.vector.to_wkt()
ds_wkt

Unnamed: 0,Array,Chunk
Bytes,299.25 kiB,299.25 kiB
Shape,"(2016, 19)","(2016, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 299.25 kiB 299.25 kiB Shape (2016, 19) (2016, 19) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",19  2016,

Unnamed: 0,Array,Chunk
Bytes,299.25 kiB,299.25 kiB
Shape,"(2016, 19)","(2016, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [11]:
# Convert the geospatial data to geometry objects
ds_geom = ds.vector.to_geom()
ds_geom

Unnamed: 0,Array,Chunk
Bytes,299.25 kiB,299.25 kiB
Shape,"(2016, 19)","(2016, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 299.25 kiB 299.25 kiB Shape (2016, 19) (2016, 19) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",19  2016,

Unnamed: 0,Array,Chunk
Bytes,299.25 kiB,299.25 kiB
Shape,"(2016, 19)","(2016, 19)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


## I/O (Internal and External)

In [12]:
# Create a dummy GeoDataFrame for this example
import numpy as np
from geopandas import GeoDataFrame
from pyproj import CRS
from shapely.geometry import Point

from hydromt.gis import GeoDataArray

gdf = GeoDataFrame(
    [{"Loc": "I", "Stuff": 1}, {"Loc": "II", "Stuff": 2}],
    geometry=[Point(0, 0), Point(1, 1)],
    crs=CRS.from_epsg(4326),
)

ds_gdf = GeoDataArray.from_gdf(gdf, np.arange(gdf.index.size))
ds_gdf

In [13]:
# Write the Dataset of the DataCatalog to a GeoDataFrame
# Waterlevel has besides stations a time dimension
# GeoDataFrames don't like vectors/ 2d array's, so if you want to keep the variable it can be reduced along the time dimension

gdf = ds.vector.to_gdf(reducer=mean)
gdf.head()

Unnamed: 0_level_0,geometry,waterlevel
stations,Unnamed: 1_level_1,Unnamed: 2_level_1
13670,POINT (12.25342 45.25635),0.138058
2798,POINT (12.22412 45.271),0.140681
2799,POINT (12.29736 45.22705),0.133489
13775,POINT (12.75879 45.24902),0.121096
13723,POINT (12.50244 45.25635),0.126971


In [14]:
# Write the Dataset to an ogr compliant netcdf
from os.path import join

ds.vector.to_netcdf(join("tmpdir", "ds_ogr.nc"), ogr_compliant=True)

# It is indeed ogr compliant as ogrinfo.exe is able to read it
!ogrinfo tmpdir/ds_ogr.nc

/usr/bin/bash: line 1: ogrinfo: command not found
