Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new GeoDataset #208

Closed
wants to merge 14 commits into from
Closed

new GeoDataset #208

wants to merge 14 commits into from

Conversation

DirkEilander
Copy link
Contributor

@DirkEilander DirkEilander commented Oct 6, 2022

This new implementation of the GeoDataset allows for any geometry type (not only points) to be combined with multi-dimensional data. My suggestion is to develop it next the current implementation which we will remove in a future release.

TODO:

  • parse GeoDataFrames to GeoDataset
  • GeoDataset to netcdf with conversion of geometry to wkt
  • parse netcdf with wkt to GeoDataset
  • write unit tests
  • decide on data accessor name (geo instead of vector ?)

Related issue: #177

Goal

  • easily work with geometry with associated multidimensional data variables
  • compared to the current implementation it should support other than point geometries.

implementation notes:

  • use "vector" accessor for data array and dataset
  • keep vector accessor backwards compatible
  • use geometry property to get a geopandas.geoarray for each of the following types
  • expose some of the geoarray methods directly as method of vector.
  • staticmethods: from_netcfd / from_gdf / from_wkt
  • methods for to_gdf / to_netcdf

Types of dataset on which the accessor should work:

    Dimensions:      (stations: N, time: NT)
    Coordinates:
      * time         (time) [np.datetime64]
      * stations     (stations) [int]
        lon          (stations) [float]
        lat          (stations) [float]
    Data variables:
        waterlevel   (time, stations) [float]
    Dimensions:      (stations: N, time: NT)
    Coordinates:
      * time         (time) [np.datetime64]
      * stations     (stations) [int]
        geometry    (stations) [numpy object with geometries]
    Data variables:
        waterlevel   (time, stations)
    Dimensions:      (stations: N, time: NT)
    Coordinates:
      * time         (time) [np.datetime64]
      * stations     (stations) [int]
        geometry    (stations) [numpy array of wkt str]
    Data variables:
        waterlevel   (time, stations)

Notes on geometry object
Right now the geometries are saved in the geometry object of the GeoDataset as a geopandas.array.GeoArray which is a numpy duck array type with additional geometry methods. This does not yet implement the __array__function__ method and can therefore not yet be integrated into a coordinate or variable of a xarray Dataset. Instead in an xarray Dataset it is reduced to a normal numpy objects array.

@DirkEilander
Copy link
Contributor Author

@hboisgon Could you already have a look a the implementation and let me know what you agree with the approach?

@DirkEilander DirkEilander requested review from dalmijn and removed request for dalmijn January 12, 2023 16:00
@dalmijn
Copy link
Contributor

dalmijn commented Jan 17, 2023

I did write a unit test for an ogr compliant dataset.

Maybe now or in the near future, rename the registration from 'geo' to 'vector'

@dalmijn dalmijn closed this Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants