Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: expose all shapely functions as GeoSeries/GeometryArray methods #2010

Open
40 of 57 tasks
Tracked by #3201
martinfleis opened this issue Jul 16, 2021 · 30 comments
Open
40 of 57 tasks
Tracked by #3201
Assignees
Milestone

Comments

@martinfleis
Copy link
Member

martinfleis commented Jul 16, 2021

shapely now contains some functions which could be exposed as methods on GeoSeries/GeometryArray levels.

Probably incomplete list of candidates (updated Feb 13, 2023):

From upcoming Shapely 2.1

  • has_m
  • get_m / .m
@Zeroto521
Copy link
Contributor

About the API design problem.

Would these pygeos methods be GeoSeries/GeometryArray's methods or gather them into accessors (after #1952 is done)?

Use coordinate ops as the example:

  • GeoSeries.count_coordinates(), this way would easily use.
  • GeoSeries.coord.count_coordinates() ( or GeoSeries.coord.count()), this way has a hierarchical relationship for different methods.

@martinfleis
Copy link
Member Author

I imagined methods. It is shorter and there's surely no overlap in naming as pygeos exposes all as top-level functions. And it is consistent with the existing methods.

Do you think that there are some significant benefits of preferring accessors?

@bitanb1999
Copy link

Hello @martinfleis, I believe for implementing the measurements such as Frechet, Hausdorff distance, and so on, we would have to consider the combinations of the different trajectories formed by the edges of the polygons of the GeoSeries elements or is straight-up between two shapely objects?

@martinfleis
Copy link
Member Author

@bitanb1999 No, it is a distance between two geometries. See the pygeos API.

We need to turn this API:

pygeos.hausdorff_distance(a, b, densify=None, **kwargs)

To this API:

geopandas.GeoSeries.hausdorff_distance(b, densify=None)

It is just about exposing those functions as they are as GeoPandasBase methods, nothing else. In most of the cases, it should be very straightforward as we just need to mirror how other functions are exposed.

@bitanb1999
Copy link

@martinfleis Thank you for the clarification! That indeed seems pretty straightforward! Is this issue up for the GSoC 2022?

@martinfleis
Copy link
Member Author

Is this issue up for the GSoC 2022?

Probably not, the small-size GSoC project is 175 hours. Implementing all listed above will be a matter of <10 hours. Have a look at GSoC project ideas here https://github.com/geopandas/geopandas/wiki/Google-Summer-of-Code-2022 to get a sense of scale. Or pick one of them directly :).

@bitanb1999
Copy link

Is this issue up for the GSoC 2022?

Probably not, the small-size GSoC project is 175 hours. Implementing all listed above will be a matter of <10 hours. Have a look at GSoC project ideas here https://github.com/geopandas/geopandas/wiki/Google-Summer-of-Code-2022 to get a sense of scale. Or pick one of them directly :).

Oh okay! Thank you!

@bitanb1999
Copy link

Is this issue up for the GSoC 2022?

Probably not, the small-size GSoC project is 175 hours. Implementing all listed above will be a matter of <10 hours. Have a look at GSoC project ideas here https://github.com/geopandas/geopandas/wiki/Google-Summer-of-Code-2022 to get a sense of scale. Or pick one of them directly :).

Another question: Is it possible to integrate GEE with GeoPandas for heavy research-based visualisations?

@martinfleis
Copy link
Member Author

Is it possible to integrate GEE with GeoPandas for heavy research-based visualisations?

I think that geemap is your best shot at the moment https://geemap.org/notebooks/geopandas/. But we're out of topic here :).

@bitanb1999
Copy link

Is it possible to integrate GEE with GeoPandas for heavy research-based visualisations?

I think that geemap is your best shot at the moment https://geemap.org/notebooks/geopandas/. But we're out of topic here :).

Yes, the question was associated with the GSoC ideas. Thank you for answering!

@martinfleis
Copy link
Member Author

@bitanb1999 If you want to have a chat about GSoC ideas, feel free to move it to out gitter https://gitter.im/geopandas/geopandas. Happy to have a discussion there!

@martinfleis martinfleis changed the title ENH: expose pygeos functions as GeoSeries/GeometryArray methods ENH: expose all shapely functions as GeoSeries/GeometryArray methods Aug 5, 2022
SimoParmeg added a commit to SimoParmeg/geopandas that referenced this issue Aug 27, 2022
SimoParmeg added a commit to SimoParmeg/geopandas that referenced this issue Aug 27, 2022
SimoParmeg added a commit to SimoParmeg/geopandas that referenced this issue Aug 27, 2022
SimoParmeg added a commit to SimoParmeg/geopandas that referenced this issue Aug 27, 2022
@EwoutH
Copy link
Contributor

EwoutH commented Dec 12, 2022

Probably not, the small-size GSoC project is 175 hours. Implementing all listed above will be a matter of <10 hours.

Do you still think this estimate is accurate?

It would be very nice to have all those functions exposed, and if this could be a potential GSoC project, the time to start thinking about that is around now.

@martinfleis
Copy link
Member Author

Do you still think this estimate is accurate?

No, given the number and variety of shapely functions not exposed, it will be significantly longer with proper tests and documentation. Some are easy but a lot is not and we'll need to figure out how to seamlessly implement them here.

We're usually doing GSoC under the NumFOCUS umbrella so there's a bit more time for us.

@bretttully
Copy link
Contributor

I'm very keen to get the precision functions from shapely2/pygeos in. If someone can point me to the best place to put them, I am happy to have a go at a PR.

Ideally, I'd like to have an API like

gdf = gpd.GeoDataFrame(...)  # maybe an optional ctr param?
gdf.set_precision(...)  # with and without inplace

One thing I would really like is to be able to (de)serialize this via files like geoparquet, but I suspect that is asking way too much, or at least can be split into a distinct issue :-)

@martinfleis
Copy link
Member Author

Historical reasons. GeoPandas API predates accessor API as far as I remember. It would make more sense if the original object was still a pandas.DataFrame with an extension array but we predate that possibility either. With the Geo* subclasses, it feels okay to expose all these directly.

@EwoutH
Copy link
Contributor

EwoutH commented Nov 18, 2023

I would love to be able to use GeoSeries.contains_xy() in the future!

@martinfleis
Copy link
Member Author

martinfleis commented Nov 19, 2023

@EwoutH feel free to come up with a PR for that. Otherwise I'll tackle it before 1.0 is out for sure.

@martinfleis
Copy link
Member Author

get_dimensions has overlapping usage with has_z and is_empty. Not sure if it is worth implementing.

@martinfleis
Copy link
Member Author

get_num_points, which returns a number of points in LineString and LinearRing only is also a candidate to ignore, given get_num_coordinates provides the same number for any geometry type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants