Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add estimate_utm_crs method to GeoSeries and GeoDataFrame #1646

Merged
merged 8 commits into from
Nov 27, 2020

Conversation

snowman2
Copy link
Contributor

@snowman2 snowman2 commented Oct 4, 2020

Closes #1528

Depends on release of pyproj 3+, but figured it wouldn't hurt to get the review process started.

@snowman2 snowman2 changed the title ENH: Add estimate_tum_crs method to GeoSeries and GeoDataFrame ENH: Add estimate_utm_crs method to GeoSeries and GeoDataFrame Oct 4, 2020
@snowman2 snowman2 marked this pull request as ready for review October 4, 2020 18:28
Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already looking great. Thanks a lot. Let's wait to see if we can make it to 0.9 or 0.10.

geopandas/geodataframe.py Outdated Show resolved Hide resolved
geopandas/geodataframe.py Show resolved Hide resolved
geopandas/geoseries.py Outdated Show resolved Hide resolved
Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Few minor comments, but otherwise seems ready to go.

geopandas/geodataframe.py Outdated Show resolved Hide resolved
geopandas/geoseries.py Outdated Show resolved Hide resolved
geopandas/tests/test_geoseries.py Outdated Show resolved Hide resolved
snowman2 and others added 4 commits November 9, 2020 09:32
Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>
Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>
Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>

# ensure using geographic coordinates
if not self.crs.is_geographic:
geoseries = self.to_crs("EPSG:4326")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought. Would it make sense to reproject only total_bounds in this case and use them directly below? So we avoid potentially costly reprojection of all geometries to 4326. Truth is that I am not entirely certain if it could have some adverse consequences or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point to bring up. Reprojecting just the total bounds would be more efficient, but would get a different set of corners in some cases. This would require some investigation to see what the potential issues would be and how much they matter. rasterio has a transform_bounds method that would be nice to be able to use. But, definitely not a dependency we would want to add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They densify bounds with additional points before reprojection, that is smart. Probably a bit too complex for us if its going to be used only here, so let's keep it as it is. I guess that people will be using this mostly with 4326 anyway, so the reprojection will not even happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we densify bounds in rasterio is to help account for projection distortion along the edges between the original total bounds, which would otherwise be ignored by only reprojecting the vertices of the outer bounds.

The actual implementation is pretty simple and could probably be done even more efficiently with a few lines using numpy (probably linspace).

However, for purposes here, you don't need to get the complete outer bounds, you need just enough to come up with a centroid that can be used to find an appropriate UTM zone. If the distortions are symmetric (e.g., if a function of latitude), then perhaps they are negligible here, since they'd be equal for parallel edges of the bounds.

One possible simplification would be to project the individual bounds of the geometries, then take the outer bounds of this. It is still an approximation, but possibly faster than projecting all vertices of all geometries first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the race is on: Toblerity/Fiona#982

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better in pygeos/shapely, but not sure how fast the GEOS release process is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For purposes of choosing the best UTM zone, is it necessary to preserve the overall shape of the geometries? It seems like mostly you want a reasonable approximation of the outer bounds of the geometry once projected to geographic coordinates. Thus densifying the box represented by total_bounds is a reasonable, fast approximation.

Also - to some degree extra densification, etc seems unnecessary, if the total_bounds fall within a single UTM zone, right? So another simplification (perhaps this is done in pyproj?) would be to see if the bounds overlap multiple UTM zones; if so, perhaps a bit more work densifying & reprojecting the box may be helpful, and if not, then proceed directly to returning that singular UTM zone.
(these optimizations could be done later)

It might be worthwhile to note in the doctstring that this will reproject all geometries to geographic coordinates if needed; that way the user knows they can sidestep that by using a simplified derived data frame (e.g., bounding boxes of all geometries, densified outer box, etc) to arrive at a faster result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to some degree extra densification, etc seems unnecessary

The "estimate" part of the function says that we aren't claiming it is the best possible UTM for this dataframe, just one that is reasonable. Way too many ways to complicate this I think (I have thought of a couple since your post that I discarded). With that in mind, we could just transform the bounding coordinates directly with pyproj and call it done 😄 .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could just transform the bounding coordinates directly with pyproj and call it done

That sounds reasonable 👍

@martinfleis martinfleis added this to the 0.9.0 milestone Nov 23, 2020
@martinfleis martinfleis merged commit a0edbee into geopandas:master Nov 27, 2020
@martinfleis
Copy link
Member

Thanks @snowman2!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: method to estimate UTM zone of a gdf
3 participants