Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: sjoin() should allow to return the distance when using the dwithin predicate, just like sjoin_nearest() #3270

Open
ShootingStarD opened this issue Apr 27, 2024 · 3 comments

Comments

@ShootingStarD
Copy link

Is your feature request related to a problem?

Geopandas recently added the dwithin method on GeoSeries to check if geometry A is at a maximum distance d of geometry B. We can also use the dwithin predicate for spatial join .

I would be very neat to allow the two methods (especially the geopandas.sjoin() ) to return the distance value used for the dwithin merge. For now I have to first do the dwithin join, then compute the distance. But it would better to do it in one go, just like the geopandas.sjoin_nearest()

Describe the solution you'd like

def sjoin(
    left_df,
    right_df,
    how="inner",
    predicate="intersects",
    lsuffix="left",
    rsuffix="right",
    distance=None,
    distance_col=None,
    **kwargs,
):
    """Spatial join of two GeoDataFrames.

    See the User Guide page :doc:`../../user_guide/mergingdata` for details.


    Parameters
    ----------
    left_df, right_df : GeoDataFrames
    how : string, default 'inner'
        The type of join:

        * 'left': use keys from left_df; retain only left_df geometry column
        * 'right': use keys from right_df; retain only right_df geometry column
        * 'inner': use intersection of keys from both dfs; retain only
          left_df geometry column
    predicate : string, default 'intersects'
        Binary predicate. Valid values are determined by the spatial index used.
        You can check the valid values in left_df or right_df as
        ``left_df.sindex.valid_query_predicates`` or
        ``right_df.sindex.valid_query_predicates``
        Replaces deprecated ``op`` parameter.
    lsuffix : string, default 'left'
        Suffix to apply to overlapping column names (left GeoDataFrame).
    rsuffix : string, default 'right'
        Suffix to apply to overlapping column names (right GeoDataFrame).
    distance : number or array_like, optional
        Distance(s) around each input geometry within which to query the tree
        for the 'dwithin' predicate. If array_like, must be
        one-dimesional with length equal to length of left GeoDataFrame.
        Required if ``predicate='dwithin'``.
    distance_col : If set, save the distances computed between matching geometries under a column of this name in the joined GeoDataFrame.

    """

API breaking implications

except for added distance_col parameter, I do not see groundbreaking changes

Describe alternatives you've considered

For now I have to first do the dwithin join, then compute the distance. But it would better and faster to do it in one go, just like the geopandas.sjoin_nearest()

Linked Issues and PR

This feature request is linked to the following PR : #2900

@ShootingStarD
Copy link
Author

@chris-hedemann I currently don't have an idea on how to implement it.

The thing is that the current addition of dwithin predicate for the sjoin is based on the query function from shapely. However the current query() implementation does not allow to return the distances, contrary to the query_nearest() .

Therefore I suppose we would have to wait for this issues to be done on shapely side to implement the behaviour in geopandas

What do you think?

@martinfleis
Copy link
Member

GeoSeries.dwithin and shapely.dwithin are predicate checks and it is easy for you to get the distance if you need it. It would be a call to shapely.distance internally anyway. I think that this use case does not require a change and a complication of the API and the higher maintenance cost.

I'd be fine with inclusion of the distance in sjoin, as it is a bit more complex to do that yourself and we have existing sjoin_nearest that does that (though differently). You would need to compute distances after the indexing step and ensure it is correctly included in the final joined data frame.

@ShootingStarD
Copy link
Author

@martinfleis I will update the name of the issue to align with your proposition

@ShootingStarD ShootingStarD changed the title ENH: GeoDataFrame and GeoSeries.dwithin should allow to return the distances values ENH: sjoin() should allow to return the distance when using the dwithin predicate, just lke sjoin_nearest() Apr 28, 2024
@ShootingStarD ShootingStarD changed the title ENH: sjoin() should allow to return the distance when using the dwithin predicate, just lke sjoin_nearest() ENH: sjoin() should allow to return the distance when using the dwithin predicate, just like sjoin_nearest() Apr 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants