Skip to content

Conversation

@chay0112
Copy link
Contributor

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

  • Implemented minimum_bounding_circle

How was this patch tested?

  • Included unit and parity tests

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation.

Copy link
Collaborator

@petern48 petern48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you follow existing patterns, and things fail unexpectedly, it is totally fine to ask for help. Just be a bit more patient waiting for a reply because I (and most people) have very busy lives.

Having AI agent mode go crazy making super odd changes, is more likely to make your PR take longer to merge. I (and most reviewers) don't just care about tests passing, but also about code readability. Rule of thumb: if the generated code is harder for you to understand than the original, than it probably isn't a good one.

In honesty, I think we should revert this entire last commit, you're farther off now then you originally were. Your initial commit looked great. Just needed a bit of tweaking, which I'll point you in the right direction.

Comment on lines 1033 to 1040
@property
def minimum_bounding_circle(self) -> "GeoSeries":
spark_expr = stf.ST_MinimumBoundingCircle(self.spark.column)
return self._query_geometry_column(
spark_expr,
returns_geom=True,
)
def minimum_bounding_circle(self, quadrant_segments: int = None):
if quadrant_segments is None:
spark_expr = stf.ST_MinimumBoundingCircle(self.spark.column)
else:
spark_expr = stf.ST_MinimumBoundingCircle(
self.spark.column, quadrant_segments
)
return self._query_geometry_column(spark_expr, returns_geom=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to follow the Geopandas API. Here's the docs for .minimum_bounding_circle(). You'll see that it doesn't have a quadrant_segments parameter, so we don't want to add it to our python API either.

Comment on lines 1323 to 1312
df_result = s.to_geoframe().minimum_bounding_circle
self.check_sgpd_equals_gpd(df_result, gpd_res)
tg = getattr(s, "to_geoframe")
gdf = tg() if callable(tg) else tg
mbc = getattr(gdf, "minimum_bounding_circle")
df_result = mbc() if callable(mbc) else mbc
self.check_sgpd_equals_gpd(df_result, expected)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecesssarily complicating the code's readability. Let's keep with existing patterns. When things don't behave the way we want them to, we should adjust things bit, not change completely different parts of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally get it. Sorry for making these a little complicated. I'll try my best to make it simpler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you had before is what we're looking for. Just need to add parentheses () to the end of the call to minimum_bounding_circle.

        df_result = s.to_geoframe().minimum_bounding_circle()
        self.check_sgpd_equals_gpd(df_result, gpd_res)

@chay0112
Copy link
Contributor Author

When you follow existing patterns, and things fail unexpectedly, it is totally fine to ask for help. Just be a bit more patient waiting for a reply because I (and most people) have very busy lives.

Having AI agent mode go crazy making super odd changes, is more likely to make your PR take longer to merge. I (and most reviewers) don't just care about tests passing, but also about code readability. Rule of thumb: if the generated code is harder for you to understand than the original, than it probably isn't a good one.

In honesty, I think we should revert this entire last commit, you're farther off now then you originally were. Your initial commit looked great. Just needed a bit of tweaking, which I'll point you in the right direction.

Thank you for the feedback and for taking the time to review this. I completely understand your point and appreciate the guidance. I’ll revert the last commit and go back to the earlier version so we can refine it from there.

@petern48
Copy link
Collaborator

Now, let's address the CI failure of your original commit. The reality is that the issue was in the test itself. I could explain why it was causing problems, but honestly it's a little complicated.

           # 2) Coverage parity — compare only where inputs are valid for both backends
           nonnull_nonempty_ps = (gs_in.isna() == False) & (gs_in.is_empty == False)

Instead, I'd like you take a step back and rewrite the test. Don't overthink it. You honestly really don't need to understand Sedona as a project much to contribute these Geopandas functions. A lot of it is just copy-paste and following patterns. Now I'll ask you to do this:

  1. Go to test_match_geopandas_series.py, and delete all of the contents of the test_minimum_bounding_circle test.
  2. Take a look at the following functions in test_match_geopandas_series.py. Then, guess how you should implement test_minimum_bounding_circle. It's very simple, much simpler than the original test in this PR. You shouldn't need to ask AI at all. You're welcome to copy and paste.

def test_centroid(self):
for geom in self.geoms:
sgpd_result = GeoSeries(geom).centroid
gpd_result = gpd.GeoSeries(geom).centroid
self.check_sgpd_equals_gpd(sgpd_result, gpd_result)

If you're still confused, you're welcome to ask for help. We might need to make some small adjustments to make things pass, but this is a lot closer to what our desired end result is.

@chay0112
Copy link
Contributor Author

Excellent, thanks for the direction. I'll check this.

@chay0112
Copy link
Contributor Author

chay0112 commented Nov 11, 2025

Now, let's address the CI failure of your original commit. The reality is that the issue was in the test itself. I could explain why it was causing problems, but honestly it's a little complicated.

           # 2) Coverage parity — compare only where inputs are valid for both backends
           nonnull_nonempty_ps = (gs_in.isna() == False) & (gs_in.is_empty == False)

Instead, I'd like you take a step back and rewrite the test. Don't overthink it. You honestly really don't need to understand Sedona as a project much to contribute these Geopandas functions. A lot of it is just copy-paste and following patterns. Now I'll ask you to do this:

  1. Go to test_match_geopandas_series.py, and delete all of the contents of the test_minimum_bounding_circle test.
  2. Take a look at the following functions in test_match_geopandas_series.py. Then, guess how you should implement test_minimum_bounding_circle. It's very simple, much simpler than the original test in this PR. You shouldn't need to ask AI at all. You're welcome to copy and paste.

def test_centroid(self):
for geom in self.geoms:
sgpd_result = GeoSeries(geom).centroid
gpd_result = gpd.GeoSeries(geom).centroid
self.check_sgpd_equals_gpd(sgpd_result, gpd_result)

If you're still confused, you're welcome to ask for help. We might need to make some small adjustments to make things pass, but this is a lot closer to what our desired end result is.

@petern48 Thanks for the great suggestion! I tried a similar approach to what’s used in the centroid test.

def test_minimum_bounding_circle(self):
        for geom in self.geoms:
            sgpd_result = GeoSeries(geom).minimum_bounding_circle()
            gpd_result = gpd.GeoSeries(geom).minimum_bounding_circle()
            self.check_sgpd_equals_gpd(sgpd_result, gpd_result)

The above test fails because the tolerance level in the check_sgpd_equals_gpd method is currently set to 1e-2.
However, increasing the tolerance to 0.5 allows the test to pass. Would you prefer that I update the tolerance to 0.5 in the check_sgpd_equals_gpd method, or handle this case-specific adjustment directly within my test, as shown below ? or am I completely wrong about the analogy ?

def test_minimum_bounding_circle(self):
        for geom in self.geoms:
            sgpd_result = GeoSeries(geom).minimum_bounding_circle()
            gpd_result = gpd.GeoSeries(geom).minimum_bounding_circle()
            for s, g in zip(sgpd_result.to_pandas(), gpd_result):
                assert s.is_valid and g.is_valid
                assert abs(s.area - g.area) < 0.5

No rush ! Please take your time to reply.

@petern48
Copy link
Collaborator

def test_minimum_bounding_circle(self):
       for geom in self.geoms:
           sgpd_result = GeoSeries(geom).minimum_bounding_circle()
           gpd_result = gpd.GeoSeries(geom).minimum_bounding_circle()
           self.check_sgpd_equals_gpd(sgpd_result, gpd_result)

Yes! Now we're on the right track. I'll explain the code a little more just for your knowledge, test_match_geopandas_series.py is for directly comparing our results GeoSeries() with the original geopandas' results gpd.GeoSeries. This code effectively does exactly that: It loops through a ton of different geometries (self.geoms) and checks that our results (sgpd_result) are equal (given a tolerance) to the original geopandas results (gpd_result).

However, increasing the tolerance to 0.5 allows the test to pass. Would you prefer that I update the tolerance to 0.5 in the check_sgpd_equals_gpd method

Nice job figuring this out. I think the best way to move forward is to add a parameter to the test function and overwrite the tolerance to use 0.5 instead just for test_minimum_bounding_circle. Something like this:

    @classmethod
    def check_sgpd_equals_gpd(
        cls,
        actual: GeoSeries,
        expected: gpd.GeoSeries,
+        tolerance: float = 1e-2
    ):
        ...
        for a, e in zip(sgpd_result, expected):
            ...
            cls.assert_geometry_almost_equal(
-                a, e, tolerance=1e-2 # 1e-2
+                a, e, tolerance
            )
            ...

Then you can call it self.check_sgpd_equals_gpd(sgpd_result, gpd_result, tolerance=0.5) in your test function. This approach should still loosen the test, so it can pass, while avoiding loosening the tests for other functions. If other functions still pass with a stricter test, we don't need to loosen it for them too. The definition of that test function is here in test_geopandas_base.py.

@chay0112
Copy link
Contributor Author

def test_minimum_bounding_circle(self):
       for geom in self.geoms:
           sgpd_result = GeoSeries(geom).minimum_bounding_circle()
           gpd_result = gpd.GeoSeries(geom).minimum_bounding_circle()
           self.check_sgpd_equals_gpd(sgpd_result, gpd_result)

Yes! Now we're on the right track. I'll explain the code a little more just for your knowledge, test_match_geopandas_series.py is for directly comparing our results GeoSeries() with the original geopandas' results gpd.GeoSeries. This code effectively does exactly that: It loops through a ton of different geometries (self.geoms) and checks that our results (sgpd_result) are equal (given a tolerance) to the original geopandas results (gpd_result).

However, increasing the tolerance to 0.5 allows the test to pass. Would you prefer that I update the tolerance to 0.5 in the check_sgpd_equals_gpd method

Nice job figuring this out. I think the best way to move forward is to add a parameter to the test function and overwrite the tolerance to use 0.5 instead just for test_minimum_bounding_circle. Something like this:

    @classmethod
    def check_sgpd_equals_gpd(
        cls,
        actual: GeoSeries,
        expected: gpd.GeoSeries,
+        tolerance: float = 1e-2
    ):
        ...
        for a, e in zip(sgpd_result, expected):
            ...
            cls.assert_geometry_almost_equal(
-                a, e, tolerance=1e-2 # 1e-2
+                a, e, tolerance
            )
            ...

Then you can call it self.check_sgpd_equals_gpd(sgpd_result, gpd_result, tolerance=0.5) in your test function. This approach should still loosen the test, so it can pass, while avoiding loosening the tests for other functions. If other functions still pass with a stricter test, we don't need to loosen it for them too. The definition of that test function is here in test_geopandas_base.py.

Awesome, that’s one of the best explanations I’ve ever received !

Copy link
Collaborator

@petern48 petern48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a one last minor nit to address.

Now that we slowed down to spend a bit more time for you understand a few aspects of the code, you'll (hopefully) be better prepared to contribute more functions 😉

Co-authored-by: Peter Nguyen <petern0408@gmail.com>
@chay0112
Copy link
Contributor Author

Looks great! Just a one last minor nit to address.

Now that we slowed down to spend a bit more time for you understand a few aspects of the code, you'll (hopefully) be better prepared to contribute more functions 😉

Absolutely! I really appreciate you taking the time to walk me through the details. It definitely helped me understand things better and feel more confident about contributing further.

@petern48 petern48 merged commit 8de7008 into apache:master Nov 11, 2025
30 of 31 checks passed
@petern48
Copy link
Collaborator

Thanks @chay0112!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Geopandas: Implement minimum_bounding_circle

2 participants