Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug in ShapeTestUtil #12287

Merged
merged 1 commit into from
Jan 24, 2024
Merged

Fix a bug in ShapeTestUtil #12287

merged 1 commit into from
Jan 24, 2024

Conversation

heemin32
Copy link
Contributor

Description

boxPolygon and trianglePolygon only uses minX and minY value of give XY Rectangle which results in a polygon with points in single place. With this changes, both methods generate correct polygons.

Copy link
Contributor

@stefanvodita stefanvodita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! I wonder if we could put a test in place that would have reliably detected this issue.

@heemin32
Copy link
Contributor Author

Nice find! I wonder if we could put a test in place that would have reliably detected this issue.

This is a code for test actually. If we are writing a test using this method, we will detect the issue and that is how I found it.

@stefanvodita
Copy link
Contributor

I should have been clearer. I'm assuming the existing tests that call nextPolygon() are passing before and after the fix, which warrants some suspicion. Could we devise a test that would use these utility methods and fail without this fix? It sounds like you were already writing a test that would serve this purpose. Let me know if I've got it wrong.

@heemin32
Copy link
Contributor Author

I should have been clearer. I'm assuming the existing tests that call nextPolygon() are passing before and after the fix, which warrants some suspicion. Could we devise a test that would use these utility methods and fail without this fix? It sounds like you were already writing a test that would serve this purpose. Let me know if I've got it wrong.

Got your point. Actually, I am using this utility methods in other dependency repos, OpenSearch. There, the test fails without this fix. Let me see if I can add a test using this utility method in lucene.

@mikemccand
Copy link
Member

Thanks @heemin32 for taking the effort to bring the fix down to Lucene, from OpenSearch test failures. A dedicated Lucene unit test would be great. Maybe @nknize could help evaluate the change?

@stefanvodita
Copy link
Contributor

stefanvodita commented Nov 4, 2023

The test could call the modified methods with a random box and assert that all vertices of the given polygon are different. We can reuse hasDuplicateVertices from #12757.

Copy link

github-actions bot commented Jan 8, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 8, 2024
Copy link
Member

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@heemin32
Copy link
Contributor Author

heemin32 commented Jan 8, 2024

There are existing tests which should fails. https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/document/TestShapeDocValues.java#L67

However because we are catching exception and try until we get the correct polygon, it never fails.
https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/document/BaseXYShapeTestCase.java#L228-L235

I cannot remove the try-catch statement as of now because there are bug in surpriseMePolygon() and createRegularPolygon() which fails to create a valid polygon because of type casting from double to float. One of attempt to fix it is in #12757.

I would like to wait until the other issues are resolved and remove the try-catch statement rather than writing a unit test for just the test utility code.

@github-actions github-actions bot removed the Stale label Jan 9, 2024
@stefanvodita
Copy link
Contributor

We would still have a few more issues to solve after #12757, which are documented in #12596, but overall I agree that it makes sense to merge this as is.

@nknize
Copy link
Member

nknize commented Jan 9, 2024

There are existing tests which should fails. https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/document/TestShapeDocValues.java#L67
However because we are catching exception and try until we get the correct polygon, it never fails.

That try-catch is intentional. The evil, esoteric / invalid test geometry generation is also intentional so we can maximize our test coverage across the tessellator and Shape indexing. This is why we have Tessellator logic to filter colinear and coplanar points so we can best effort index shapes that may not be well formed or may not have been pre-validated or cleaned. While we expect well formed data, that's not the real world. Users (especially geo users) often throw dirty data at the API. And to pre-validate geometries is an expensive operation. Not all users want to pay that price. This is why we left much of that diligence out of Lucene and in the hands of the integrating API. For example, see Elasticsearch ignore_malformed parameter. For the explicit shape doc value bounding box test referenced, we expect a valid polygon bounding box for test purposes, but (as just pointed out) that reused polygon generator could create a malformed one and an invalid bounding box. Rather than write a new polygon generator I opted to reuse the existing one and just retry until a valid polygon is created. What you could do is just write a new polygon generator that always creates a valid well-formed polygon. I'd be concerned, though, about new contributors always using that to create perfect data, which is not real world.

@heemin32
Copy link
Contributor Author

heemin32 commented Jan 9, 2024

That try-catch is intentional.

However, the implementation between latlon and xy are different then.

public void testLatLonPolygonCentroid() {
Polygon p = GeoTestUtil.nextPolygon();
Point expected = (Point) computeCentroid(p);
List<ShapeField.DecodedTriangle> tess = getTessellation(p);
LatLonShapeDocValuesField dvField = LatLonShape.createDocValueField(FIELD_NAME, p);
assertEquals(tess.size(), dvField.numberOfTerms());
assertEquals(expected.getLat(), dvField.getCentroid().getLat(), TOLERANCE);
assertEquals(expected.getLon(), dvField.getCentroid().getLon(), TOLERANCE);
assertEquals(TYPE.TRIANGLE, dvField.getHighestDimensionType());
}
public void testXYPolygonCentroid() {
XYPolygon p = (XYPolygon) BaseXYShapeTestCase.ShapeType.POLYGON.nextShape();
XYPoint expected = (XYPoint) computeCentroid(p);
XYShapeDocValuesField dvField = XYShape.createDocValueField(FIELD_NAME, getTessellation(p));
assertEquals(expected.getX(), dvField.getCentroid().getX(), TOLERANCE);
assertEquals(expected.getY(), dvField.getCentroid().getY(), TOLERANCE);
assertEquals(TYPE.TRIANGLE, dvField.getHighestDimensionType());
}

For latlon, it does not have try-catch statement wrapping GeoTestUtil.nextPolygon(). I believe GeoTestUtil.nextPolygon() returns valid polygon always but ShapeTestUtil.nextPolygon() does not.

@nknize
Copy link
Member

nknize commented Jan 9, 2024

... I believe GeoTestUtil.nextPolygon() returns valid polygon always but ShapeTestUtil.nextPolygon() does not.

A quick glance looks like the difference is from LUCENE-9192 where ShapeTestUtil#nextPolygon will only consider executing the try-catch path on Nightly runs. GeoTestUtil will consider it 10% of the time..

@heemin32
Copy link
Contributor Author

heemin32 commented Jan 9, 2024

Yes. But what I meant is testLatLonPolygonCentroid() test calls Polygon p = GeoTestUtil.nextPolygon(); directly, but testXYPolygonCentroid() test calls XYPolygon p = (XYPolygon) BaseXYShapeTestCase.ShapeType.POLYGON.nextShape(); which calls ShapeTestUtil.nextPolygon(); inside try-catch. We need to call ShapeTestUtil.nextPolygon(); directly from testXYPolygonCentroid().

@nknize
Copy link
Member

nknize commented Jan 9, 2024

We need to call ShapeTestUtil.nextPolygon(); directly from testXYPolygonCentroid().

Doing that could create a polygon that can't be tessellated.

@heemin32
Copy link
Contributor Author

heemin32 commented Jan 9, 2024

Doing that could create a polygon that can't be tessellated.

But testLatLonPolygonCentroid() does call GeoTestUtils.nextPolygon() directly and tessellation works fine.

public void testLatLonPolygonCentroid() {
Polygon p = GeoTestUtil.nextPolygon();
Point expected = (Point) computeCentroid(p);
List<ShapeField.DecodedTriangle> tess = getTessellation(p);
LatLonShapeDocValuesField dvField = LatLonShape.createDocValueField(FIELD_NAME, p);
assertEquals(tess.size(), dvField.numberOfTerms());
assertEquals(expected.getLat(), dvField.getCentroid().getLat(), TOLERANCE);
assertEquals(expected.getLon(), dvField.getCentroid().getLon(), TOLERANCE);
assertEquals(TYPE.TRIANGLE, dvField.getHighestDimensionType());
}

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 24, 2024
@heemin32
Copy link
Contributor Author

Any update?

@stefanvodita
Copy link
Contributor

Maybe we can merge this fix as-is and continue the conversation about where test polygon generation should produce valid polygons in #12596. It's not clear to me now if there is anything to fix in #12596, but I think we agree that the fix in this PR should go ahead.

@heemin32 - do you want to write a CHANGES entry for Lucene 9.10?

@mikemccand
Copy link
Member

Maybe we can merge this fix as-is and continue the conversation about where test polygon generation should produce valid polygons in #12596. I

+1. Progress not perfection!

boxPolygon and trianglePolygon only uses minX and minY value of give XY Rectangle which results in a polygon with points in single place. With this changes, both methods generate correct polygons.
@heemin32
Copy link
Contributor Author

Added CHANGELOG entry

@stefanvodita stefanvodita merged commit 3a205fe into apache:main Jan 24, 2024
4 checks passed
@stefanvodita
Copy link
Contributor

Thank you for persevering @heemin32!

@heemin32 heemin32 deleted the test-bug-fix branch January 24, 2024 18:19
stefanvodita pushed a commit that referenced this pull request Jan 24, 2024
boxPolygon and trianglePolygon only uses minX and minY value of give XY Rectangle which results in a polygon with points in single place. With this changes, both methods generate correct polygons.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants