Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-area polygons in POLYGON_INDEX_CELL result in incorrect responses #58639

Open
cpg314 opened this issue Jan 9, 2024 · 0 comments
Open

Zero-area polygons in POLYGON_INDEX_CELL result in incorrect responses #58639

cpg314 opened this issue Jan 9, 2024 · 0 comments
Labels
potential bug To be reviewed by developers and confirmed/rejected.

Comments

@cpg314
Copy link

cpg314 commented Jan 9, 2024

Describe what's wrong

A polygon dictionary with latitudes and longitudes is created with

CREATE DICTIONARY dictionary
(
     ident String,
     geom MultiPolygon
)
PRIMARY KEY geom
SOURCE(CLICKHOUSE(TABLE 'source'))
LIFETIME(MIN 0 MAX 0)
LAYOUT(POLYGON(STORE_POLYGON_KEY_COLUMN 1))

The multipolygons are simple composed of one axis-aligned rectangle each.

Trying to avoid the segfault from #58612, I clamped the coordinates, which resulted in rectangles with zero area in the index.
This led to 'dictGet/dictHas` queries returning nothing for points clearly in polygons that were not the clamped.

In other words, zero-area polygons in the index lead to incorrect query results.

For example, with

geom=[[[(8.501196095367856,47.42624617738168),(8.594972446886741,47.42624617738168),(8.594972446886741,47.4898464715465),(8.501196095367856,47.4898464715465),(8.501196095367856,47.42624617738168)]]]
position=(8.548056,47.458056) 

one gets

SELECT
    pointInPolygon(position, geom[1][1]),
    dictGet(dictionary, 'ident', position)

┌─pointInPolygon(position, _subquery9)─┬─dictGet(dictionary, 'ident', position)─┐
│                                    1 │                                        │
└──────────────────────────────────────┴────────────────────────────────────────┘

Other elements that points to a bug in the index creation:

  • When switching to the naive POLYGON_SIMPLE implementation, the queries are correctly answered
  • The pointInPolygon result above is also correct
  • The error does not appear when the zero-area polygons are added.
  • When switching to POLYGON_INDEX_EACH, I even get clearly erroneous answers with points not belonging to the polygon.

Does it reproduce on recent release?

This is present on version 23.12.2.59 (official build), build id: 7F4C1A822F9C67A4D137A58F9A95BD4B0F1B6A8A, git hash: 17ab210)

How to reproduce

I can try to create a minimal reproducible example.

Expected behavior

  • The 3 polygon index  (SIMPLE, INDEX_EACH and INDEX_CELL) should differ only in runtime, and match with pointInPolygon. 
  • Zero-area polygons in the index should not result in incorrect results for other polygons.
@cpg314 cpg314 added the potential bug To be reviewed by developers and confirmed/rejected. label Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential bug To be reviewed by developers and confirmed/rejected.
Projects
None yet
Development

No branches or pull requests

1 participant