Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges #756

Merged
merged 25 commits into from
Apr 27, 2022

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Mar 21, 2022

Similar to LUCENE-9580, It might happen that the new edge introduced when removing a tile is collinear with edges from the resulting polygon. This situation makes this edges not eligible for filtering because of LUCENE-8620 and it makes the logic fail.

This change makes sure we don't introduce collinear edges when removing holes.

@yixunx
Copy link
Contributor

yixunx commented Mar 31, 2022

Thank you for fixing this @iverase. However I just ran into the same Tessellation error using this branch on a different polygon (repro here, and I also attached the geojson to https://issues.apache.org/jira/browse/LUCENE-10470). Interestingly, the error only starts happening after the changes in this branch, and does not happen on the current main branch (tested using my branch that runs different Tessellators), so it's probably a side effect of the changes in this PR. Do you know what might be causing this error?

@iverase
Copy link
Contributor Author

iverase commented Apr 1, 2022

@yixunx thanks for the input.

I revised the issue again and it seems the fix is much simpler. We are actually tessellating properly the polygons but we are not detecting it properly because at the end of the process we still have some edges but those edges are just a line that can be discarded.

I have added a check that test if the final edges have an area and if they don't, it won't fail. Test are passing and visual inspection of the tessellation looks good.

@yixunx
Copy link
Contributor

yixunx commented Apr 4, 2022

@iverase As mentioned in my latest comment on Jira, I found a shape that fails after the latest fix because the final edges still have an area. I was able to get that shape to work by re-introducing your checkNonCollinearLines fix, but doing that made testComplexPolygon47 fail again. I created a branch with the repro if you want to take a look.

@iverase
Copy link
Contributor Author

iverase commented Apr 5, 2022

Thanks @Yinux! your dataset is very interesting because it seems your geometries have many parallel edges which are challenging the tessellator.

In order to fix that I went back to the PR that changed the behaviour of the tessellator. In that PR we added the capability of labelling the edges of the triangles to know if they belong to the original polygon or they have been introduced by the tessellation process. One side effect is that the method filterPoints which removes collinear points is more picky now because it can only remove collinear points when they have the same label (belonging to the original polygon or not).

In this case the issue was a collinear edge where one of the edges was returning over the previous edge and had a point on top of that edge. We can actually remove that collinearity as well so we end up with only one edge.

I added that change in line 1306 and now all test are passing. I run this version over my dataset (over 400 million polygons) and all passed too.

Let me know if you find more issues.

@iverase
Copy link
Contributor Author

iverase commented Apr 12, 2022

@yixunx if there is no further input I am planning to push this change shortly.

@yixunx
Copy link
Contributor

yixunx commented Apr 12, 2022

@iverase The latest run of our indexing pipeline revealed some more shapes that fail to tessellate. This time I made changes so that we get all failing shapes instead of just one. There are 1.3 million failing shapes, and I'm planning to submit a sample of them this week. However, I haven't looked into the shapes yet so I'm not sure if the failures are related to this issue. I can open a separate ticket with the new shapes if you are going to merge this PR soon. Thanks!

@iverase
Copy link
Contributor Author

iverase commented Apr 12, 2022

Ups I was not expecting so many failures. I prefer to wait until we find what is hopefully some pathological issue that can be fixed.

@yixunx
Copy link
Contributor

yixunx commented Apr 17, 2022

Hi @iverase it turns out that many of my failed shapes failed because they contain self-intersections and I set checkSelfIntersections = true when running the tessellator. They do not run into exceptions if I set the flag to false. I did get one shape that failed with "unable to tessellate" though:

{
  "type": "Polygon",
  "coordinates": [
    [
      [
        33.45959656411607,
        -55.440650754647535
      ],
      [
        33.45959656411607,
        -55.4409285324253
      ],
      [
        33.45904100856052,
        -55.4409285324253
      ],
      [
        33.45904100856052,
        -55.441206310203086
      ],
      [
        33.458763230782736,
        -55.441206310203086
      ],
      [
        33.458763230782736,
        -55.44231742131419
      ],
      [
        33.4593187863383,
        -55.44231742131419
      ],
      [
        33.4593187863383,
        -55.44259519909197
      ],
      [
        33.45959656411607,
        -55.44259519909197
      ],
      [
        33.45959656411607,
        -55.44231742131419
      ],
      [
        33.45987434189385,
        -55.44231742131419
      ],
      [
        33.45987434189385,
        -55.44259519909197
      ],
      [
        33.46015211967163,
        -55.44259519909197
      ],
      [
        33.46015211967163,
        -55.44287297686975
      ],
      [
        33.460429897449416,
        -55.44287297686975
      ],
      [
        33.460429897449416,
        -55.44315075464752
      ],
      [
        33.46070767522719,
        -55.44315075464752
      ],
      [
        33.46070767522719,
        -55.44287297686975
      ],
      [
        33.460985453004966,
        -55.44287297686975
      ],
      [
        33.460985453004966,
        -55.44231742131419
      ],
      [
        33.46126323078274,
        -55.44231742131419
      ],
      [
        33.46126323078274,
        -55.44176186575864
      ],
      [
        33.460985453004966,
        -55.44176186575864
      ],
      [
        33.460985453004966,
        -55.441206310203086
      ],
      [
        33.46126323078274,
        -55.441206310203086
      ],
      [
        33.46126323078274,
        -55.440650754647535
      ],
      [
        33.461541008560516,
        -55.440650754647535
      ],
      [
        33.461541008560516,
        -55.44037297686975
      ],
      [
        33.4618187863383,
        -55.44037297686975
      ],
      [
        33.4618187863383,
        -55.44009519909197
      ],
      [
        33.463207675227196,
        -55.44009519909197
      ],
      [
        33.463207675227196,
        -55.43926186575864
      ],
      [
        33.46292989744941,
        -55.43926186575864
      ],
      [
        33.46292989744941,
        -55.43898408798086
      ],
      [
        33.46265211967163,
        -55.43898408798086
      ],
      [
        33.46265211967163,
        -55.43926186575864
      ],
      [
        33.46070767522719,
        -55.43926186575864
      ],
      [
        33.46070767522719,
        -55.43898408798086
      ],
      [
        33.46015211967163,
        -55.43898408798086
      ],
      [
        33.46015211967163,
        -55.43953964353642
      ],
      [
        33.45987434189385,
        -55.43953964353642
      ],
      [
        33.45987434189385,
        -55.43981742131419
      ],
      [
        33.45959656411607,
        -55.43981742131419
      ],
      [
        33.45959656411607,
        -55.44037297686975
      ],
      [
        33.4593187863383,
        -55.44037297686975
      ],
      [
        33.4593187863383,
        -55.440650754647535
      ],
      [
        33.45959656411607,
        -55.440650754647535
      ]
    ],
    [
      [
        33.460985453004966,
        -55.4409285324253
      ],
      [
        33.460985453004966,
        -55.441206310203086
      ],
      [
        33.46070767522719,
        -55.441206310203086
      ],
      [
        33.46070767522719,
        -55.4409285324253
      ],
      [
        33.460985453004966,
        -55.4409285324253
      ]
    ],
    [
      [
        33.46070767522719,
        -55.44009519909197
      ],
      [
        33.46070767522719,
        -55.4409285324253
      ],
      [
        33.460429897449416,
        -55.4409285324253
      ],
      [
        33.460429897449416,
        -55.44037297686975
      ],
      [
        33.45987434189385,
        -55.44037297686975
      ],
      [
        33.45987434189385,
        -55.44009519909197
      ],
      [
        33.46070767522719,
        -55.44009519909197
      ]
    ],
    [
      [
        33.460429897449416,
        -55.441206310203086
      ],
      [
        33.46070767522719,
        -55.441206310203086
      ],
      [
        33.46070767522719,
        -55.44203964353641
      ],
      [
        33.45987434189385,
        -55.44203964353641
      ],
      [
        33.45987434189385,
        -55.44176186575864
      ],
      [
        33.460429897449416,
        -55.44176186575864
      ],
      [
        33.460429897449416,
        -55.441206310203086
      ]
    ],
    [
      [
        33.45959656411607,
        -55.441206310203086
      ],
      [
        33.45959656411607,
        -55.4409285324253
      ],
      [
        33.46015211967163,
        -55.4409285324253
      ],
      [
        33.46015211967163,
        -55.441206310203086
      ],
      [
        33.45959656411607,
        -55.441206310203086
      ]
    ],
    [
      [
        33.460985453004966,
        -55.44037297686975
      ],
      [
        33.460985453004966,
        -55.43981742131419
      ],
      [
        33.46126323078274,
        -55.43981742131419
      ],
      [
        33.46126323078274,
        -55.44037297686975
      ],
      [
        33.460985453004966,
        -55.44037297686975
      ]
    ]
  ]
}

I ran it through different tessellator versions using my branch and this is the first shape I got that failed in Lucene 8.2.0 (all previous shapes only failed on Lucene 8.3.0 or higher).

By the way I have a tangential question: the Tessellator javadoc says the polygon cannot have self-intersections, but it seems that the tessellator runs fine on shapes with self intersections unless I set checkSelfIntersections = true. Is this expected? What would the tessellator return when I set the flag to false? (I saw you implemented the flag in #428 so figured you might have some context).

@iverase
Copy link
Contributor Author

iverase commented Apr 25, 2022

Hi @yixunx,

I had a look to the polygon and it seems to be a totally different type of error and I would prefer to handle it as a separate issue. In the meanwhile I propose to push this change as it is as it contains good fixes for the other cases, what do you think?

By the way I have a tangential question: the Tessellator javadoc says the polygon cannot have self-intersections, but it seems that the tessellator runs fine on shapes with self intersections unless I set checkSelfIntersections = true

I was not the original contributor for this class so I cannot answer that question on the javadocs and I have always found estrange that javadocs says that polygons should not have self-intersections and then we have a method called cureLocalIntersections which it seems to expect self-intersections (currently this method never gets exercise on the test and I am always considering if it should be removed). I wonder if this method is making your polygons to succeed.

I would porpoase to open another issue with this subject so we can capture the intention of the javadocs and see if we should take any action.

@yixunx
Copy link
Contributor

yixunx commented Apr 25, 2022

Sounds good, thank you for looking into this @iverase. I think it makes sense to merge this PR as you suggested, and I'll open separate tickets for the latest failing shape and for my questions about self intersections.

@iverase iverase merged commit 5d3ab09 into apache:main Apr 27, 2022
iverase added a commit that referenced this pull request Apr 27, 2022
…r edges (#756)

 Check if polygon has been successfully tessellated before we fail (we are failing some valid
  tessellations) and allow filtering edges that fold on top of the previous one
@iverase iverase deleted the LUCENE-10470 branch April 27, 2022 08:25
wjp719 added a commit to wjp719/lucene that referenced this pull request Apr 28, 2022
* main: (68 commits)
  LUCENE-10292: Suggest: Fix FreeTextSuggester so that getCount() returned results consistent with lookup() during concurrent build()
  LUCENE-10525 Improve WindowsFS emulation to catch invalid file names (apache#829)
  LUCENE-10508:  Use MIN_WIDE_EXTENT for all wide rectangles (apache#845)
  LUCENE-10470: [Tessellator] Fix some failing polygons due to collinear edges (apache#756)
  LUCENE-10508: Fix error for rectangles with an extent close to 180 degrees (apache#824)
  LUCENE-10529: Fix TestTaxonomyFacetAssociations NPE when randomly indexing no documents for dim
  fix path to jar file in demo documentation
  LUCENE-10499: reduce unnecessary copy data overhead when growing array size (apache#786)
  LUCENE-10535: upgrade com.palantir.consistent-versions to 2.10.0
  LUCENE-10533: SpellChecker.formGrams is missing bounds check (apache#836)
  Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint. (apache#834)
  Fix JVM error branch logic. (apache#835)
  LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori (apache#805)
  LUCENE-8836: Speed up TermsEnum#lookupOrd on increasing sequences of ords. (apache#827)
  LUCENE-10528: use Xvfb in test to avoid messing up user's desktop (apache#828)
  LUCENE-10315: Speed up DocIdsWriter by ForUtil (apache#797)
  LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types (apache#812)
  Add two facet tests (apache#826)
  fail clearly on too-new JDK (apache#819)
  improve spotless error to suggest running 'gradlew tidy' (apache#817)
  ...
@yixunx
Copy link
Contributor

yixunx commented May 9, 2022

@iverase I opened https://issues.apache.org/jira/browse/LUCENE-10563 with the latest failing shape I attached above plus a couple more shapes that I found that also cause tessellation failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants