Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency bug DocumentsWriterPerThreadPool.getAndLock() uncovered by OpenJ9 test failures? #12916

Closed
mikemccand opened this issue Dec 11, 2023 · 24 comments
Assignees
Labels
Milestone

Comments

@mikemccand
Copy link
Member

Description

There is an exciting upstream (OpenJ9) comment here (thank you @singh264), copied below:

The intermittent assertion failure in testSegmentCountOnFlushRandom seems to have started after bae5b33 was added in the lucene repo, and a solution to fix the intermittent assertion failure is to make the DocumentsWriterPerThreadPool.getAndLock() method synchronized as it seems to fix a race condition that occurs while multiple threads concurrently add documents to an index during the test.

I have not looked closely yet ...

Version and environment details

No response

@uschindler
Copy link
Contributor

Hi,
The proposed fix is looking fine to me. Please note that all other accessible methods in that class are synchronized (or private ones implicitly synchronized as only called from synchronized ones).

Just this one isn't and this causes trouble.

I think the fix mentioned here is fine and fixes a real bug which may also happen on Hotspot on some other hardware platform. It does not fail on x86 with hotspot. ARM could fail due to other memory behaviour and order.

I will provide a PR to add synchronized tomorrow.

@uschindler uschindler self-assigned this Dec 18, 2023
@uschindler uschindler added this to the 9.10.0 milestone Dec 18, 2023
@jpountz
Copy link
Contributor

jpountz commented Dec 20, 2023

Does someone understand if adding synchronization is fixing a real bug of it it just helps hide a J9 bug? This method is subject to contention and #12199 was about avoiding locking on this method, which proved to help significantly when indexing cheap documents on several threads. I'm looking at the test and the code and can't see what sort of race condition may cause this test to fail.

@uschindler
Copy link
Contributor

This is a real bug and not one of openj9. You can reproduce this bug with enough tries on hotspot, too.

@uschindler
Copy link
Contributor

They made statistics in the linked issue. Hotspot also fails. So they rejected it as openj9 issue.

@jpountz
Copy link
Contributor

jpountz commented Dec 20, 2023

Thanks I had missed that, I'll look more into it.

@uschindler
Copy link
Contributor

See this comment: eclipse-openj9/openj9#18400 (comment)

@jpountz
Copy link
Contributor

jpountz commented Dec 20, 2023

Thanks. I'm trying to reproduce failures locally with the following command, without luck so far with JDK 17 and JDK 21. I'll dig more tomorrow. If any of you manages to reproduce, I'm interested in the command that you used.

$ ./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC test --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000

@uschindler
Copy link
Contributor

uschindler commented Dec 20, 2023

Thanks. Maybe the OpenJ9 people can help how they reproduced. You can use gradlew beast instead of gradlew test to run the forked copies inside gradle.

More looking into that code; I think the whole code here (and IndexWriter in general) should be freed of ancient "synchronized" blocks and methods and should instead use the more modern Java synchronization patterns inclusive volatile / opaque reads and barriers. The problematic concurrency could for sure be solved with some better algorithm that allows to read lockfree and only write with locks (e.g., ReadWriteLock instead of synchronized).

@uschindler
Copy link
Contributor

uschindler commented Dec 20, 2023

I made test fail on my AMD Ryzen 3700 (the Policeman Jenkins Sever) with Hotspot 17.0.9:

$ java -version
openjdk version "17.0.9" 2023-10-17
OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9)
OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, sharing)

$ ./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC beast --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000 -Ptests.dups=100

It failed on the 3rd beasting round:

> Task :lucene:core:test_1
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
:lucene:core:test_1 (SUCCESS): 1000 test(s)

> Task :lucene:core:test_10
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
:lucene:core:test_10 (SUCCESS): 1000 test(s)

> Task :lucene:core:test_100
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release

org.apache.lucene.index.TestIndexWriterThreadsToSegments > testSegmentCountOnFlushRandom {seed=[F7B4CD7A5624D5EC:A0DDCC17DE66DB34]} FAILED
    java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
        at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
        at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
        at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1006, name=Thread-981, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1004, name=Thread-979, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1002, name=Thread-977, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=999, name=Thread-974, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1000, name=Thread-975, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1003, name=Thread-978, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1001, name=Thread-976, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1005, name=Thread-980, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=998, name=Thread-973, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

org.apache.lucene.index.TestIndexWriterThreadsToSegments > test suite's output saved to /home/thetaphi/repro/lucene/lucene/core/build/test-results/test_100/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriterThreadsToSegments.txt, copied below:
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-977,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-979,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-981,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-974,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-976,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-973,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-982,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.AssertionError
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.junit.Assert.fail(Assert.java:87)
  2>    at org.junit.Assert.assertTrue(Assert.java:42)
  2>    at org.junit.Assert.assertTrue(Assert.java:53)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-975,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-978,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-980,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
   >     java.lang.AssertionError
   >         at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >         at org.junit.Assert.fail(Assert.java:87)
   >         at org.junit.Assert.assertTrue(Assert.java:42)
   >         at org.junit.Assert.assertTrue(Assert.java:53)
   >         at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
   >         at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
   >         at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >         at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1006, name=Thread-981, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1004, name=Thread-979, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1002, name=Thread-977, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=999, name=Thread-974, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1000, name=Thread-975, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1003, name=Thread-978, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1001, name=Thread-976, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1005, name=Thread-980, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=998, name=Thread-973, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2> NOTE: reproduce with: gradlew test --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.seed=F7B4CD7A5624D5EC -Dtests.locale=en-DK -Dtests.timezone=HST -Dtests.asserts=true -Dtests.file.encoding=UTF-8

:lucene:core:test_100 (FAILURE): 99 test(s), 1 failure(s), 1 skipped

99 tests completed, 1 failed, 1 skipped

> Task :lucene:core:test_100 FAILED

@singh264
Copy link

If any of you manages to reproduce, I'm interested in the command that you used

I was able to reproduce the failure in TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom in branch_9x on a x86_64 Linux machine:

$JAVA_HOME/bin/java -version
java version "17.0.9" 2023-10-17 LTS
Java(TM) SE Runtime Environment (build 17.0.9+11-LTS-201)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.9+11-LTS-201, mixed mode, sharing)

$RUNTIME_JAVA_HOME/bin/java -version
openjdk version "17.0.9-internal" 2023-10-17
OpenJDK Runtime Environment (build 17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17)
Eclipse OpenJ9 VM (build openj9-0.41.0, JRE 17 Linux amd64-64-Bit Compressed References 20231124_000000 (JIT enabled, AOT enabled)
OpenJ9   - 461bf3c70
OMR      - 5eee6ad9d
JCL      - 3699725139c based on jdk-17.0.9+9)
./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC test --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000

...

> Task :altJvmWarning
NOTE: Alternative java toolchain will be used for compilation and tests:
  Project will use 17 (Eclipse OpenJ9 JDK 17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17, home at: /root/openj9_issues_18400/openj9-openjdk-jdk17/build/linux-x86_64-server-release/images/jdk)
  Gradle runs with 17 (Oracle JDK 17.0.9+11-LTS-201, home at: /root/openj9_issues_18400/jdk-17.0.9)

...

> Task :lucene:core:test
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by java.lang.System
WARNING: Please consider reporting this to the maintainers of java.lang.System
WARNING: System::setSecurityManager will be removed in a future release

org.apache.lucene.index.TestIndexWriterThreadsToSegments > testSegmentCountOnFlushRandom {seed=[F7B4CD7A5624D5EC:59A28958CC8D8396]} FAILED
    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=126, name=Thread-97, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=127, name=Thread-98, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=130, name=Thread-101, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=125, name=Thread-96, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=129, name=Thread-100, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=122, name=Thread-93, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=131, name=Thread-102, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=123, name=Thread-94, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=128, name=Thread-99, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
        at app//org.junit.Assert.fail(Assert.java:87)
        at app//org.junit.Assert.assertTrue(Assert.java:42)
        at app//org.junit.Assert.assertTrue(Assert.java:53)
        at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
        at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
        at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
        at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
<============-> 93% EXECUTING [3m 57s]
> :lucene:core:test > Executing test org.apache.lucene.index.TestIndexWriterThreadsToSegments
> IDLE
> IDLE
> :lucene:core:test > 18 tests completed, 1 failed

jpountz added a commit to jpountz/lucene that referenced this issue Dec 21, 2023
…ll` on a non-empty queue.

Before this change, `ConcurrentApproximatePriorityQueue#poll` could sometimes
return `null` even though the queue was empty at no point in time. The
practical implication is that we can end up with more DWPTs in memory than
indexing threads, which, while not a bug, is probably not efficient.

I ran luceneutil's `IndexGeonames` with this change and could not notice a
slowdown while this benchmark has been good at exhibiting contention issues in
the past. This is likely due to the fact that most of the time, a DWPT can be
obtained from the pool using the optimistic path, which doesn't block.

Closes apache#12649 apache#12916
jpountz added a commit to jpountz/lucene that referenced this issue Dec 21, 2023
…ll` on a non-empty queue.

Before this change, `ConcurrentApproximatePriorityQueue#poll` could sometimes
return `null` even though the queue was empty at no point in time. The
practical implication is that we can end up with more DWPTs in memory than
indexing threads, which, while not strictly a bug, may require doing more
merging than we'd like later on.

I ran luceneutil's `IndexGeonames` with this change, and
`ConcurrentApproximatePriorityQueue#poll` was not the main source of
contention. I instrumented the code to check how many DWPTs got pulled from the
queue using the optimistic path vs. pessimistic path and got 8525598 for the
optimistic path vs. 12247 for the pessimistic path.

Closes apache#12649 apache#12916
@jpountz
Copy link
Contributor

jpountz commented Dec 21, 2023

I was not able to reproduce the failure, but could create an isolated test on ConcurrentApproximatePriorityQueue that consistently reproduces what I think is the same problem. I opened a PR at #12959.

@uschindler
Copy link
Contributor

Ok, maybe the issue happens more often with OpenJ9 or AMD Ryzen CPUs. Should I take your PR and check with beasting on Policeman's Ryzen to see if I can reproduce with the above command?

I can also test OpenJ9.

@uschindler
Copy link
Contributor

Is there an explanation why it works better when making the method synchronized as suggested by the OpenJ9 people?

@jpountz
Copy link
Contributor

jpountz commented Dec 21, 2023

Should I take your PR and check with beasting on Policeman's Ryzen to see if I can reproduce with the above command?

That would be great.

I have a Ryzen too, though a AMD Ryzen 9 3900X. I'm not sure what factors influence the reproducibility of this bug. The good news is that the new test on #12959 fails 100% of the time for me without the fix.

Is there an explanation why it works better when making the method synchronized as suggested by the OpenJ9 people?

My understanding is that the way that the bug manifests looks like this:

  • 2 threads T1 and T2 doing indexing,
  • a ConcurrentyApproximatePriorityQueue with concurrency = 2, ie. 2 sub queues, initially empty

And then the following sequence of events:
1 T1 tries to pull a DWPT from the queue, it's empty so DocumentsWriterPerThreadPool creates a new DWPT. There is now 1 DWPT.
2 T2 tries to pull a DWPT from the queue, it's empty (because the only one created so far is taken by T1) so DocumentsWriterPerThreadPool creates a new DWPT. There are now 2 DWPTs.
3 T2 indexes a document
4 T1 indexes a document
5 T2 puts the DWPT back into the sub queue at index 1
6 T1 puts the DWPT back into the sub queue at index 0
7 T1 starts pulling a DWPT from the queue
8 T2 pulls the DWPT from the sub queue at index 0
9 T1 checks the sub queue at index 0, it's empty (since T2 just took it) so it keeps going
10 T2 indexes a document
11 T2 puts the DWPT back into the sub queue at index 0
12 T2 pulls the DWPT from the sub queue at index 1
13 T1 checks the sub queue at index 1, it's empty (since T2 just took it) so it keeps going
14 T1 checked all sub queues and could not get a DWPT, so DocumentsWriterPerThreadPool creates a one. There are now 3 DWPTs even though there are only 2 indexing threads.

Making the method synchronized helps by forcing T2 to wait until T1 is done pulling a DWPT from the queue before starting doing the same. The PR tries to go a bit more granular to limit contention.

@uschindler
Copy link
Contributor

OK, it is running now with above command line (beasting) on OpenJDK Temurin 17.0.9.

@uschindler
Copy link
Contributor

Hi,
it still fails - this time in org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150

It is a different assertion error without message.

> Task :lucene:core:test_1
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
:lucene:core:test_1 (SUCCESS): 1000 test(s)

> Task :lucene:core:test_10
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
:lucene:core:test_10 (SUCCESS): 1000 test(s)

> Task :lucene:core:test_100
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
:lucene:core:test_100 (SUCCESS): 1000 test(s)

> Task :lucene:core:test_11
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
:lucene:core:test_11 (SUCCESS): 1000 test(s)

> Task :lucene:core:test_12
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release

org.apache.lucene.index.TestIndexWriterThreadsToSegments > testSegmentCountOnFlushRandom {seed=[F7B4CD7A5624D5EC:C3D392823281AA8D]} FAILED
    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=67, name=Thread-42, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=64, name=Thread-39, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=65, name=Thread-40, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=61, name=Thread-36, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=62, name=Thread-37, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=60, name=Thread-35, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=59, name=Thread-34, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=58, name=Thread-33, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=63, name=Thread-38, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
        at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
        at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
        at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

org.apache.lucene.index.TestIndexWriterThreadsToSegments > test suite's output saved to /home/thetaphi/repro/lucene/lucene/core/build/test-results/test_12/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriterThreadsToSegments.txt, copied below:
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-40,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-33,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-41,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.AssertionError
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.junit.Assert.fail(Assert.java:87)
  2>    at org.junit.Assert.assertTrue(Assert.java:42)
  2>    at org.junit.Assert.assertTrue(Assert.java:53)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-37,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-36,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-34,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-35,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-38,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-42,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-39,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=67, name=Thread-42, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=64, name=Thread-39, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=65, name=Thread-40, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=61, name=Thread-36, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=62, name=Thread-37, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=60, name=Thread-35, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=59, name=Thread-34, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=58, name=Thread-33, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=63, name=Thread-38, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     java.lang.AssertionError
   >         at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >         at org.junit.Assert.fail(Assert.java:87)
   >         at org.junit.Assert.assertTrue(Assert.java:42)
   >         at org.junit.Assert.assertTrue(Assert.java:53)
   >         at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
   >         at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
   >         at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >         at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2> NOTE: reproduce with: gradlew test --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.seed=F7B4CD7A5624D5EC -Dtests.locale=en-DK -Dtests.timezone=HST -Dtests.asserts=true -Dtests.file.encoding=UTF-8

:lucene:core:test_12 (FAILURE): 5 test(s), 1 failure(s), 1 skipped

5 tests completed, 1 failed, 1 skipped

@uschindler
Copy link
Contributor

Nah it is exactly same error.

I verified I am on right branch:

thetaphi@serv1:~/repro/lucene$ git status
On branch concurrent_pq_poll_non_null_on_non_empty_queue
Your branch is up to date with 'jpountz/concurrent_pq_poll_non_null_on_non_empty_queue'.

nothing to commit, working tree clean

@jpountz
Copy link
Contributor

jpountz commented Dec 21, 2023

Thanks I'll keep digging.

@uschindler
Copy link
Contributor

Did you run my above "beasting" command? It runs everything 100 times with each 1000 iterations. For me it fails in most cases after 3rd or fourth try.

@uschindler
Copy link
Contributor

Also keep to have the machine busy at same moment (Jenkins jobs are running at same time). This may increase chance to hit this.

@jpountz
Copy link
Contributor

jpountz commented Dec 21, 2023

Did you run my above "beasting" command? It runs everything 100 times with each 1000 iterations. For me it fails in most cases after 3rd or fourth try.

I tried it this morning and stopped it after some time (I don't remember how long, but tens of minutes). I just started it again and no failures in the first 6 rounds (40 minutes), I'll let it run overnight to see if I can get it to fail.

However I can get the test to fail with OpenJ9. Maybe I can use it to better understand what is happening.

Also keep to have the machine busy at same moment (Jenkins jobs are running at same time). This may increase chance to hit this.

Good idea.

@jpountz
Copy link
Contributor

jpountz commented Dec 23, 2023

It took a long time (3.5h) but it ended up failing. And then I could also get it to fail on my branch, with both Hotspot and J9. I'll keep looking, most likely after the holidays.

@uschindler
Copy link
Contributor

The latest PR seems to fix the issue.

jpountz added a commit that referenced this issue Jan 12, 2024
…on a non-empty queue. (#12959)

Before this change, `DocumentsWriterPerThread#getAndLock` could sometimes
return `null` even though the queue was empty at no point in time. The
practical implication is that we can end up with more DWPTs in memory than
indexing threads, which, while not strictly a bug, may require doing more
merging than we'd like later on.

I ran luceneutil's `IndexGeonames` with this change, and
`DocumentsWriterPerThread#getAndLock` was not the main source of
contention.

Closes #12649 #12916
jpountz added a commit that referenced this issue Jan 12, 2024
…on a non-empty queue. (#12959)

Before this change, `DocumentsWriterPerThread#getAndLock` could sometimes
return `null` even though the queue was empty at no point in time. The
practical implication is that we can end up with more DWPTs in memory than
indexing threads, which, while not strictly a bug, may require doing more
merging than we'd like later on.

I ran luceneutil's `IndexGeonames` with this change, and
`DocumentsWriterPerThread#getAndLock` was not the main source of
contention.

Closes #12649 #12916
slow-J pushed a commit to slow-J/lucene that referenced this issue Jan 16, 2024
…on a non-empty queue. (apache#12959)

Before this change, `DocumentsWriterPerThread#getAndLock` could sometimes
return `null` even though the queue was empty at no point in time. The
practical implication is that we can end up with more DWPTs in memory than
indexing threads, which, while not strictly a bug, may require doing more
merging than we'd like later on.

I ran luceneutil's `IndexGeonames` with this change, and
`DocumentsWriterPerThread#getAndLock` was not the main source of
contention.

Closes apache#12649 apache#12916
@jpountz
Copy link
Contributor

jpountz commented Feb 12, 2024

Closing as this issue has been addressed.

@jpountz jpountz closed this as completed Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants