Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? #12916

mikemccand · 2023-12-11T15:13:20Z

Description

There is an exciting upstream (OpenJ9) comment here (thank you @singh264), copied below:

The intermittent assertion failure in testSegmentCountOnFlushRandom seems to have started after bae5b33 was added in the lucene repo, and a solution to fix the intermittent assertion failure is to make the DocumentsWriterPerThreadPool.getAndLock() method synchronized as it seems to fix a race condition that occurs while multiple threads concurrently add documents to an index during the test.

I have not looked closely yet ...

Version and environment details

No response

The text was updated successfully, but these errors were encountered:

uschindler · 2023-12-18T23:58:13Z

Hi,
The proposed fix is looking fine to me. Please note that all other accessible methods in that class are synchronized (or private ones implicitly synchronized as only called from synchronized ones).

Just this one isn't and this causes trouble.

I think the fix mentioned here is fine and fixes a real bug which may also happen on Hotspot on some other hardware platform. It does not fail on x86 with hotspot. ARM could fail due to other memory behaviour and order.

I will provide a PR to add synchronized tomorrow.

jpountz · 2023-12-20T17:12:02Z

Does someone understand if adding synchronization is fixing a real bug of it it just helps hide a J9 bug? This method is subject to contention and #12199 was about avoiding locking on this method, which proved to help significantly when indexing cheap documents on several threads. I'm looking at the test and the code and can't see what sort of race condition may cause this test to fail.

uschindler · 2023-12-20T17:39:58Z

This is a real bug and not one of openj9. You can reproduce this bug with enough tries on hotspot, too.

uschindler · 2023-12-20T17:41:01Z

They made statistics in the linked issue. Hotspot also fails. So they rejected it as openj9 issue.

jpountz · 2023-12-20T17:42:23Z

Thanks I had missed that, I'll look more into it.

uschindler · 2023-12-20T17:42:54Z

See this comment: eclipse-openj9/openj9#18400 (comment)

jpountz · 2023-12-20T18:10:48Z

Thanks. I'm trying to reproduce failures locally with the following command, without luck so far with JDK 17 and JDK 21. I'll dig more tomorrow. If any of you manages to reproduce, I'm interested in the command that you used.

$ ./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC test --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000

uschindler · 2023-12-20T18:15:55Z

Thanks. Maybe the OpenJ9 people can help how they reproduced. You can use gradlew beast instead of gradlew test to run the forked copies inside gradle.

More looking into that code; I think the whole code here (and IndexWriter in general) should be freed of ancient "synchronized" blocks and methods and should instead use the more modern Java synchronization patterns inclusive volatile / opaque reads and barriers. The problematic concurrency could for sure be solved with some better algorithm that allows to read lockfree and only write with locks (e.g., ReadWriteLock instead of synchronized).

uschindler · 2023-12-20T19:11:34Z

I made test fail on my AMD Ryzen 3700 (the Policeman Jenkins Sever) with Hotspot 17.0.9:

$ java -version
openjdk version "17.0.9" 2023-10-17
OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9)
OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, sharing)

$ ./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC beast --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000 -Ptests.dups=100

It failed on the 3rd beasting round:

> Task :lucene:core:test_1
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
:lucene:core:test_1 (SUCCESS): 1000 test(s)

> Task :lucene:core:test_10
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
:lucene:core:test_10 (SUCCESS): 1000 test(s)

> Task :lucene:core:test_100
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release

org.apache.lucene.index.TestIndexWriterThreadsToSegments > testSegmentCountOnFlushRandom {seed=[F7B4CD7A5624D5EC:A0DDCC17DE66DB34]} FAILED
    java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
        at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
        at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
        at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1006, name=Thread-981, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1004, name=Thread-979, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1002, name=Thread-977, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=999, name=Thread-974, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1000, name=Thread-975, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1003, name=Thread-978, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1001, name=Thread-976, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1005, name=Thread-980, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=998, name=Thread-973, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

org.apache.lucene.index.TestIndexWriterThreadsToSegments > test suite's output saved to /home/thetaphi/repro/lucene/lucene/core/build/test-results/test_100/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriterThreadsToSegments.txt, copied below:
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-977,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-979,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-981,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-974,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-976,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-973,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-982,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.AssertionError
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.junit.Assert.fail(Assert.java:87)
  2>    at org.junit.Assert.assertTrue(Assert.java:42)
  2>    at org.junit.Assert.assertTrue(Assert.java:53)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-975,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-978,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 20, 2023 8:49:28 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-980,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
   >     java.lang.AssertionError
   >         at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >         at org.junit.Assert.fail(Assert.java:87)
   >         at org.junit.Assert.assertTrue(Assert.java:42)
   >         at org.junit.Assert.assertTrue(Assert.java:53)
   >         at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
   >         at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
   >         at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >         at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1006, name=Thread-981, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1004, name=Thread-979, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1002, name=Thread-977, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=999, name=Thread-974, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1000, name=Thread-975, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1003, name=Thread-978, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1001, name=Thread-976, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1005, name=Thread-980, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=998, name=Thread-973, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2> NOTE: reproduce with: gradlew test --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.seed=F7B4CD7A5624D5EC -Dtests.locale=en-DK -Dtests.timezone=HST -Dtests.asserts=true -Dtests.file.encoding=UTF-8

:lucene:core:test_100 (FAILURE): 99 test(s), 1 failure(s), 1 skipped

99 tests completed, 1 failed, 1 skipped

> Task :lucene:core:test_100 FAILED

singh264 · 2023-12-20T19:25:10Z

If any of you manages to reproduce, I'm interested in the command that you used

I was able to reproduce the failure in TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom in branch_9x on a x86_64 Linux machine:

$JAVA_HOME/bin/java -version
java version "17.0.9" 2023-10-17 LTS
Java(TM) SE Runtime Environment (build 17.0.9+11-LTS-201)
Java HotSpot(TM) 64-Bit Server VM (build 17.0.9+11-LTS-201, mixed mode, sharing)

$RUNTIME_JAVA_HOME/bin/java -version
openjdk version "17.0.9-internal" 2023-10-17
OpenJDK Runtime Environment (build 17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17)
Eclipse OpenJ9 VM (build openj9-0.41.0, JRE 17 Linux amd64-64-Bit Compressed References 20231124_000000 (JIT enabled, AOT enabled)
OpenJ9   - 461bf3c70
OMR      - 5eee6ad9d
JCL      - 3699725139c based on jdk-17.0.9+9)

./gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC test --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000

...

> Task :altJvmWarning
NOTE: Alternative java toolchain will be used for compilation and tests:
  Project will use 17 (Eclipse OpenJ9 JDK 17.0.9-internal+0-adhoc.root.openj9-openjdk-jdk17, home at: /root/openj9_issues_18400/openj9-openjdk-jdk17/build/linux-x86_64-server-release/images/jdk)
  Gradle runs with 17 (Oracle JDK 17.0.9+11-LTS-201, home at: /root/openj9_issues_18400/jdk-17.0.9)

...

> Task :lucene:core:test
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by java.lang.System
WARNING: Please consider reporting this to the maintainers of java.lang.System
WARNING: System::setSecurityManager will be removed in a future release

org.apache.lucene.index.TestIndexWriterThreadsToSegments > testSegmentCountOnFlushRandom {seed=[F7B4CD7A5624D5EC:59A28958CC8D8396]} FAILED
    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=126, name=Thread-97, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=127, name=Thread-98, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=130, name=Thread-101, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=125, name=Thread-96, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=129, name=Thread-100, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=122, name=Thread-93, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=131, name=Thread-102, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=123, name=Thread-94, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=128, name=Thread-99, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
        at app//org.junit.Assert.fail(Assert.java:87)
        at app//org.junit.Assert.assertTrue(Assert.java:42)
        at app//org.junit.Assert.assertTrue(Assert.java:53)
        at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
        at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
        at java.base@17.0.9-internal/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
        at app//org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
<============-> 93% EXECUTING [3m 57s]
> :lucene:core:test > Executing test org.apache.lucene.index.TestIndexWriterThreadsToSegments
> IDLE
> IDLE
> :lucene:core:test > 18 tests completed, 1 failed

…ll` on a non-empty queue. Before this change, `ConcurrentApproximatePriorityQueue#poll` could sometimes return `null` even though the queue was empty at no point in time. The practical implication is that we can end up with more DWPTs in memory than indexing threads, which, while not a bug, is probably not efficient. I ran luceneutil's `IndexGeonames` with this change and could not notice a slowdown while this benchmark has been good at exhibiting contention issues in the past. This is likely due to the fact that most of the time, a DWPT can be obtained from the pool using the optimistic path, which doesn't block. Closes apache#12649 apache#12916

…ll` on a non-empty queue. Before this change, `ConcurrentApproximatePriorityQueue#poll` could sometimes return `null` even though the queue was empty at no point in time. The practical implication is that we can end up with more DWPTs in memory than indexing threads, which, while not strictly a bug, may require doing more merging than we'd like later on. I ran luceneutil's `IndexGeonames` with this change, and `ConcurrentApproximatePriorityQueue#poll` was not the main source of contention. I instrumented the code to check how many DWPTs got pulled from the queue using the optimistic path vs. pessimistic path and got 8525598 for the optimistic path vs. 12247 for the pessimistic path. Closes apache#12649 apache#12916

jpountz · 2023-12-21T11:02:07Z

I was not able to reproduce the failure, but could create an isolated test on ConcurrentApproximatePriorityQueue that consistently reproduces what I think is the same problem. I opened a PR at #12959.

uschindler · 2023-12-21T11:42:35Z

Ok, maybe the issue happens more often with OpenJ9 or AMD Ryzen CPUs. Should I take your PR and check with beasting on Policeman's Ryzen to see if I can reproduce with the above command?

I can also test OpenJ9.

uschindler · 2023-12-21T11:45:47Z

Is there an explanation why it works better when making the method synchronized as suggested by the OpenJ9 people?

jpountz · 2023-12-21T13:08:52Z

Should I take your PR and check with beasting on Policeman's Ryzen to see if I can reproduce with the above command?

That would be great.

I have a Ryzen too, though a AMD Ryzen 9 3900X. I'm not sure what factors influence the reproducibility of this bug. The good news is that the new test on #12959 fails 100% of the time for me without the fix.

Is there an explanation why it works better when making the method synchronized as suggested by the OpenJ9 people?

My understanding is that the way that the bug manifests looks like this:

2 threads T1 and T2 doing indexing,
a ConcurrentyApproximatePriorityQueue with concurrency = 2, ie. 2 sub queues, initially empty

And then the following sequence of events:
1 T1 tries to pull a DWPT from the queue, it's empty so DocumentsWriterPerThreadPool creates a new DWPT. There is now 1 DWPT.
2 T2 tries to pull a DWPT from the queue, it's empty (because the only one created so far is taken by T1) so DocumentsWriterPerThreadPool creates a new DWPT. There are now 2 DWPTs.
3 T2 indexes a document
4 T1 indexes a document
5 T2 puts the DWPT back into the sub queue at index 1
6 T1 puts the DWPT back into the sub queue at index 0
7 T1 starts pulling a DWPT from the queue
8 T2 pulls the DWPT from the sub queue at index 0
9 T1 checks the sub queue at index 0, it's empty (since T2 just took it) so it keeps going
10 T2 indexes a document
11 T2 puts the DWPT back into the sub queue at index 0
12 T2 pulls the DWPT from the sub queue at index 1
13 T1 checks the sub queue at index 1, it's empty (since T2 just took it) so it keeps going
14 T1 checked all sub queues and could not get a DWPT, so DocumentsWriterPerThreadPool creates a one. There are now 3 DWPTs even though there are only 2 indexing threads.

Making the method synchronized helps by forcing T2 to wait until T1 is done pulling a DWPT from the queue before starting doing the same. The PR tries to go a bit more granular to limit contention.

uschindler · 2023-12-21T14:48:34Z

OK, it is running now with above command line (beasting) on OpenJDK Temurin 17.0.9.

uschindler · 2023-12-21T16:25:11Z

Hi,
it still fails - this time in org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150

It is a different assertion error without message.

> Task :lucene:core:test_1
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
:lucene:core:test_1 (SUCCESS): 1000 test(s)

> Task :lucene:core:test_10
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
:lucene:core:test_10 (SUCCESS): 1000 test(s)

> Task :lucene:core:test_100
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
:lucene:core:test_100 (SUCCESS): 1000 test(s)

> Task :lucene:core:test_11
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
:lucene:core:test_11 (SUCCESS): 1000 test(s)

> Task :lucene:core:test_12
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release

org.apache.lucene.index.TestIndexWriterThreadsToSegments > testSegmentCountOnFlushRandom {seed=[F7B4CD7A5624D5EC:C3D392823281AA8D]} FAILED
    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=67, name=Thread-42, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=64, name=Thread-39, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=65, name=Thread-40, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=61, name=Thread-36, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=62, name=Thread-37, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=60, name=Thread-35, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=59, name=Thread-34, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=58, name=Thread-33, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=63, name=Thread-38, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]

        Caused by:
        java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
            at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
            at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)

            Caused by:
            java.util.concurrent.BrokenBarrierException
                at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
                at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
                at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

    java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
        at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
        at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
        at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)

org.apache.lucene.index.TestIndexWriterThreadsToSegments > test suite's output saved to /home/thetaphi/repro/lucene/lucene/core/build/test-results/test_12/outputs/OUTPUT-org.apache.lucene.index.TestIndexWriterThreadsToSegments.txt, copied below:
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-40,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-33,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-41,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.AssertionError
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.junit.Assert.fail(Assert.java:87)
  2>    at org.junit.Assert.assertTrue(Assert.java:42)
  2>    at org.junit.Assert.assertTrue(Assert.java:53)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-37,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-36,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-34,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-35,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-38,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-42,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
  2> Dec 21, 2023 5:16:25 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
  2> WARNING: Uncaught exception in thread: Thread[Thread-39,5,TGRP-TestIndexWriterThreadsToSegments]
  2> java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
  2>    at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
  2> Caused by: java.util.concurrent.BrokenBarrierException
  2>    at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
  2>    at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
  2>    at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2>
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=67, name=Thread-42, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=64, name=Thread-39, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=65, name=Thread-40, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=61, name=Thread-36, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=62, name=Thread-37, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=60, name=Thread-35, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=59, name=Thread-34, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=58, name=Thread-33, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=63, name=Thread-38, state=RUNNABLE, group=TGRP-TestIndexWriterThreadsToSegments]
   >
   >         Caused by:
   >         java.lang.RuntimeException: java.util.concurrent.BrokenBarrierException
   >             at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >             at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:239)
   >
   >             Caused by:
   >             java.util.concurrent.BrokenBarrierException
   >                 at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:252)
   >                 at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >                 at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
   >
   >     java.lang.AssertionError
   >         at __randomizedtesting.SeedInfo.seed([F7B4CD7A5624D5EC]:0)
   >         at org.junit.Assert.fail(Assert.java:87)
   >         at org.junit.Assert.assertTrue(Assert.java:42)
   >         at org.junit.Assert.assertTrue(Assert.java:53)
   >         at org.apache.lucene.index.TestIndexWriterThreadsToSegments$CheckSegmentCount.run(TestIndexWriterThreadsToSegments.java:150)
   >         at java.base/java.util.concurrent.CyclicBarrier.dowait(CyclicBarrier.java:222)
   >         at java.base/java.util.concurrent.CyclicBarrier.await(CyclicBarrier.java:364)
   >         at org.apache.lucene.index.TestIndexWriterThreadsToSegments$2.run(TestIndexWriterThreadsToSegments.java:236)
  2> NOTE: reproduce with: gradlew test --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.seed=F7B4CD7A5624D5EC -Dtests.locale=en-DK -Dtests.timezone=HST -Dtests.asserts=true -Dtests.file.encoding=UTF-8

:lucene:core:test_12 (FAILURE): 5 test(s), 1 failure(s), 1 skipped

5 tests completed, 1 failed, 1 skipped

uschindler · 2023-12-21T17:50:00Z

Nah it is exactly same error.

I verified I am on right branch:

thetaphi@serv1:~/repro/lucene$ git status
On branch concurrent_pq_poll_non_null_on_non_empty_queue
Your branch is up to date with 'jpountz/concurrent_pq_poll_non_null_on_non_empty_queue'.

nothing to commit, working tree clean

jpountz · 2023-12-21T18:18:41Z

Thanks I'll keep digging.

uschindler · 2023-12-21T18:21:47Z

Did you run my above "beasting" command? It runs everything 100 times with each 1000 iterations. For me it fails in most cases after 3rd or fourth try.

uschindler · 2023-12-21T18:22:23Z

Also keep to have the machine busy at same moment (Jenkins jobs are running at same time). This may increase chance to hit this.

jpountz · 2023-12-21T20:45:13Z

Did you run my above "beasting" command? It runs everything 100 times with each 1000 iterations. For me it fails in most cases after 3rd or fourth try.

I tried it this morning and stopped it after some time (I don't remember how long, but tens of minutes). I just started it again and no failures in the first 6 rounds (40 minutes), I'll let it run overnight to see if I can get it to fail.

However I can get the test to fail with OpenJ9. Maybe I can use it to better understand what is happening.

Also keep to have the machine busy at same moment (Jenkins jobs are running at same time). This may increase chance to hit this.

Good idea.

jpountz · 2023-12-23T06:20:20Z

It took a long time (3.5h) but it ended up failing. And then I could also get it to fail on my branch, with both Hotspot and J9. I'll keep looking, most likely after the holidays.

uschindler · 2024-01-11T18:28:53Z

The latest PR seems to fix the issue.

…on a non-empty queue. (#12959) Before this change, `DocumentsWriterPerThread#getAndLock` could sometimes return `null` even though the queue was empty at no point in time. The practical implication is that we can end up with more DWPTs in memory than indexing threads, which, while not strictly a bug, may require doing more merging than we'd like later on. I ran luceneutil's `IndexGeonames` with this change, and `DocumentsWriterPerThread#getAndLock` was not the main source of contention. Closes #12649 #12916

…on a non-empty queue. (apache#12959) Before this change, `DocumentsWriterPerThread#getAndLock` could sometimes return `null` even though the queue was empty at no point in time. The practical implication is that we can end up with more DWPTs in memory than indexing threads, which, while not strictly a bug, may require doing more merging than we'd like later on. I ran luceneutil's `IndexGeonames` with this change, and `DocumentsWriterPerThread#getAndLock` was not the main source of contention. Closes apache#12649 apache#12916

jpountz · 2024-02-12T07:40:00Z

Closing as this issue has been addressed.

mikemccand added the type:bug label Dec 11, 2023

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Dec 11, 2023

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Dec 11, 2023

uschindler self-assigned this Dec 18, 2023

uschindler added this to the 9.10.0 milestone Dec 18, 2023

jpountz mentioned this issue Dec 21, 2023

Make sure DocumentsWriterPerThread#getAndLock never returns null on a non-empty queue. #12959

Merged

jpountz closed this as completed Feb 12, 2024

github-project-automation bot moved this from Open to Closed in OpenSearch Lucene & Core Performance Tracking Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? #12916

Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? #12916

mikemccand commented Dec 11, 2023

uschindler commented Dec 18, 2023

jpountz commented Dec 20, 2023

uschindler commented Dec 20, 2023

uschindler commented Dec 20, 2023

jpountz commented Dec 20, 2023

uschindler commented Dec 20, 2023

jpountz commented Dec 20, 2023

uschindler commented Dec 20, 2023 •

edited

Loading

uschindler commented Dec 20, 2023 •

edited

Loading

singh264 commented Dec 20, 2023

jpountz commented Dec 21, 2023

uschindler commented Dec 21, 2023

uschindler commented Dec 21, 2023

jpountz commented Dec 21, 2023

uschindler commented Dec 21, 2023

uschindler commented Dec 21, 2023

uschindler commented Dec 21, 2023

jpountz commented Dec 21, 2023

uschindler commented Dec 21, 2023

uschindler commented Dec 21, 2023

jpountz commented Dec 21, 2023 •

edited

Loading

jpountz commented Dec 23, 2023

uschindler commented Jan 11, 2024

jpountz commented Feb 12, 2024

Concurrency bug DocumentsWriterPerThreadPool.getAndLock() uncovered by OpenJ9 test failures? #12916

Concurrency bug DocumentsWriterPerThreadPool.getAndLock() uncovered by OpenJ9 test failures? #12916

Comments

mikemccand commented Dec 11, 2023

Description

Version and environment details

uschindler commented Dec 18, 2023

jpountz commented Dec 20, 2023

uschindler commented Dec 20, 2023

uschindler commented Dec 20, 2023

jpountz commented Dec 20, 2023

uschindler commented Dec 20, 2023

jpountz commented Dec 20, 2023

uschindler commented Dec 20, 2023 • edited Loading

uschindler commented Dec 20, 2023 • edited Loading

singh264 commented Dec 20, 2023

jpountz commented Dec 21, 2023

uschindler commented Dec 21, 2023

uschindler commented Dec 21, 2023

jpountz commented Dec 21, 2023

uschindler commented Dec 21, 2023

uschindler commented Dec 21, 2023

uschindler commented Dec 21, 2023

jpountz commented Dec 21, 2023

uschindler commented Dec 21, 2023

uschindler commented Dec 21, 2023

jpountz commented Dec 21, 2023 • edited Loading

jpountz commented Dec 23, 2023

uschindler commented Jan 11, 2024

jpountz commented Feb 12, 2024

Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? #12916

Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? #12916

uschindler commented Dec 20, 2023 •

edited

Loading

uschindler commented Dec 20, 2023 •

edited

Loading

jpountz commented Dec 21, 2023 •

edited

Loading