Make sure `DocumentsWriterPerThread#getAndLock` never returns `null` on a non-empty queue. #12959

jpountz · 2023-12-21T10:43:06Z

Before this change, DocumentsWriterPerThread#getAndLock could sometimes
return null even though the queue was empty at no point in time. The
practical implication is that we can end up with more DWPTs in memory than
indexing threads, which, while not strictly a bug, may require doing more
merging than we'd like later on.

I ran luceneutil's IndexGeonames with this change, and
DocumentsWriterPerThread#getAndLock was not the main source of
contention.

Closes #12649 #12916

…ll` on a non-empty queue. Before this change, `ConcurrentApproximatePriorityQueue#poll` could sometimes return `null` even though the queue was empty at no point in time. The practical implication is that we can end up with more DWPTs in memory than indexing threads, which, while not strictly a bug, may require doing more merging than we'd like later on. I ran luceneutil's `IndexGeonames` with this change, and `ConcurrentApproximatePriorityQueue#poll` was not the main source of contention. I instrumented the code to check how many DWPTs got pulled from the queue using the optimistic path vs. pessimistic path and got 8525598 for the optimistic path vs. 12247 for the pessimistic path. Closes apache#12649 apache#12916

uschindler · 2023-12-21T18:08:55Z

It looks like this does not fix the issue in #12916. The bug is there but it does not look like the cause for #12916.

dweiss · 2023-12-21T21:47:07Z

For what it's worth, I've tried reproducing this on Adrien's branch with:

gradlew -p lucene/core -Dtests.seed=F7B4CD7A5624D5EC
beast --tests TestIndexWriterThreadsToSegments.testSegmentCountOnFlushRandom -Dtests.jvmargs="-XX:+UseCompressedOops" -Ptests.iters=1000 -Ptests.dups=100

but nope, no fails.

uschindler · 2023-12-21T23:14:00Z

I get it to fail with both OpenJ9 and Hotspot on this branch, see #12916

jpountz · 2023-12-23T19:37:05Z

OK, I think I understand why the test is still failing, this seems to be because we unlock the DWPT after putting it back into the queue. But there is also a good reason to do so. I need to think more about it.

jpountz · 2024-01-08T11:34:21Z

I have a new iteration of this change, which now also accounts for how DocumentsWriterPerThreadPool may lock DocumentsWriterPerThread instances while they are still in the pool. I'm at 2 hours of beasting with J9 without a failure so far.

jpountz · 2024-01-11T15:41:28Z

If there are no concerns, I plan on merging this PR soon.

uschindler · 2024-01-11T15:54:14Z

Hi,
sorry I missed to test this. I started my beasting and try to reproduce it.
Will report back.

uschindler

I beasted this PR and it produced on my Ryzen CPU under load no failures. So it looks like it fixed the issue.

The AtomicInteger is only incremented so at some point it overflows. But when analyzing the code, the actual number does not matter, it just needs to be different to retry, correct? So when it overflows it does not do any harm.

uschindler · 2024-01-11T18:27:19Z

FYI, I did not try to run the beasting for hours because previously it was failing very fast on a heavy loaded Ryzen CPU with both Hotspot and OpenJ9. Now its stable, so I trust the results also after 2 hours (1 hour with Hotspot, 1 hour with OpenJ9).

jpountz · 2024-01-12T15:19:56Z

Thanks a lot @uschindler !

…on a non-empty queue. (#12959) Before this change, `DocumentsWriterPerThread#getAndLock` could sometimes return `null` even though the queue was empty at no point in time. The practical implication is that we can end up with more DWPTs in memory than indexing threads, which, while not strictly a bug, may require doing more merging than we'd like later on. I ran luceneutil's `IndexGeonames` with this change, and `DocumentsWriterPerThread#getAndLock` was not the main source of contention. Closes #12649 #12916

…on a non-empty queue. (apache#12959) Before this change, `DocumentsWriterPerThread#getAndLock` could sometimes return `null` even though the queue was empty at no point in time. The practical implication is that we can end up with more DWPTs in memory than indexing threads, which, while not strictly a bug, may require doing more merging than we'd like later on. I ran luceneutil's `IndexGeonames` with this change, and `DocumentsWriterPerThread#getAndLock` was not the main source of contention. Closes apache#12649 apache#12916

jpountz mentioned this pull request Dec 21, 2023

Concurrency bug DocumentsWriterPerThreadPool.getAndLock() uncovered by OpenJ9 test failures? #12916

Closed

iter

efce8c1

uschindler approved these changes Jan 11, 2024

View reviewed changes

jpountz changed the title ~~Make sure ConcurrentApproximatePriorityQueue#poll never returns null on a non-empty queue.~~ Make sure DocumentsWriterPerThread#getAndLock never returns null on a non-empty queue. Jan 12, 2024

jpountz merged commit e0daca1 into apache:main Jan 12, 2024
4 checks passed

jpountz deleted the concurrent_pq_poll_non_null_on_non_empty_queue branch January 12, 2024 15:21

uschindler added this to the 9.10.0 milestone Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure `DocumentsWriterPerThread#getAndLock` never returns `null` on a non-empty queue. #12959

Make sure `DocumentsWriterPerThread#getAndLock` never returns `null` on a non-empty queue. #12959

jpountz commented Dec 21, 2023 •

edited

uschindler commented Dec 21, 2023

dweiss commented Dec 21, 2023

uschindler commented Dec 21, 2023

jpountz commented Dec 23, 2023

jpountz commented Jan 8, 2024

jpountz commented Jan 11, 2024

uschindler commented Jan 11, 2024

uschindler left a comment •

edited

uschindler commented Jan 11, 2024

jpountz commented Jan 12, 2024

Make sure DocumentsWriterPerThread#getAndLock never returns null on a non-empty queue. #12959

Make sure DocumentsWriterPerThread#getAndLock never returns null on a non-empty queue. #12959

Conversation

jpountz commented Dec 21, 2023 • edited

uschindler commented Dec 21, 2023

dweiss commented Dec 21, 2023

uschindler commented Dec 21, 2023

jpountz commented Dec 23, 2023

jpountz commented Jan 8, 2024

jpountz commented Jan 11, 2024

uschindler commented Jan 11, 2024

uschindler left a comment • edited

Choose a reason for hiding this comment

uschindler commented Jan 11, 2024

jpountz commented Jan 12, 2024

Make sure `DocumentsWriterPerThread#getAndLock` never returns `null` on a non-empty queue. #12959

Make sure `DocumentsWriterPerThread#getAndLock` never returns `null` on a non-empty queue. #12959

jpountz commented Dec 21, 2023 •

edited

uschindler left a comment •

edited