Fix the race condition in realtime text index refresh thread (#6858) #6990
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The refresh thread maintains a single global queue of realtime text index readers across all consuming segments across all tables.
A reader is polled, refreshed and supposed to be added back to the queue so that it can be refreshed again in the next cycle.
There is a race condition: If we find the queue empty, but something gets added before we lock the mutex, the thread signal will be lost and it will be in await mode until next segment signals.
Set of steps for race condition
It will then await for signal on condition variable(Which is bug!) and might have to wait until some other segment signals it.
here are several ways to solve the above problem
i. When the thread enters https://github.com/apache/incubator-pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneIndexReaderRefreshThread.java#L68, it will try to acquire the lock.
ii. If it does acquire the lock, it will check if the queue is empty or not. Depending on state of queue, it will either skip the while loop or enter the while loop and await for the signal (and release the lock while await())
iii. If the thread at https://github.com/apache/incubator-pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneIndexRefreshState.java#L86 acquires lock first, then it will add the segment to queue and signal any condition variable if it awaits.
iv. In step 3, after segment is added to queue, segment thread will release the lock. and when background refresh thread now tries to acquire the lock, gets the lock. It then will check for while condition(if the queue is empty) which will be false after step 3 and never wait. In case any problem occurs within while loop, it being in try catch{} block, the lock acquired by this background refresh thread will be released in finally{}.
Essentially, the mutex ensures that only a single(background refresh thread/segment thread) thread updates/check the queue at any given moment.
Issue #6858
cc @siddharthteotia @mcvsubbu