-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-16184. De-flake TestBlockScanner#testSkipRecentAccessFile #3329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
Can you update the description with the reason for failure and detail about the fix |
|
Done @ayushtkn. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanx @virajjasani for the fix, I tried reproducing this with a simple sleep, and got the same exception, and with the your fix that got sorted.
Changes LGTM
Will wait some time for Takanobu before pushing this.
|
Thanks @ayushtkn for detailed review with cross-verification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice fix, @virajjasani. LGTM.
|
Also, it's good to see two consecutive +1s from QA bot for hadoop-hdfs. Haven't seen more than single +1 from QA build for long time on trunk. |
|
Merged. Thanks for your contribution, @virajjasani. Thanks for your review, @ayushtkn.
That's true. QA results seem pretty good recently. Really thanks for fixing several flaky tests, @virajjasani. |
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org> Signed-off-by: Takanobu Asanuma <tasanuma@apache.org> (cherry picked from commit 1b9927a)
Thanks @tasanuma for all the reviews!! |
…e#3329) Reviewed-by: Ayush Saxena <ayushsaxena@apache.org> Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
Description of PR
Test TestBlockScanner#testSkipRecentAccessFile is flaky:
Reason for failure:
Thread in VolumnScanner keeps scanning blocks. Before block scan, it executes
TestScanResultHandler#setupand it keeps waiting oninfoobject until main thread intestSkipRecentAccessFile()notifiesinfoobject and sets it'sshouldRunto true for VolumnScanner thread to successfully return fromTestScanResultHandler#setup. Now in main test, we try to assert thatinfo.blocksScannedstays 0 which is only possible if VolumnScanner's block scanner thread does not reachinfo.blocksScanned++point, which cannot be guaranteed always and hence this test is flaky.Fix:
In order to fix this, the main thread in
testSkipRecentAccessFile()should initializeinfo.semas Semaphore (permitting only single thread to take a lock, Mutex) and take a lock by main thread so that until released, block scan thread cannot reach pointinfo.blocksScanned++and hence the test never fails. Before the block scanner thread reach the point of incrementing blocksScanned count, it has to acquire Semaphore and it gets blocked here until main thread releases the lock.How was this patch tested?
Unit tests