New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-15481: Fix concurrency bug in RemoteIndexCache #14483
Merged
Merged
Changes from 14 commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
84fff3e
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 79d0e8a
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 0904561
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 6a3bc42
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 b338b45
Merge branch 'trunk' into KAFKA-15481
jeel2420 48a82eb
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 8ea3985
Merge branch 'trunk' into KAFKA-15481
jeel2420 9211e6b
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 cacf9f1
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 07ca66b
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 419f596
Merge branch 'trunk' into KAFKA-15481
jeel2420 851593d
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 6b4fae6
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 c35c3b2
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 882864d
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 71fcad5
KAFKA-15481: Fix concurrency bug in RemoteIndexCache
jeel2420 File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeel2420 Why we are invoking this again
invocation.callRealMethod()
It is already called in
712 line when(spyEntry).markForCleanup()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
invocation.callRealMethod() is called to call the markForCleanup() after read is called and before we start asserting to make sure indexes get renamed before we assert the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeel2420
markForCleanUp
should be called only one time. We need to test the behaviour when there are concurrent read/remove happens on the cache for the same entry.In the test we just need to assert the way @showuon suggested
// So, maybe we verify with this:
if (Files.exists(entry.offsetIndex().file().toPath)) {
assertCacheSize(1)
} else {
assertCacheSize(0)
}
Calling '
markForCleanUp
' twice will always result in cacheSize 0 eventually.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iit2009060 Read
markForCleanUp()
is not getting called twice. Please see the mock, inside that during the first execution ofmarkForCleanUp()
I am calling the actual markForCleanup() function (i.e Index are getting renamed) but for subsequent calls, mock is doing nothing so we actual markForCleanup() function to rename the indexes is getting called once only and it is as expected.I have verified this behaviour as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeel2420 During read
MarkforCleanUp
should not be called not even once as per the functionality.Then why we need to call it explicitly again here
invocation.callRealMethod()
.I am seeing two invocation of
markForCleanUp
}).when(spyEntry).markForCleanup()
712 line noinvocation.callRealMethod()
708 line noThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, but they are "different"
markForCleanUp
.For (1), the
markForCleanUp
is an injected method for controlling the invoking order. So there are latches wait/countdown.For (2), it's the real
markForCleanUp
method to rename the cache files.The goal is to simulate the race condition happened in KAFKA-15481.
Yes, I think so. But in some cases, there could be 1 if getEntry goes after. The thread management are all decided by OS, we can't assure which one will go first, right?
I think the goal of this test is to make sure the issue in KAFKA-15481 will not happen again. That's why I added this comment.
I'm not following you here. What we're doing in this test is to
read and remove concurrently in the separate thread
. Aboutvalidate the cacheSize based on the order of execution
, since we can't assure which thread will be executed first, we can't do this, right? If we can decide the execution order, then it means they are not running concurrently, is that right?Maybe you can show us if it were you, what test you'll create. Some pseudo code are enough. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@showuon
For (1), the markForCleanUp is an injected method for controlling the invoking order. So there are latches wait/countdown.
Do you mean this is a mock method and no rename would happen in this case ?
Effectively the functionality/logic of
markCleanUp
is called one time only ?I was thinking something like this
In the test case mentioned in the jira KAFKA-15481
the execution order is remove,read and the overall result is cache size 0 which is wrong because of timegap between removal and renaming the files. Here we are validating the same with rsm call count. If they are atomic rsm execution should happen and files should be restored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
Yes, they are basically similar with what we have now. By injecting mock implementation for
markForCleanUp
is just to make the 2 thread execution more close. In the end, what we have now is to invoke "realMethod", which is what you did above. I'm fine if you think we should validate the rsm call count. But again, they are basically testing the same thing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@showuon Yes correct , it is testing the same thing. I am also fine. But from readability perspective the one I propose is simpler to understand and does not require any future change if markForCleanUp function changed. I left @jeel2420 to make a decision here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iit2009060 As current test case is able to reproduce the case mentioned in the jira KAFKA-15481 I think we should be fine.
The only reason to have
markForCleanUp
mock is to have control over the 2 thread execution.