New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MINOR: use monotonic clock for replica fetcher DelayedItem #8517
base: trunk
Are you sure you want to change the base?
Conversation
We perform an unnecessary conversion in the replica fetcher hotpath.
@@ -33,7 +33,11 @@ class DelayedItem(val delayMs: Long) extends Delayed with Logging { | |||
* The remaining delay time | |||
*/ | |||
def getDelay(unit: TimeUnit): Long = { | |||
unit.convert(max(dueMs - Time.SYSTEM.milliseconds, 0), TimeUnit.MILLISECONDS) | |||
unit.convert(getDelayMs, TimeUnit.MILLISECONDS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still used after this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it's only used in a test. So I suggest we remove it (as well as the Delayed implementation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. I'll drop it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ijuma I have removed getDelay and we no longer implement Delayed. In doing so I also switched the fetcher to a monotonic clock, as our existing implementation is dangerous.
ok to test |
Same test failed in both jobs:
Seems unrelated, but will retest just in case. |
retest this please |
Unrelated flaky test:
|
@@ -21,24 +21,19 @@ import java.util.concurrent._ | |||
|
|||
import org.apache.kafka.common.utils.Time | |||
|
|||
import scala.math._ | |||
class DelayedItem(val delayMs: Long, val time: Time) extends Logging { | |||
def this(delayMs: Long) = this(delayMs, Time.SYSTEM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? Can we not pass the time
from every caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, I was just trying to avoid the replica fetcher having to pass Time.SYSTEM in everywhere. If you prefer that I can change it. Is there any downside what I did though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should pass the actual time
instance so that we can mock if we want. It has one in its constructor already:
class ReplicaFetcherThread(name: String,
fetcherId: Int,
sourceBroker: BrokerEndPoint,
brokerConfig: KafkaConfig,
failedPartitions: FailedPartitions,
replicaMgr: ReplicaManager,
metrics: Metrics,
time: Time,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yes, I was just coming back to say your suggestion was better for testability reasons.
ok to test |
@@ -21,24 +21,19 @@ import java.util.concurrent._ | |||
|
|||
import org.apache.kafka.common.utils.Time | |||
|
|||
import scala.math._ | |||
class DelayedItem(val delayMs: Long, val time: Time) extends Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does time
need to be a public val
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll make it private.
@@ -60,6 +61,7 @@ class ReplicaAlterLogDirsThreadTest { | |||
|
|||
when(replicaManager.futureLogExists(t1p0)).thenReturn(false) | |||
|
|||
val time = new SystemTime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use Time.SYSTEM
, not create a new instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider making SystemTime
package private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I followed the abstract fetcher tests which created new SystemTime(s) too. I'll see if it's easy to make SystemTime package private.
I updated the PR to pass Time everywhere that's required. Note that I did not switch the fetcher tests to use MockTime as it appears they are timing dependent and may depend on AbstractFetcherThread's use of await, so I would want to make the change carefully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
retest this please |
@ijuma can this be merged? |
ok to test |
@lbradstreet Compile errors:
|
@ijuma thanks, I probably should have merged in before asking. |
@ijuma the updated PR should compile OK. I removed the change to make SystemTime package private as it seems to be used in other tooling now. |
ok to test |
retest this please |
ok to test |
retest this please |
|
@chia7712 thanks, nice catch. That does make things trickier. |
Switches the replica fetcher to use a monotonic clock to calculate delays.
In doing so an unnecessary time conversion was removed the replica fetcher hot path. This is a small but significant (2.2%) cost when testing clusters with high replica counts when throttling has engaged (see attached profile).