Skip to content

Commit 56efdf8

Browse files
committed
[KYUUBI #2594] Fix flaky Test - support engine alive probe to fast fail on engine broken
### _Why are the changes needed?_ To close #2594 For the liveness probe, it marks the remote engine broken if the last alive time detection is timeout. ``` if (now - engineLastAlive > engineAliveTimeout) { error("Mark the engine not alive with no recent alive probe success:" + s" ${now - engineLastAlive} ms exceeds timeout $engineAliveTimeout ms") remoteEngineBroken = true } ``` But now the engineLastAlive variable is not volatile. So we set the engineLastAlive as volatile variable in this pr to prevent flaky test. And now we are using `scheduleAtFixedRate`, it will trigger the liveness detection in fixed rate and regardless whether the last one has completed. For scheduleWithFixedDelay, ``` Creates and executes a periodic action that becomes enabled first after the given initial delay, and subsequently with the given delay between the termination of one execution and the commencement of the next. If any execution of the task encounters an exception, subsequent executions are suppressed. Otherwise, the task will only terminate via cancellation or termination of the executor. ``` For scheduleAtFixedRate, ``` Creates and executes a periodic action that becomes enabled first after the given initial delay, and subsequently with the given period; that is executions will commence after initialDelay then initialDelay+period, then initialDelay + 2 * period, and so on. If any execution of the task encounters an exception, subsequent executions are suppressed. Otherwise, the task will only terminate via cancellation or termination of the executor. If any execution of this task takes longer than its period, then subsequent executions may start late, but will not concurrently execute. ``` We should use scheduleWithFixedDelay, so that the liveness probe detections are in sequence. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #2685 from turboFei/engine_alive. Closes #2594 0c96938 [Fei Wang] set engineLastAlive volatile 960d608 [Fei Wang] Use scheduleWithFixedDelay instead of scheduleAtFixedRate Authored-by: Fei Wang <fwang12@ebay.com> Signed-off-by: Fei Wang <fwang12@ebay.com>
1 parent b2495e9 commit 56efdf8

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

kyuubi-server/src/main/scala/org/apache/kyuubi/client/KyuubiSyncThriftClient.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ class KyuubiSyncThriftClient private (
5353
@volatile private var remoteEngineBroken: Boolean = false
5454
private val engineAliveProbeClient = engineAliveProbeProtocol.map(new TCLIService.Client(_))
5555
private var engineAliveThreadPool: ScheduledExecutorService = _
56-
private var engineLastAlive: Long = _
56+
@volatile private var engineLastAlive: Long = _
5757

5858
private def startEngineAliveProbe(): Unit = {
5959
engineAliveThreadPool = ThreadUtils.newDaemonSingleThreadScheduledExecutor(
@@ -85,7 +85,7 @@ class KyuubiSyncThriftClient private (
8585
}
8686
}
8787
engineLastAlive = System.currentTimeMillis()
88-
engineAliveThreadPool.scheduleAtFixedRate(
88+
engineAliveThreadPool.scheduleWithFixedDelay(
8989
task,
9090
engineAliveProbeInterval,
9191
engineAliveProbeInterval,

0 commit comments

Comments
 (0)