You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[KYUUBI #2102] Support to retry the internal thrift request call and add engine liveness probe to enable fast fail before retry
### _Why are the changes needed?_
To close#2102
Support to retry all the internal thrift request calls(except RenewDelegationToken now), and fast fail if the remote engine is not stable or not alive.
In this PR, it supports engine liveness probe.
If it is enabled, a companion thrift client will be created and open a liveness probe session when opening remote engine session.
It will send some simple thrift request(GetInfo) to check whether the remote engine is alive, and fast fail before retry if remote engine is not connectable.
#### Why not use the same thrift client to check engine liveness before retry?
I tried that, but met `out of resp sequence` error.
For example:
1. send getOperationStatus request
2. read time out
3. send GetInfoType request
4. receive getOperationStatus response (out of resp sequence)
### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes#2122 from turboFei/retry_rpc.
Closes#21023926ba0 [Fei Wang] adress comments
ade4ede [Fei Wang] add timeout
1b7a64f [Fei Wang] Only check remote engine alive before retry
98e03f8 [Fei Wang] refactor
fac388c [Fei Wang] remove unused import
9c6d873 [Fei Wang] add ut
9b59565 [Fei Wang] Support to retry the thrift request and engine alive probe
Authored-by: Fei Wang <fwang12@ebay.com>
Signed-off-by: Fei Wang <fwang12@ebay.com>
Copy file name to clipboardExpand all lines: docs/deployment/settings.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -301,8 +301,9 @@ Key | Default | Meaning | Type | Since
301
301
<code>kyuubi.operation.plan.only.mode</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'>NONE</div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>Whether to perform the statement in a PARSE, ANALYZE, OPTIMIZE, PHYSICAL, EXECUTION only way without executing the query. When it is NONE, the statement will be fully executed</div>|<divstyle='width: 30pt'>string</div>|<divstyle='width: 20pt'>1.4.0</div>
302
302
<code>kyuubi.operation.query.timeout</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'><undefined></div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>Timeout for query executions at server-side, take affect with client-side timeout(`java.sql.Statement.setQueryTimeout`) together, a running query will be cancelled automatically if timeout. It's off by default, which means only client-side take fully control whether the query should timeout or not. If set, client-side timeout capped at this point. To cancel the queries right away without waiting task to finish, consider enabling kyuubi.operation.interrupt.on.cancel together.</div>|<divstyle='width: 30pt'>duration</div>|<divstyle='width: 20pt'>1.2.0</div>
303
303
<code>kyuubi.operation.scheduler.pool</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'><undefined></div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>The scheduler pool of job. Note that, this config should be used after change Spark config spark.scheduler.mode=FAIR.</div>|<divstyle='width: 30pt'>string</div>|<divstyle='width: 20pt'>1.1.1</div>
304
-
<code>kyuubi.operation.status.polling.max.attempts</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'>5</div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>Max attempts for long polling asynchronous running sql query's status on raw transport failures, e.g. TTransportException</div>|<divstyle='width: 30pt'>int</div>|<divstyle='width: 20pt'>1.4.0</div>
<code>kyuubi.operation.status.polling.timeout</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'>PT5S</div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>Timeout(ms) for long polling asynchronous running sql query's status</div>|<divstyle='width: 30pt'>duration</div>|<divstyle='width: 20pt'>1.0.0</div>
306
+
<code>kyuubi.operation.thrift.client.request.max.attempts</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'>5</div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>Max attempts for operation thrift request call at server-side on raw transport failures, e.g. TTransportException</div>|<divstyle='width: 30pt'>int</div>|<divstyle='width: 20pt'>1.6.0</div>
306
307
307
308
308
309
### Server
@@ -321,6 +322,9 @@ Key | Default | Meaning | Type | Since
321
322
<code>kyuubi.session.conf.advisor</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'><undefined></div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>A config advisor plugin for Kyuubi Server. This plugin can provide some custom configs for different user or session configs and overwrite the session configs before open a new session. This config value should be a class which is a child of 'org.apache.kyuubi.plugin.SessionConfAdvisor' which has zero-arg constructor.</div>|<divstyle='width: 30pt'>string</div>|<divstyle='width: 20pt'>1.5.0</div>
322
323
<code>kyuubi.session.conf.ignore.list</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'></div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>A comma separated list of ignored keys. If the client connection contains any of them, the key and the corresponding value will be removed silently during engine bootstrap and connection setup. Note that this rule is for server-side protection defined via administrators to prevent some essential configs from tampering but will not forbid users to set dynamic configurations via SET syntax.</div>|<divstyle='width: 30pt'>seq</div>|<divstyle='width: 20pt'>1.2.0</div>
323
324
<code>kyuubi.session.conf.restrict.list</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'></div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>A comma separated list of restricted keys. If the client connection contains any of them, the connection will be rejected explicitly during engine bootstrap and connection setup. Note that this rule is for server-side protection defined via administrators to prevent some essential configs from tampering but will not forbid users to set dynamic configurations via SET syntax.</div>|<divstyle='width: 30pt'>seq</div>|<divstyle='width: 20pt'>1.2.0</div>
325
+
<code>kyuubi.session.engine.alive.probe.enabled</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'>false</div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>Whether to enable the engine alive probe, it true, we will create a companion thrift client that sends simple request to check whether the engine is keep alive.</div>|<divstyle='width: 30pt'>boolean</div>|<divstyle='width: 20pt'>1.6.0</div>
<code>kyuubi.session.engine.alive.timeout</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'>PT2M</div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>The timeout for engine alive. If there is no alive probe success in the last timeout window, the engine will be marked as no-alive.</div>|<divstyle='width: 30pt'>duration</div>|<divstyle='width: 20pt'>1.6.0</div>
<code>kyuubi.session.engine.flink.main.resource</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'><undefined></div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>The package used to create Flink SQL engine remote job. If it is undefined, Kyuubi will use the default</div>|<divstyle='width: 30pt'>string</div>|<divstyle='width: 20pt'>1.4.0</div>
326
330
<code>kyuubi.session.engine.flink.max.rows</code>|<divstyle='width: 65pt;word-wrap: break-word;white-space: normal'>1000000</div>|<divstyle='width: 170pt;word-wrap: break-word;white-space: normal'>Max rows of Flink query results. For batch queries, rows that exceeds the limit would be ignored. For streaming queries, the query would be canceled if the limit is reached.</div>|<divstyle='width: 30pt'>int</div>|<divstyle='width: 20pt'>1.5.0</div>
0 commit comments