Core : Make REST scan planning poll timeout configurable#15863
Core : Make REST scan planning poll timeout configurable#15863amogh-jahagirdar merged 5 commits intoapache:mainfrom
Conversation
Co-authored-by: Isaac
There was a problem hiding this comment.
@rahulsmahadev this looks good to me, thank you! I'll leave it up in case @singhpk234 @danielcweeks @nastra or any others have any other comments.
| RESTTableScan scan = restTableScanFor(table); | ||
|
|
||
| // With a 1ms timeout and a server that never completes, planFiles should fail | ||
| assertThatThrownBy(scan::planFiles).isInstanceOf(RuntimeException.class).hasMessage(null); |
There was a problem hiding this comment.
I think the reason this asserts isInstanceOf(RuntimeException) is because the NotCompleteException is private , yet we bubble it up in case we hit the timeout case.
It's something we can look at separately, but maybe there's a proper public facing execption we want to throw in case of timeouts; not a big deal but that lets clients who have the ability to fall back to client side planning in such cases be able to do that a bit more cleanly.
There was a problem hiding this comment.
This is actually an issue because we're throwing a private exception with no message, which doesn't allow proper handling. This should be something more like RemotePlanTimeoutException with a specific error. NotCompleteException is not only semantically incorrect, but it isn't distinguishable at all from the calling client as it's only catchable as a RuntimeException.
I guess Tasks doesn't really allow us to change the thrown exception, so we're probably stuck with this approach.
There was a problem hiding this comment.
Agree that we need a semantically clear exception like RemotePlanTimeoutException. I'm not sure we're really stuck with this, we can just update it to wrap the Tasks invocation in a try/catch, remap the Not complete to the new exception type. We already follow that pattern in a few other places like BaseTransaction and CatalogHandlers
There was a problem hiding this comment.
I added a new exception to api (not sure if thats the best place) and wrap the exception here, let me know if this makes sense
There was a problem hiding this comment.
Thanks for adding @rahulsmahadev! Left a comment below on why I think core/rest is a better place for this particular exception rather than the API module. You may want to get different opinions here but just my take.
There was a problem hiding this comment.
thanks added it in core, I think it makes sense there
There was a problem hiding this comment.
I don't think this works. We're now assuming a failure is a timeout. How do we distinguish other non-timeout errors?
| public static final String REST_SCAN_PLAN_ID = "rest-scan-plan-id"; | ||
|
|
||
| public static final String REST_SCAN_PLANNING_POLL_TIMEOUT_MS = | ||
| "rest-scan-planning.poll-timeout-ms"; |
There was a problem hiding this comment.
Name makes sense to me!
| * <p>Clients may catch this to fall back to client-side planning when server-side planning is too | ||
| * slow. |
There was a problem hiding this comment.
isn't this then violation of server mode being sent as part of loadTable ? would it be better to say try with larger timeout ?
There was a problem hiding this comment.
I would just remove these comments out about how a client could use it, and just stick with something simple like "Thrown when server-side scan planning does not complete before the client deadline"
But to answer the question, no I don't think it's a violation. The example I provided was about the case where server side planning is supported and configured for usage by a client but not indicated as required by a server.
There was a problem hiding this comment.
IMO I also think this should be in core/rest rather than API. The public API exceptions are more for implementation agnostic exceptions along the read/commit path or REST protocol defined exceptions.
However, this exception is a bit more specific for our particular REST client implementation for remote scanning (it's not something generic on the TableScan path). Maybe worth getting different opinions here but just my take.
| configurePlanningBehavior(TestPlanningBehavior.Builder::asynchronous); | ||
| RESTTable table = restTableFor(restCatalog, "custom_timeout_success"); | ||
| setParserContext(table); | ||
| // default catalog uses the default 5-minute timeout, which is sufficient for async planning |
There was a problem hiding this comment.
The test name is asyncPlanningSucceedsWithCustomTimeout but we are relying on default timeout, is this intented ?
[optional] also i would say set the value explicitly in case some one fiddles with the default
There was a problem hiding this comment.
Good catch! Updated the test to create its own catalog with an explicit poll-timeout-ms=30000 instead of relying on the default
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Thanks @rahulsmahadev ! The latest updates look right to me.
|
Merging, thank you @rahulsmahadev , and @singhpk234 @danielcweeks for reviewing! |
Summary
rest-scan-planning.poll-timeout-mscatalog property to control the maximum wait time when polling for async scan planning results inRESTTableScan(default remains 5 minutes)RemotePlanTimeoutExceptionincore/rest(package-private) so the timeout failure is semantically clear and distinguishable from other runtime errorsNotCompleteExceptionand rethrow asRemotePlanTimeoutExceptionwith a descriptive message including planId, timeout, and retry limitsTest plan
asyncPlanningRespectsConfigurablePollTimeout: sets 1ms timeout against a server that never completes, verifiesRemotePlanTimeoutExceptionis thrownasyncPlanningSucceedsWithCustomTimeout: sets an explicit 30s timeout and verifies async planning completes successfullyasyncPlanningRejectsInvalidTimeout: verifies negative timeout values are rejected withIllegalArgumentException