Skip to content

Core : Make REST scan planning poll timeout configurable#15863

Merged
amogh-jahagirdar merged 5 commits intoapache:mainfrom
rahulsmahadev:make-rest-scan-polling-configurable
Apr 7, 2026
Merged

Core : Make REST scan planning poll timeout configurable#15863
amogh-jahagirdar merged 5 commits intoapache:mainfrom
rahulsmahadev:make-rest-scan-polling-configurable

Conversation

@rahulsmahadev
Copy link
Copy Markdown
Contributor

@rahulsmahadev rahulsmahadev commented Apr 2, 2026

Summary

  • Add configurable rest-scan-planning.poll-timeout-ms catalog property to control the maximum wait time when polling for async scan planning results in RESTTableScan (default remains 5 minutes)
  • Introduce RemotePlanTimeoutException in core/rest (package-private) so the timeout failure is semantically clear and distinguishable from other runtime errors
  • Wrap the polling loop to catch the internal NotCompleteException and rethrow as RemotePlanTimeoutException with a descriptive message including planId, timeout, and retry limits
  • Validate that the configured timeout is a positive value

Test plan

  • asyncPlanningRespectsConfigurablePollTimeout: sets 1ms timeout against a server that never completes, verifies RemotePlanTimeoutException is thrown
  • asyncPlanningSucceedsWithCustomTimeout: sets an explicit 30s timeout and verifies async planning completes successfully
  • asyncPlanningRejectsInvalidTimeout: verifies negative timeout values are rejected with IllegalArgumentException

@github-actions github-actions Bot added the core label Apr 2, 2026
Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rahulsmahadev this looks good to me, thank you! I'll leave it up in case @singhpk234 @danielcweeks @nastra or any others have any other comments.

RESTTableScan scan = restTableScanFor(table);

// With a 1ms timeout and a server that never completes, planFiles should fail
assertThatThrownBy(scan::planFiles).isInstanceOf(RuntimeException.class).hasMessage(null);
Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason this asserts isInstanceOf(RuntimeException) is because the NotCompleteException is private , yet we bubble it up in case we hit the timeout case.

It's something we can look at separately, but maybe there's a proper public facing execption we want to throw in case of timeouts; not a big deal but that lets clients who have the ability to fall back to client side planning in such cases be able to do that a bit more cleanly.

Copy link
Copy Markdown
Contributor

@danielcweeks danielcweeks Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually an issue because we're throwing a private exception with no message, which doesn't allow proper handling. This should be something more like RemotePlanTimeoutException with a specific error. NotCompleteException is not only semantically incorrect, but it isn't distinguishable at all from the calling client as it's only catchable as a RuntimeException.

I guess Tasks doesn't really allow us to change the thrown exception, so we're probably stuck with this approach.

Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we need a semantically clear exception like RemotePlanTimeoutException. I'm not sure we're really stuck with this, we can just update it to wrap the Tasks invocation in a try/catch, remap the Not complete to the new exception type. We already follow that pattern in a few other places like BaseTransaction and CatalogHandlers

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new exception to api (not sure if thats the best place) and wrap the exception here, let me know if this makes sense

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding @rahulsmahadev! Left a comment below on why I think core/rest is a better place for this particular exception rather than the API module. You may want to get different opinions here but just my take.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks added it in core, I think it makes sense there

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works. We're now assuming a failure is a timeout. How do we distinguish other non-timeout errors?

public static final String REST_SCAN_PLAN_ID = "rest-scan-plan-id";

public static final String REST_SCAN_PLANNING_POLL_TIMEOUT_MS =
"rest-scan-planning.poll-timeout-ms";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name makes sense to me!

@github-actions github-actions Bot added the API label Apr 2, 2026
@rahulsmahadev rahulsmahadev changed the title Core: Make REST scan planning poll timeout configurable API, Core: Make REST scan planning poll timeout configurable Apr 2, 2026
Comment on lines +26 to +27
* <p>Clients may catch this to fall back to client-side planning when server-side planning is too
* slow.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this then violation of server mode being sent as part of loadTable ? would it be better to say try with larger timeout ?

Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just remove these comments out about how a client could use it, and just stick with something simple like "Thrown when server-side scan planning does not complete before the client deadline"

But to answer the question, no I don't think it's a violation. The example I provided was about the case where server side planning is supported and configured for usage by a client but not indicated as required by a server.

Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO I also think this should be in core/rest rather than API. The public API exceptions are more for implementation agnostic exceptions along the read/commit path or REST protocol defined exceptions.

However, this exception is a bit more specific for our particular REST client implementation for remote scanning (it's not something generic on the TableScan path). Maybe worth getting different opinions here but just my take.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

configurePlanningBehavior(TestPlanningBehavior.Builder::asynchronous);
RESTTable table = restTableFor(restCatalog, "custom_timeout_success");
setParserContext(table);
// default catalog uses the default 5-minute timeout, which is sufficient for async planning
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name is asyncPlanningSucceedsWithCustomTimeout but we are relying on default timeout, is this intented ?

[optional] also i would say set the value explicitly in case some one fiddles with the default

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Updated the test to create its own catalog with an explicit poll-timeout-ms=30000 instead of relying on the default

@amogh-jahagirdar amogh-jahagirdar self-requested a review April 3, 2026 02:52
@github-actions github-actions Bot removed the API label Apr 3, 2026
@rahulsmahadev rahulsmahadev changed the title API, Core: Make REST scan planning poll timeout configurable Core : Make REST scan planning poll timeout configurable Apr 3, 2026
@rahulsmahadev rahulsmahadev requested a review from singhpk234 April 3, 2026 21:31
Copy link
Copy Markdown
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rahulsmahadev ! The latest updates look right to me.

@amogh-jahagirdar
Copy link
Copy Markdown
Contributor

Merging, thank you @rahulsmahadev , and @singhpk234 @danielcweeks for reviewing!

@amogh-jahagirdar amogh-jahagirdar merged commit e1a2713 into apache:main Apr 7, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants