Skip to content

Add missing HTTP timeouts across multiple providers#63042

Closed
Ironankit525 wants to merge 1 commit intoapache:mainfrom
Ironankit525:fix/add-missing-http-timeouts
Closed

Add missing HTTP timeouts across multiple providers#63042
Ironankit525 wants to merge 1 commit intoapache:mainfrom
Ironankit525:fix/add-missing-http-timeouts

Conversation

@Ironankit525
Copy link
Contributor

This PR adds a default timeout of 30 seconds to several HTTP requests across multiple providers that were missing it. This prevents tasks from hanging indefinitely in case of network issues or unresponsive servers.

Affected providers:

This PR adds a default timeout of 30 seconds to several HTTP requests across multiple providers that were missing it. This prevents tasks from hanging indefinitely in case of network issues or unresponsive servers.\n\nAffected providers:\n- OpenFaaS\n- Apache Druid\n- FAB\n- CNCF Kubernetes\n- Google Cloud Dataprep\n\nCloses: apache#63033
Copy link
Collaborator

@jroachgolf84 jroachgolf84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is quite dangerous, and arbitrary timeouts should not be added without justification. I'd consider doing a thorough analysis for each of these changes and justifying WHY a timeout needs to be added (add comments, unit-tests, etc.).

Again, this scares me a good bit.

@jroachgolf84
Copy link
Collaborator

For example; what if I have my Task-level execution_timeout set for 5 minutes. If I have a request timeout of 30 seconds, we could have an issue.

def get_yaml_content_from_file(kueue_yaml_url) -> list[dict]:
"""Download content of YAML file and separate it into several dictionaries."""
response = requests.get(kueue_yaml_url, allow_redirects=True)
response = requests.get(kueue_yaml_url, allow_redirects=True, timeout=30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be okay to apply 30s timeout across K8s provider as usually all calls should be short. But this here is a simple search&add. The K8s provider uses a K8s API Client, is a timeout should be applied (and is not already) then it needs to be passed down to API client (where a generic retry layer was added recently)

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend as of the comments from @jroachgolf84 to split the PR into per provider. At least fro K8s please respect the comments.

@Ironankit525
Copy link
Contributor Author

Thanks for the thorough review, @jroachgolf84 and @jscheffl! I completely agree that adding blanket timeouts without justification can be risky, especially for methods that might block for long-running operations.

I will close this combined PR and open separate, per-provider PRs that include proper justification, configuration options where necessary, and only apply timeouts to control-plane/metadata operations rather than blocking execution tasks.

@jscheffl regarding the Kubernetes Hook get_yaml_content_from_file: this method actually uses requests.get to fetch a raw YAML file from a generic URL (like GitHub) rather than hitting the K8s API server via the kubernetes Python client. That's why the timeout is applied here directly to requests.get. I'll address this in a separate focused PR.

Closing this PR in favor of splitting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing timeouts in HTTP requests across multiple providers

3 participants