Add missing HTTP timeouts across multiple providers#63042
Add missing HTTP timeouts across multiple providers#63042Ironankit525 wants to merge 1 commit intoapache:mainfrom
Conversation
This PR adds a default timeout of 30 seconds to several HTTP requests across multiple providers that were missing it. This prevents tasks from hanging indefinitely in case of network issues or unresponsive servers.\n\nAffected providers:\n- OpenFaaS\n- Apache Druid\n- FAB\n- CNCF Kubernetes\n- Google Cloud Dataprep\n\nCloses: apache#63033
jroachgolf84
left a comment
There was a problem hiding this comment.
I think this is quite dangerous, and arbitrary timeouts should not be added without justification. I'd consider doing a thorough analysis for each of these changes and justifying WHY a timeout needs to be added (add comments, unit-tests, etc.).
Again, this scares me a good bit.
|
For example; what if I have my Task-level |
| def get_yaml_content_from_file(kueue_yaml_url) -> list[dict]: | ||
| """Download content of YAML file and separate it into several dictionaries.""" | ||
| response = requests.get(kueue_yaml_url, allow_redirects=True) | ||
| response = requests.get(kueue_yaml_url, allow_redirects=True, timeout=30) |
There was a problem hiding this comment.
I'd be okay to apply 30s timeout across K8s provider as usually all calls should be short. But this here is a simple search&add. The K8s provider uses a K8s API Client, is a timeout should be applied (and is not already) then it needs to be passed down to API client (where a generic retry layer was added recently)
jscheffl
left a comment
There was a problem hiding this comment.
I would recommend as of the comments from @jroachgolf84 to split the PR into per provider. At least fro K8s please respect the comments.
|
Thanks for the thorough review, @jroachgolf84 and @jscheffl! I completely agree that adding blanket timeouts without justification can be risky, especially for methods that might block for long-running operations. I will close this combined PR and open separate, per-provider PRs that include proper justification, configuration options where necessary, and only apply timeouts to control-plane/metadata operations rather than blocking execution tasks. @jscheffl regarding the Kubernetes Hook get_yaml_content_from_file: this method actually uses requests.get to fetch a raw YAML file from a generic URL (like GitHub) rather than hitting the K8s API server via the kubernetes Python client. That's why the timeout is applied here directly to requests.get. I'll address this in a separate focused PR. Closing this PR in favor of splitting. |
This PR adds a default timeout of 30 seconds to several HTTP requests across multiple providers that were missing it. This prevents tasks from hanging indefinitely in case of network issues or unresponsive servers.
Affected providers:
OpenFaasHook)DruidHook)FabAirflowSecurityManagerOverride)KubernetesHook)DataprepHook)Closes: Missing timeouts in HTTP requests across multiple providers #63033