Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for cloud fetch #362

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

andrefurlan-db
Copy link
Contributor

@andrefurlan-db andrefurlan-db commented Feb 21, 2024

  • Throw when failed to download file

  • Retry properly while downloading file

  • Add a bunch of debug logs

  • Prevent thread issues

TODO: http connection pools for cloud storage, proxies, etc.

Also backported to version 2

* fixes for cloud fetch

Signed-off-by: Andre Furlan <andre.furlan@databricks.com>
---------

Signed-off-by: Andre Furlan <andre.furlan@databricks.com>
Co-authored-by: Raymond Cypher <raymond.cypher@databricks.com>
self._shutdown_manager()
return None
raise ResultSetDownloadError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the change in the comment above, there is no retry attempted?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it just handled by raising the exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the retry is done outside this function, closer to the actual http request

Signed-off-by: Andre Furlan <andre.furlan@databricks.com>
@@ -171,3 +202,40 @@ def decompress_data(compressed_data: bytes) -> bytes:
uncompressed_data += data
start += num_bytes
return uncompressed_data


def http_get_with_retry(url, max_retries=5, backoff_factor=2, download_timeout=60):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we implementing retry behavior here rather than using a Retry passed to the session?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. It is in the TODO to also have connection pools

Copy link
Collaborator

@benc-db benc-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve, but consider if we can implement with urllib3 Retry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants