-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not check for S3 key before attempting download #19504
Conversation
s3_obj = self.get_key(key, bucket_name) | ||
try: | ||
s3_obj = self.get_key(key, bucket_name) | ||
except ClientError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my preference would be to not catch this error at all, but catching it keeps this consistent with the existing behavior
S3Hook.download_file first checks object existence then downloads. This resolves creds 2 times. We don't need to check existence. Just ask for the key and if it's not there you'll know from the error. And if it is there, you'll only have resolved creds once.
1d04558
to
2a03258
Compare
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
But the main issue is that it does not reuse the connection, right? Not that we do two requests. |
So I would suggest we reuse the connection instead. |
I suspect it's to do with |
Correct. This is also my assumption @ashb. |
Oh yeah, so get_client_type and get_resource_type bot call This change makes sense anyway though -- there's no point in making two requests. |
But the first one is only doing a HEAD operation. So technically it is not the same. 😅 But yes, we can get rid of this request if you want. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- one nit
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
it's true that the non-reuse of creds is also a problem but yeah why do two calls when you can do one easier to beg forgiveness than to ask permission :) |
When you download a key that exists, notice that it retrieves creds twice:
The first is for checking existence and the second is retrieving the object.
We don't need to check for existence. We can just ask for the object and if it's not there, the api will let is know. And when the object is there, we'll only have retrieved creds once.