-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3KeySensor to use S3Hook url parser #21500
Conversation
Rather than relying on the poke method to parse the attrs we should either do it in `__init__` or in a cached property. Since it's a pretty insignifigant computation let's just do it in `__init__`. And anyway, if the params are bad then best to get a warning before deploying your code.
Parsing |
FWIW, datatype checks on templated fields within |
darnit, good catch... maybe i'll go cached prop route |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trivial nit + a musing but LGTM
' should be relative path from root' | ||
' level, rather than a full s3:// url' | ||
) | ||
raise AirflowException('If bucket_name provided, bucket_key must be relative path, not URI.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise AirflowException('If bucket_name provided, bucket_key must be relative path, not URI.') | |
raise AirflowException('If bucket_name is provided, bucket_key must be a relative path, not a URI.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was trying to keep it on one line :) lemme see if that works
raise AirflowException('If key is a relative path from root, please provide a bucket_name') | ||
self.bucket_name = parsed_url.netloc | ||
self.bucket_key = parsed_url.path.lstrip('/') | ||
self.bucket_name, self.bucket_key = S3Hook.parse_s3_url(self.bucket_key) | ||
else: | ||
parsed_url = urlparse(self.bucket_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't have to happen in this PR but I wonder if S3Hook.parse_s3_url()
could be augmented with some sort of configurable check between validating a URI vs. a relative path? Centralize some of this logic there without have urlparse()
is different places. Anyway, just a thought.
S3KeySensor resolves the bucket_name and key in the poke method.
Pulling this "resolution" logic out into a
_resolve_bucket_and_key
method makes it a little clearer what's happening there.Unfortunately we can't just do it in
__init__
because we have to wait for templates to render first.I took the opportunity here to remove redundant s3 uri parsing logic, which already exists in a static method on the hook.
And since we use that hook method now, I had to update the tests to not mock the entire hook class but only the check methods.
Finally, since while I was at it I strengthened the test slightly, replacing
assert op.poke(None)
withassert op.poke(None) is True
etc, since mocks can evaluate to "truthy"