Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3KeySensor to use S3Hook url parser #21500

Merged
merged 5 commits into from
Feb 15, 2022

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented Feb 10, 2022

S3KeySensor resolves the bucket_name and key in the poke method.

Pulling this "resolution" logic out into a _resolve_bucket_and_key method makes it a little clearer what's happening there.

Unfortunately we can't just do it in __init__ because we have to wait for templates to render first.

I took the opportunity here to remove redundant s3 uri parsing logic, which already exists in a static method on the hook.

And since we use that hook method now, I had to update the tests to not mock the entire hook class but only the check methods.

Finally, since while I was at it I strengthened the test slightly, replacing assert op.poke(None) with assert op.poke(None) is True etc, since mocks can evaluate to "truthy"

Rather than relying on the poke method to parse the attrs we should either do it in `__init__` or in a cached property.

Since it's a pretty insignifigant computation let's  just do it in `__init__`.

And anyway, if the params are bad then best to get a warning before deploying your code.
@josh-fell
Copy link
Contributor

Parsing bucket_key in the __init__() won't work if the value is an XComArg or a Jinja string. We've seen several issues, namely AttributeErrors, when some sort of string parsing is used on templated fields in __init__() since the value doesn't get rendered until the execute() method. Check out #14682 as an example.

@josh-fell
Copy link
Contributor

josh-fell commented Feb 10, 2022

FWIW, datatype checks on templated fields within __init__() have their issues as well. See #20347.

@dstandish
Copy link
Contributor Author

Parsing bucket_key in the __init__() won't work if the value is an XComArg or a Jinja string. We've seen several issues, namely AttributeErrors, when some sort of string parsing is used on templated fields in __init__() since the value doesn't get rendered until the execute() method. Check out #14682 as an example.

darnit, good catch... maybe i'll go cached prop route

@dstandish dstandish marked this pull request as draft February 10, 2022 19:11
@dstandish dstandish changed the title S3KeySensor should parse bucket and key outside of poke S3KeySensor to use S3Hook url parser Feb 10, 2022
@dstandish dstandish marked this pull request as ready for review February 11, 2022 06:37
Copy link
Contributor

@josh-fell josh-fell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial nit + a musing but LGTM

' should be relative path from root'
' level, rather than a full s3:// url'
)
raise AirflowException('If bucket_name provided, bucket_key must be relative path, not URI.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise AirflowException('If bucket_name provided, bucket_key must be relative path, not URI.')
raise AirflowException('If bucket_name is provided, bucket_key must be a relative path, not a URI.')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was trying to keep it on one line :) lemme see if that works

raise AirflowException('If key is a relative path from root, please provide a bucket_name')
self.bucket_name = parsed_url.netloc
self.bucket_key = parsed_url.path.lstrip('/')
self.bucket_name, self.bucket_key = S3Hook.parse_s3_url(self.bucket_key)
else:
parsed_url = urlparse(self.bucket_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have to happen in this PR but I wonder if S3Hook.parse_s3_url() could be augmented with some sort of configurable check between validating a URI vs. a relative path? Centralize some of this logic there without have urlparse() is different places. Anyway, just a thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants