feat(providers/amazon): allow disabling hook-level lineage in S3Hook …#63499
feat(providers/amazon): allow disabling hook-level lineage in S3Hook …#63499shnhdan wants to merge 9 commits intoapache:mainfrom
Conversation
|
cc @kacpermuda can you clarify what is the desired behavior? I think this apply also for google storage and other cloud vendors. I'd rather we won't have different customizations for each provider. |
|
@shnhdan I believe the PR now contains only test, with no implementation on the actual hook. Maybe some faulty merge/rebase happened? |
|
Responded on the linked issue, not sure how, but we should make sure this arg is named consistently across providers. |
|
@kacpermuda Fixed the rebase error and restored the implementation.All 154 S3 tests are passing locally. I’m following the naming discussion and am open to renaming the parameter to align with Airflow's consistency standards. |
9bb9c06 to
0b3dec6
Compare
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
|
Thanks for working on this - the use case makes sense, and having a way to limit hook-level lineage emission can definitely be useful. One thing I’d like to raise before we settle on the exact shape of the solution is that hook-level lineage is fundamentally a core feature, but the places where we will likely want to control it live in multiple providers (S3, GCS, etc.). Because of that, it would be good to think a bit about how to keep it consistent across providers without introducing a dependency on newer core versions. In particular:
Because of that, it might be worth having a short discussion (either here or on devlist) about the intended approach. Maybe we end up doing exactly what this PR proposes, but it would be good to confirm that this is the direction we want and that we apply it consistently across providers. Curious what others think. cc @eladkal @potiuk @mobuchowski |
|
Honestly the name of this isn't right - "enable_hook_level_lineage" is what it does right now, but the intent is "no auto create assets". Not about disabling lineage entirely... Also this sounds like it is too broad -- we might want to disable the auto creation of assets, but keep the metadata that "hook X accessed asset Y". Additionally back to the original ask, maybe you need this at a per call level too -- you might want to disable it for the individual parts when uploaded, but still keep it when the single "upload complete" file is uploaded. As for into basehook or not: yes perhaps, but I think that also depends on answers to ^^ |
|
I agree that standardising this in BaseHook is the better path for long-term consistency. Even if this shift moves the work toward the Core team, I'd like to remain involved and work alongside the maintainers to implement the final version. Following this conversation for the final direction. |
|
@shnhdan There is no "core team" (or there is, but we don't have a monopoly on making changes to task-sdk etc.) PRs welcome in other words. Feel free to update this PR in place to become change to core. |
|
@ashb Thanks for the clarification ! I'm happy to take this on. I'll work on moving the implementation to Core/Task SDK and will update this PR once the refactor is ready for review. |
Description
Adds
enable_hook_level_lineage: bool = Trueparameter toS3Hook.__init__.When set to
False, allget_hook_lineage_collector()calls in the hook are skipped.Default behavior is unchanged.
Changes:
hooks/s3.py— newenable_hook_level_lineageparam + all lineage calls gated behind ittests/test_s3.py— tests covering default (enabled), disabled, and explicit enabled behaviorCloses #63371
Was generative AI tooling used to co-author this PR?
Generated-by: Gemini following the guidelines