Skip to content

feat(providers/amazon): allow disabling hook-level lineage in S3Hook …#63499

Open
shnhdan wants to merge 9 commits intoapache:mainfrom
shnhdan:feature/disable-s3-hook-lineage
Open

feat(providers/amazon): allow disabling hook-level lineage in S3Hook …#63499
shnhdan wants to merge 9 commits intoapache:mainfrom
shnhdan:feature/disable-s3-hook-lineage

Conversation

@shnhdan
Copy link

@shnhdan shnhdan commented Mar 13, 2026

Description

Adds enable_hook_level_lineage: bool = True parameter to S3Hook.__init__.

When set to False, all get_hook_lineage_collector() calls in the hook are skipped.
Default behavior is unchanged.

Changes:

  • hooks/s3.py — new enable_hook_level_lineage param + all lineage calls gated behind it
  • tests/test_s3.py — tests covering default (enabled), disabled, and explicit enabled behavior

Closes #63371


Was generative AI tooling used to co-author this PR?
  • Yes (Gemini)

Generated-by: Gemini following the guidelines

@shnhdan shnhdan requested a review from o-nikolas as a code owner March 13, 2026 09:17
@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Mar 13, 2026
@eladkal
Copy link
Contributor

eladkal commented Mar 13, 2026

cc @kacpermuda can you clarify what is the desired behavior? I think this apply also for google storage and other cloud vendors. I'd rather we won't have different customizations for each provider.

@SamWheating
Copy link
Contributor

@eladkal see the attached ticket which I opened earlier this week with a description of the motivations.

#63371

Agreed that this is definitely something we want to standardize across all relevant hooks, lets discuss it on the issue and maybe some of the other lineage contributors can weigh in.

@kacpermuda
Copy link
Collaborator

@shnhdan I believe the PR now contains only test, with no implementation on the actual hook. Maybe some faulty merge/rebase happened?

@kacpermuda
Copy link
Collaborator

Responded on the linked issue, not sure how, but we should make sure this arg is named consistently across providers.

@shnhdan
Copy link
Author

shnhdan commented Mar 16, 2026

@kacpermuda Fixed the rebase error and restored the implementation.All 154 S3 tests are passing locally. I’m following the naming discussion and am open to renaming the parameter to align with Airflow's consistency standards.

@shnhdan shnhdan force-pushed the feature/disable-s3-hook-lineage branch from 9bb9c06 to 0b3dec6 Compare March 16, 2026 14:07
@kacpermuda
Copy link
Collaborator

Thanks for working on this - the use case makes sense, and having a way to limit hook-level lineage emission can definitely be useful.

One thing I’d like to raise before we settle on the exact shape of the solution is that hook-level lineage is fundamentally a core feature, but the places where we will likely want to control it live in multiple providers (S3, GCS, etc.). Because of that, it would be good to think a bit about how to keep it consistent across providers without introducing a dependency on newer core versions.

In particular:

  1. Global vs local control
    It probably makes sense to have a cluster-level/default configuration, and then allow providers or hooks to override it when needed. That way DAG authors do not need to explicitly configure every hook if a cluster operator wants to disable or limit hook-level lineage globally. We can already effectively disable hook-level lineage since AF 3.2 by setting the asset collection limit to 0 (PR feat: Make Hook Level Lineage limits configurable #62010), but maybe we should consider introducing a more explicit config (for example a simple boolean to enable/disable HLL), and then allow hooks/providers to override it if needed.

  2. Consistency across providers without a core dependency
    This is a core feature that we will likely want to control in multiple providers, so we probably want to keep naming and semantics consistent. At the same time, we would like to do this without introducing a dependency on core, so that providers can implement it while still supporting multiple Airflow versions (basically so that what we introduce today still works on AF2.11).
    It might be good to discuss this with the more task-sdk oriented contributors as well, as it feels more like an API boundary question between core and providers. One possible idea could be to keep an internal _BaseHook in core, while also exposing a BaseHook in a common provider layer, so that _BaseHook remains an internal implementation detail and BaseHook can be used more freely by providers. cc @ashb , curious what you think.

Because of that, it might be worth having a short discussion (either here or on devlist) about the intended approach. Maybe we end up doing exactly what this PR proposes, but it would be good to confirm that this is the direction we want and that we apply it consistently across providers.

Curious what others think. cc @eladkal @potiuk @mobuchowski

@ashb
Copy link
Member

ashb commented Mar 16, 2026

Honestly the name of this isn't right - "enable_hook_level_lineage" is what it does right now, but the intent is "no auto create assets". Not about disabling lineage entirely...

Also this sounds like it is too broad -- we might want to disable the auto creation of assets, but keep the metadata that "hook X accessed asset Y".

Additionally back to the original ask, maybe you need this at a per call level too -- you might want to disable it for the individual parts when uploaded, but still keep it when the single "upload complete" file is uploaded.

As for into basehook or not: yes perhaps, but I think that also depends on answers to ^^

@shnhdan
Copy link
Author

shnhdan commented Mar 17, 2026

I agree that standardising this in BaseHook is the better path for long-term consistency. Even if this shift moves the work toward the Core team, I'd like to remain involved and work alongside the maintainers to implement the final version. Following this conversation for the final direction.

@ashb
Copy link
Member

ashb commented Mar 17, 2026

@shnhdan There is no "core team" (or there is, but we don't have a monopoly on making changes to task-sdk etc.) PRs welcome in other words. Feel free to update this PR in place to become change to core.

@shnhdan
Copy link
Author

shnhdan commented Mar 17, 2026

@ashb Thanks for the clarification ! I'm happy to take this on. I'll work on moving the implementation to Core/Task SDK and will update this PR once the refactor is ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow disabling hook-level lineage in Hook constructors

6 participants