Skip to content

Isolate openlineage extractor test from hook lineage collector pollution#67057

Merged
eladkal merged 1 commit into
apache:mainfrom
potiuk:fix/openlineage-test-pollution
May 17, 2026
Merged

Isolate openlineage extractor test from hook lineage collector pollution#67057
eladkal merged 1 commit into
apache:mainfrom
potiuk:fix/openlineage-test-pollution

Conversation

@potiuk
Copy link
Copy Markdown
Member

@potiuk potiuk commented May 17, 2026

Summary

providers/openlineage/tests/unit/openlineage/extractors/test_base.py::test_default_extractor_uses_wrong_operatorlineage_class deterministically fails on the Compat 3.0.6:P3.10 and Compat 3.1.8:P3.10 provider-distributions jobs (e.g. this run) because the process-wide singleton hook lineage collector gets populated by ObjectStoragePath operations from providers/common/ai/.../durable/test_storage.py running earlier in the same pytest session.

ExtractorManager.extract_metadata (providers/openlineage/src/airflow/providers/openlineage/extractors/manager.py:130-135) falls through to get_hook_lineage() whenever the extractor's lineage is unusable, so the polluted singleton's datasets get merged into the assertion and == OperatorLineage() fails.

This is invisible on main because the local SDK returns a NoOpCollector when no hook lineage reader plugin is installed, and the scheduled CI only runs Compat 2.10.5/3.0.0 (where common.ai isn't installed). The regression dates from #64199 (durable execution for AgentOperator).

Fix

Patch get_hook_lineage_collector with a fresh HookLineageCollector() for the duration of the test. Minimal, scoped to the one assertion, no changes to the polluting tests.

Full root cause and the durable fix (resetting the collector in teardown for tests that exercise ObjectStoragePath) tracked at #67044.

Test plan

  • Local: uv run --project providers/openlineage pytest providers/openlineage/tests/unit/openlineage/extractors/test_base.py — all 36 tests pass.
  • All prek hooks pass on the changed file.
  • CI: Compat 3.0.6/3.1.8 jobs become green.

related: #67044, #66825 (rebase blocked on this fix)


Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4.7)

Generated-by: Claude Code (Opus 4.7) following the guidelines

The hook lineage collector is a process-wide singleton. Tests that exercise
ObjectStoragePath operations (notably providers/common/ai/.../durable/
test_storage.py) populate it, and ExtractorManager.extract_metadata falls
through to that collector when the extractor returns an unusable lineage
class. That leaks the polluting test's datasets into the openlineage
assertion when both run in the same pytest session - the deterministic
failure mode on the Compat 3.0.6/3.1.8 provider-distributions jobs.

Pin a fresh HookLineageCollector for the duration of the test to isolate it.

The broader fix - clearing the collector in teardown for any test that
exercises ObjectStoragePath - is tracked separately at apache#67044.
@eladkal eladkal merged commit c2246ef into apache:main May 17, 2026
93 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants