Skip to content

ExecutorManager re-uses fixed module names; sys.modules pollution leaks TestOperator into other specs on Python 3.11 #4705

@Yicong-Huang

Description

@Yicong-Huang

What happened?

ExecutorManager reuses fixed module names (udf-v1, udf-v2, ...) per instance and registers them in sys.modules via importlib.import_module. Each test fixture creates a fresh ExecutorManager with executor_version = 0, so every spec's first executor lands on the name udf-v1. The teardown only closes the temp filesystem; it does not remove the entry from sys.modules or pop the temp directory from sys.path.

When the second spec hits udf-v1, load_executor_definition enters the cached branch:

# amber/src/main/python/core/architecture/managers/executor_manager.py
if module_name in sys.modules:
    executor_module = importlib.import_module(module_name)
    executor_module.__dict__.clear()
    executor_module.__dict__["__name__"] = module_name
    executor_module = importlib.reload(executor_module)

reload() then tries to refind the module by name. On CPython 3.11 (Linux, ubuntu-latest), the path lookup occasionally still resolves to the previous spec / loader cache instead of the freshly-written udf-v1.py in the new temp filesystem. The cached class definition (TestOperator from core/architecture/managers/test_executor_manager.py's SAMPLE_OPERATOR_CODE) leaks into the next spec's executor, and the new spec's assertions on the expected class (e.g. CountBatchOperator.count) fail with an AttributeError.

Repro

CI run: https://github.com/apache/texera/actions/runs/25263695023/job/74074970899?pr=4636backport (release/v1.1.0-incubating) / python (ubuntu-latest, 3.11).

Test order in that run:

  1. core/architecture/managers/test_executor_manager.py::TestExecutorManager::test_accept_python_language_regular_operator — passes; loads udf-v1 with class TestOperator(UDFOperatorV2) from SAMPLE_OPERATOR_CODE.
  2. (~30 specs later, alphabetical order)
  3. core/runnables/test_main_loop.py::TestMainLoop::test_batch_dp_thread_can_process_batch — fixture mock_initialize_batch_count_executor sends OpExecWithCode(inspect.getsource(CountBatchOperator), "python"). The handler calls executor_manager.initialize_executor(code, ...), which calls load_executor_definition(code). The new ExecutorManager starts at executor_version = 0 → generates udf-v1. The cached entry from step 1 wins.
  4. The test eventually does assert executor.count == 1 and gets AttributeError: 'TestOperator' object has no attribute 'count'.

The same code on Python 3.10 / 3.12 / 3.13 of the same backport job, and on the direct build / python (3.11) job for the same PR, both pass — the 3.11 importlib path on this particular fs/timing combination is what trips the cache. PR #4636 (pip → uv install switch) does not introduce the bug; it merely shifts transitive package versions and timing enough to change the latent collision rate.

Branch

main (also reproducible on release/v1.1.0-incubating)

Commit Hash (Optional)

8ce4ad5

Relevant log output

core/runnables/test_main_loop.py:851: AttributeError
>       assert executor.count == 1
E       AttributeError: 'TestOperator' object has no attribute 'count'
================== 1 failed, 218 passed, 5 warnings in 45.60s ==================

Likely fix direction

The collision goes away if module names are unique per ExecutorManager instance instead of starting at udf-v1 every time. Two reasonable shapes:

  • Per-instance UUID prefixmodule_name = f"udf-{uuid.uuid4().hex}-v{version}". Names never collide across specs, the clear+reload branch becomes unreachable in tests, and production behavior is unchanged.
  • Lifecycle-aware close — also pop self.operator_module_name from sys.modules and remove the temp dir from sys.path in close(). Strictly fewer leaks but still relies on every test path calling close.

A is the smaller, more defensive change.

Out of scope

  • Reproducing this with uv vs pip. The cause is the static module name and sys.modules reuse; transitive package versions only affect the timing.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions