What happened?
ExecutorManager reuses fixed module names (udf-v1, udf-v2, ...) per instance and registers them in sys.modules via importlib.import_module. Each test fixture creates a fresh ExecutorManager with executor_version = 0, so every spec's first executor lands on the name udf-v1. The teardown only closes the temp filesystem; it does not remove the entry from sys.modules or pop the temp directory from sys.path.
When the second spec hits udf-v1, load_executor_definition enters the cached branch:
# amber/src/main/python/core/architecture/managers/executor_manager.py
if module_name in sys.modules:
executor_module = importlib.import_module(module_name)
executor_module.__dict__.clear()
executor_module.__dict__["__name__"] = module_name
executor_module = importlib.reload(executor_module)
reload() then tries to refind the module by name. On CPython 3.11 (Linux, ubuntu-latest), the path lookup occasionally still resolves to the previous spec / loader cache instead of the freshly-written udf-v1.py in the new temp filesystem. The cached class definition (TestOperator from core/architecture/managers/test_executor_manager.py's SAMPLE_OPERATOR_CODE) leaks into the next spec's executor, and the new spec's assertions on the expected class (e.g. CountBatchOperator.count) fail with an AttributeError.
Repro
CI run: https://github.com/apache/texera/actions/runs/25263695023/job/74074970899?pr=4636 — backport (release/v1.1.0-incubating) / python (ubuntu-latest, 3.11).
Test order in that run:
core/architecture/managers/test_executor_manager.py::TestExecutorManager::test_accept_python_language_regular_operator — passes; loads udf-v1 with class TestOperator(UDFOperatorV2) from SAMPLE_OPERATOR_CODE.
- (~30 specs later, alphabetical order)
core/runnables/test_main_loop.py::TestMainLoop::test_batch_dp_thread_can_process_batch — fixture mock_initialize_batch_count_executor sends OpExecWithCode(inspect.getsource(CountBatchOperator), "python"). The handler calls executor_manager.initialize_executor(code, ...), which calls load_executor_definition(code). The new ExecutorManager starts at executor_version = 0 → generates udf-v1. The cached entry from step 1 wins.
- The test eventually does
assert executor.count == 1 and gets AttributeError: 'TestOperator' object has no attribute 'count'.
The same code on Python 3.10 / 3.12 / 3.13 of the same backport job, and on the direct build / python (3.11) job for the same PR, both pass — the 3.11 importlib path on this particular fs/timing combination is what trips the cache. PR #4636 (pip → uv install switch) does not introduce the bug; it merely shifts transitive package versions and timing enough to change the latent collision rate.
Branch
main (also reproducible on release/v1.1.0-incubating)
Commit Hash (Optional)
8ce4ad5
Relevant log output
core/runnables/test_main_loop.py:851: AttributeError
> assert executor.count == 1
E AttributeError: 'TestOperator' object has no attribute 'count'
================== 1 failed, 218 passed, 5 warnings in 45.60s ==================
Likely fix direction
The collision goes away if module names are unique per ExecutorManager instance instead of starting at udf-v1 every time. Two reasonable shapes:
- Per-instance UUID prefix —
module_name = f"udf-{uuid.uuid4().hex}-v{version}". Names never collide across specs, the clear+reload branch becomes unreachable in tests, and production behavior is unchanged.
- Lifecycle-aware close — also pop
self.operator_module_name from sys.modules and remove the temp dir from sys.path in close(). Strictly fewer leaks but still relies on every test path calling close.
A is the smaller, more defensive change.
Out of scope
- Reproducing this with
uv vs pip. The cause is the static module name and sys.modules reuse; transitive package versions only affect the timing.
What happened?
ExecutorManagerreuses fixed module names (udf-v1,udf-v2, ...) per instance and registers them insys.modulesviaimportlib.import_module. Each test fixture creates a freshExecutorManagerwithexecutor_version = 0, so every spec's first executor lands on the nameudf-v1. The teardown only closes the temp filesystem; it does not remove the entry fromsys.modulesor pop the temp directory fromsys.path.When the second spec hits
udf-v1,load_executor_definitionenters the cached branch:reload()then tries to refind the module by name. On CPython 3.11 (Linux, ubuntu-latest), the path lookup occasionally still resolves to the previous spec / loader cache instead of the freshly-writtenudf-v1.pyin the new temp filesystem. The cached class definition (TestOperatorfromcore/architecture/managers/test_executor_manager.py'sSAMPLE_OPERATOR_CODE) leaks into the next spec's executor, and the new spec's assertions on the expected class (e.g.CountBatchOperator.count) fail with anAttributeError.Repro
CI run: https://github.com/apache/texera/actions/runs/25263695023/job/74074970899?pr=4636 —
backport (release/v1.1.0-incubating) / python (ubuntu-latest, 3.11).Test order in that run:
core/architecture/managers/test_executor_manager.py::TestExecutorManager::test_accept_python_language_regular_operator— passes; loadsudf-v1withclass TestOperator(UDFOperatorV2)fromSAMPLE_OPERATOR_CODE.core/runnables/test_main_loop.py::TestMainLoop::test_batch_dp_thread_can_process_batch— fixturemock_initialize_batch_count_executorsendsOpExecWithCode(inspect.getsource(CountBatchOperator), "python"). The handler callsexecutor_manager.initialize_executor(code, ...), which callsload_executor_definition(code). The newExecutorManagerstarts atexecutor_version = 0→ generatesudf-v1. The cached entry from step 1 wins.assert executor.count == 1and getsAttributeError: 'TestOperator' object has no attribute 'count'.The same code on Python 3.10 / 3.12 / 3.13 of the same backport job, and on the direct
build / python (3.11)job for the same PR, both pass — the 3.11 importlib path on this particular fs/timing combination is what trips the cache. PR #4636 (pip → uv install switch) does not introduce the bug; it merely shifts transitive package versions and timing enough to change the latent collision rate.Branch
main (also reproducible on
release/v1.1.0-incubating)Commit Hash (Optional)
8ce4ad5
Relevant log output
Likely fix direction
The collision goes away if module names are unique per
ExecutorManagerinstance instead of starting atudf-v1every time. Two reasonable shapes:module_name = f"udf-{uuid.uuid4().hex}-v{version}". Names never collide across specs, the clear+reload branch becomes unreachable in tests, and production behavior is unchanged.self.operator_module_namefromsys.modulesand remove the temp dir fromsys.pathinclose(). Strictly fewer leaks but still relies on every test path callingclose.A is the smaller, more defensive change.
Out of scope
uvvspip. The cause is the static module name andsys.modulesreuse; transitive package versions only affect the timing.