[Fix](pyudf) Fix concurrent race condition when import module#61280
[Fix](pyudf) Fix concurrent race condition when import module#61280hello-stephen merged 4 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 27357 ms |
TPC-DS: Total hot run time: 153468 ms |
|
run beut |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Code Review Summary
PR Goal: Fix concurrent race condition when importing Python UDF modules with the same name from different locations. The root cause is that sys.modules is a global dict keyed by module name only, but the import lock was keyed by (location, module_name), allowing two threads to concurrently modify sys.modules for the same module name without mutual exclusion.
Verdict: The fix is logically correct and well-scoped. Changing the _import_locks key from (location, module_name) to just module_name properly serializes concurrent imports of same-named modules regardless of their source location. Two minor issues found (stale comments and a formatting nit).
Critical Checkpoint Conclusions
- Does the code accomplish its goal? Yes. By keying the import lock on
module_namealone, two threads importing the same-named module from different locations will now properly serialize, preventing thesys.modulesrace condition. - Is this modification small, clear, and focused? Yes. Only changes the lock key granularity, preserving all other logic.
- Concurrency correctness? The lock is now correctly aligned with the shared resource (
sys.modules, keyed by module name). The_module_cacheremains keyed by(location, module_name)which is correct since it needs location-awareness. The double-checked locking pattern on_import_locksitself is fine since Python's GIL provides safe dict reads. - Are there functionally parallel code paths? The
_clear_modules_from_locationcall site is also updated consistently. - Configuration items? None added. N/A.
- Incompatible changes? None. Internal-only change.
- Test coverage? No test added. This is a concurrency race condition that is inherently difficult to reproduce in a unit test. The scenario is well-described in the PR body and related PR #60630. A regression test exercising concurrent same-named module imports from different locations would be ideal but is understandably hard to write deterministically.
- Observability? N/A for this change.
- Performance? Slightly broader lock scope (per-module vs per-location+module), meaning imports of the same module name from different locations will now serialize. This is correct behavior and the performance impact is negligible since module imports are infrequent.
- Other issues? Two stale comments found (see inline comments), plus one minor formatting nit.
| # Use a per-(location, module) lock to prevent race conditions during import | ||
| import_lock = ModuleUDFLoader._get_import_lock(location, full_module_name) | ||
| import_lock = ModuleUDFLoader._get_import_lock(full_module_name) | ||
|
|
There was a problem hiding this comment.
Nit: Stale comment. This now uses a per-module lock (not per-(location, module) lock). Should be updated to:
# Use a per-module lock to prevent race conditions during import
zclllyybb
left a comment
There was a problem hiding this comment.
need more investigation
|
run buildall |
TPC-H: Total hot run time: 26720 ms |
TPC-DS: Total hot run time: 167763 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 26551 ms |
TPC-DS: Total hot run time: 168525 ms |
8dcf3ed to
61785f6
Compare
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
61785f6 to
7538c27
Compare
|
run buildall |
TPC-H: Total hot run time: 26538 ms |
TPC-DS: Total hot run time: 169310 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #60630
Problem Summary:
Release note
modules_nameas a key to distinguish imported modules, but the current behavior useslocation + modules_nameas a key to differentiate and acquire a lock for lib loading, which can cause a concurrent race condition when two threads import the same named module from different locations at the same time.In the
python_server, thelocationis the cache directory, composed of thefunction_id. Thus, the internalmodule_cacheonly needslocationas its key. After importing a cache, immediately remove the mapping maintained insys.now
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)