Skip to content

[fix](cloud) Deduplicate pending one-shot warm up jobs#62384

Open
freemandealer wants to merge 3 commits into
apache:masterfrom
freemandealer:task-master-pick-pr-8320-to-master
Open

[fix](cloud) Deduplicate pending one-shot warm up jobs#62384
freemandealer wants to merge 3 commits into
apache:masterfrom
freemandealer:task-master-pick-pr-8320-to-master

Conversation

@freemandealer
Copy link
Copy Markdown
Member

@freemandealer freemandealer commented Apr 11, 2026

Proposed changes
Issue Number: N/A

Deduplicate equivalent PENDING one-shot TABLE warm up jobs by destination cluster, normalized table set, and force flag.
Deduplicate equivalent PENDING one-shot CLUSTER warm up jobs by source/destination cluster pair.
Reuse the oldest matching pending job and return its job id instead of appending another pending duplicate.
Keep RUNNING jobs out of deduplication and preserve the existing PERIODIC / EVENT_DRIVEN behavior.
Add unit tests for table/cluster deduplication, replay handling, and regression coverage.
Test:

Not run in this backport branch per request.

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@freemandealer
Copy link
Copy Markdown
Member Author

run buildall

2 similar comments
@freemandealer
Copy link
Copy Markdown
Member Author

run buildall

@freemandealer
Copy link
Copy Markdown
Member Author

run buildall

return lock;
}
ReentrantLock newLock = new ReentrantLock();
ReentrantLock existingLock = oncePendingCreateLocks.putIfAbsent(key, newLock);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oncePendingCreateLocks will keep growing, which means memory-leaks ?

gavinchou
gavinchou previously approved these changes Apr 11, 2026
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@freemandealer
Copy link
Copy Markdown
Member Author

run buildall

freemandealer and others added 3 commits May 26, 2026 21:01
Issue Number: N/A

Related PR: selectdb/selectdb-core#8320

Problem Summary: Equivalent one-shot warm up requests could accumulate duplicate PENDING jobs for the same destination. Reuse the oldest matching pending job for TABLE and CLUSTER once jobs instead of appending another duplicate.

None

- Test: No need to test (picked directly per request without compile/test)
- Behavior changed: Yes (equivalent pending one-shot warm up requests now reuse the existing job id)
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Replace once-pending warm up creation locks with reference-counted per-key locks so dedupe entries are removed after the last holder or waiter exits instead of growing without bound.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.cloud.cache.CacheHotspotManagerTest (with CUSTOM_MVN=/tmp/mvn-fe-core-am)
- Behavior changed: Yes (once warm up dedupe lock entries are released after the last holder or waiter exits; duplicate job reuse behavior stays the same)
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: apache#8320

Problem Summary: After rebasing onto latest master, FE build failed because Immutables generated FlightAuthResult code references SuppressFBWarnings but fe-core did not include the findbugs annotations artifact. Full FE UT also failed because the COS test expected constructor-time region validation although the typed S3-compatible properties now support endpoint-only configs, the authentication handler test expected an OIDC plugin that is not present in the current authentication plugin modules, the warm up job lock concurrency test relied on a main-thread static Env mock from worker threads, and java-common readable vector metadata omitted the const-column flag that its reader and meta size accounting expect. This change adds the annotation dependency for generated Immutables code, updates the COS missing-region assertion to the native presigned URL path, aligns authentication plugin manager tests with the password and LDAP plugins available on the test classpath, registers the warm up lock concurrency test Env mock inside each worker thread, and writes the const flag into vector metadata to keep writable and readable metadata layouts consistent.

### Release note

None

### Check List (For Author)

- Test:
    - ./build.sh --fe
    - ./run-fe-ut.sh
    - mvn test -pl fe-filesystem/fe-filesystem-cos -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=CosObjStorageTest
    - mvn test -pl fe-authentication/fe-authentication-handler -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=AuthenticationPluginManagerTest
    - mvn test -pl fe-core -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=org.apache.doris.cloud.cache.CacheHotspotManagerTest#testConcurrentCreateClusterOnceJobReleasesRefCountedLockAfterWaiterCompletes
    - mvn test -pl be-java-extensions/java-common -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=JniScannerTest
- Behavior changed: No
- Does this need documentation: No
@freemandealer freemandealer force-pushed the task-master-pick-pr-8320-to-master branch from 756bb4e to d2b8bbb Compare May 26, 2026 18:28
<version>${project.version}</version>
<scope>test</scope>
</dependency>
<dependency>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this change

}

public void updateMeta(VectorColumn meta) {
meta.appendLong(isConst ? 1 : 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this change for?

@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants