Skip to content

[AMORO-4103] Release table runtime when the group name is changed to a non-existent resource group#4104

Open
xxubai wants to merge 6 commits intoapache:masterfrom
xxubai:fix-pending-hang
Open

[AMORO-4103] Release table runtime when the group name is changed to a non-existent resource group#4104
xxubai wants to merge 6 commits intoapache:masterfrom
xxubai:fix-pending-hang

Conversation

@xxubai
Copy link
Contributor

@xxubai xxubai commented Mar 3, 2026

When a table's optimizer group configuration is changed (e.g., from "old_group" to "default"), if the AMS restarts before the optimizing process is properly closed, the table runtime may remain stuck in PENDING status forever. This happens in two scenarios:

  • AMS restart with non-existent resource group: During loadOptimizingQueues, tables whose persisted optimizer group no longer exists are left in a groupToTableRuntimes map but never released — the previous code only logged a warning without taking any corrective action.
  • Runtime config change to non-existent group: When handleConfigChanged is triggered and the table's new optimizer group does not exist, there is no fallback to release the table's optimizing process, causing it to hang indefinitely.

Why are the changes needed?

Close #4103.

Brief change log

  • DefaultOptimizingService.loadOptimizingQueues(): Replaced the original logic that merely logged a warning for unloaded table runtimes in non-existent groups. Now, tables in PLANNING or PENDING status whose resource group does not exist are actively released back to IDLE via completeEmptyProcess().

  • DefaultOptimizingService.ConfigChangeHandler.handleConfigChanged(): After refreshing the table in the new group's queue, added logic to release the table from the queue. If the new group's queue does not exist, completeEmptyProcess() is called directly on the table runtime to prevent it from being stuck.

  • TestOptimizingQueue: Added two new test cases:

    • testReleaseOrphanedPlanningTableOnRestart: Verifies that a table persisted with PLANNING status in a non-existent group is correctly released to IDLE during AMS restart.
    • testReleaseOrphanedPendingTableOnRestart: Verifies the same behavior for tables persisted with PENDING status.
    • Added helper method simulateLoadOptimizingQueuesForNonExistentGroup() to simulate the loadOptimizingQueues logic in tests.
  • TestDefaultOptimizingService: Added two new test cases:

    • testHandleConfigChangedGroupChanged: Verifies that when the optimizer group changes to a different existing group, the table is properly released from both old and new queues without exceptions.
    • testHandleConfigChangedGroupNotExist: Verifies that when the optimizer group changes to a non-existent group, completeEmptyProcess() is called on the table runtime.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? No
  • If yes, how is the feature documented? Not applicable

…t resource group.

Improve handling of orphaned PLANNING/PENDING tables
@github-actions github-actions bot added the module:ams-server Ams server module label Mar 3, 2026
@xxubai xxubai changed the title Release table runtime when the group name is changed to a non-existent resource group [AMORO-4103] Release table runtime when the group name is changed to a non-existent resource group Mar 3, 2026
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 22.44%. Comparing base (e02295b) to head (45b4013).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #4104      +/-   ##
============================================
- Coverage     22.44%   22.44%   -0.01%     
  Complexity     2552     2552              
============================================
  Files           458      458              
  Lines         42022    42036      +14     
  Branches       5915     5917       +2     
============================================
  Hits           9433     9433              
- Misses        31777    31791      +14     
  Partials        812      812              
Flag Coverage Δ
trino 22.44% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@j1wonpark j1wonpark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! The loadOptimizingQueues change makes sense to me. I have a question about the handleConfigChanged part.

@xxubai xxubai requested a review from j1wonpark March 6, 2026 06:24
Copy link
Contributor

@j1wonpark j1wonpark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! The approach looks clean — handling both AMS restart and config-change scenarios makes sense, and completeEmptyProcess() being idempotent keeps it safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ams-server Ams server module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Tables stuck in PENDING status forever after clear self-optimizing config

3 participants