Skip to content

Fix gc lock removals#6166

Merged
ddanielr merged 3 commits intoapache:2.1from
ddanielr:bugfix/fix-gc-lock-deletions
Mar 4, 2026
Merged

Fix gc lock removals#6166
ddanielr merged 3 commits intoapache:2.1from
ddanielr:bugfix/fix-gc-lock-deletions

Conversation

@ddanielr
Copy link
Contributor

@ddanielr ddanielr commented Mar 3, 2026

Does not delete the top level gc/lock/ path, instead it just deletes all locks underneath the path.

Added validation checks from other lock deletion code paths.
Updates ITs and the Admin utility to delete gc locks the same way.

Fixes IT failures caused by #6077

Does not delete the top level `gc/lock/` path, instead it just deletes
all locks underneath the path.

Updates ITs and the Admin utility to delete gc locks the same way.
@ddanielr ddanielr added this to the 2.1.5 milestone Mar 3, 2026
Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which test was failing? Was there a problem with the ZooZap code?

@ddanielr
Copy link
Contributor Author

ddanielr commented Mar 4, 2026

Which test was failing?

BigRootTabletIT.java and AdvertiseAndBindIT.java were the two I noticed failing, but any test that stopped and started the cluster should have been failing.

Was there a problem with the ZooZap code?

The ZooZap code was incorrect. #6077 replaced the use of removeSingletonLock with a new method in ServiceLock.

removeSingletonLock calls zapDirectory first which takes a path and only calls recursiveDelete for each child of the path.

The new ServiceLock code was calling recursiveDelete directly with the top-level path. So it was removing all the gc locks, but then also removing the path node gc/lock so new gc processes could not find the gc/lock path node so all new locks were failing.

Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new ServiceLock code was calling recursiveDelete directly with the top-level path. So it was removing all the gc locks

Ok, so it seems after these changes that the ServiceLock method and ZooZap.removeSingletonLock now behave in a very similar way, except they seem to report/log differently. Noticed that could lead to inconsistent logging/printing.

@ddanielr
Copy link
Contributor Author

ddanielr commented Mar 4, 2026

The new ServiceLock code was calling recursiveDelete directly with the top-level path. So it was removing all the gc locks

Ok, so it seems after these changes that the ServiceLock method and ZooZap.removeSingletonLock now behave in a very similar way, except they seem to report/log differently. Noticed that could lead to inconsistent logging/printing.

Yes I think we can iterate a bit further on this and remove the removeSingletonLock method from ZooZap.
I was trying to get all of the lock removals moved into ServiceLock so the lock creation and deletion is centralized.

ddanielr added 2 commits March 4, 2026 18:15
Switches to using ServiceLock.deleteLocks for tserver locks so lock
deletes can show up in the Admin log (if debug is enabled).

Fixed a bug where the command would fail if no scan servers were running.
Added the ability to also force stop the compaction-coordinator server
getLockData() also throws NoNodeException if nothing exists.
@ddanielr ddanielr merged commit b9cdb06 into apache:2.1 Mar 4, 2026
9 checks passed
@ddanielr ddanielr deleted the bugfix/fix-gc-lock-deletions branch March 4, 2026 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants