GC deleted live tservers WAL node in zookeeper.

**Describe the bug**

Saw this while testing #6217 3908c7e710de2b2fd7f2c453f79f1bb0385453a7, but suspect its a more general issue in main as wal/tserver/GC code was not modified.

At startup tservers create a node in ZK to track their active WALs.  Later this node must exists when the tserver creates a WAL.

In the tserver logs saw it obtained it lock

```
2026-04-03T19:44:16,303 Thread[57] [tserver.TabletServer] DEBUG: Obtained tablet server lock /tservers/accumulo/localhost:9800/zlock#3e1f9d39-5a26-4794-95c1-5238103b9ac8#0000000000 localhost:9800[10000ff8e2b001a]
```

Then a bit later the GC process removed the tservers WAL node in ZK.  The GC only does this if the tserver is not in the live tserver set and it has no wals registered in ZK.

```
2026-04-03T19:44:50,545 Thread[52] [gc.GarbageCollectWriteAheadLogs] INFO : Removing znode for localhost:9800[10000ff8e2b001a]
```

Much later the tsever failed to create a new WAL because its node was deleted in ZK.  This caused all writes on the tserer to hang.

```
2026-04-03T21:28:19,818 Thread[142] [log.TabletServerLogger] ERROR: Failed to add new WAL marker for hdfs://10.113.13.85:8020/accumulo/wal/localhost+9800/0286a9ab-1735-428f-b091-263b2e42396b
org.apache.accumulo.server.log.WalStateManager$WalMarkerException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /wals/localhost:9800[10000ff8e2b001a]/0286a9ab-1735-428f-b091-263b2e42396b
	at org.apache.accumulo.server.log.WalStateManager.updateState(WalStateManager.java:138)
	at org.apache.accumulo.server.log.WalStateManager.addNewWalMarker(WalStateManager.java:124)
	at org.apache.accumulo.tserver.TabletServer.addNewLogMarker(TabletServer.java:1032)
	at org.apache.accumulo.tserver.log.TabletServerLogger.lambda$startLogMaker$0(TabletServerLogger.java:294)
	at org.apache.accumulo.core.util.threads.Threads.lambda$createCriticalThread$0(Threads.java:76)
	at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /wals/localhost:9800[10000ff8e2b001a]/0286a9ab-1735-428f-b091-263b2e42396b
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:117)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:53)
	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1347)
	at org.apache.accumulo.core.zookeeper.ZooSession.create(ZooSession.java:272)
	at org.apache.accumulo.core.fate.zookeeper.ZooReaderWriter.lambda$putPersistentData$1(ZooReaderWriter.java:92)
	at org.apache.accumulo.core.fate.zookeeper.ZooReader.retryLoopMutator(ZooReader.java:174)
	at org.apache.accumulo.core.fate.zookeeper.ZooReader.retryLoop(ZooReader.java:153)
	at org.apache.accumulo.core.fate.zookeeper.ZooReaderWriter.putPersistentData(ZooReaderWriter.java:90)
	at org.apache.accumulo.core.fate.zookeeper.ZooReaderWriter.putPersistentData(ZooReaderWriter.java:65)
	at org.apache.accumulo.server.log.WalStateManager.updateState(WalStateManager.java:136)
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC deleted live tservers WAL node in zookeeper. #6298

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

GC deleted live tservers WAL node in zookeeper. #6298

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions