HDDS-8740. Commit batch in SCMHADBTransactionBufferImpl on close() by adoroszlai · Pull Request #4866 · apache/ozone

adoroszlai · 2023-06-09T13:04:19Z

What changes were proposed in this pull request?

TestScmHAFinalization intermittently reports Found 2 leaked objects (CodecBuffer instances). FinalizationStateManagerImpl is leaking a pair of CodecBuffers allocated for writing layout version to SCM metadata store:

2023-06-08 19:50:12,664 [Finalizer] WARN  db.CodecBuffer (CodecBuffer.java:finalize(129)) - LEAK 1: org.apache.hadoop.hdds.utils.db.CodecBuffer@292f4cc4, refCnt=1, capacity=3 allocation:
org.apache.hadoop.hdds.utils.db.CodecBuffer.allocate(CodecBuffer.java:74)
...
org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:172)
org.apache.hadoop.hdds.scm.ha.SCMHADBTransactionBufferImpl.addToBuffer(SCMHADBTransactionBufferImpl.java:70)
org.apache.hadoop.hdds.scm.server.upgrade.FinalizationStateManagerImpl.finalizeLayoutFeatureLocal(FinalizationStateManagerImpl.java:156)
org.apache.hadoop.hdds.scm.server.upgrade.FinalizationStateManagerImpl.reinitialize(FinalizationStateManagerImpl.java:257)
...
org.apache.hadoop.hdds.scm.server.upgrade.FinalizationManagerImpl.reinitialize(FinalizationManagerImpl.java:147)
org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.startServices(SCMHAManagerImpl.java:445)
org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.reloadSCMState(SCMHAManagerImpl.java:357)
org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.installCheckpoint(SCMHAManagerImpl.java:308)
org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.installCheckpoint(SCMHAManagerImpl.java:258)
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.reinitialize(SCMStateMachine.java:391)
org.apache.ratis.server.impl.StateMachineUpdater.reload(StateMachineUpdater.java:218)

finalizeLayoutFeatureLocal adds these to the transactionBuffer, collecting them in a batch operation:

ozone/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/upgrade/FinalizationStateManagerImpl.java

Lines 156 to 157 in b479a38

    
           transactionBuffer.addToBuffer(finalizationStore, 
        
               OzoneConsts.LAYOUT_VERSION_KEY, String.valueOf(layoutVersion));

CodecBuffers are released on commit, which happens when transaction buffer is flushed.

ozone/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBBatchOperation.java

Lines 333 to 339 in b479a38

    
           public void commit(RocksDatabase db) throws IOException { 
        
             debug(() -> String.format("%s: commit %s", 
        
                 name, opCache.getCommitString())); 
        
             try (Closeable ignored = opCache.prepareBatchWrite()) { 
        
               db.batchWrite(writeBatch); 
        
             } 
        
           }

The problem is that the buffer may be closed without being flushed. In this case the contents of the current batch, if any, are not committed. (The batch is closed, so RocksDB objects are not leaked.)

We could fix it by explicitly flushing the buffer when finalization completes. However, the problem may be more generic, any metadata stored by SCM via the transaction buffer may be lost. So this PR proposes to commit the in-progress batch operation, if any, before closing it.

Also:

Add a closed flag. Allow closing only once. Reject operations if closed.
Add precondition for currentBatchOperation being non-null.
Replace AtomicLong txFlushPending with a boolean, since the specific number is not important, only > 0 or == 0 cases are distinguished.
Change SCMMetadataStore to final both in SCMHADBTransactionBufferImpl and in StorageContainerManager

https://issues.apache.org/jira/browse/HDDS-8740

How was this patch tested?

TestScmHAFinalization#testSnapshotFinalization passed in 30x10 runs:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/5221131625

Regular CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/5221128846

szetszwo · 2023-06-10T07:47:32Z

@adoroszlai , how about we clear the cache in close? Then, it will also work for OM, datanodes and other cases.

+++ b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBBatchOperation.java
@@ -350,6 +350,7 @@ public void commit(RocksDatabase db, ManagedWriteOptions writeOptions)
   @Override
   public void close() {
     debug(() -> String.format("%s: close", name));
+    opCache.clear();
     writeBatch.close();
   }

adoroszlai · 2023-06-10T08:47:30Z

how about we clear the cache in close? Then, it will also work for OM, datanodes and other cases.

Thanks @szetszwo for the suggestion. After posting this PR I was thinking along the same lines, but haven't updated the PR yet.

adoroszlai@7a4448c

Can we assume that all batches should be committed (to avoid data loss)? If so, we might want to combine commit and close.

szetszwo · 2023-06-11T07:14:17Z

Can we assume that all batches should be committed (to avoid data loss)? ...

We should support close without commit. It is like abort. It is useful when a sequence of operations failing in the middle (e.g. Atomically put 10 records but it fails to read the 8th record from another source.) Then, it should abort instead of committing partially.

szetszwo · 2023-06-12T17:13:24Z

@adoroszlai , after HDDS-8803, do we still need this?

adoroszlai · 2023-06-12T18:27:40Z

after HDDS-8803, do we still need this?

@szetszwo This is no longer necessary for the leak. It would be nice if @errose28 could check if losing the layout version due to uncommitted batch (buffer not flushed) may be a problem.

errose28 · 2023-06-13T22:50:30Z

In what cases would the buffer be closed before flush other than shutdown? Shutdown should not be a problem since the information would be replayed from the Ratis log on restart. If the buffer is closed and entries are not flushed but the SCM keeps accepting transactions, that seems like a larger problem outside of finalization.

adoroszlai · 2023-06-14T04:25:34Z

should not be a problem since the information would be replayed from the Ratis log on restart

Then fixing the leak itself in HDDS-8803 seems enough.

HDDS-8740. Flush SCM HA transaction buffer on close

7736f06

adoroszlai self-assigned this Jun 9, 2023

adoroszlai requested review from duongkame, errose28, sumitagrawl and szetszwo June 9, 2023 13:05

adoroszlai closed this Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-8740. Commit batch in SCMHADBTransactionBufferImpl on close()#4866

HDDS-8740. Commit batch in SCMHADBTransactionBufferImpl on close()#4866
adoroszlai wants to merge 1 commit intoapache:masterfrom
adoroszlai:HDDS-8740

adoroszlai commented Jun 9, 2023

Uh oh!

szetszwo commented Jun 10, 2023

Uh oh!

adoroszlai commented Jun 10, 2023 •

edited

Loading

Uh oh!

szetszwo commented Jun 11, 2023

Uh oh!

szetszwo commented Jun 12, 2023

Uh oh!

adoroszlai commented Jun 12, 2023

Uh oh!

errose28 commented Jun 13, 2023

Uh oh!

adoroszlai commented Jun 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

	transactionBuffer.addToBuffer(finalizationStore,
	OzoneConsts.LAYOUT_VERSION_KEY, String.valueOf(layoutVersion));

	public void commit(RocksDatabase db) throws IOException {
	debug(() -> String.format("%s: commit %s",
	name, opCache.getCommitString()));
	try (Closeable ignored = opCache.prepareBatchWrite()) {
	db.batchWrite(writeBatch);
	}
	}

Conversation

adoroszlai commented Jun 9, 2023

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

szetszwo commented Jun 10, 2023

Uh oh!

adoroszlai commented Jun 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szetszwo commented Jun 11, 2023

Uh oh!

szetszwo commented Jun 12, 2023

Uh oh!

adoroszlai commented Jun 12, 2023

Uh oh!

errose28 commented Jun 13, 2023

Uh oh!

adoroszlai commented Jun 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

adoroszlai commented Jun 10, 2023 •

edited

Loading