Skip to content

HDDS-8740. Commit batch in SCMHADBTransactionBufferImpl on close()#4866

Closed
adoroszlai wants to merge 1 commit intoapache:masterfrom
adoroszlai:HDDS-8740
Closed

HDDS-8740. Commit batch in SCMHADBTransactionBufferImpl on close()#4866
adoroszlai wants to merge 1 commit intoapache:masterfrom
adoroszlai:HDDS-8740

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

TestScmHAFinalization intermittently reports Found 2 leaked objects (CodecBuffer instances). FinalizationStateManagerImpl is leaking a pair of CodecBuffers allocated for writing layout version to SCM metadata store:

2023-06-08 19:50:12,664 [Finalizer] WARN  db.CodecBuffer (CodecBuffer.java:finalize(129)) - LEAK 1: org.apache.hadoop.hdds.utils.db.CodecBuffer@292f4cc4, refCnt=1, capacity=3 allocation:
org.apache.hadoop.hdds.utils.db.CodecBuffer.allocate(CodecBuffer.java:74)
...
org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:172)
org.apache.hadoop.hdds.scm.ha.SCMHADBTransactionBufferImpl.addToBuffer(SCMHADBTransactionBufferImpl.java:70)
org.apache.hadoop.hdds.scm.server.upgrade.FinalizationStateManagerImpl.finalizeLayoutFeatureLocal(FinalizationStateManagerImpl.java:156)
org.apache.hadoop.hdds.scm.server.upgrade.FinalizationStateManagerImpl.reinitialize(FinalizationStateManagerImpl.java:257)
...
org.apache.hadoop.hdds.scm.server.upgrade.FinalizationManagerImpl.reinitialize(FinalizationManagerImpl.java:147)
org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.startServices(SCMHAManagerImpl.java:445)
org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.reloadSCMState(SCMHAManagerImpl.java:357)
org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.installCheckpoint(SCMHAManagerImpl.java:308)
org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.installCheckpoint(SCMHAManagerImpl.java:258)
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.reinitialize(SCMStateMachine.java:391)
org.apache.ratis.server.impl.StateMachineUpdater.reload(StateMachineUpdater.java:218)

finalizeLayoutFeatureLocal adds these to the transactionBuffer, collecting them in a batch operation:

transactionBuffer.addToBuffer(finalizationStore,
OzoneConsts.LAYOUT_VERSION_KEY, String.valueOf(layoutVersion));

CodecBuffers are released on commit, which happens when transaction buffer is flushed.

public void commit(RocksDatabase db) throws IOException {
debug(() -> String.format("%s: commit %s",
name, opCache.getCommitString()));
try (Closeable ignored = opCache.prepareBatchWrite()) {
db.batchWrite(writeBatch);
}
}

The problem is that the buffer may be closed without being flushed. In this case the contents of the current batch, if any, are not committed. (The batch is closed, so RocksDB objects are not leaked.)

We could fix it by explicitly flushing the buffer when finalization completes. However, the problem may be more generic, any metadata stored by SCM via the transaction buffer may be lost. So this PR proposes to commit the in-progress batch operation, if any, before closing it.

Also:

  • Add a closed flag. Allow closing only once. Reject operations if closed.
  • Add precondition for currentBatchOperation being non-null.
  • Replace AtomicLong txFlushPending with a boolean, since the specific number is not important, only > 0 or == 0 cases are distinguished.
  • Change SCMMetadataStore to final both in SCMHADBTransactionBufferImpl and in StorageContainerManager

https://issues.apache.org/jira/browse/HDDS-8740

How was this patch tested?

TestScmHAFinalization#testSnapshotFinalization passed in 30x10 runs:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/5221131625

Regular CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/5221128846

@szetszwo
Copy link
Contributor

@adoroszlai , how about we clear the cache in close? Then, it will also work for OM, datanodes and other cases.

+++ b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBBatchOperation.java
@@ -350,6 +350,7 @@ public void commit(RocksDatabase db, ManagedWriteOptions writeOptions)
   @Override
   public void close() {
     debug(() -> String.format("%s: close", name));
+    opCache.clear();
     writeBatch.close();
   }

@adoroszlai
Copy link
Contributor Author

adoroszlai commented Jun 10, 2023

how about we clear the cache in close? Then, it will also work for OM, datanodes and other cases.

Thanks @szetszwo for the suggestion. After posting this PR I was thinking along the same lines, but haven't updated the PR yet.

adoroszlai@7a4448c

Can we assume that all batches should be committed (to avoid data loss)? If so, we might want to combine commit and close.

@szetszwo
Copy link
Contributor

Can we assume that all batches should be committed (to avoid data loss)? ...

We should support close without commit. It is like abort. It is useful when a sequence of operations failing in the middle (e.g. Atomically put 10 records but it fails to read the 8th record from another source.) Then, it should abort instead of committing partially.

@szetszwo
Copy link
Contributor

@adoroszlai , after HDDS-8803, do we still need this?

@adoroszlai
Copy link
Contributor Author

after HDDS-8803, do we still need this?

@szetszwo This is no longer necessary for the leak. It would be nice if @errose28 could check if losing the layout version due to uncommitted batch (buffer not flushed) may be a problem.

@errose28
Copy link
Contributor

In what cases would the buffer be closed before flush other than shutdown? Shutdown should not be a problem since the information would be replayed from the Ratis log on restart. If the buffer is closed and entries are not flushed but the SCM keeps accepting transactions, that seems like a larger problem outside of finalization.

@adoroszlai
Copy link
Contributor Author

should not be a problem since the information would be replayed from the Ratis log on restart

Then fixing the leak itself in HDDS-8803 seems enough.

@adoroszlai adoroszlai closed this Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments