Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10134. ManagedReadOptions is not closed properly #6013

Merged
merged 4 commits into from
Jan 19, 2024

Conversation

adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

RocksDatabase.newIterator(ColumnFamily, boolean) returns an iterator whose readOptions is closed right after creating the iterator. This seems wrong, since closing it invalidates the native handle. I'm not sure how this can result in leakage of the readOptions object as reported by the leak detector:

[LeakDetector-ManagedRocksObject0] WARN  managed.ManagedRocksObjectUtils (ManagedRocksObjectUtils.java:reportLeak(62)) - ManagedReadOptions is not closed properly
StackTrace for unclosed instance: org.apache.hadoop.hdds.utils.db.managed.ManagedReadOptions.<init>(ManagedReadOptions.java:30)
org.apache.hadoop.hdds.utils.db.RocksDatabase.newIterator(RocksDatabase.java:778)
org.apache.hadoop.hdds.utils.db.RDBTable.iterator(RDBTable.java:232)
org.apache.hadoop.hdds.utils.db.TypedTable.iterator(TypedTable.java:419)
org.apache.hadoop.hdds.utils.db.TypedTable.iterator(TypedTable.java:410)
org.apache.hadoop.hdds.utils.db.TypedTable.iterator(TypedTable.java:55)
org.apache.hadoop.ozone.om.service.DirectoryDeletingService$DirDeletingTask.call(DirectoryDeletingService.java:169)
org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:140)

from:

org.apache.hadoop.fs.ozone.TestRootedDDSWithFSO -- Time elapsed: 23.00 s <<< FAILURE!
java.lang.AssertionError: Found 1 leaked objects, check logs
	at org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectMetrics.assertNoLeaks(ManagedRocksObjectMetrics.java:61)
	at org.apache.hadoop.ozone.MiniOzoneClusterImpl.shutdown(MiniOzoneClusterImpl.java:457)
	at org.apache.hadoop.fs.ozone.TestRootedDDSWithFSO.teardown(TestRootedDDSWithFSO.java:123)

This patch proposes to keep track of the ReadOptions (and possibly other objects, e.g. lower/upper boundary Slice) with the ManagedIterator, and close them at the same time when the iterator is closed.

https://issues.apache.org/jira/browse/HDDS-10134

How was this patch tested?

TestRootedDDSWithFSO passed in 10x10 runs (but it also passed with master, so more repetitions may be needed).

Regular CI:
https://github.com/adoroszlai/ozone/actions/runs/7539928782

@adoroszlai adoroszlai self-assigned this Jan 16, 2024
@duongkame
Copy link
Contributor

Is the error happening with the latest code on master? I don't think we're comfortable not understanding how this happens, tbh (I tried, but I couldn't comprehend it either).

Also, is there a full log file of the test failure? There may be something else, e.g. error closing the resource.

@adoroszlai
Copy link
Contributor Author

Is the error happening with the latest code on master?

Yes.

Also, is there a full log file of the test failure?

Attached to HDDS-10134. I don't see any errors in it, but please double-check.

@szetszwo
Copy link
Contributor

I suspect the case is in close(); see

public void close() {
super.close();
leakTracker.close();
}

If super.close() throws an exception, leakTracker won't be closed. If it is the case, try-finally will fix it.

@szetszwo
Copy link
Contributor

@adoroszlai , are we able to reproduce the failure by repeatedly running it? If yes, we can test whether try-finally could fix it.

@adoroszlai
Copy link
Contributor Author

Thanks @szetszwo for the idea about try-finally, I'll add it. I was not able to reproduce the problem yet by repeated runs.

@adoroszlai adoroszlai marked this pull request as ready for review January 18, 2024 10:22
@adoroszlai
Copy link
Contributor Author

Updated the PR to ensure leakTracker.close() is not skipped if underlying object's close() (or similar) fails. This should avoid false positives in leak detection. If we see similar problem in the future, at least we have ruled this out.

The original change (to keep ReadOptions open until iterator is closed) is dropped for now.

Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@adoroszlai adoroszlai merged commit 0fa0671 into apache:master Jan 19, 2024
45 checks passed
@adoroszlai adoroszlai deleted the HDDS-10134 branch January 19, 2024 07:48
@adoroszlai
Copy link
Contributor Author

Thanks @duongkame, @szetszwo for the review.

try {
super.close();
} finally {
leakTracker.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late comment @adoroszlai. We could've just put leakTracker before the main close. This would mean that leakTracker is meant to track if resource closure is invoked by the application code, regardless of the result (success or failure).

  public void close() {
   leakTracker.close();
   super.close();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @duongkame for the suggestion. I had the same idea, but preferred the one in the PR: a problem in leakTracker.close() might prevent release of the underlying (real) resource. try-finally is not that much more complex.

Tejaskriya pushed a commit to Tejaskriya/ozone that referenced this pull request Jan 24, 2024
adoroszlai added a commit to adoroszlai/ozone that referenced this pull request Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants