Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-8637: WriteBatch objects leak off-heap memory #7050

Merged
merged 2 commits into from Jul 15, 2019

Conversation

@ableegoldman
Copy link
Contributor

commented Jul 8, 2019

Should be cherry-picked back to 2.3 (picked from 2.2 to 2.1 in 7077 )

@ableegoldman

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

@bbejeck bbejeck added the streams label Jul 10, 2019

@@ -403,10 +403,10 @@ public synchronized void close() {
}

dbAccessor.close();
db.close();

This comment has been minimized.

Copy link
@mjsax

mjsax Jul 10, 2019

Member

Can we add a comment and maybe link to RocksDB docs to clarify the correct order of close() calls? Otherwise, we might forget about it and reorder accidentally...

Also: this may affect more versions compared to the memory leak? Should we extract it into it's own PR to easily backport?

This comment has been minimized.

Copy link
@pkleindl

pkleindl Jul 11, 2019

Contributor

Looking again at the docs https://github.com/facebook/rocksdb/wiki/RocksJava-Basics I am still not sure if the order is now completely correct.
The doc says ColumnFamilyHandle > RocksDB > DBOptions > ColumnFamilyOptions
but the code does ColumnFamilyHandle > ColumnFamilyOptions > DBOptions > RocksDB due to userSpecifiedOptions.
Additionally, only some are set to null afterwards, does this have a special reason?

This comment has been minimized.

Copy link
@ableegoldman

ableegoldman Jul 11, 2019

Author Contributor

This PR is moving the db.close() to before userSpecifiedOptions.close() so aren't we now closing the db in the right order? That said, the docs do suggest we should flip the order in which DBOptions & ColumnFamilyOptions are closed in RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapter#close (which, side note, is an obscenely long name...)

But Matthias is right, this should be split into a separate PR (and well documented) -- will open another one

Also, I think on this branch everything is set to null -- but if you were looking at the 2.2 branch, yeah I did forget to set filter to null. We were rushing to get that in before 2.2.1 was cut 😄

This comment has been minimized.

Copy link
@mjsax

mjsax Jul 11, 2019

Member

Thanks for splitting out #7076 @ableegoldman

Please review the new PR, too, @pkleindl

This comment has been minimized.

Copy link
@pkleindl

pkleindl Jul 12, 2019

Contributor

Will do.
My question regarding the null setting was about the DBOptions & ColumnFamilyOptions in RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapter, why they are not set to null too.
As for the long name, it's probably a german language thing ;-)

@@ -233,6 +233,7 @@ void restoreAllInternal(final Collection<KeyValue<byte[], byte[]>> records) {
final S segment = entry.getKey();
final WriteBatch batch = entry.getValue();
segment.write(batch);
batch.close();

This comment has been minimized.

Copy link
@mjsax

mjsax Jul 10, 2019

Member

Can we add a test this this?

This comment has been minimized.

Copy link
@ableegoldman

ableegoldman Jul 11, 2019

Author Contributor

Not really sure what a good way to test this would be...thoughts?

This comment has been minimized.

Copy link
@mjsax

mjsax Jul 11, 2019

Member

I would be tricky. We would need to intercept the call to new WriteBatch within getWriteBatches() using power-mock and later verify if close() was called on the mock. Might not be worth it...

@bbejeck

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

retest this please

2 similar comments
@bbejeck

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

retest this please

@bbejeck

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

retest this please

@mjsax

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

Java11 2.12 timed out. Java11 2.13 and Java8 failed with flaky tests.

Retest this please.

@ableegoldman

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2019

Java 8 passed, Java11 2.13 failed with flaky tests. retest this, please

@mjsax

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

Java8 and 11/2.12 passed. 11/2.13 failed (test result not available any longer).

Retest this please.

@guozhangwang guozhangwang merged commit 8cefe97 into apache:trunk Jul 15, 2019

0 of 3 checks passed

JDK 8 and Scala 2.11 FAILURE No test results found.
Details
JDK 11 and Scala 2.12 Build started for merge commit.
Details
JDK 11 and Scala 2.13 Build started for merge commit.
Details
guozhangwang added a commit that referenced this pull request Jul 15, 2019
KAFKA-8637: WriteBatch objects leak off-heap memory (#7050)
Should be cherry-picked back to 2.3 (picked from 2.2 to 2.1 in 7077 )

Reviewers: pkleindl <44436474+pkleindl@users.noreply.github.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
@guozhangwang

This comment has been minimized.

Copy link
Contributor

commented Jul 15, 2019

Cherry-picked to 2.3 as well.

ijuma added a commit to confluentinc/kafka that referenced this pull request Jul 16, 2019
KAFKA-8637: WriteBatch objects leak off-heap memory (apache#7050)
Should be cherry-picked back to 2.3 (picked from 2.2 to 2.1 in 7077 )

Reviewers: pkleindl <44436474+pkleindl@users.noreply.github.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
ijuma added a commit to confluentinc/kafka that referenced this pull request Jul 20, 2019
Merge remote-tracking branch 'apache-github/2.3' into ccs-2.3
* apache-github/2.3:
  MINOR: Update documentation for enabling optimizations (apache#7099)
  MINOR: Remove stale streams producer retry default docs. (apache#6844)
  KAFKA-8635; Skip client poll in Sender loop when no request is sent (apache#7085)
  KAFKA-8615: Change to track partition time breaks TimestampExtractor (apache#7054)
  KAFKA-8670; Fix exception for kafka-topics.sh --describe without --topic mentioned (apache#7094)
  KAFKA-8602: Separate PR for 2.3 branch (apache#7092)
  KAFKA-8530; Check for topic authorization errors in OffsetFetch response (apache#6928)
  KAFKA-8662; Fix producer metadata error handling and consumer manual assignment (apache#7086)
  KAFKA-8637: WriteBatch objects leak off-heap memory (apache#7050)
  KAFKA-8620: fix NPE due to race condition during shutdown while rebalancing (apache#7021)
  HOT FIX: close RocksDB objects in correct order (apache#7076)
  KAFKA-7157: Fix handling of nulls in TimestampConverter (apache#7070)
  KAFKA-6605: Fix NPE in Flatten when optional Struct is null (apache#5705)
  Fixes #8198 KStreams testing docs use non-existent method pipe (apache#6678)
  KAFKA-5998: fix checkpointableOffsets handling (apache#7030)
  KAFKA-8653; Default rebalance timeout to session timeout for JoinGroup v0 (apache#7072)
  KAFKA-8591; WorkerConfigTransformer NPE on connector configuration reloading (apache#6991)
  MINOR: add upgrade text (apache#7013)
  Bump version to 2.3.1-SNAPSHOT
xiowu0 added a commit to linkedin/kafka that referenced this pull request Aug 22, 2019
[LI-CHERRY-PICK] [a32918b] KAFKA-8637: WriteBatch objects leak off-he…
…ap memory (apache#7050)

TICKET = KAFKA-8637
LI_DESCRIPTION =
EXIT_CRITERIA = HASH [a32918b]
ORIGINAL_DESCRIPTION =

Should be cherry-picked back to 2.3 (picked from 2.2 to 2.1 in 7077 )

Reviewers: pkleindl <44436474+pkleindl@users.noreply.github.com>, Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
(cherry picked from commit a32918b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.