Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-16854. ABFS: Fix for the OutOfMemoryException in AbfsOutputStream #2014

Closed
wants to merge 6 commits into from

Conversation

bilaharith
Copy link
Contributor

@bilaharith bilaharith commented May 12, 2020

Currently in environments where memory is restricted, It is observed at times a large number of buffers needed for the execution and the same and are kept within the bufferpool for ever this lead to out Of Memory exceptions.
This change addresses certain improvemnts to AbfsOutputStream which help fix the issue.

  • Getting rid of the ElasticByteBufferPool. This actually hold up to the buffers created for ever. No upper bound for the number of buffers that can be contained within the pool. This could lead to high memory consumption.
  • A new BufferPool is implemented to over come the shortcomings of ElasticByteBufferPool. For more information find the doc attached with the JIRA.
  • Sharing the Threadpool across all the AbfsOutputStream instances.
  • Sharing the buffer pool across all the AbfsOutputStream instances

Driver test results using accounts in Central India
mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify

Account with HNS Support
[INFO] Tests run: 67, Failures: 0, Errors: 0, Skipped: 0
[WARNING] Tests run: 426, Failures: 0, Errors: 0, Skipped: 66
[WARNING] Tests run: 206, Failures: 0, Errors: 0, Skipped: 24

Account without HNS support
[INFO] Tests run: 67, Failures: 0, Errors: 0, Skipped: 0
[WARNING] Tests run: 426, Failures: 0, Errors: 0, Skipped: 240
[WARNING] Tests run: 206, Failures: 0, Errors: 0, Skipped: 24

@bilaharith bilaharith marked this pull request as draft May 12, 2020 18:52
@bilaharith bilaharith marked this pull request as ready for review May 13, 2020 03:53
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 22m 19s trunk passed
+1 💚 compile 0m 27s trunk passed
+1 💚 checkstyle 0m 22s trunk passed
+1 💚 mvnsite 0m 34s trunk passed
+1 💚 shadedclient 16m 29s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 23s trunk passed
+0 🆗 spotbugs 0m 55s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 0m 52s trunk passed
-0 ⚠️ patch 1m 9s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 34s the patch passed
+1 💚 compile 0m 26s the patch passed
+1 💚 javac 0m 26s the patch passed
+1 💚 checkstyle 0m 15s the patch passed
+1 💚 mvnsite 0m 26s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 shadedclient 15m 51s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 21s the patch passed
-1 ❌ findbugs 1m 1s hadoop-tools/hadoop-azure generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0)
_ Other Tests _
+1 💚 unit 120m 51s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 39s The patch does not generate ASF License warnings.
185m 5s
Reason Tests
FindBugs module:hadoop-tools/hadoop-azure
Integral value cast to double and then passed to Math.ceil in org.apache.hadoop.fs.azurebfs.services.AbfsByteBufferPool.isPossibleToIssueNewBuffer() At AbfsByteBufferPool.java:and then passed to Math.ceil in org.apache.hadoop.fs.azurebfs.services.AbfsByteBufferPool.isPossibleToIssueNewBuffer() At AbfsByteBufferPool.java:[line 86]
Inconsistent synchronization of org.apache.hadoop.fs.azurebfs.services.AbfsByteBufferPool.numBuffersInUse; locked 81% of time Unsynchronized access at AbfsByteBufferPool.java:81% of time Unsynchronized access at AbfsByteBufferPool.java:[line 88]
Exceptional return value of java.util.concurrent.ArrayBlockingQueue.offer(Object) ignored in org.apache.hadoop.fs.azurebfs.services.AbfsByteBufferPool.release(byte[]) At AbfsByteBufferPool.java:ignored in org.apache.hadoop.fs.azurebfs.services.AbfsByteBufferPool.release(byte[]) At AbfsByteBufferPool.java:[line 144]
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-2014/2/artifact/out/Dockerfile
GITHUB PR #2014
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
uname Linux 3552590ba86c 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 743c2e9
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
findbugs https://builds.apache.org/job/hadoop-multibranch/job/PR-2014/2/artifact/out/new-findbugs-hadoop-tools_hadoop-azure.html
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-2014/2/testReport/
Max. process+thread count 334 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-2014/2/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 22m 19s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 0m 51s Maven dependency ordering for branch
+1 💚 mvninstall 18m 52s trunk passed
+1 💚 compile 17m 3s trunk passed
+1 💚 checkstyle 2m 44s trunk passed
+1 💚 mvnsite 1m 24s trunk passed
+1 💚 shadedclient 18m 33s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 12s trunk passed
+0 🆗 spotbugs 1m 5s Used deprecated FindBugs config; considering switching to SpotBugs.
+0 🆗 findbugs 0m 36s branch/hadoop-project no findbugs output file (findbugsXml.xml)
-0 ⚠️ patch 1m 26s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 29s Maven dependency ordering for patch
+1 💚 mvninstall 0m 42s the patch passed
-1 ❌ compile 15m 15s root in the patch failed.
-1 ❌ javac 15m 15s root in the patch failed.
+1 💚 checkstyle 2m 40s the patch passed
+1 💚 mvnsite 1m 23s the patch passed
-1 ❌ whitespace 0m 0s The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 xml 0m 3s The patch has no ill-formed XML file.
+1 💚 shadedclient 14m 4s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 14s the patch passed
+0 🆗 findbugs 0m 35s hadoop-project has no data from findbugs
_ Other Tests _
+1 💚 unit 0m 33s hadoop-project in the patch passed.
+1 💚 unit 120m 51s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 53s The patch does not generate ASF License warnings.
244m 19s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-2014/3/artifact/out/Dockerfile
GITHUB PR #2014
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle
uname Linux a95c7e5558b3 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 108ecf9
Default Java Private Build-1.8.0_252-8u252-b09-1~18.04-b09
compile https://builds.apache.org/job/hadoop-multibranch/job/PR-2014/3/artifact/out/patch-compile-root.txt
javac https://builds.apache.org/job/hadoop-multibranch/job/PR-2014/3/artifact/out/patch-compile-root.txt
whitespace https://builds.apache.org/job/hadoop-multibranch/job/PR-2014/3/artifact/out/whitespace-eol.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-2014/3/testReport/
Max. process+thread count 414 (vs. ulimit of 5500)
modules C: hadoop-project hadoop-tools/hadoop-azure U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-2014/3/console
versions git=2.17.1 maven=3.6.0 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@bilaharith
Copy link
Contributor Author

Driver test results using accounts in Central India
mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify

Account with HNS Support
[INFO] Tests run: 71, Failures: 0, Errors: 0, Skipped: 0
[WARNING] Tests run: 434, Failures: 0, Errors: 0, Skipped: 74
[WARNING] Tests run: 206, Failures: 0, Errors: 0, Skipped: 24

Account without HNS support
[INFO] Tests run: 71, Failures: 0, Errors: 0, Skipped: 0
[WARNING] Tests run: 434 Failures: 0, Errors: 0, Skipped: 248
[WARNING] Tests run: 206, Failures: 0, Errors: 0, Skipped: 24

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial review = not looked at tests

@@ -172,6 +172,12 @@
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</dependency>
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. can you add to hadoop project pom and then refer here. its how we guarantee consistent versions.
  2. do we really need to add a new JAR into production just for annotations? if that is all it is for, maybe we could somehow avoid doing that

which annotations is it actually for? as VisibleForTesting is in guava

import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.MIN_VALUE_MAX_AZURE_WRITE_MEM_USAGE_PERCENTAGE;

/**
* Pool for byte[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. at ehd for javadoc in java 8

}

private synchronized boolean isPossibleToIssueNewBuffer() {
Runtime rt = Runtime.getRuntime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be done outside a sync block? it probably does a JNI call.

private synchronized boolean isPossibleToIssueNewBuffer() {
Runtime rt = Runtime.getRuntime();
int bufferCountByMaxFreeBuffers =
maxBuffersToPool + rt.availableProcessors();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store availableProcessors in constructor.


initWriteBufferPool(abfsOutputStreamContext);

ThreadFactory daemonThreadFactory = new ThreadFactory() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's some hadoop thread factory you should be able to lift. Also: name the threads

return null;
}));
}
for (Future<Void> futureTask : futureTasks) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see if you can use org.apache.hadoop.fs.impl.FutureIOSupport here. And somewhere there's a method to block waiting for futures to complete without doing it sequentally; I believe it is faster

@bilaharith bilaharith closed this Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants