New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAPREDUCE-7431. ShuffleHandler refactor and fix after Netty4 upgrade. #5311
Conversation
Change-Id: Ifedad2fae1ddd8f22623b5d44875b20a3b3fd318
Change-Id: Iaf37e847d689c6b46a5d819f3397865e1d2c05d7
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Change-Id: I1d60976cc8113fc9c5f736de6869f40e93b5063a
💔 -1 overall
This message was automatically generated. |
Change-Id: I3f17d76b3629a67ec8846327fed30fc927b10ec7
🎊 +1 overall
This message was automatically generated. |
boolean keepAliveParam = false; | ||
if (keepAliveList != null && keepAliveList.size() == 1) { | ||
keepAliveParam = Boolean.parseBoolean(keepAliveList.get(0)); | ||
if (LOG.isDebugEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this if is required here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it was used like this to perform better when DEBUG log is not activated. (The log parameters are not evaluated). This is just copy-pasta code from the original solution, no change here.
} | ||
|
||
if (mapIds == null || reduceQ == null || jobQ == null) { | ||
sendError(ctx, "Required param job, map and reduce", BAD_REQUEST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some error log can be added here what was null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change here from my side.
new ShuffleHeader(mapId, info.partLength, info.rawLength, reduce))); | ||
final File spillFile = | ||
new File(mapOutputInfo.mapOutputFileName.toString()); | ||
RandomAccessFile spill = SecureIOUtils.openForRandomRead(spillFile, "r", user, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FadvisedFileRegion/FadvisedChunkedFile closes the file.
(The openForRandomRead() could be moved to each Fadvised() constructor to make it a bit harder to make a mistake with it, but the current code can't throw and exception between openForRandomRead and giving it to Fadvised constructors.)
log4j.logger.io.netty=TRACE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can stay as INFO, to reduce the build test times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't log that much at all.
+1, non-binding |
Thanks @tomicooler, the patch looks good to me, merged to trunk. Thanks @K0K0V0K for the review. |
the test is failing now: Though it passes locally for me.
anyone planning to fix or should we revert this? |
|
Thank you Tamas for volunteering!!! |
…tty4 upgrade. (apache#5311) This commit does not contain secrets Change-Id: Ic693413cba6aed4cea3b645c4b65c559450ec40d
…ter Netty4 upgrade. (apache#5311) (cherry picked from commit 151b71d) Change-Id: I5575dbebaf379aa908312ae0b9cda3f81a694738
…ter Netty4 upgrade. (apache#5311) (cherry picked from commit 151b71d) Change-Id: I5575dbebaf379aa908312ae0b9cda3f81a694738
This list captures the current state of non-upstream changes in our branch that are not in the public repo. ---Changes cherry-picked to branch-3.3.6-dremio from branch-3.3.2-dremio--- The below changes were on branch-3.3.2-dremio and needed to be brought to branch-3.3.6-dremio to prevent regressing scenarios these changes addressed. HADOOP-18928: S3AFileSystem URL encodes twice where Path has trailing / (proposed) DX-69726: Bumping okie from 1.6.0 to 3.4.0 (CVE-2023-3635) DX-69726: Bumping okie from 1.6.0 to 3.4.0 (CVE-2023-3635) DX-66470: Allow for custom shared key signer for ABFS DX-66673: Backport HADOOP-18602. Remove netty3 dependency DX-66673: Backport MAPREDUCE-7434. Fix ShuffleHandler tests. Contributed by Tamas Domok DX-66673: Backport MAPREDUCE-7431. ShuffleHandler refactor and fix after Netty4 upgrade. (apache#5311) DX-66673: Backport HADOOP-15327. Upgrade MR ShuffleHandler to use Netty4 apache#3259. Contributed by Szilard Nemeth. DX-66673: Backport HADOOP-17115. Replace Guava Sets usage by Hadoop's own Sets in hadoop-common and hadoop-tools (apache#2985) HADOOP-18676. jettison dependency override in hadoop-common lib DX-52816: Downgrade azure-data-lake-store-sdk to 2.3.3 to support dremio version. DX-52701: Remove node based module by Naveen Kumar DX-32012: Adding BatchList Iterator for ListFiles by “ajmeera.nagaraju” DX-18552: Make file status check optional in S3AFileSystem create() Add flag to skip native tests by Laurent Goujon DX-21904: Support S3 requester-pays headers by Brandon Huang DX-21471: Fix checking of use of OAuth credentials with AzureNativeFileSystem DX-19314: make new kms format configurable DX-17058 Add FileSystem to META-INF/services DX-17317 Fix incorrect parameter passed into AzureADAuthenticator-getTokenUsingClientCreds by TiffanyLam DX-17276 Azure AD support for StorageV1 by James Duong DX-17276 Add Azure AD support in Dremio's hadoop-azure library for Storage V1 support unwraps BindException in HttpServer2 ---Changes picked up by moving to 3.3.6--- The below changes were changes on branch-3.3.2-dremio that did not need to come to branch-3.3.6-dremio as the public 3.3.6 branch contained the fixes already. DX-67500: Backport HADOOP-18136. Verify FileUtils.unTar() handling of missing .tar files. DX-66673: Backport HADOOP-18079. Upgrade Netty to 4.1.77. (apache#3977) DX-66673: Backport HADOOP-11245. Update NFS gateway to use Netty4 (apache#2832) (apache#4997) DX-64051: Bump jettison from 1.1 to 1.5.4 in hadoop/branch-3.3.2-dremio DX-64051: Bump jettison from 1.1 to 1.5.4 in hadoop/branch-3.3.2-dremio DX-63800 Bump commons-net from 3.6 to 3.9.0 to address CVE-2021-37533 DX-27168: removing org.codehaus.jackson Change-Id: I6cdb968e33826105caff96e1c3d2c6313a550689
Change-Id: Ifedad2fae1ddd8f22623b5d44875b20a3b3fd318
Description of PR
Key takeaways:
ResourceLeakDetector.setLevel(PARANOID)
Check Jira for more details: https://issues.apache.org/jira/browse/MAPREDUCE-7431
How was this patch tested?
The new unit tests are better at validating the content of the response.
Generated a 10 GB input.
{code}
hdfs dfs -rm -r -skipTrash /tmp/sort_input
yarn jar hadoop-3.4.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar randomwriter "-Dmapreduce.randomwriter.totalbytes=10000000000" /tmp/sort_input
{code}
Then sorted it, tested with -r 1 and -r 40 too.
{code}
hdfs dfs -rm -r -skipTrash /tmp/sort_output
yarn jar hadoop-3.4.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar sort -Dmapreduce.job.reduce.slowstart.completedmaps=1 -r 40 /tmp/sort_input /tmp/sort_output | tee sort_app_output.txt
{code}
Tested with SSL enabled/disabled too.
I saw no performance degradation during my measurements (compared to the latest netty4 version).
For code changes:
Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
I added ch.qos.logback for just a test dependency. LGPL.
If applicable, have you updated the
LICENSE
,LICENSE-binary
,NOTICE-binary
files?