-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37847][CORE][SHUFFLE] PushBlockStreamCallback#isStale should check null to avoid NPE #35146
Conversation
Can one of the admins verify this patch? |
Given that We also need to ensure we have tests which check for stale but not too late as well - this specific codepath is an excellent test case for it. |
I'm not sure if such change will break the semantic of |
There is no issues with making the change, please go ahead. |
.../network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
Outdated
Show resolved
Hide resolved
…first to avoid NPE" This reverts commit 30be05b.
LGTM, but this comment is not yet addressed right? |
LGTM. @venkata91 I think that comment has been reverted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor nit. Otherwise looks good to me
.../network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
Outdated
Show resolved
Hide resolved
Merged to master. |
…heck null to avoid NPE ### What changes were proposed in this pull request? Check `null` in `isStale` to avoid NPE. ### Why are the changes needed? There is a chance that late push shuffle block request invokes `PushBlockStreamCallback#onData` after the merged partition finalized, which causes NPE. ``` 2022-01-07 21:06:14,464 INFO shuffle.RemoteBlockPushResolver: shuffle partition application_1640143179334_0149_-1 102 6922, chunk_size=1, meta_length=138, data_length=112632 2022-01-07 21:06:14,615 ERROR shuffle.RemoteBlockPushResolver: Encountered issue when merging shufflePush_102_0_279_6922 java.lang.NullPointerException at org.apache.spark.network.shuffle.RemoteBlockPushResolver$AppShuffleMergePartitionsInfo.access$200(RemoteBlockPushResolver.java:1017) at org.apache.spark.network.shuffle.RemoteBlockPushResolver$PushBlockStreamCallback.isStale(RemoteBlockPushResolver.java:806) at org.apache.spark.network.shuffle.RemoteBlockPushResolver$PushBlockStreamCallback.onData(RemoteBlockPushResolver.java:840) at org.apache.spark.network.server.TransportRequestHandler$3.onData(TransportRequestHandler.java:209) at org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79) at org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at org.sparkproject.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at org.sparkproject.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at org.sparkproject.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at org.sparkproject.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722) at org.sparkproject.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) at org.sparkproject.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) at org.sparkproject.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) at org.sparkproject.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at org.sparkproject.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at org.sparkproject.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) ``` `isTooLate` checks null but `isStale` does not, so check `isTooLate` first to avoid NPE ```java private boolean isTooLate( AppShuffleMergePartitionsInfo appShuffleMergePartitionsInfo, int reduceId) { return null == appShuffleMergePartitionsInfo || INDETERMINATE_SHUFFLE_FINALIZED == appShuffleMergePartitionsInfo.shuffleMergePartitions || !appShuffleMergePartitionsInfo.shuffleMergePartitions.containsKey(reduceId); } ``` ### Does this PR introduce _any_ user-facing change? Bugfix, to avoid NPE in Yarn ESS. ### How was this patch tested? I don't think it's easy to write a unit test for this issue based on current code, since it's a minor change, use exsiting ut to ensue the change doesn't break the current functionalities. Closes apache#35146 from pan3793/SPARK-37847. Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
What changes were proposed in this pull request?
Check
null
inisStale
to avoid NPE.Why are the changes needed?
There is a chance that late push shuffle block request invokes
PushBlockStreamCallback#onData
after the merged partition finalized, which causes NPE.isTooLate
checks null butisStale
does not, so checkisTooLate
first to avoid NPEDoes this PR introduce any user-facing change?
Bugfix, to avoid NPE in Yarn ESS.
How was this patch tested?
I don't think it's easy to write a unit test for this issue based on current code, since it's a minor change, use exsiting ut to ensue the change doesn't break the current functionalities.