Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] MR Client may lost data or throw exception when rss.storage.type without MEMORY. #886

Closed
3 tasks done
zhengchenyu opened this issue May 16, 2023 · 3 comments · Fixed by #887
Closed
3 tasks done

Comments

@zhengchenyu
Copy link
Collaborator

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

1 Bug description

When rss.storage.type without MEMORY, client-mr may raise exception as below:

2023-05-16 18:58:52,191 INFO mapreduce.Job: Task Id : attempt_1683514063269_3300_r_000025_0, Status : FAILED
Error: org.apache.uniffle.common.exception.RssException: Blocks read inconsistent: expected 13 blocks, actual 8 blocks
	at org.apache.uniffle.common.util.RssUtils.checkProcessedBlockIds(RssUtils.java:287)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.checkProcessedBlockIds(ShuffleReadClientImpl.java:253)
	at org.apache.hadoop.mapreduce.task.reduce.RssFetcher.copyFromRssServer(RssFetcher.java:193)
	at org.apache.hadoop.mapreduce.task.reduce.RssFetcher.fetchAllRssBlocks(RssFetcher.java:133)
	at org.apache.hadoop.mapreduce.task.reduce.RssShuffle.run(RssShuffle.java:202)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:377)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

In fact, the problem happen firstly in our internal version on client-tez module. Below is tez error stack:

2023-05-16 18:35:18,591 INFO impl.ComposedClientReadHandler: Failed to read shuffle data caused by
org.apache.uniffle.common.exception.RssException: Can't get FileSystem for hdfs://devtest-ns-fed/uniffle-rss/appattempt_1684233307050_0001_000000/1/0-0
	at org.apache.uniffle.storage.handler.impl.HdfsClientReadHandler.init(HdfsClientReadHandler.java:113)
	at org.apache.uniffle.storage.handler.impl.HdfsClientReadHandler.readShuffleData(HdfsClientReadHandler.java:162)
	at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:101)
	at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:129)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.read(ShuffleReadClientImpl.java:238)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.readShuffleBlockData(ShuffleReadClientImpl.java:162)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.copyFromRssServer(RssFetcherOrderedGrouped.java:166)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.fetchAllRssBlocks(RssFetcherOrderedGrouped.java:151)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.callInternal(RssFetcherOrderedGrouped.java:296)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.callInternal(RssFetcherOrderedGrouped.java:29)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at org.apache.uniffle.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at org.apache.uniffle.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at org.apache.uniffle.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2023-05-16 18:35:18,595 ERROR rss.RssShuffleScheduler: Summation: Fetcher failed with error
org.apache.uniffle.common.exception.RssFetchFailedException: Failed to read shuffle data from WARM handler
	at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:109)
	at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:129)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.read(ShuffleReadClientImpl.java:238)
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.readShuffleBlockData(ShuffleReadClientImpl.java:162)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.copyFromRssServer(RssFetcherOrderedGrouped.java:166)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.fetchAllRssBlocks(RssFetcherOrderedGrouped.java:151)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.callInternal(RssFetcherOrderedGrouped.java:296)
	at org.apache.tez.runtime.library.common.shuffle.rss.RssFetcherOrderedGrouped.callInternal(RssFetcherOrderedGrouped.java:29)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at org.apache.uniffle.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at org.apache.uniffle.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at org.apache.uniffle.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.uniffle.common.exception.RssException: Can't get FileSystem for hdfs://devtest-ns-fed/uniffle-rss/appattempt_1684233307050_0001_000000/1/0-0
	at org.apache.uniffle.storage.handler.impl.HdfsClientReadHandler.init(HdfsClientReadHandler.java:113)
	at org.apache.uniffle.storage.handler.impl.HdfsClientReadHandler.readShuffleData(HdfsClientReadHandler.java:162)
	at org.apache.uniffle.storage.handler.impl.ComposedClientReadHandler.readShuffleData(ComposedClientReadHandler.java:101)
	... 14 more

In fact, the reproduce probability is very high in tez-local mode. The reproduce probability is low in mr on yarn mode, then I sleep 1 second before shuffleWriteClient.sendShuffleData in SortWriteBufferManager, the The reproduce probability is very high.

2 Reason

When the bug happen, the value of expect committed in below log is a random value.

[INFO] 2023-05-16 19:07:20,436 Grpc-272 ShuffleTaskManager commitShuffle - Checking commit result for appId[appattempt_1683514060868_9741_000001], shuffleId[0], expect committed[390], remain[390]

Here we know that shuffleWriteClient.sendShuffleData run in a async thread. when we call finishShuffle, sendShuffleData may not happen, so some data will never flush in shuffle server.

Affects Version(s)

master

Uniffle Server Log Output

No response

Uniffle Engine Log Output

No response

Uniffle Server Configurations

No response

Uniffle Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@jerqi
Copy link
Contributor

jerqi commented May 16, 2023

We recommend to use MEMORY_LOCALFILE or MEMORY_LOCALFILE_HDFS storageType in your production environment. But it's also ok to fix this issue. Have you developed Uniffle on Tez? Some people in huolala are focusing on it. We have a meeting to discuss the issues tomorrow. Do you have interest on it? And we have wechat group to communicate.
mmqrcode1684237193864

@zhengchenyu
Copy link
Collaborator Author

@jerqi I will use MEMORY_LOCALFILE in production environment. LOCALFILE_HDFS is just my develop environment.
Oh, I just know about that some other focus on client-tez. Because our hadoop version is 3.2.1, incompatible with master, so I works it in private before.
BTW, TEZ is our primary hive engine in BEIKE, nearly 150K apps run in one day. I am very interested on it.

@jerqi
Copy link
Contributor

jerqi commented May 16, 2023

@jerqi I will use MEMORY_LOCALFILE in production environment. LOCALFILE_HDFS is just my develop environment. Oh, I just know about that some other focus on client-tez. Because our hadoop version is 3.2.1, incompatible with master, so I works it in private before. BTW, TEZ is our primary hive engine in BEIKE, nearly 150K apps run in one day. I am very interested on it.

For hadoop version 3.1, you can add an extra profile for it.

jerqi pushed a commit that referenced this issue May 19, 2023
…torage.type without MEMORY. (#887)

### What changes were proposed in this pull request?

Make sure finishShuffle after send all shuffle data.

### Why are the changes needed?

If type without MEMORY, some data will never flush.

### How was this patch tested?

I test in two mode:
* Tez local debug mode
* MR on yarn mode

Add new UT

Co-authored-by: zhengchenyu001 <zhengchenyu001@ke.com>
jerqi pushed a commit that referenced this issue May 19, 2023
…torage.type without MEMORY. (#887)

### What changes were proposed in this pull request?

Make sure finishShuffle after send all shuffle data.

### Why are the changes needed?

If type without MEMORY, some data will never flush.

### How was this patch tested?

I test in two mode:
* Tez local debug mode
* MR on yarn mode

Add new UT

Co-authored-by: zhengchenyu001 <zhengchenyu001@ke.com>
jerqi added a commit that referenced this issue May 19, 2023
…en rss.storage.type without MEMORY. (#887)"

This reverts commit 4423b43.
xianjingfeng pushed a commit to xianjingfeng/incubator-uniffle that referenced this issue Jun 20, 2023
…ion when rss.storage.type without MEMORY. (apache#887)"

This reverts commit 4423b43.
zhengchenyu added a commit to zhengchenyu/incubator-uniffle that referenced this issue Jun 27, 2023
jerqi pushed a commit that referenced this issue Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants