Skip to content

[Problem] Inconsistent blocks when reading shuffle data #198

Description

@zuston

I found some tasks of spark jobs will throw the exceptions that the inconsistent blocks number. The stacktrace is as follows

22/09/03 15:29:21 ERROR Executor: Exception in task 330.0 in stage 9.0 (TID 59001)
org.apache.uniffle.common.exception.RssException: Blocks read inconsistent: expected 30000 blocks, actual 15636 blocks
	at org.apache.uniffle.client.impl.ShuffleReadClientImpl.checkProcessedBlockIds(ShuffleReadClientImpl.java:215)
	at org.apache.spark.shuffle.reader.RssShuffleDataIterator.hasNext(RssShuffleDataIterator.java:135)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)

I didn't find any error/warn log in shuffle server which stored the corresponding partition data.

We dont set any replica config and directly use the MEMORY_LOCALFILE storageType. Does this exception caused by the disk error?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions