Skip to content

[CELEBORN-1657] Throw exception to client when exception occurred in reading sort shuffle files#2818

Closed
FMX wants to merge 2 commits into
apache:mainfrom
FMX:b1657
Closed

[CELEBORN-1657] Throw exception to client when exception occurred in reading sort shuffle files#2818
FMX wants to merge 2 commits into
apache:mainfrom
FMX:b1657

Conversation

@FMX
Copy link
Copy Markdown
Contributor

@FMX FMX commented Oct 17, 2024

What changes were proposed in this pull request?

To report errors in sorting shuffle files.

Why are the changes needed?

Report errors to client as soon as possible when a shuffle file is corrupted.

Does this PR introduce any user-facing change?

NO.

How was this patch tested?

Cluster test.

@FMX FMX marked this pull request as draft October 17, 2024 06:14
@FMX FMX marked this pull request as ready for review October 17, 2024 06:28
Copy link
Copy Markdown
Contributor

@RexXiong RexXiong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better add an ut test for this

logger.error(
"Sorting shuffle file for " + fileId + " " + originFilePath + " failed, detail: ", e);
sortExceptions
.computeIfAbsent(fileId, v -> JavaUtils.newConcurrentHashMap())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fileid->shuffleKey

@RexXiong RexXiong changed the title [CELEBORN-1657] Fix IndexOutOfBoundsExceptions occurred in reading shuffle files [CELEBORN-1657] Throw exception to client when exception occurred in reading sort shuffle files Oct 17, 2024
@FMX FMX closed this Oct 17, 2024

if (sortExceptions.containsKey(shuffleKey)
&& sortExceptions.get(shuffleKey).containsKey(fileId)) {
throw sortExceptions.get(shuffleKey).get(fileId);
Copy link
Copy Markdown
Contributor

@mridulm mridulm Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be a concurrent cleanup (which can result in NPE) here ?
If yes, we should try to make this atomic, something like:

  AtomicReference<IOException> sortExceptionRef = new AtomicReference<>(null);
  sortExceptions.compute(shuffleKey,(key,  current) -> {
    if (null != current) {
      sortExceptionRef.set(current.get(fileId);
    }
    return current;
  });
  if (null != sortExceptionRef.get()) throw sortExceptionRef.get();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants