[CELEBORN-1567] Support throw FetchFailedException when Data corruption detected#2691
[CELEBORN-1567] Support throw FetchFailedException when Data corruption detected#2691cxzl25 wants to merge 2 commits intoapache:mainfrom
Conversation
client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java
Outdated
Show resolved
Hide resolved
mridulm
left a comment
There was a problem hiding this comment.
Looks good to me.
Btw, support for checksum/validation of data would a good feature to add IMO ... there were corner cases where this helped catch issues in spark (instead of relying on compression/deserialization failing ... which need not always happen).
|
I have a question: should we retry fetching another replication before throwing a FetchFailedException when the conf |
This is not necessarily safe, because the Task may have read part of the data, so it is safer to retry the Task. This is how Spark handles it. |
It looks like we've already done this. |
|
Thank you, merging to main(v0.6.0)/branch-0.5(v0.5.2)/branch-0.4(v0.4.3). |
…on detected ### What changes were proposed in this pull request? ### Why are the changes needed? #2655 (review) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? GA Closes #2691 from cxzl25/CELEBORN-1567. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: Shaoyun Chen <csy@apache.org> (cherry picked from commit b8f275d) Signed-off-by: Shaoyun Chen <csy@apache.org>
…on detected ### What changes were proposed in this pull request? ### Why are the changes needed? #2655 (review) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? GA Closes #2691 from cxzl25/CELEBORN-1567. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: Shaoyun Chen <csy@apache.org> (cherry picked from commit b8f275d)
…on detected ### What changes were proposed in this pull request? ### Why are the changes needed? apache#2655 (review) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? GA Closes apache#2691 from cxzl25/CELEBORN-1567. Authored-by: sychen <sychen@ctrip.com> Signed-off-by: Shaoyun Chen <csy@apache.org>
What changes were proposed in this pull request?
Why are the changes needed?
#2655 (review)
Does this PR introduce any user-facing change?
No
How was this patch tested?
GA