-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30228][BUILD][2.4] Update zstd-jni to 1.4.4-3 #31645
Conversation
cc @dongjoon-hyun FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also run ./dev/test-dependencies.sh --replace-manifest
?
Usually, we don't upgrade dependencies in released branches. But this is a correctness bug and I think we can make an exception here. |
ok to test |
OK |
cc @srowen as well |
Kubernetes integration test starting |
Yeah this one is fine. |
Kubernetes integration test status success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is marked as a correctness issue, do you think we can have a test case, @seayoun ?
Also, I'm wondering why 1.4.4-3? The target commit is in 1.3.5-x, too.
In the JIRA, I found the following, but the reason for the version choice is missing.
|
Test build #135466 has finished for PR 31645 at commit
|
It could be hard to reproduce zstd returning 0 bytes before EOF. If we can we'd mostly be testing what zstd tests already - the stream just never returns 0 bytes before done. If it's pretty hard to test, I think it's OK to merge without a specific test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
### What changes were proposed in this pull request? Change zstd-jni version to version 1.4.4-3 ### Why are the changes needed? This old zstd-jni(tag 1.3.3-2) has probability to read less data when shuffle read. The `ZstdInputStream` in zstd-jni(tag 1.3.3-2) maybe return 0 after a read function call, this doesn't meet the standard of `InputStream` and the `InputStream` will not return 0 unless len is 0; Spark will use a BufferedInputStream wrapped to ZstdInputStream, when ZstdInputStream read call return 0, BufferedInputStream will consider the 0 as the end of read and exit, this can lead data loss. zstd-jni issues: luben/zstd-jni#159 zstd-jni commits: luben/zstd-jni@7eec558 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Closes #31645 from seayoun/yuhaiyang_update_zstd_jni. Authored-by: yuhaiyang <yuhaiyang@yuhaiyangs-MacBook-Pro.local> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Merged to branch-2.4. |
Another issue for ZstdInputStream. It happends in 1.4.4-3 |
Thanks, @SeeyouN . You meant that another issue is fixed at 1.4.4-3, right? (Instead of
|
Change zstd-jni version to version 1.4.4-3 This old zstd-jni(tag 1.3.3-2) has probability to read less data when shuffle read. The `ZstdInputStream` in zstd-jni(tag 1.3.3-2) maybe return 0 after a read function call, this doesn't meet the standard of `InputStream` and the `InputStream` will not return 0 unless len is 0; Spark will use a BufferedInputStream wrapped to ZstdInputStream, when ZstdInputStream read call return 0, BufferedInputStream will consider the 0 as the end of read and exit, this can lead data loss. zstd-jni issues: luben/zstd-jni#159 zstd-jni commits: luben/zstd-jni@7eec558 Closes apache#31645 from seayoun/yuhaiyang_update_zstd_jni. Authored-by: yuhaiyang <yuhaiyang@yuhaiyangs-MacBook-Pro.local> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 96f5137) RB=3332514 BUG=LIHADOOP-63952 G=spark-reviewers R=tgudivad,ekrogen A=ekrogen
What changes were proposed in this pull request?
Change zstd-jni version to version 1.4.4-3
Why are the changes needed?
This old zstd-jni(tag 1.3.3-2) has probability to read less data when shuffle read.
The
ZstdInputStream
in zstd-jni(tag 1.3.3-2) maybe return 0 after a read function call, this doesn't meet the standard ofInputStream
and theInputStream
will not return 0 unless len is 0; Spark will use a BufferedInputStream wrapped to ZstdInputStream, when ZstdInputStream read call return 0, BufferedInputStream will consider the 0 as the end of read and exit, this can lead data loss.zstd-jni issues:
luben/zstd-jni#159
zstd-jni commits:
luben/zstd-jni@7eec558
Does this PR introduce any user-facing change?
How was this patch tested?