-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Enabling CacheOptions::LazyDefault caused Parquet fuzzing failure #38071
Comments
…efault in Parquet reader
I've re-produce this problem on my Local PC, seems that this will not exists in the production env.
I think this could be fixed 🤔 And only DEBUG build will raise this request. And I think change to Default will not improve this 🤔 |
Thanks for taking a look!
Indeed, #38072 confirms this. The default CacheOptions show the same issue. Is the fix to make this an actual error, instead of only a debug check? (because it should still error properly when reading an invalid file?) |
https://github.com/apache/arrow/pull/38073/files I've a basic fixing, but I don't know if putting the check here is ok(maybe there're better place). Waiting for @pitrou review |
🤔 Maybe it's ok to change the - std::vector<ReadRange> Coalesce(std::vector<ReadRange> ranges)
+ Result<std::vector<ReadRange>> Coalesce(std::vector<ReadRange> ranges) But here it only affect the debug build, and not report this on release mode. So waiting for your advices... |
If it's possible for a file with invalid/overlapping ranges to make it this far, then yeah, we should make Coalesce return Result instead of asserting/doing something incorrect silently. |
…38073) ### Rationale for this change The C++ Parquet Arrow fuzz will generate bad Parquet file with bad row-range, this patch change the `CoalesceReadRanges` to return `Result<>`. ### What changes are included in this PR? Just a checking, change `CoalesceReadRanges` to return `Result<>`. ### Are these changes tested? No. ### Are there any user-facing changes? No. * Closes: #38071 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…fer (apache#38073) ### Rationale for this change The C++ Parquet Arrow fuzz will generate bad Parquet file with bad row-range, this patch change the `CoalesceReadRanges` to return `Result<>`. ### What changes are included in this PR? Just a checking, change `CoalesceReadRanges` to return `Result<>`. ### Are these changes tested? No. ### Are there any user-facing changes? No. * Closes: apache#38071 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…fer (apache#38073) ### Rationale for this change The C++ Parquet Arrow fuzz will generate bad Parquet file with bad row-range, this patch change the `CoalesceReadRanges` to return `Result<>`. ### What changes are included in this PR? Just a checking, change `CoalesceReadRanges` to return `Result<>`. ### Are these changes tested? No. ### Are there any user-facing changes? No. * Closes: apache#38071 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…fer (apache#38073) ### Rationale for this change The C++ Parquet Arrow fuzz will generate bad Parquet file with bad row-range, this patch change the `CoalesceReadRanges` to return `Result<>`. ### What changes are included in this PR? Just a checking, change `CoalesceReadRanges` to return `Result<>`. ### Are these changes tested? No. ### Are there any user-facing changes? No. * Closes: apache#38071 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
#37854 introduced a failure in the "AMD64 Ubuntu 22.04 C++ ASAN UBSAN" build (https://github.com/apache/arrow/actions/runs/6430392691/job/17462667620?pr=38069#logs), related to the LazyCache coalesced reads. See details below.
I assume this is an existing bug, given that PR only changed a default for an option a user could already set before as well. But changing the default of course makes it more visible.
Potentially short term option is to only change
pre_buffer
and keep the current non-lazy defaultcache_options
(if that fixes it). Or revert the PR entirely until this is resolved (I don't have time today to look into more detail).Although the R bindings also already use the CacheOptions::LazyDefault by default for a while.
Originally posted by @jorisvandenbossche in #37854 (comment)
The text was updated successfully, but these errors were encountered: