-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Parquet] Fuzz using several batch sizes #15193
Comments
Something like this could work: diff --git a/cpp/src/parquet/arrow/reader.cc b/cpp/src/parquet/arrow/reader.cc
index b57b5062a..717f49629 100644
--- a/cpp/src/parquet/arrow/reader.cc
+++ b/cpp/src/parquet/arrow/reader.cc
@@ -20,6 +20,7 @@
#include <algorithm>
#include <cstring>
#include <memory>
+#include <optional>
#include <unordered_set>
#include <utility>
#include <vector>
@@ -1385,13 +1386,21 @@ Status FuzzReader(std::unique_ptr<FileReader> reader) {
Status FuzzReader(const uint8_t* data, int64_t size) {
auto buffer = std::make_shared<::arrow::Buffer>(data, size);
- auto file = std::make_shared<::arrow::io::BufferReader>(buffer);
- FileReaderBuilder builder;
- RETURN_NOT_OK(builder.Open(std::move(file)));
+ auto st = Status::OK();
+
+ for (auto batch_size : std::vector<std::optional<int>>{std::nullopt, 1, 13, 300}) {
+ auto file = std::make_shared<::arrow::io::BufferReader>(buffer);
+ FileReaderBuilder builder;
+ RETURN_NOT_OK(builder.Open(std::move(file)));
- std::unique_ptr<FileReader> reader;
- RETURN_NOT_OK(builder.Build(&reader));
- return FuzzReader(std::move(reader));
+ std::unique_ptr<FileReader> reader;
+ RETURN_NOT_OK(builder.Build(&reader));
+ if (batch_size) {
+ reader->set_batch_size(*batch_size);
+ }
+ st &= FuzzReader(std::move(reader));
+ }
+ return st;
}
} // namespace internal |
I'd like to pickup this issue, but I need to learn fuzz first :) |
take |
pitrou
pushed a commit
that referenced
this issue
Feb 1, 2023
…#33942) ### Rationale for this change Currently, parquet FuzzReader will only read fixed batch size. This patch change it using some fixed batch size. ### What changes are included in this PR? Change parquet FuzzReader use some fixed batch size. ### Are these changes tested? These changes are already testing in parquet :) ### Are there any user-facing changes? Because `FuzzReader` is an EXPORT api, a Fuzzer user may facing "error" when reading. But I think it's ok * Closes: #15193 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
sjperkins
pushed a commit
to sjperkins/arrow
that referenced
this issue
Feb 10, 2023
…h size (apache#33942) ### Rationale for this change Currently, parquet FuzzReader will only read fixed batch size. This patch change it using some fixed batch size. ### What changes are included in this PR? Change parquet FuzzReader use some fixed batch size. ### Are these changes tested? These changes are already testing in parquet :) ### Are there any user-facing changes? Because `FuzzReader` is an EXPORT api, a Fuzzer user may facing "error" when reading. But I think it's ok * Closes: apache#15193 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
gringasalpastor
pushed a commit
to gringasalpastor/arrow
that referenced
this issue
Feb 17, 2023
…h size (apache#33942) ### Rationale for this change Currently, parquet FuzzReader will only read fixed batch size. This patch change it using some fixed batch size. ### What changes are included in this PR? Change parquet FuzzReader use some fixed batch size. ### Are these changes tested? These changes are already testing in parquet :) ### Are there any user-facing changes? Because `FuzzReader` is an EXPORT api, a Fuzzer user may facing "error" when reading. But I think it's ok * Closes: apache#15193 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
fatemehp
pushed a commit
to fatemehp/arrow
that referenced
this issue
Feb 24, 2023
…h size (apache#33942) ### Rationale for this change Currently, parquet FuzzReader will only read fixed batch size. This patch change it using some fixed batch size. ### What changes are included in this PR? Change parquet FuzzReader use some fixed batch size. ### Are these changes tested? These changes are already testing in parquet :) ### Are there any user-facing changes? Because `FuzzReader` is an EXPORT api, a Fuzzer user may facing "error" when reading. But I think it's ok * Closes: apache#15193 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
Exercising several batch sizes in
parquet::arrow::internal::FuzzReader
may help find more reader/decoder bugs.Component(s)
C++
The text was updated successfully, but these errors were encountered: