Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Fuzz using several batch sizes #15193

Closed
pitrou opened this issue Jan 4, 2023 · 3 comments · Fixed by #33942
Closed

[C++][Parquet] Fuzz using several batch sizes #15193

pitrou opened this issue Jan 4, 2023 · 3 comments · Fixed by #33942

Comments

@pitrou
Copy link
Member

pitrou commented Jan 4, 2023

Describe the enhancement requested

Exercising several batch sizes in parquet::arrow::internal::FuzzReader may help find more reader/decoder bugs.

Component(s)

C++

@pitrou
Copy link
Member Author

pitrou commented Jan 4, 2023

Something like this could work:

diff --git a/cpp/src/parquet/arrow/reader.cc b/cpp/src/parquet/arrow/reader.cc
index b57b5062a..717f49629 100644
--- a/cpp/src/parquet/arrow/reader.cc
+++ b/cpp/src/parquet/arrow/reader.cc
@@ -20,6 +20,7 @@
 #include <algorithm>
 #include <cstring>
 #include <memory>
+#include <optional>
 #include <unordered_set>
 #include <utility>
 #include <vector>
@@ -1385,13 +1386,21 @@ Status FuzzReader(std::unique_ptr<FileReader> reader) {
 
 Status FuzzReader(const uint8_t* data, int64_t size) {
   auto buffer = std::make_shared<::arrow::Buffer>(data, size);
-  auto file = std::make_shared<::arrow::io::BufferReader>(buffer);
-  FileReaderBuilder builder;
-  RETURN_NOT_OK(builder.Open(std::move(file)));
+  auto st = Status::OK();
+
+  for (auto batch_size : std::vector<std::optional<int>>{std::nullopt, 1, 13, 300}) {
+    auto file = std::make_shared<::arrow::io::BufferReader>(buffer);
+    FileReaderBuilder builder;
+    RETURN_NOT_OK(builder.Open(std::move(file)));
 
-  std::unique_ptr<FileReader> reader;
-  RETURN_NOT_OK(builder.Build(&reader));
-  return FuzzReader(std::move(reader));
+    std::unique_ptr<FileReader> reader;
+    RETURN_NOT_OK(builder.Build(&reader));
+    if (batch_size) {
+      reader->set_batch_size(*batch_size);
+    }
+    st &= FuzzReader(std::move(reader));
+  }
+  return st;
 }
 
 }  // namespace internal

@mapleFU
Copy link
Member

mapleFU commented Jan 17, 2023

I'd like to pickup this issue, but I need to learn fuzz first :)

@mapleFU
Copy link
Member

mapleFU commented Jan 31, 2023

take

pitrou pushed a commit that referenced this issue Feb 1, 2023
…#33942)

### Rationale for this change

Currently, parquet FuzzReader will only read fixed batch size. This patch change it using some fixed batch size.

### What changes are included in this PR?

Change parquet FuzzReader use some fixed batch size.

### Are these changes tested?

These changes are already testing in parquet :)

### Are there any user-facing changes?

Because `FuzzReader` is an EXPORT api, a Fuzzer user may facing "error" when reading. But I think it's ok

* Closes: #15193

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 12.0.0 milestone Feb 1, 2023
sjperkins pushed a commit to sjperkins/arrow that referenced this issue Feb 10, 2023
…h size (apache#33942)

### Rationale for this change

Currently, parquet FuzzReader will only read fixed batch size. This patch change it using some fixed batch size.

### What changes are included in this PR?

Change parquet FuzzReader use some fixed batch size.

### Are these changes tested?

These changes are already testing in parquet :)

### Are there any user-facing changes?

Because `FuzzReader` is an EXPORT api, a Fuzzer user may facing "error" when reading. But I think it's ok

* Closes: apache#15193

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
gringasalpastor pushed a commit to gringasalpastor/arrow that referenced this issue Feb 17, 2023
…h size (apache#33942)

### Rationale for this change

Currently, parquet FuzzReader will only read fixed batch size. This patch change it using some fixed batch size.

### What changes are included in this PR?

Change parquet FuzzReader use some fixed batch size.

### Are these changes tested?

These changes are already testing in parquet :)

### Are there any user-facing changes?

Because `FuzzReader` is an EXPORT api, a Fuzzer user may facing "error" when reading. But I think it's ok

* Closes: apache#15193

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
fatemehp pushed a commit to fatemehp/arrow that referenced this issue Feb 24, 2023
…h size (apache#33942)

### Rationale for this change

Currently, parquet FuzzReader will only read fixed batch size. This patch change it using some fixed batch size.

### What changes are included in this PR?

Change parquet FuzzReader use some fixed batch size.

### Are these changes tested?

These changes are already testing in parquet :)

### Are there any user-facing changes?

Because `FuzzReader` is an EXPORT api, a Fuzzer user may facing "error" when reading. But I think it's ok

* Closes: apache#15193

Authored-by: mwish <maplewish117@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants