-
Notifications
You must be signed in to change notification settings - Fork 4
Add integration tests #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration tests #61
Conversation
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #61 +/- ##
=======================================
Coverage ? 82.31%
=======================================
Files ? 35
Lines ? 1742
Branches ? 0
=======================================
Hits ? 1434
Misses ? 308
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
265a553 to
02539e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive integration tests for Arrow IPC file format serialization and deserialization, including tools for converting between JSON, stream, and file formats. The implementation properly tracks record batch blocks in the footer to enable random access to batches within Arrow files.
Key Changes
- Implements footer block tracking with offset, metadata_length, and body_length for each record batch
- Adds nullable flag handling in deserialization to properly preserve schema information
- Creates integration testing tools compatible with Apache Arrow's Archery framework
- Adds Docker-based integration testing workflow
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| include/sparrow_ipc/serialize.hpp | Adds serialized_record_batch_info struct and updates serialize_record_batch to return block metadata |
| src/serialize.cpp | Implements block info calculation with proper 8-byte alignment per Arrow spec |
| include/sparrow_ipc/stream_file_serializer.hpp | Adds record_batch_block struct and tracking vector for footer generation |
| src/stream_file_serializer.cpp | Updates write_footer to populate footer with tracked record batch blocks |
| src/deserialize.cpp | Adds nullable flag parameter to deserialization functions |
| include/sparrow_ipc/deserialize_*.hpp | Updates function signatures to accept nullable parameter |
| src/deserialize_fixedsizebinary_array.cpp | Implements nullable flag handling in schema construction |
| integration_tests/* | New integration test infrastructure with JSON/stream/file conversion tools |
| tests/test_stream_file_serializer.cpp | Adds extensive footer validation tests |
| .github/workflows/integration_tests.yaml | New CI workflow for Docker-based integration testing |
| CMakeLists.txt | Adds SPARROW_IPC_BUILD_INTEGRATION_TESTS option and output directory configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const org::apache::arrow::flatbuf::Footer* get_footer_from_file_data(const std::vector<uint8_t>& file_data) | ||
| { | ||
| // Footer size is stored 4 bytes before the trailing magic | ||
| const size_t footer_size_offset = file_data.size() - sparrow_ipc::arrow_file_magic_size - sizeof(int32_t); | ||
| int32_t footer_size = 0; | ||
| std::memcpy(&footer_size, file_data.data() + footer_size_offset, sizeof(int32_t)); | ||
|
|
||
| // Footer data starts at footer_size_offset - footer_size | ||
| const size_t footer_offset = footer_size_offset - footer_size; | ||
|
|
||
| return org::apache::arrow::flatbuf::GetFooter(file_data.data() + footer_offset); | ||
| } |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helper function get_footer_from_file_data is duplicated in both tests/test_stream_file_serializer.cpp (line 270) and integration_tests/test_integration_tools.cpp (line 24). Consider extracting this into a shared test utility header (e.g., tests/include/test_helpers.hpp) to avoid code duplication and make maintenance easier.
87b0774 to
204eb30
Compare
.github/copilot-instructions.md
Outdated
| 1. Schema mismatches in stream → `std::invalid_argument` | ||
| 2. Deallocating source buffer while arrays in use → undefined behavior | ||
| 3. Missing RPATH on Linux → runtime linking errors | ||
| 4. Only LZ4 compression supported (not ZSTD yet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are supported now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| - name: Install specific version of tzdata | ||
| run: sudo apt-get install tzdata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| - name: List all folders and subfolders | ||
| run: | | ||
| echo "Listing all folders and subfolders:" | ||
| find . -type d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a leftover from debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will delete that
| # Clone the arrow monorepo // TODO: change to the official repo | ||
| RUN git clone --depth 1 --branch archery_supports_external_libraries https://github.com/Alex-PLACET/arrow.git /arrow-integration --recurse-submodules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we're not using the official repo already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the official repo don't support external library to be tested
| @@ -1,18 +1,22 @@ | |||
| #include "sparrow_ipc/serialize.hpp" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Including project headers should be done after c++ standard ones (i.e <cstdint> and <optional>).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the current format rules
| * | ||
| * Each block describes the location and size of a record batch in the file. | ||
| */ | ||
| struct record_batch_block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record_batch_block name isn't really explicit regarding what it should be, maybe rename to something else to clarify? record_batch_metadata or record_batch_info or record_batch_offsets (preference for the latter)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It refers to this: https://github.com/apache/arrow/blob/main/format/File.fbs#L39
| #include "sparrow_ipc/stream_file_serializer.hpp" | ||
|
|
||
| // Helper function to extract and parse the footer from Arrow IPC file data | ||
| const org::apache::arrow::flatbuf::Footer* get_footer_from_file_data(const std::vector<uint8_t>& file_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this file (test_integration_tools.cpp) is testing a local one (integration_tools.cpp), we may want to move it to the global tests folder?
That way we could also move this helper function to the dedicated sparrow_ipc_tests_helpers.hpp file and avoid duplicated code in the same project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fixed
|
The integration tests job seems to be still failing? EDIT: Ok, it's the other layouts I suppose? |
|
Maybe I missed it but are we actually running |
This reverts commit 280467d.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
204eb30 to
85569fd
Compare
The point of the PR is to be able to run the integration test with archery and be able to run the tests on the primitive.json without any issue.
You will see that the workflow is red, it's because of the other layouts.
The plan is to fix the other layout in next PRs.