Skip to content

Conversation

@Alex-PLACET
Copy link
Member

@Alex-PLACET Alex-PLACET commented Nov 10, 2025

The point of the PR is to be able to run the integration test with archery and be able to run the tests on the primitive.json without any issue.
You will see that the workflow is red, it's because of the other layouts.
The plan is to fix the other layout in next PRs.

@Alex-PLACET Alex-PLACET self-assigned this Nov 10, 2025
@codecov-commenter
Copy link

codecov-commenter commented Nov 10, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@93c2fe8). Learn more about missing BASE report.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #61   +/-   ##
=======================================
  Coverage        ?   82.31%           
=======================================
  Files           ?       35           
  Lines           ?     1742           
  Branches        ?        0           
=======================================
  Hits            ?     1434           
  Misses          ?      308           
  Partials        ?        0           
Flag Coverage Δ
unittests 82.31% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive integration tests for Arrow IPC file format serialization and deserialization, including tools for converting between JSON, stream, and file formats. The implementation properly tracks record batch blocks in the footer to enable random access to batches within Arrow files.

Key Changes

  • Implements footer block tracking with offset, metadata_length, and body_length for each record batch
  • Adds nullable flag handling in deserialization to properly preserve schema information
  • Creates integration testing tools compatible with Apache Arrow's Archery framework
  • Adds Docker-based integration testing workflow

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
include/sparrow_ipc/serialize.hpp Adds serialized_record_batch_info struct and updates serialize_record_batch to return block metadata
src/serialize.cpp Implements block info calculation with proper 8-byte alignment per Arrow spec
include/sparrow_ipc/stream_file_serializer.hpp Adds record_batch_block struct and tracking vector for footer generation
src/stream_file_serializer.cpp Updates write_footer to populate footer with tracked record batch blocks
src/deserialize.cpp Adds nullable flag parameter to deserialization functions
include/sparrow_ipc/deserialize_*.hpp Updates function signatures to accept nullable parameter
src/deserialize_fixedsizebinary_array.cpp Implements nullable flag handling in schema construction
integration_tests/* New integration test infrastructure with JSON/stream/file conversion tools
tests/test_stream_file_serializer.cpp Adds extensive footer validation tests
.github/workflows/integration_tests.yaml New CI workflow for Docker-based integration testing
CMakeLists.txt Adds SPARROW_IPC_BUILD_INTEGRATION_TESTS option and output directory configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 31 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 270 to 289
const org::apache::arrow::flatbuf::Footer* get_footer_from_file_data(const std::vector<uint8_t>& file_data)
{
// Footer size is stored 4 bytes before the trailing magic
const size_t footer_size_offset = file_data.size() - sparrow_ipc::arrow_file_magic_size - sizeof(int32_t);
int32_t footer_size = 0;
std::memcpy(&footer_size, file_data.data() + footer_size_offset, sizeof(int32_t));

// Footer data starts at footer_size_offset - footer_size
const size_t footer_offset = footer_size_offset - footer_size;

return org::apache::arrow::flatbuf::GetFooter(file_data.data() + footer_offset);
}
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper function get_footer_from_file_data is duplicated in both tests/test_stream_file_serializer.cpp (line 270) and integration_tests/test_integration_tools.cpp (line 24). Consider extracting this into a shared test utility header (e.g., tests/include/test_helpers.hpp) to avoid code duplication and make maintenance easier.

Copilot uses AI. Check for mistakes.
@Alex-PLACET Alex-PLACET marked this pull request as ready for review December 1, 2025 13:44
@Alex-PLACET Alex-PLACET marked this pull request as draft December 1, 2025 15:52
@Alex-PLACET Alex-PLACET force-pushed the add_integration_tests branch from 87b0774 to 204eb30 Compare December 2, 2025 09:14
@Alex-PLACET Alex-PLACET marked this pull request as ready for review December 2, 2025 10:04
1. Schema mismatches in stream → `std::invalid_argument`
2. Deallocating source buffer while arrays in use → undefined behavior
3. Missing RPATH on Linux → runtime linking errors
4. Only LZ4 compression supported (not ZSTD yet)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are supported now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 18 to 19
- name: Install specific version of tzdata
run: sudo apt-get install tzdata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment on lines 49 to 52
- name: List all folders and subfolders
run: |
echo "Listing all folders and subfolders:"
find . -type d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a leftover from debugging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will delete that

Comment on lines +35 to +36
# Clone the arrow monorepo // TODO: change to the official repo
RUN git clone --depth 1 --branch archery_supports_external_libraries https://github.com/Alex-PLACET/arrow.git /arrow-integration --recurse-submodules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we're not using the official repo already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the official repo don't support external library to be tested

@@ -1,18 +1,22 @@
#include "sparrow_ipc/serialize.hpp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including project headers should be done after c++ standard ones (i.e <cstdint> and <optional>).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the current format rules

*
* Each block describes the location and size of a record batch in the file.
*/
struct record_batch_block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record_batch_block name isn't really explicit regarding what it should be, maybe rename to something else to clarify? record_batch_metadata or record_batch_info or record_batch_offsets (preference for the latter)...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include "sparrow_ipc/stream_file_serializer.hpp"

// Helper function to extract and parse the footer from Arrow IPC file data
const org::apache::arrow::flatbuf::Footer* get_footer_from_file_data(const std::vector<uint8_t>& file_data)
Copy link
Member

@Hind-M Hind-M Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this file (test_integration_tools.cpp) is testing a local one (integration_tools.cpp), we may want to move it to the global tests folder?
That way we could also move this helper function to the dedicated sparrow_ipc_tests_helpers.hpp file and avoid duplicated code in the same project.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fixed

@Hind-M
Copy link
Member

Hind-M commented Dec 2, 2025

The integration tests job seems to be still failing?

EDIT: Ok, it's the other layouts I suppose?

@Hind-M
Copy link
Member

Hind-M commented Dec 2, 2025

Maybe I missed it but are we actually running test_integration_tools in the CI?

@Alex-PLACET Alex-PLACET force-pushed the add_integration_tests branch from 204eb30 to 85569fd Compare December 3, 2025 10:52
@Alex-PLACET Alex-PLACET requested a review from Hind-M December 4, 2025 09:04
@Alex-PLACET Alex-PLACET merged commit b00c770 into QuantStack:main Dec 5, 2025
26 of 28 checks passed
@Alex-PLACET Alex-PLACET deleted the add_integration_tests branch December 5, 2025 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants