Verify: BINARY types preserve data integrity (legacy #147)#21
Merged
HenryNebula merged 5 commits intodevfrom Apr 23, 2026
Merged
Verify: BINARY types preserve data integrity (legacy #147)#21HenryNebula merged 5 commits intodevfrom
HenryNebula merged 5 commits intodevfrom
Conversation
Owner
Author
|
Also need to add integration tests for external DB besides mock |
e74e894 to
73e8149
Compare
The legacy issue (baztian/jaydebeapi#147) about BINARY types being decoded as UTF-8 strings does not affect jaydebeapiarrow — the Arrow JDBC adapter handles binary types natively and returns Python bytes. Add mockBinaryResult to MockConnection and three test cases that verify binary data round-trips correctly, including non-UTF-8 byte sequences (0x80, 0xff, 0xfe) and all 256 byte values. Closes #20 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses PR review feedback requesting integration tests beyond the mock driver. Tests that binary data containing non-UTF-8 bytes (0x80, 0xff, 0xfe) round-trips correctly through the Arrow path against a real HSQLDB database. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address reviewer feedback: verify binary data integrity on an external database. The PostgresTest override tests the full 256-byte spectrum and common non-UTF-8 sequences that historically get corrupted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rebase onto dev restored timestamp tests that were previously removed. Add Drill-specific test_binary_non_utf8_roundtrip using CTAS (Drill doesn't support parameterized INSERT for binary). Skip on Trino since the memory connector doesn't support VARBINARY round-trip via CTAS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2e12898 to
4a92bb1
Compare
Drill cannot create VARBINARY columns with non-UTF-8 bytes via CTAS (hex literal conversion is unsupported). Binary data integrity is already verified via mock tests, HSQLDB, and PostgreSQL integration tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #20 (legacy baztian/jaydebeapi#147): Verifies that BINARY/VARBINARY data round-trips correctly through the Arrow JDBC adapter without data loss.
The upstream
jaydebeapihad a bug where binary data was decoded as UTF-8 strings viastr(java_val), corrupting non-UTF-8 bytes. This does not affectjaydebeapiarrowbecause the Arrow JDBC adapter handles binary types natively, returning Pythonbytesobjects.Changes
mockBinaryResult(byte[])toMockConnection— mocksgetBinaryStream()(the method the Arrow consumer actually calls),getBytes(),wasNull(), andgetObject()test_mock.py:test_binary_non_utf8_bytes_preserved— verifies bytes like0x80,0xff,0xfesurvive round-triptest_binary_all_byte_values— all 256 byte values round-trip correctlytest_binary_empty— empty binary data handled correctlyTest plan
bytes(notstr), confirming no UTF-8 decoding occursCloses #20
Generated with Claude Code