feat(blob): add blob descriptor write support for append-only tables#270
Conversation
Introduce BlobDescriptor serialization/deserialization, AppendBlobFileWriter for writing blob format files, blob format writer implementation, and descriptor mode in the reader that returns BlobDescriptor references instead of inline data. Update row tracking in commit to use per-column counters for blob files. Allow BlobType in append-only tables (still unsupported with primary keys).
ddf3d67 to
057135b
Compare
|
|
||
| /// Comma-separated BLOB field names stored as serialized BlobDescriptor | ||
| /// bytes inline in normal data files (no .blob files for these fields). | ||
| pub fn blob_descriptor_fields(&self) -> HashSet<String> { |
There was a problem hiding this comment.
blob-descriptor-field needs schema-level validation before we use this set. In Java, every configured field must exist and must be a top-level BLOB field. Here we only parse the option string, so typos or nested / non-BLOB fields are silently accepted and we diverge from Java behavior. Can we validate this during schema construction / table initialization instead of letting it flow into the writer path?
| }; | ||
|
|
||
| let has_blob_fields = schema.fields().iter().any(|f| { | ||
| f.data_type().contains_blob_type() && !blob_descriptor_fields.contains(f.name()) |
There was a problem hiding this comment.
This is broader than the Java blob contract. contains_blob_type() recurses into nested types, so a column like ROW<blob BLOB> now flips has_blob_fields and routes the whole top-level column into AppendBlobFileWriter, but the blob writer still only accepts a single top-level BinaryArray. That makes the new nested-blob acceptance path a false positive: the schema is accepted, then the write path fails at runtime. I think this needs to be tightened to top-level DataType::Blob(_) only (matching Java BlobType.fieldsInBlobFile) or explicitly rejected during schema validation.
Purpose
Introduce BlobDescriptor serialization/deserialization, AppendBlobFileWriter for writing blob format files, blob format writer implementation, and descriptor mode in the reader that returns BlobDescriptor references instead of inline data. Update row tracking in commit to use per-column counters for blob files. Allow BlobType in append-only tables (still unsupported with primary keys).
Brief change log
Tests
API and Format
Documentation