Conversation
refactor(version-control): remove last_pruned column fix(tests): update assertions for external files
There was a problem hiding this comment.
Pull request overview
This PR bumps the DUC schema to 3000002 and refactors external file storage so revision metadata can be read/parsed without loading large binary blobs, while also removing the version-graph pruning fields/APIs.
Changes:
- Split external file revision blobs into a dedicated
external_file_revision_datatable and a separate exportedfilesDatamap across Rust/JS/Python. - Remove pruning-related schema fields/types/APIs (
PRUNING_LEVEL,pruning_level,last_pruned) from schema + language bindings. - Update package READMEs/docs to load the logo via jsDelivr instead of raw GitHub URLs.
Reviewed changes
Copilot reviewed 36 out of 36 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| schema/version_control.sql | Drops version_graph.last_pruned from the declared schema. |
| schema/migrations/3000001_to_3000002.sql | Adds migration for splitting external file blobs into a new table and bumps user_version. |
| schema/duc.sql | Bumps schema version to 3000002; removes duc_global_state.pruning_level; splits external file revision blob storage into new table. |
| packages/ducrs/src/types.rs | Updates Rust types for split external file blobs; removes pruning-related types/fields; adds ExternalFileLoaded and filesData. |
| packages/ducrs/src/parse.rs | Parses split blob storage (external_files + filesData), updates get_external_file to return metadata + blobs. |
| packages/ducrs/src/serialize.rs | Serializes revision metadata and writes blobs from filesData into the new blob table; removes pruning fields from writes. |
| packages/ducrs/src/api/version_control.rs | Removes pruning read/write paths and drops last_pruned from queries. |
| packages/ducrs/src/lib.rs | Adjusts tests for lazy parse + updated get_external_file return type. |
| packages/ducpy/src/ducpy/serialize.py | Splits external file metadata vs blob bytes into files + filesData for Rust serialization. |
| packages/ducpy/src/ducpy/classes/DataStateClass.py | Removes pruning fields; adds files_data on exported state; external revision no longer carries blob inline. |
| packages/ducpy/src/ducpy/builders/state_builders.py | Removes pruning builder methods; attaches blob bytes via a side-channel for serialization. |
| packages/ducpy/src/ducpy/enums.py | Removes PRUNING_LEVEL. |
| packages/ducpy/src/tests/src/*.py | Updates SQL tests to insert revision blobs into external_file_revision_data and removes pruning columns from inserts. |
| packages/ducjs/src/types/index.ts | Introduces filesData typing and splits revision metadata from blob data; removes pruning typing. |
| packages/ducjs/src/restore/restoreDataState.ts | Restores filesData and keeps backward-compat extraction from legacy inline data; removes pruning restoration. |
| packages/ducjs/src/lazy-files.ts | Updates lazy file loading to handle {...fileMeta, data} shape and provides ResolvedFileData. |
| packages/ducjs/src/parse.ts | Ensures lazy parse initializes filesData. |
| packages/ducjs/src/utils/state/index.ts | Removes default pruningLevel from global state defaults. |
| packages/ducjs/src/transform.ts | Removes pruning normalization. |
| packages/ducjs/src/enums.ts | Removes PRUNING_LEVEL. |
| packages/ducjs/src/utils/normalize.ts | Removes unused grid normalization helpers. |
| packages/ducjs/src/utils/constants.ts | Removes DEFAULT_GRID_STEP. |
| packages/ducpdf/src/duc2pdf/src/builder.rs | Threads active revision blob bytes separately into image/svg/pdf processing. |
| packages/ducpdf/src/duc2pdf/src/streaming/stream_resources.rs | Accepts optional revision blob bytes for resource processing. |
| packages/ducpdf/src/duc2pdf/src/utils/svg_to_pdf.rs | Updates SVG conversion entrypoint to accept revision bytes directly. |
| README.md + packages/*/README.md + packages/ducpy/docs/index.rst | Switches logo URL to jsDelivr CDN. |
Comments suppressed due to low confidence (1)
schema/migrations/3000001_to_3000002.sql:40
- This migration bumps
user_versionto 3000002, and the 3000002 schema files removeduc_global_state.pruning_levelandversion_graph.last_pruned. As written, the migration only changes the external file tables, so upgraded databases will still contain those removed columns and won’t match the declared 3000002 schema. Consider recreating those tables to drop the columns (or keep the columns in the schema if intentional).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -479,36 +480,34 @@ impl DucToPdfBuilder { | |||
| Some(rev) => rev.mime_type.clone(), | |||
| None => continue, | |||
| }; | |||
|
|
|||
| let rev_data = files_data | |||
| .as_ref() | |||
| .and_then(|d| d.get(&file.active_revision_id)) | |||
| .map(|b| b.as_ref()); | |||
There was a problem hiding this comment.
external_files_data.clone() will deep-clone the entire revision-blob map (potentially very large) even though only the active revision’s bytes are needed per file. Consider avoiding the full clone (e.g., pull out only the needed blob per iteration into an owned Vec<u8>/ByteBuf, or restructure so you don’t have to hold a long-lived borrow of self while calling self.process_*).
| // Write data blobs from the separate filesData map | ||
| if let Some(data_map) = files_data { | ||
| for (rev_id, blob) in data_map { | ||
| data_stmt.execute(params![rev_id, blob.as_ref()])?; | ||
| } |
There was a problem hiding this comment.
filesData can contain revision IDs that don’t exist in external_file_revisions (e.g., stale entries). With the new FK from external_file_revision_data.revision_id → external_file_revisions.id, inserting such entries will cause serialization to fail. Consider filtering data_map to only known revision IDs (from files) or validating and returning a clearer error when an unknown revision ID is encountered.
| const revId = isValidString(r.id); | ||
| const mimeType = isValidString(r.mimeType); | ||
| const dataSource = r.data ?? (r as any).dataURL; | ||
| const data = isValidUint8Array(dataSource); | ||
| if (!revId || !mimeType || !data) continue; | ||
| if (!revId || !mimeType) continue; | ||
|
|
||
| restoredRevisions[revKey] = { | ||
| id: revId, | ||
| sizeBytes: isFiniteNumber(r.sizeBytes) ? (r.sizeBytes as number) : data.byteLength, | ||
| sizeBytes: isFiniteNumber(r.sizeBytes) ? (r.sizeBytes as number) : 0, | ||
| checksum: isValidString(r.checksum) || undefined, | ||
| sourceName: isValidString(r.sourceName) || undefined, | ||
| mimeType, | ||
| message: isValidString(r.message) || undefined, | ||
| created: isFiniteNumber(r.created) ? (r.created as number) : Date.now(), | ||
| lastRetrieved: isFiniteNumber(r.lastRetrieved) ? (r.lastRetrieved as number) : undefined, | ||
| data, | ||
| }; |
There was a problem hiding this comment.
restoreFiles() now falls back to sizeBytes: 0 when r.sizeBytes is missing, even if the revision still includes legacy inline data (which restoreFilesData() can extract). This makes the metadata inaccurate and can break any logic that relies on sizeBytes (e.g., comparisons/merging). Consider computing the fallback from the legacy inline blob length when present, or deriving it from filesData[revId] when available.
| @@ -298,20 +294,86 @@ export const restoreFiles = (importedFiles: unknown): DucExternalFiles => { | |||
| revisions: { | |||
| [revId]: { | |||
| id: revId, | |||
| sizeBytes: data.byteLength, | |||
| sizeBytes: 0, | |||
| mimeType, | |||
| created, | |||
| lastRetrieved: isFiniteNumber(legacyData.lastRetrieved) | |||
| ? (legacyData.lastRetrieved as number) | |||
| : undefined, | |||
| data, | |||
| }, | |||
| }, | |||
| }; | |||
There was a problem hiding this comment.
For legacy flat external_files entries, restoreFiles() no longer requires data/dataURL to be present and sets sizeBytes to 0. If callers pass legacy file objects without the blob, you’ll end up with an external file that can’t be rendered or re-serialized correctly. Consider either keeping the legacy requirement (skip entries without data) or ensuring the blob is moved into filesData and sizeBytes is derived accordingly.
| CREATE TABLE external_file_revision_data ( | ||
| revision_id TEXT PRIMARY KEY REFERENCES external_file_revisions(id) ON DELETE CASCADE, | ||
| data BLOB NOT NULL | ||
| ) WITHOUT ROWID; | ||
|
|
||
| -- 2. Move existing data blobs into the new table | ||
| INSERT INTO external_file_revision_data (revision_id, data) | ||
| SELECT id, data FROM external_file_revisions; | ||
|
|
||
| -- 3. Recreate external_file_revisions without the data column | ||
| ALTER TABLE external_file_revisions RENAME TO _ext_revisions_old_v3000001; | ||
|
|
There was a problem hiding this comment.
external_file_revision_data is created with a FK to external_file_revisions, but the migration later renames external_file_revisions. In SQLite, ALTER TABLE ... RENAME TO rewrites foreign-key references, so this FK will end up pointing at _ext_revisions_old_v3000001 and then the referenced table is dropped. Reorder the migration to create external_file_revision_data after the new external_file_revisions table exists (or recreate the data table) so the FK/cascade points at the final table.
| ALTER TABLE external_file_revisions RENAME TO _ext_revisions_old_v3000001; | ||
|
|
||
| CREATE TABLE external_file_revisions ( | ||
| id TEXT PRIMARY KEY, | ||
| file_id TEXT NOT NULL REFERENCES external_files(id) ON DELETE CASCADE, | ||
| size_bytes INTEGER NOT NULL DEFAULT 0, | ||
| checksum TEXT, | ||
| source_name TEXT, | ||
| mime_type TEXT NOT NULL, | ||
| message TEXT, | ||
| created INTEGER NOT NULL, | ||
| last_retrieved INTEGER | ||
| ) WITHOUT ROWID; | ||
|
|
||
| CREATE INDEX idx_external_file_revisions_file ON external_file_revisions(file_id); | ||
|
|
There was a problem hiding this comment.
After ALTER TABLE external_file_revisions RENAME ..., the existing idx_external_file_revisions_file index from v3000001 will still exist (now attached to the renamed table) and will keep its name. Creating a new index with the same name on the new table will fail with “index already exists”. Drop the old index after the rename (or create the new index under a different name) before attempting to create the replacement index.
|
🎉 This PR is included in version 3.2.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
|
🎉 This PR is included in version 3.2.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
|
🎉 This PR is included in version 3.2.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
|
🎉 This PR is included in version 3.2.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
|
🎉 This PR is included in version 3.7.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
|
🎉 This PR is included in version 3.6.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
No description provided.