Jorgedanisc by jorgedanisc · Pull Request #225 · ducflair/duc

jorgedanisc · 2026-03-19T23:23:37Z

No description provided.

refactor(version-control): remove last_pruned column fix(tests): update assertions for external files

Copilot

Pull request overview

This PR bumps the DUC schema to 3000002 and refactors external file storage so revision metadata can be read/parsed without loading large binary blobs, while also removing the version-graph pruning fields/APIs.

Changes:

Split external file revision blobs into a dedicated external_file_revision_data table and a separate exported filesData map across Rust/JS/Python.
Remove pruning-related schema fields/types/APIs (PRUNING_LEVEL, pruning_level, last_pruned) from schema + language bindings.
Update package READMEs/docs to load the logo via jsDelivr instead of raw GitHub URLs.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
schema/version_control.sql	Drops `version_graph.last_pruned` from the declared schema.
schema/migrations/3000001_to_3000002.sql	Adds migration for splitting external file blobs into a new table and bumps `user_version`.
schema/duc.sql	Bumps schema version to 3000002; removes `duc_global_state.pruning_level`; splits external file revision blob storage into new table.
packages/ducrs/src/types.rs	Updates Rust types for split external file blobs; removes pruning-related types/fields; adds `ExternalFileLoaded` and `filesData`.
packages/ducrs/src/parse.rs	Parses split blob storage (`external_files` + `filesData`), updates `get_external_file` to return metadata + blobs.
packages/ducrs/src/serialize.rs	Serializes revision metadata and writes blobs from `filesData` into the new blob table; removes pruning fields from writes.
packages/ducrs/src/api/version_control.rs	Removes pruning read/write paths and drops `last_pruned` from queries.
packages/ducrs/src/lib.rs	Adjusts tests for lazy parse + updated `get_external_file` return type.
packages/ducpy/src/ducpy/serialize.py	Splits external file metadata vs blob bytes into `files` + `filesData` for Rust serialization.
packages/ducpy/src/ducpy/classes/DataStateClass.py	Removes pruning fields; adds `files_data` on exported state; external revision no longer carries blob inline.
packages/ducpy/src/ducpy/builders/state_builders.py	Removes pruning builder methods; attaches blob bytes via a side-channel for serialization.
packages/ducpy/src/ducpy/enums.py	Removes `PRUNING_LEVEL`.
packages/ducpy/src/tests/src/*.py	Updates SQL tests to insert revision blobs into `external_file_revision_data` and removes pruning columns from inserts.
packages/ducjs/src/types/index.ts	Introduces `filesData` typing and splits revision metadata from blob data; removes pruning typing.
packages/ducjs/src/restore/restoreDataState.ts	Restores `filesData` and keeps backward-compat extraction from legacy inline data; removes pruning restoration.
packages/ducjs/src/lazy-files.ts	Updates lazy file loading to handle `{...fileMeta, data}` shape and provides `ResolvedFileData`.
packages/ducjs/src/parse.ts	Ensures lazy parse initializes `filesData`.
packages/ducjs/src/utils/state/index.ts	Removes default `pruningLevel` from global state defaults.
packages/ducjs/src/transform.ts	Removes pruning normalization.
packages/ducjs/src/enums.ts	Removes `PRUNING_LEVEL`.
packages/ducjs/src/utils/normalize.ts	Removes unused grid normalization helpers.
packages/ducjs/src/utils/constants.ts	Removes `DEFAULT_GRID_STEP`.
packages/ducpdf/src/duc2pdf/src/builder.rs	Threads active revision blob bytes separately into image/svg/pdf processing.
packages/ducpdf/src/duc2pdf/src/streaming/stream_resources.rs	Accepts optional revision blob bytes for resource processing.
packages/ducpdf/src/duc2pdf/src/utils/svg_to_pdf.rs	Updates SVG conversion entrypoint to accept revision bytes directly.
README.md + packages/*/README.md + packages/ducpy/docs/index.rst	Switches logo URL to jsDelivr CDN.

Comments suppressed due to low confidence (1)

schema/migrations/3000001_to_3000002.sql:40

This migration bumps user_version to 3000002, and the 3000002 schema files remove duc_global_state.pruning_level and version_graph.last_pruned. As written, the migration only changes the external file tables, so upgraded databases will still contain those removed columns and won’t match the declared 3000002 schema. Consider recreating those tables to drop the columns (or keep the columns in the schema if intentional).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-19T23:30:37Z

packages/ducpdf/src/duc2pdf/src/builder.rs

@@ -479,36 +480,34 @@ impl DucToPdfBuilder {
                    Some(rev) => rev.mime_type.clone(),
                    None => continue,
                };
+
+                let rev_data = files_data
+                    .as_ref()
+                    .and_then(|d| d.get(&file.active_revision_id))
+                    .map(|b| b.as_ref());


external_files_data.clone() will deep-clone the entire revision-blob map (potentially very large) even though only the active revision’s bytes are needed per file. Consider avoiding the full clone (e.g., pull out only the needed blob per iteration into an owned Vec<u8>/ByteBuf, or restructure so you don’t have to hold a long-lived borrow of self while calling self.process_*).

Copilot · 2026-03-19T23:30:37Z

packages/ducrs/src/serialize.rs

+    // Write data blobs from the separate filesData map
+    if let Some(data_map) = files_data {
+        for (rev_id, blob) in data_map {
+            data_stmt.execute(params![rev_id, blob.as_ref()])?;
+        }


filesData can contain revision IDs that don’t exist in external_file_revisions (e.g., stale entries). With the new FK from external_file_revision_data.revision_id → external_file_revisions.id, inserting such entries will cause serialization to fail. Consider filtering data_map to only known revision IDs (from files) or validating and returning a clearer error when an unknown revision ID is encountered.

Copilot · 2026-03-19T23:30:37Z

packages/ducjs/src/restore/restoreDataState.ts

        const revId = isValidString(r.id);
        const mimeType = isValidString(r.mimeType);
-        const dataSource = r.data ?? (r as any).dataURL;
-        const data = isValidUint8Array(dataSource);
-        if (!revId || !mimeType || !data) continue;
+        if (!revId || !mimeType) continue;

        restoredRevisions[revKey] = {
          id: revId,
-          sizeBytes: isFiniteNumber(r.sizeBytes) ? (r.sizeBytes as number) : data.byteLength,
+          sizeBytes: isFiniteNumber(r.sizeBytes) ? (r.sizeBytes as number) : 0,
          checksum: isValidString(r.checksum) || undefined,
          sourceName: isValidString(r.sourceName) || undefined,
          mimeType,
          message: isValidString(r.message) || undefined,
          created: isFiniteNumber(r.created) ? (r.created as number) : Date.now(),
          lastRetrieved: isFiniteNumber(r.lastRetrieved) ? (r.lastRetrieved as number) : undefined,
-          data,
        };


restoreFiles() now falls back to sizeBytes: 0 when r.sizeBytes is missing, even if the revision still includes legacy inline data (which restoreFilesData() can extract). This makes the metadata inaccurate and can break any logic that relies on sizeBytes (e.g., comparisons/merging). Consider computing the fallback from the legacy inline blob length when present, or deriving it from filesData[revId] when available.

Copilot · 2026-03-19T23:30:38Z

packages/ducjs/src/restore/restoreDataState.ts

@@ -298,20 +294,86 @@ export const restoreFiles = (importedFiles: unknown): DucExternalFiles => {
      revisions: {
        [revId]: {
          id: revId,
-          sizeBytes: data.byteLength,
+          sizeBytes: 0,
          mimeType,
          created,
          lastRetrieved: isFiniteNumber(legacyData.lastRetrieved)
            ? (legacyData.lastRetrieved as number)
            : undefined,
-          data,
        },
      },
    };


For legacy flat external_files entries, restoreFiles() no longer requires data/dataURL to be present and sets sizeBytes to 0. If callers pass legacy file objects without the blob, you’ll end up with an external file that can’t be rendered or re-serialized correctly. Consider either keeping the legacy requirement (skip entries without data) or ensuring the blob is moved into filesData and sizeBytes is derived accordingly.

Copilot · 2026-03-19T23:30:38Z

schema/migrations/3000001_to_3000002.sql

+CREATE TABLE external_file_revision_data (
+    revision_id TEXT PRIMARY KEY REFERENCES external_file_revisions(id) ON DELETE CASCADE,
+    data        BLOB NOT NULL
+) WITHOUT ROWID;
+
+-- 2. Move existing data blobs into the new table
+INSERT INTO external_file_revision_data (revision_id, data)
+    SELECT id, data FROM external_file_revisions;
+
+-- 3. Recreate external_file_revisions without the data column
+ALTER TABLE external_file_revisions RENAME TO _ext_revisions_old_v3000001;
+


external_file_revision_data is created with a FK to external_file_revisions, but the migration later renames external_file_revisions. In SQLite, ALTER TABLE ... RENAME TO rewrites foreign-key references, so this FK will end up pointing at _ext_revisions_old_v3000001 and then the referenced table is dropped. Reorder the migration to create external_file_revision_data after the new external_file_revisions table exists (or recreate the data table) so the FK/cascade points at the final table.

Copilot · 2026-03-19T23:30:38Z

schema/migrations/3000001_to_3000002.sql

+ALTER TABLE external_file_revisions RENAME TO _ext_revisions_old_v3000001;
+
+CREATE TABLE external_file_revisions (
+    id              TEXT    PRIMARY KEY,
+    file_id         TEXT    NOT NULL REFERENCES external_files(id) ON DELETE CASCADE,
+    size_bytes      INTEGER NOT NULL DEFAULT 0,
+    checksum        TEXT,
+    source_name     TEXT,
+    mime_type       TEXT    NOT NULL,
+    message         TEXT,
+    created         INTEGER NOT NULL,
+    last_retrieved  INTEGER
+) WITHOUT ROWID;
+
+CREATE INDEX idx_external_file_revisions_file ON external_file_revisions(file_id);
+


After ALTER TABLE external_file_revisions RENAME ..., the existing idx_external_file_revisions_file index from v3000001 will still exist (now attached to the renamed table) and will keep its name. Creating a new index with the same name on the new table will fail with “index already exists”. Drop the old index after the rename (or create the new index under a different name) before attempting to create the replacement index.

github-actions · 2026-03-19T23:42:51Z

🎉 This PR is included in version 3.2.0 🎉

The release is available on:

GitHub release
ducrs@3.2.0

Your semantic-release bot 📦🚀

github-actions · 2026-03-19T23:44:23Z

🎉 This PR is included in version 3.2.0 🎉

The release is available on:

ducpy@3.2.0
GitHub release

Your semantic-release bot 📦🚀

github-actions · 2026-03-19T23:46:54Z

🎉 This PR is included in version 3.2.0 🎉

The release is available on:

ducjs@3.2.0
GitHub release

Your semantic-release bot 📦🚀

github-actions · 2026-03-19T23:48:09Z

🎉 This PR is included in version 3.2.0 🎉

The release is available on:

GitHub release
duc2pdf@3.2.0

Your semantic-release bot 📦🚀

github-actions · 2026-03-19T23:56:41Z

🎉 This PR is included in version 3.7.0 🎉

The release is available on:

ducpdf@3.7.0
GitHub release

Your semantic-release bot 📦🚀

github-actions · 2026-03-20T00:06:42Z

🎉 This PR is included in version 3.6.0 🎉

The release is available on:

ducsvg@3.6.0
GitHub release

Your semantic-release bot 📦🚀

jorgedanisc added 3 commits March 17, 2026 22:17

feat(external-files): split data into separate table

aa4e528

refactor(version-control): remove last_pruned column fix(tests): update assertions for external files

chore: update image links in README files

4ae3412

fix: update duc sql version to 3000002

4478f0c

jorgedanisc self-assigned this Mar 19, 2026

Copilot AI review requested due to automatic review settings March 19, 2026 23:23

jorgedanisc merged commit 7aa7b5e into dev Mar 19, 2026
1 check passed

Copilot started reviewing on behalf of jorgedanisc March 19, 2026 23:24 View session

Copilot AI reviewed Mar 19, 2026

View reviewed changes

github-actions bot added the released label Mar 19, 2026

Conversation

jorgedanisc commented Mar 19, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 19, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants