Skip to content

Jorgedanisc#225

Merged
jorgedanisc merged 3 commits intodevfrom
jorgedanisc
Mar 19, 2026
Merged

Jorgedanisc#225
jorgedanisc merged 3 commits intodevfrom
jorgedanisc

Conversation

@jorgedanisc
Copy link
Copy Markdown
Contributor

No description provided.

refactor(version-control): remove last_pruned column
fix(tests): update assertions for external files
@jorgedanisc jorgedanisc self-assigned this Mar 19, 2026
Copilot AI review requested due to automatic review settings March 19, 2026 23:23
@jorgedanisc jorgedanisc merged commit 7aa7b5e into dev Mar 19, 2026
1 check passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bumps the DUC schema to 3000002 and refactors external file storage so revision metadata can be read/parsed without loading large binary blobs, while also removing the version-graph pruning fields/APIs.

Changes:

  • Split external file revision blobs into a dedicated external_file_revision_data table and a separate exported filesData map across Rust/JS/Python.
  • Remove pruning-related schema fields/types/APIs (PRUNING_LEVEL, pruning_level, last_pruned) from schema + language bindings.
  • Update package READMEs/docs to load the logo via jsDelivr instead of raw GitHub URLs.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
schema/version_control.sql Drops version_graph.last_pruned from the declared schema.
schema/migrations/3000001_to_3000002.sql Adds migration for splitting external file blobs into a new table and bumps user_version.
schema/duc.sql Bumps schema version to 3000002; removes duc_global_state.pruning_level; splits external file revision blob storage into new table.
packages/ducrs/src/types.rs Updates Rust types for split external file blobs; removes pruning-related types/fields; adds ExternalFileLoaded and filesData.
packages/ducrs/src/parse.rs Parses split blob storage (external_files + filesData), updates get_external_file to return metadata + blobs.
packages/ducrs/src/serialize.rs Serializes revision metadata and writes blobs from filesData into the new blob table; removes pruning fields from writes.
packages/ducrs/src/api/version_control.rs Removes pruning read/write paths and drops last_pruned from queries.
packages/ducrs/src/lib.rs Adjusts tests for lazy parse + updated get_external_file return type.
packages/ducpy/src/ducpy/serialize.py Splits external file metadata vs blob bytes into files + filesData for Rust serialization.
packages/ducpy/src/ducpy/classes/DataStateClass.py Removes pruning fields; adds files_data on exported state; external revision no longer carries blob inline.
packages/ducpy/src/ducpy/builders/state_builders.py Removes pruning builder methods; attaches blob bytes via a side-channel for serialization.
packages/ducpy/src/ducpy/enums.py Removes PRUNING_LEVEL.
packages/ducpy/src/tests/src/*.py Updates SQL tests to insert revision blobs into external_file_revision_data and removes pruning columns from inserts.
packages/ducjs/src/types/index.ts Introduces filesData typing and splits revision metadata from blob data; removes pruning typing.
packages/ducjs/src/restore/restoreDataState.ts Restores filesData and keeps backward-compat extraction from legacy inline data; removes pruning restoration.
packages/ducjs/src/lazy-files.ts Updates lazy file loading to handle {...fileMeta, data} shape and provides ResolvedFileData.
packages/ducjs/src/parse.ts Ensures lazy parse initializes filesData.
packages/ducjs/src/utils/state/index.ts Removes default pruningLevel from global state defaults.
packages/ducjs/src/transform.ts Removes pruning normalization.
packages/ducjs/src/enums.ts Removes PRUNING_LEVEL.
packages/ducjs/src/utils/normalize.ts Removes unused grid normalization helpers.
packages/ducjs/src/utils/constants.ts Removes DEFAULT_GRID_STEP.
packages/ducpdf/src/duc2pdf/src/builder.rs Threads active revision blob bytes separately into image/svg/pdf processing.
packages/ducpdf/src/duc2pdf/src/streaming/stream_resources.rs Accepts optional revision blob bytes for resource processing.
packages/ducpdf/src/duc2pdf/src/utils/svg_to_pdf.rs Updates SVG conversion entrypoint to accept revision bytes directly.
README.md + packages/*/README.md + packages/ducpy/docs/index.rst Switches logo URL to jsDelivr CDN.
Comments suppressed due to low confidence (1)

schema/migrations/3000001_to_3000002.sql:40

  • This migration bumps user_version to 3000002, and the 3000002 schema files remove duc_global_state.pruning_level and version_graph.last_pruned. As written, the migration only changes the external file tables, so upgraded databases will still contain those removed columns and won’t match the declared 3000002 schema. Consider recreating those tables to drop the columns (or keep the columns in the schema if intentional).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 471 to +487
@@ -479,36 +480,34 @@ impl DucToPdfBuilder {
Some(rev) => rev.mime_type.clone(),
None => continue,
};

let rev_data = files_data
.as_ref()
.and_then(|d| d.get(&file.active_revision_id))
.map(|b| b.as_ref());
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

external_files_data.clone() will deep-clone the entire revision-blob map (potentially very large) even though only the active revision’s bytes are needed per file. Consider avoiding the full clone (e.g., pull out only the needed blob per iteration into an owned Vec<u8>/ByteBuf, or restructure so you don’t have to hold a long-lived borrow of self while calling self.process_*).

Copilot uses AI. Check for mistakes.
Comment on lines +1163 to +1167
// Write data blobs from the separate filesData map
if let Some(data_map) = files_data {
for (rev_id, blob) in data_map {
data_stmt.execute(params![rev_id, blob.as_ref()])?;
}
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filesData can contain revision IDs that don’t exist in external_file_revisions (e.g., stale entries). With the new FK from external_file_revision_data.revision_idexternal_file_revisions.id, inserting such entries will cause serialization to fail. Consider filtering data_map to only known revision IDs (from files) or validating and returning a clearer error when an unknown revision ID is encountered.

Copilot uses AI. Check for mistakes.
Comment on lines 248 to 261
const revId = isValidString(r.id);
const mimeType = isValidString(r.mimeType);
const dataSource = r.data ?? (r as any).dataURL;
const data = isValidUint8Array(dataSource);
if (!revId || !mimeType || !data) continue;
if (!revId || !mimeType) continue;

restoredRevisions[revKey] = {
id: revId,
sizeBytes: isFiniteNumber(r.sizeBytes) ? (r.sizeBytes as number) : data.byteLength,
sizeBytes: isFiniteNumber(r.sizeBytes) ? (r.sizeBytes as number) : 0,
checksum: isValidString(r.checksum) || undefined,
sourceName: isValidString(r.sourceName) || undefined,
mimeType,
message: isValidString(r.message) || undefined,
created: isFiniteNumber(r.created) ? (r.created as number) : Date.now(),
lastRetrieved: isFiniteNumber(r.lastRetrieved) ? (r.lastRetrieved as number) : undefined,
data,
};
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restoreFiles() now falls back to sizeBytes: 0 when r.sizeBytes is missing, even if the revision still includes legacy inline data (which restoreFilesData() can extract). This makes the metadata inaccurate and can break any logic that relies on sizeBytes (e.g., comparisons/merging). Consider computing the fallback from the legacy inline blob length when present, or deriving it from filesData[revId] when available.

Copilot uses AI. Check for mistakes.
Comment on lines 276 to 305
@@ -298,20 +294,86 @@ export const restoreFiles = (importedFiles: unknown): DucExternalFiles => {
revisions: {
[revId]: {
id: revId,
sizeBytes: data.byteLength,
sizeBytes: 0,
mimeType,
created,
lastRetrieved: isFiniteNumber(legacyData.lastRetrieved)
? (legacyData.lastRetrieved as number)
: undefined,
data,
},
},
};
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For legacy flat external_files entries, restoreFiles() no longer requires data/dataURL to be present and sets sizeBytes to 0. If callers pass legacy file objects without the blob, you’ll end up with an external file that can’t be rendered or re-serialized correctly. Consider either keeping the legacy requirement (skip entries without data) or ensuring the blob is moved into filesData and sizeBytes is derived accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +18
CREATE TABLE external_file_revision_data (
revision_id TEXT PRIMARY KEY REFERENCES external_file_revisions(id) ON DELETE CASCADE,
data BLOB NOT NULL
) WITHOUT ROWID;

-- 2. Move existing data blobs into the new table
INSERT INTO external_file_revision_data (revision_id, data)
SELECT id, data FROM external_file_revisions;

-- 3. Recreate external_file_revisions without the data column
ALTER TABLE external_file_revisions RENAME TO _ext_revisions_old_v3000001;

Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

external_file_revision_data is created with a FK to external_file_revisions, but the migration later renames external_file_revisions. In SQLite, ALTER TABLE ... RENAME TO rewrites foreign-key references, so this FK will end up pointing at _ext_revisions_old_v3000001 and then the referenced table is dropped. Reorder the migration to create external_file_revision_data after the new external_file_revisions table exists (or recreate the data table) so the FK/cascade points at the final table.

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +32
ALTER TABLE external_file_revisions RENAME TO _ext_revisions_old_v3000001;

CREATE TABLE external_file_revisions (
id TEXT PRIMARY KEY,
file_id TEXT NOT NULL REFERENCES external_files(id) ON DELETE CASCADE,
size_bytes INTEGER NOT NULL DEFAULT 0,
checksum TEXT,
source_name TEXT,
mime_type TEXT NOT NULL,
message TEXT,
created INTEGER NOT NULL,
last_retrieved INTEGER
) WITHOUT ROWID;

CREATE INDEX idx_external_file_revisions_file ON external_file_revisions(file_id);

Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After ALTER TABLE external_file_revisions RENAME ..., the existing idx_external_file_revisions_file index from v3000001 will still exist (now attached to the renamed table) and will keep its name. Creating a new index with the same name on the new table will fail with “index already exists”. Drop the old index after the rename (or create the new index under a different name) before attempting to create the replacement index.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.2.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.2.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.2.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.2.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.7.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.6.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants