Skip to content

Dev#226

Merged
jorgedanisc merged 4 commits intomainfrom
dev
Mar 19, 2026
Merged

Dev#226
jorgedanisc merged 4 commits intomainfrom
dev

Conversation

@jorgedanisc
Copy link
Copy Markdown
Contributor

No description provided.

@jorgedanisc jorgedanisc self-assigned this Mar 19, 2026
Copilot AI review requested due to automatic review settings March 19, 2026 23:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the DUC schema and supporting libraries to separate external-file revision blobs from revision metadata (enabling metadata-only reads/lazy loading), while also removing pruning-related fields and refreshing README asset links.

Changes:

  • Bump schema to 3000002 and split external_file_revisions.data into a new external_file_revision_data table.
  • Update Rust/JS/Python types, parsing, and serialization to use a separate filesData/external_files_data blob map keyed by revision id.
  • Remove pruning-level / last-pruned fields across schema + SDKs, and switch README images to jsDelivr CDN.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
schema/version_control.sql Removes last_pruned column from version_graph definition.
schema/migrations/3000001_to_3000002.sql Migration adds external_file_revision_data and recreates external_file_revisions without blob column.
schema/duc.sql Bumps user_version to 3000002; updates external file tables; removes pruning_level from global state.
packages/ducsvg/README.md Switches logo URL to jsDelivr CDN.
packages/ducrs/src/types.rs Removes pruning types/fields; introduces revision metadata type + ExternalFileLoaded; adds filesData to exported state.
packages/ducrs/src/serialize.rs Writes external file metadata + separate blob table; removes pruning/global-state fields from SQL writes.
packages/ducrs/src/parse.rs Reads external file blobs into separate map; updates get_external_file to return metadata + blob map.
packages/ducrs/src/lib.rs Updates tests to account for external_files_data and get_external_file return type.
packages/ducrs/src/api/version_control.rs Removes last_pruned reads and pruning write APIs.
packages/ducrs/README.md Switches logo URL to jsDelivr CDN.
packages/ducpy/src/tests/src/test_pdf_image_elements.py Adjusts SQL test inserts to new external file blob table.
packages/ducpy/src/tests/src/test_create_image_element_with_external_file.py Adjusts SQL inserts/selects to join external_file_revision_data.
packages/ducpy/src/tests/src/test_a_duc_with_everything.py Updates global state insert + external file blob inserts for new schema.
packages/ducpy/src/tests/src/test_CSPMDS_local_and_global_states.py Removes pruning_level usage in global state SQL insert.
packages/ducpy/src/ducpy/serialize.py Emits files (meta) + filesData (blobs) payload split for Rust serializer.
packages/ducpy/src/ducpy/enums.py Removes PRUNING_LEVEL enum.
packages/ducpy/src/ducpy/classes/DataStateClass.py Removes pruning fields; adds files_data to exported state.
packages/ducpy/src/ducpy/builders/state_builders.py Removes pruning builder APIs; attaches _data_blobs for split blob serialization.
packages/ducpy/docs/index.rst Switches logo URL to jsDelivr CDN.
packages/ducpy/README.md Switches logo URL to jsDelivr CDN.
packages/ducpdf/src/duc2pdf/src/utils/svg_to_pdf.rs Updates SVG conversion helper to accept active revision bytes explicitly.
packages/ducpdf/src/duc2pdf/src/streaming/stream_resources.rs Updates resource processing to require separate revision bytes rather than revision-embedded blobs.
packages/ducpdf/src/duc2pdf/src/builder.rs Passes active revision bytes from external_files_data into resource processing.
packages/ducpdf/src/duc2pdf/README.md Switches logo URL to jsDelivr CDN.
packages/ducpdf/README.md Switches logo URL to jsDelivr CDN.
packages/ducjs/src/utils/state/index.ts Removes default pruningLevel from global state.
packages/ducjs/src/utils/normalize.ts Removes unused grid normalization helpers.
packages/ducjs/src/utils/constants.ts Removes unused DEFAULT_GRID_STEP.
packages/ducjs/src/types/index.ts Introduces filesData + revision-meta types; removes pruning types/fields.
packages/ducjs/src/transform.ts Stops normalizing pruningLevel for Rust.
packages/ducjs/src/restore/restoreDataState.ts Restores file metadata separately from file blobs via filesData; removes pruning restore.
packages/ducjs/src/parse.ts Initializes filesData in lazy parse path.
packages/ducjs/src/lazy-files.ts Returns ExternalFileLoaded from WASM; exposes active revision bytes via ResolvedFileData; adds export/merge for filesData.
packages/ducjs/src/enums.ts Removes PRUNING_LEVEL enum.
packages/ducjs/README.md Switches logo URL to jsDelivr CDN.
README.md Switches logo URL to jsDelivr CDN.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 471 to +487
@@ -479,36 +480,34 @@ impl DucToPdfBuilder {
Some(rev) => rev.mime_type.clone(),
None => continue,
};

let rev_data = files_data
.as_ref()
.and_then(|d| d.get(&file.active_revision_id))
.map(|b| b.as_ref());
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files_data is cloned from self.context.exported_data.external_files_data before the loop. Since this map can contain large blobs, the clone can double peak memory usage for big documents. Prefer borrowing (as_ref()) and looking up blobs by reference (or cloning only the single active blob you need per file).

Copilot uses AI. Check for mistakes.
Comment on lines 116 to +143
def _convert_external_files(
entries: Optional[Any],
) -> Optional[Dict[str, Any]]:
"""Convert DucExternalFile dict (or legacy list) to camelCase dicts for serialization."""
) -> Tuple[Optional[Dict[str, Any]], Optional[Dict[str, bytes]]]:
"""Convert DucExternalFile dict (or legacy list) to camelCase dicts for serialization.

Returns ``(files_meta, files_data)`` where *files_meta* contains revision
metadata (no blobs) and *files_data* maps revision-id → raw bytes for the
separate ``filesData`` key expected by the Rust serializer.
"""
if not entries:
return None
return None, None

data_blobs: Dict[str, bytes] = {}

def _extract_blobs(entry: Any) -> None:
blobs = getattr(entry, "_data_blobs", None)
if blobs:
data_blobs.update(blobs)

# New format: already a dict mapping id → DucExternalFile
if isinstance(entries, dict):
result: dict = {}
for key, value in entries.items():
_extract_blobs(value)
if is_dataclass(value):
value = asdict(value)
result[key] = deep_snake_to_camel(value) if isinstance(value, dict) else value
return result if result else None
return (result if result else None), (data_blobs if data_blobs else None)
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_convert_external_files() only extracts binary blobs from a private _data_blobs attribute. If callers pass external files as plain dicts (or legacy objects) with inline revision data/dataURL, those bytes will be silently dropped because the Rust structs no longer include data in revision metadata. Consider also extracting inline revision data from dict inputs (and removing it from the metadata payload), or raising an explicit error when blobs are present but no filesData can be produced.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
-- Migration: 3000001 → 3000002
-- Splits binary data out of `external_file_revisions` into a dedicated
-- `external_file_revision_data` table so metadata can be read without
-- loading heavy blobs.

Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration bumps user_version to 3000002 but only modifies the external file tables. Since the base schema for 3000002 also removes duc_global_state.pruning_level and version_graph.last_pruned, upgrading from 3000001 will leave those columns in place, making the on-disk schema dependent on upgrade path. Consider updating this migration to also drop/recreate those tables/columns (or keep them in the base schema if intentional).

Copilot uses AI. Check for mistakes.
Comment on lines +1163 to +1167
// Write data blobs from the separate filesData map
if let Some(data_map) = files_data {
for (rev_id, blob) in data_map {
data_stmt.execute(params![rev_id, blob.as_ref()])?;
}
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files_data is written wholesale into external_file_revision_data without verifying that each blob corresponds to an existing revision (and FK checks are disabled for the whole export). This can produce orphaned data rows or revisions with no data rows, depending on the caller, yielding an internally inconsistent .duc. Consider (a) filtering files_data to only revision IDs present in files, and (b) erroring (or at least warning) when an active revision is missing a blob.

Copilot uses AI. Check for mistakes.
Comment on lines +343 to +353
for (const revKey in rawRevisions) {
if (!Object.prototype.hasOwnProperty.call(rawRevisions, revKey)) continue;
if (result[revKey]) continue; // filesData takes precedence
const rev = rawRevisions[revKey];
if (!rev || typeof rev !== "object") continue;
const r = rev as Record<string, unknown>;
const dataSource = r.data ?? (r as any).dataURL;
const data = isValidUint8Array(dataSource);
if (data) {
result[revKey] = data;
}
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restoreFilesData() uses revKey (the revisions map key) as the key into filesData, but the canonical key for blobs is the revision id (r.id). If the map key differs from r.id (or if legacy exports key by something else), the restored blobs won't be found by consumers that index by revision id (e.g., activeRevisionId). Use the validated revId/r.id for both the result[...] key and the precedence check.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

📋 Release Preview

Package Current Next Tests
ducrs 3.1.1 3.2.0 ✅ Passed
ducjs 3.1.1 3.2.0 ✅ Passed
ducpy 3.1.1 3.2.0 ✅ Passed
ducpdf 3.6.1 3.7.0 ✅ Passed
ducsvg 3.5.1 3.6.0 ✅ Passed
duc2pdf 3.1.1 3.2.0 ✅ Passed

@jorgedanisc jorgedanisc merged commit 4eafe52 into main Mar 19, 2026
7 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.2.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.2.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.2.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.2.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.7.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.6.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants