Skip to content

ostree-ext: Match on diff_ids as well as layer digest on import#2081

Open
alexlarsson wants to merge 1 commit intobootc-dev:mainfrom
alexlarsson:pull-by-diff-id
Open

ostree-ext: Match on diff_ids as well as layer digest on import#2081
alexlarsson wants to merge 1 commit intobootc-dev:mainfrom
alexlarsson:pull-by-diff-id

Conversation

@alexlarsson
Copy link
Contributor

Layer digest can vary for a layer due to e.g. recompression, so at the start of an import we enumerate all cached images and make a diff_id to layer digest map. Then when pulling an image if the layer digest ref doesn't exist, we try to lookup in the map and use that if it exists.

We also create new refs for all such reused layer commits.

This fixes #2078

Assisted-by: Claude Code (Sonnet 4.5)

Layer digest can vary for a layer due to e.g. recompression, so at the
start of an import we enumerate all cached images and make a diff_id to
layer digest map. Then when pulling an image if the layer digest ref
doesn't exist, we try to lookup in the map and use that if it exists.

We also create new refs for all such reused layer commits.

This fixes bootc-dev#2078

Assisted-by: Claude Code (Sonnet 4.5)
Signed-off-by: Alexander Larsson <alexl@redhat.com>
@alexlarsson
Copy link
Contributor Author

This is an alternative to #2080

@github-actions github-actions bot added the area/ostree Issues related to ostree label Mar 19, 2026
@bootc-bot bootc-bot bot requested a review from gursewak1997 March 19, 2026 14:38
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a valuable layer deduplication mechanism by mapping "diff_id"s to blob digests. This enhancement allows for more efficient reuse of existing layers, even when their digests change due to factors like recompression. The integration of this logic into the ImageImporter struct and its associated methods, such as query_layer and ensure_ref_for_layer, is well-executed. This change significantly improves storage management and import efficiency.

Comment on lines +787 to +788
map.entry(diff_id_str)
.or_insert_with(|| layer_desc.digest().to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The build_diffid_to_digest_map function uses or_insert_with to store only the first encountered blob_digest for a given diff_id. While this approach is valid for deduplication (as any valid digest for the content is sufficient), it might be beneficial to add a comment explaining this design choice. This clarifies that if multiple images have the same diff_id but different blob_digests (e.g., due to varying compression), only one of them will be used for future lookups. This is consistent with the PR description's goal of handling recompression.

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be relatively easy to generate a test for in tests/it/main.rs which fetches content from OCI dirs where we can easily control this (1 layer gzip compressed, anotehr one zstd compressed) etc.

// Try to find a layer with the same diff_id but different blob digest
if let Some(existing_digest) = self.find_digest_by_diffid(manifest, config, layer) {
let existing_ref = ref_for_blob_digest(existing_digest)?;
if let Some(existing_commit) = self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty deep nesting here, I think we could combinator chain this to reduce nesting with .or_else(|| find_digest_by_diffid()) or so - we'd probably need to Ok wrap the first case and also .transpose()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ostree Issues related to ostree

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ostree-container: Add diffid as metadata to refs

2 participants