Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blobcache: avoid an unnecessary NewImage() #2500

Merged
merged 1 commit into from Jul 31, 2020

Conversation

nalind
Copy link
Member

@nalind nalind commented Jul 30, 2020

What type of PR is this?

/kind bug

What this PR does / why we need it:

Avoid calling NewImage() on the source reference when the ImageSource that we already have returns nil from its LayerInfosForCopy() method.

For our container-as-image references, that causes a re-extraction of uncached blobs, recomputing their digests, rebuilding the config blobs and manifests.

The image library's copy.Image() function then asks the source reference that we're wrapping for blobs that we listed in the rebuilt manifest, and if any of those values differ, we fail to read them.

This would have only affected builds that specified a blob cache.

How to verify it

Build an image with buildah bud with debugging enabled, using the hidden --blob-cache option. Observe that before this fix is applied, for each layer we commit, the layers list is computed twice, temporary directories are created twice, and the config blobs and manifests are generated twice. With the fix applied, these things should only happen once.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

@nalind
Copy link
Member Author

nalind commented Jul 30, 2020

I believe that when the contents of those new layers change between the two computations (for reasons I haven't pinned down yet), we get the errors seen in https://bugzilla.redhat.com/show_bug.cgi?id=1720730.

@rhatdan
Copy link
Member

rhatdan commented Jul 30, 2020

LGTM
Will we get an increase in speed do to this fix?

@nalind
Copy link
Member Author

nalind commented Jul 30, 2020

If you're using a blob cache (which is not used by default by the buildah or podman CLI tools), then yes, contents of layers that are added during the build will be extracted once instead of twice. OpenShift, for example, uses blob caches, so it will benefit.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good catch!

@vrothberg
Copy link
Member

I restarted the in_podman test. Upload failed.

@rhatdan
Copy link
Member

rhatdan commented Jul 31, 2020

bors r+

@bors
Copy link
Contributor

bors bot commented Jul 31, 2020

👎 Rejected by PR status

@nalind
Copy link
Member Author

nalind commented Jul 31, 2020

bors retry

bors bot added a commit that referenced this pull request Jul 31, 2020
2500: blobcache: avoid an unnecessary NewImage() r=rhatdan a=nalind

#### What type of PR is this?

/kind bug

#### What this PR does / why we need it:

Avoid calling `NewImage()` on the source reference when the `ImageSource` that we already have returns `nil` from its `LayerInfosForCopy()` method.

For our container-as-image references, that causes a re-extraction of uncached blobs, recomputing their digests, rebuilding the config blobs and manifests.

The image library's `copy.Image()` function then asks the source reference that we're wrapping for blobs that we listed in the rebuilt manifest, and if any of those values differ, we fail to read them.

This would have only affected builds that specified a blob cache.

#### How to verify it

Build an image with `buildah bud` with debugging enabled, using the hidden `--blob-cache` option.  Observe that before this fix is applied, for each layer we commit, the layers list is computed twice, temporary directories are created twice, and the config blobs and manifests are generated twice.  With the fix applied, these things should only happen once.

#### Which issue(s) this PR fixes:

None

#### Special notes for your reviewer:

#### Does this PR introduce a user-facing change?

```
None
```

Co-authored-by: Nalin Dahyabhai <nalin@redhat.com>
Avoid calling NewImage() on the source reference when the ImageSource
that we already have returns nil from its LayerInfosForCopy() method.

For our container-as-image references, that causes a re-extraction of
uncached blobs, recomputing their digests, rebuilding the config blobs
and manifests.

The image library's copy.Image() function then asks the source reference
that we're wrapping for blobs that we listed in the rebuilt manifest,
and if any of those values differ, we fail to read them.

This would have only affected builds that specified a blob cache.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
@@ -275,12 +275,11 @@ func (s *blobCacheSource) LayerInfosForCopy(ctx context.Context, instanceDigest
return nil, errors.Wrapf(err, "error getting layer infos for copying image %q through cache", transports.ImageName(s.reference))
}
if infos == nil {
image, err := s.reference.NewImage(ctx, &s.sys)
img, err := image.FromUnparsedImage(ctx, &s.sys, image.UnparsedInstance(s.source, nil))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This nil should be instanceDigest, right?

@nalind nalind force-pushed the blobcache-duplicate-mismatch branch from 35684f7 to 289a250 Compare July 31, 2020 14:08
@bors
Copy link
Contributor

bors bot commented Jul 31, 2020

Canceled.

@TomSweeneyRedHat
Copy link
Member

LGTM
bors got into a funky "Cancelled" state. Maybe you knocked it off the rails with your submit?

@TomSweeneyRedHat
Copy link
Member

bors retry

1 similar comment
@nalind
Copy link
Member Author

nalind commented Jul 31, 2020

bors retry

@bors
Copy link
Contributor

bors bot commented Jul 31, 2020

Already running a review

@bors
Copy link
Contributor

bors bot commented Jul 31, 2020

Build succeeded:

@bors bors bot merged commit 29f4d01 into containers:master Jul 31, 2020
@nalind nalind deleted the blobcache-duplicate-mismatch branch July 31, 2020 16:27
bors bot added a commit that referenced this pull request Aug 1, 2020
2501: [release-1.15] blobcache: avoid an unnecessary NewImage() r=rhatdan a=nalind

#### What type of PR is this?

/kind bug

#### What this PR does / why we need it:

Avoid calling `NewImage()` on the source reference when the `ImageSource` that we already have returns `nil` from its `LayerInfosForCopy()` method.

For our container-as-image references, that causes a re-extraction of uncached blobs, recomputing their digests, rebuilding the config blobs and manifests.

The image library's `copy.Image()` function then asks the source reference that we're wrapping for blobs that we listed in the rebuilt manifest, and if any of those values differ, we fail to read them.

This would have only affected builds that specified a blob cache.

#### How to verify it

Build an image with `buildah bud` with debugging enabled, using the hidden `--blob-cache` option.  Observe that before this fix is applied, for each layer we commit, the layers list is computed twice, temporary directories are created twice, and the config blobs and manifests are generated twice.  With the fix applied, these things should only happen once.

#### Which issue(s) this PR fixes:

None

#### Special notes for your reviewer:

Cherry-picked from #2500.

#### Does this PR introduce a user-facing change?

```
None
```

Co-authored-by: Nalin Dahyabhai <nalin@redhat.com>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants