Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy: temporarily disable early commits #1205

Merged
merged 1 commit into from Apr 19, 2021

Conversation

vrothberg
Copy link
Member

@vrothberg vrothberg commented Apr 19, 2021

Temporarily disable committing blobs early to fix a bug related to
mistakenly committing empty layes [1]. The plan is to reeanble the
feature when a solid solution for passing down the layer information
is found.

[1] #1205 (comment)

vrothberg added a commit to vrothberg/buildah that referenced this pull request Apr 19, 2021
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
@vrothberg
Copy link
Member Author

Please don't merge before containers/buildah#3156 is green. I want to make sure it's passing the buildah bud tests which exercise the changed code a lot.

@vrothberg
Copy link
Member Author

@rhatdan @giuseppe @lsm5 @nalind PTAL

@rhatdan
Copy link
Member

rhatdan commented Apr 19, 2021

LGTM

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAK at this point. This might be the right fix but it isn’t obvious to me.

At a first glance I don’t think this is right at all; the current c/image/storage code is filtering out empty layers everywhere. At the very least, the rest of the transport needs to be adjusted to be consistent; see all the other references to Throwaway and EmptyLayer.


More importantly, there’s a history to the consistency check of buildah-expected vs. actual layer count; not physically storing empty layers has been an intentional decision the last time this inconsistency happened.

Compare https://github.com/containers/buildah/blob/c7ed4b60f4f298cf2c0ba56dfa1c0c566923109d/image.go#L111

Could you write down exactly how the current implementation fails (@rhatdan please file bugs in GitHub next time, so that there is something to refer to) and how the Buildah and c/image assumptions diverge, please?

@vrothberg
Copy link
Member Author

vrothberg commented Apr 19, 2021

Could you write down exactly how the current implementation fails (@rhatdan please file bugs in GitHub next time, so that there is something to refer to) and how the Buildah and c/image assumptions diverge, please?

I don't think it's specific to Buildah. The error can easily be reproduced with Podman as well.

podman (master) $ ./bin/podman image tree pause
Image ID: 350b164e7ae1
Tags:     [k8s.gcr.io/pause:latest]
Size:     247.1kB
Image Layers
├──  ID: 5f70bf18a086 Size: 1.024kB
├──  ID: 9663e2bcfcdc Size: 241.7kB
└──  ID: 053b98296fa9 Size: 1.024kB Top Layer of: [k8s.gcr.io/pause:latest]

The layers made it to the storage before commit 538c2f8. Could it be that 538c2f8#diff-5f5309b6d0cce8cb28452c98db55efdd9ccc3386ce6c86b5e54738498240b472R718 should not happen?

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 19, 2021

Could it be that 538c2f8#diff-5f5309b6d0cce8cb28452c98db55efdd9ccc3386ce6c86b5e54738498240b472R718 should not happen?

Yes, that looks much closer to the cause: the original image has two GzippedEmptyLayerDigest layers:

$ skopeo inspect --raw docker://k8s.gcr.io/pause

    "layers": [
        {
            "digest": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
            "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
            "size": 32
        },
        {
            "digest": "sha256:4964c72cd0245a7f77da38425dc98b472b2699ba6c49d5a9221fb32b972bc06b",
            "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
            "size": 69631
        },
        {
            "digest": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
            "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
            "size": 32
        }

but they are not marked as EmptyLayer, and are present in DiffIDs:

$ skopeo inspect --config --raw docker://k8s.gcr.io/pause | python -mjson.tool

    "history": [
        {
            "created": "2013-06-13T14:03:50.821769-07:00"
        },
        {
            "created": "2014-07-19T07:02:32.221484321Z",
            "created_by": "/bin/sh"
        },
        {
            "created": "2014-07-19T07:02:32.267701596Z",
            "created_by": "/bin/sh"
        }
    ],

    "rootfs": {
        "diff_ids": [
            "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef",
            "sha256:e16a89738269fec22db26ec6362823a9ec42d0163685d88ba03c4fb5d5e723f6",
            "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef"
        ],
        "type": "layers"
    }

So ideally we should propagate manifest.Manifest.LayerInfos()[].EmptyLayer from the copy code all the way through to PutBlobWithOptions / TryReusingBlobWithOptions. Or maybe change the c/storage model so that EmptyLayers or layers with GzippedEmptyLayerDigest are not physically stored — but that would break the DiffID count check linked above on c/storage vs. c/image version mismatches.

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 19, 2021

For the record, the original problem report:

$ buildah pull  k8s.gcr.io/pause

$ podman save --format oci-archive k8s.gcr.io/pause >/tmp/test.tar
Error: unable to save "k8s.gcr.io/pause": error copying image to the 
remote destination: error creating LayerInfosForCopy of image 
"350b164e7ae1dcddeffadd65c76226c9b6dc5553f5179153fb0e36b78f2a5e06": 
expected more than 2 physical layers to exist

fails with c/image 5.11.0

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 19, 2021

Or maybe change the c/storage model so that EmptyLayers or layers with GzippedEmptyLayerDigest are not physically stored

…that wouldn’t work correctly for EmptyLayers with non-GzippedEmptyLayerDigest digest, because the same early commit code woudn’t know about the EmptyLayer flag. Such layers are probably very rare, but it just seems easier to do the fully correct thing and propagate EmptyLayer from the source. (This should be possible to do without much API breakage, because c/image/copy doesn’t use a generic types.Image implementation, but specifically image.FromUnparsedImage; so we can define a new non-interface type that provides the new data, and keep types.Image unchanged.)

@vrothberg
Copy link
Member Author

@mtrmac, to make sure I understood correctly. We should fetch the source manifest in copy/copy.go and pass the manifest.LayerInfo down to the ...WithOptions funcs?

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 19, 2021

copy/copy.go is already reading the source manifest in order to obtain LayerInfos, to know what blobs to copy in the first place. And that, in turn, is already calling manifest.Manifest.LayerInfos, and possibly reading the config, which is the authoritative source for the EmptyLayer property of layers. The correct “empty layer” boolean for each layer already exists in memory.

So I was thinking, very roughly:

  • Extend image.genericManifest to provide a LayerInfosWithEmptyLayer or something like that (actually simplifies code from the current LayerInfos, which gets the data and drops it).
  • Have image.FromUnparsedImage (or a new function??) return a sourcedImage (now a public type??) that provides the LayerInfosWithEmptyLayer
  • Augment c/storage (using the existing ImageDestinationWithOptions) to get a similar LayerInfosForCopyWithEmptyLayer
  • Read this in copy.copyLayers
  • Given all that, the “empty layer” boolean would be available to copy.copyLayer, and could be passed down to the WithOptions function (either as a part of the LayerInfo or separately, I don’t know).

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 19, 2021

possibly reading the config

To be more precise, actually empty layers are only allowed to exist as physical blobs in schema1 manifests, so we don’t read the config blob for this purpose.

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 19, 2021

  • Augment c/storage (using the existing ImageDestinationWithOptions) to get a similar LayerInfosForCopyWithEmptyLayer

Oops, the caller can submit an arbitrary ImageSource implementation. Ugh. Maybe it would be fine to read the EmptyLayer boolean just from the sourcedImage and not let LayerInfosForCopy override it, I don’t know.

@vrothberg
Copy link
Member Author

I begin to think that reverting commit 538c2f8 would make sense as a quick fix (which we need). That buys us time to come up with a solid solution.

@rhatdan @mtrmac WDYT?

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 19, 2021

If you intend to get this working fairly quickly, maybe not outright revert it, but just make it ineffective by using (options.LayerIndex == nil || true), so that we minimize other merge conflicts? I don’t have a strong opinion, I’m just noting this possibility.

Temporarily disable committing blobs early to fix a bug related to
mistakenly committing empty layes [1].  The plan is to reeanble the
feature when a solid solution for passing down the layer information
is found.

[1] containers#1205 (comment)

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
@vrothberg vrothberg changed the title storage: commit empty non-base layers copy: temporarily disable early commits Apr 19, 2021
@vrothberg
Copy link
Member Author

If you intend to get this working fairly quickly, maybe not outright revert it, but just make it ineffective by using (options.LayerIndex == nil || true), so that we minimize other merge conflicts? I don’t have a strong opinion, I’m just noting this possibility.

Disabled it with a constant. That should buy us some time.

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 19, 2021

ACK.

@rhatdan
Copy link
Member

rhatdan commented Apr 19, 2021

LGTM

@rhatdan rhatdan merged commit 6805236 into containers:master Apr 19, 2021
@vrothberg vrothberg deleted the fix-commit branch June 2, 2021 09:26
@vrothberg
Copy link
Member Author

FYI: I just got back to the issue. Working on libimage and polishing Podman v3.2 consumed most of my time since then but I have cycles now.

vrothberg added a commit to vrothberg/image that referenced this pull request Jun 2, 2021
The early committing of blobs has been temporarily disabled in containers#1205.
There was a regression in deciding whether an empty layer may be
committed to the storage or not [1].

Initially, the early committing considerred a layer to empty (and hence
not commit it to storage) when it matched the shasum of an empty GZIP
layer.  As it turned out, that is not a reliable indicator since some
images may still include these layers in the diff IDs [2].

Re-enable the early committing of blobs by passing down the emptiness
information down by querrying the image's manifest.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>

[1] containers#1205 (comment)
[2] containers#1205 (comment)
vrothberg added a commit to vrothberg/image that referenced this pull request Jun 2, 2021
The early committing of blobs has been temporarily disabled in containers#1205.
There was a regression in deciding whether an empty layer may be
committed to the storage or not [1].

Initially, the early committing considerred a layer to empty (and hence
not commit it to storage) when it matched the shasum of an empty GZIP
layer.  As it turned out, that is not a reliable indicator since some
images may still include these layers in the diff IDs [2].

Re-enable the early committing of blobs by passing down the emptiness
information down by querrying the image's manifest.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>

[1] containers#1205 (comment)
[2] containers#1205 (comment)

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
vrothberg added a commit to vrothberg/image that referenced this pull request Jun 2, 2021
The early committing of blobs has been temporarily disabled in containers#1205.
There was a regression in deciding whether an empty layer may be
committed to the storage or not [1].

Initially, the early committing considerred a layer to empty (and hence
not commit it to storage) when it matched the shasum of an empty GZIP
layer.  As it turned out, that is not a reliable indicator since some
images may still include these layers in the diff IDs [2].

Re-enable the early committing of blobs by passing down the emptiness
information down by querrying the image's manifest.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>

[1] containers#1205 (comment)
[2] containers#1205 (comment)

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
vrothberg added a commit to vrothberg/image that referenced this pull request Jun 3, 2021
The early committing of blobs has been temporarily disabled in containers#1205.
There was a regression in deciding whether an empty layer may be
committed to the storage or not [1].

Initially, the early committing considerred a layer to empty (and hence
not commit it to storage) when it matched the shasum of an empty GZIP
layer.  As it turned out, that is not a reliable indicator since some
images may still include these layers in the diff IDs [2].

Re-enable the early committing of blobs by passing down the emptiness
information down by querrying the image's manifest.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>

[1] containers#1205 (comment)
[2] containers#1205 (comment)

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants