Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"podman manifest add" is not concurrent safe #14667

Closed
Romain-Geissler-1A opened this issue Jun 20, 2022 · 4 comments · Fixed by containers/common#1090
Closed

"podman manifest add" is not concurrent safe #14667

Romain-Geissler-1A opened this issue Jun 20, 2022 · 4 comments · Fixed by containers/common#1090
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@Romain-Geissler-1A
Copy link
Contributor

Romain-Geissler-1A commented Jun 20, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

The command "podman manifest add" is not concurrent safe. If you try to add different images to the same manifest in parallel, sometimes one images is "lost".

Steps to reproduce the issue:

Reproduced from inside a container quay.io/podman/upstream started with --privileged mode

  1. Create some tags my.registry/my-namespace/my-image:aarch64 and my.registry/my-namespace/my-image:x86_64 of dummy images using 2 distincts architectures:
[root@5c9771d2427c /]# podman pull --arch aarch64 fedora
Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.fedoraproject.org/fedora:latest...
Getting image source signatures
Copying blob e0878b843331 done
Copying config 5beb352982 done
Writing manifest to image destination
Storing signatures
WARNING: image platform (linux/arm64) does not match the expected platform (linux/aarch64)
5beb352982d939db3b957c8fd16da9d24bbd69f1d796169c5ddb649b3a03cf91


[root@5c9771d2427c /]# podman pull --arch x86_64 fedora
Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.fedoraproject.org/fedora:latest...
Getting image source signatures
Copying blob 75f075168a24 done
Copying config 3a66698e60 done
Writing manifest to image destination
Storing signatures
WARNING: image platform (linux/amd64) does not match the expected platform (linux/x86_64)
3a66698e604003f7822a0c73e9da50e090fda9a99fe1f2e1e2e7fe796cc803d5


[root@5c9771d2427c /]# podman tag 5beb352982d939db3b957c8fd16da9d24bbd69f1d796169c5ddb649b3a03cf91 my.registry/my-namespace/my-image:aarch64
[root@5c9771d2427c /]# podman tag 3a66698e604003f7822a0c73e9da50e090fda9a99fe1f2e1e2e7fe796cc803d5 my.registry/my-namespace/my-image:x86_64
  1. In a for loop, create 100 manifests where these two images will be added:

[root@5c9771d2427c /]# for i in {1..100}; do podman manifest create my-manifest-$i; podman manifest add my-manifest-$i containers-storage:my.registry/my-namespace/my-image:aarch64 & podman manifest add my-manifest-$i containers-storage:my.registry/my-namespace/my-image:x86_64 & done

  1. Wait some time till the previous commands are finished.

  2. Now inspect the manifests, some contain two images, some others just one:

[root@5c9771d2427c /]# for i in {1..100}; do podman manifest inspect my-manifest-$i | jq '.manifests | length'; done
2
1
1
1
2
2
2
2
2
2
2
...

Describe the results you received:

Manifests are created with "random" number of images.

Describe the results you expected:

Manifest should always contain 2 images.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

[root@5c9771d2427c /]# podman version
Client:       Podman Engine
Version:      4.2.0-dev
API Version:  4.2.0-dev
Go Version:   go1.18.3
Built:        Thu Jan  1 00:00:00 1970
OS/Arch:      linux/amd64```
@openshift-ci openshift-ci bot added kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. labels Jun 20, 2022
@flouthoc
Copy link
Collaborator

Hi @Romain-Geissler-1A, Thanks for creating the issue.

I have not played with the patch yet but i think manifest list API needs a lock, buildah has a similar code for commit where it gets the locker for manifest list and locks while adding image to list I think we need to do same there here but I have a inclination to do this at libimage end it self. @mtrmac @vrothberg any issues if we do locking at libimage end instead of calling end ( but it would be breaking patch ) ?

Caution: I have not played with patch yet but something like this should work

diff --git a/libpod/runtime.go b/libpod/runtime.go
index 6c8a99846..d7a0be55d 100644
--- a/libpod/runtime.go
+++ b/libpod/runtime.go
@@ -86,7 +86,6 @@ type Runtime struct {
        libimageRuntime        *libimage.Runtime
        libimageEventsShutdown chan bool
        lockManager            lock.Manager
-
        // Worker
        workerChannel chan func()
        workerGroup   sync.WaitGroup
@@ -1040,6 +1039,11 @@ func (r *Runtime) configureStore() error {
        return nil
 }
 
+// Libpod return store
+func (r *Runtime) GetStore() storage.Store {
+       return r.store
+}
+
 // LibimageRuntime ... to allow for a step-by-step migration to libimage.
 func (r *Runtime) LibimageRuntime() *libimage.Runtime {
        return r.libimageRuntime
diff --git a/pkg/domain/infra/abi/manifest.go b/pkg/domain/infra/abi/manifest.go
index 8b52c335c..30640a589 100644
--- a/pkg/domain/infra/abi/manifest.go
+++ b/pkg/domain/infra/abi/manifest.go
@@ -10,6 +10,7 @@ import (
        "github.com/containers/common/libimage"
        cp "github.com/containers/image/v5/copy"
        "github.com/containers/image/v5/manifest"
+       "github.com/containers/common/libimage/manifests"
        "github.com/containers/image/v5/pkg/shortnames"
        "github.com/containers/image/v5/transports"
        "github.com/containers/image/v5/transports/alltransports"
@@ -185,6 +186,12 @@ func (ir *ImageEngine) ManifestAdd(ctx context.Context, name string, images []st
        if err != nil {
                return "", err
        }
+       locker, err := manifests.LockerForImage(ir.Libpod.GetStore(), manifestList.ID())
+       if err != nil {
+               return "", err
+       }
+       locker.Lock()
+       defer locker.Unlock()
 
        addOptions := &libimage.ManifestListAddOptions{
                All:                   opts.All,

or ( this one is what I would prefer but it would require all calle to stop locking at higher level, breaking PATCH )

diff --git a/vendor/github.com/containers/common/libimage/manifest_list.go b/vendor/github.com/containers/common/libimage/manifest_list.go
index 4e8959004..db0a8b629 100644
--- a/vendor/github.com/containers/common/libimage/manifest_list.go
+++ b/vendor/github.com/containers/common/libimage/manifest_list.go
@@ -221,6 +221,12 @@ type ManifestListAddOptions struct {
 // Add adds one or more manifests to the manifest list and returns the digest
 // of the added instance.
 func (m *ManifestList) Add(ctx context.Context, name string, options *ManifestListAddOptions) (digest.Digest, error) {
+       locker, err := manifests.LockerForImage(m.image.runtime.store, m.ID())
+       if err != nil {
+               return "", err
+       }
+       locker.Lock()
+       defer locker.Unlock()
        if options == nil {
                options = &ManifestListAddOptions{}
        }

@mheon mheon removed the kind/feature Categorizes issue or PR as related to a new feature. label Jun 21, 2022
@vrothberg
Copy link
Member

@mtrmac @vrothberg any issues if we do locking at libimage end instead of calling end ( but it would be breaking patch ) ?

Ideally lbimage should take care of that. If all callers of an API have to lock, it smells like something's missing.

flouthoc added a commit to flouthoc/common that referenced this issue Jul 14, 2022
`podman manifest add` uses `ManifestList.Add(` but of now `Add(` does
not locks while adding instances to the list thus causing race scenarios
where storage is not reloaded and overrided by another invocation of the
command.

Following problem is solved in two steps

* Add -> LockByInstance: Acquire a fs lock by instance ID so other
  invocation waits until this invocation completes its write.
* Add -> LockByInstance -> reload: Reload instance digests from storage
  just after acquiring lock to make sure we are not overriding any just
written instance.

Reproducer: containers/podman#14667 (comment)

Closes: containers/podman#14667

[NO NEW TESTS NEEDED]
[NO TESTS NEEDED]
This needes integration tests so its hard to verify race in CI.

Signed-off-by: Aditya R <arajan@redhat.com>
@flouthoc
Copy link
Collaborator

flouthoc commented Jul 14, 2022

Created a PR and verified manually in various sample space by reproducer shared here: #14667 (comment)

[fl@fedora bin]$ ./verify.sh 
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2

flouthoc added a commit to flouthoc/common that referenced this issue Jul 14, 2022
`podman manifest add` uses `ManifestList.Add(` but of now `Add(` does
not locks while adding instances to the list thus causing race scenarios
where storage is not reloaded and overrided by another invocation of the
command.

Following problem is solved in two steps

* Add -> LockByInstance: Acquire a fs lock by instance ID so other
  invocation waits until this invocation completes its write.
* Add -> LockByInstance -> reload: Reload instance digests from storage
  just after acquiring lock to make sure we are not overriding any just
written instance.

Reproducer: containers/podman#14667 (comment)

Closes: containers/podman#14667

[NO NEW TESTS NEEDED]
[NO TESTS NEEDED]
This needes integration tests so its hard to verify race in CI.

Signed-off-by: Aditya R <arajan@redhat.com>
@flouthoc
Copy link
Collaborator

@Romain-Geissler-1A Above PR in c/common closes this issue: containers/common#1090 feel free to verify the patch :) Thanks

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants