Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic: attempted to update last-writer in lockfile without the write lock #1324

Closed
tyler92 opened this issue Sep 7, 2022 · 5 comments · Fixed by #1351
Closed

Panic: attempted to update last-writer in lockfile without the write lock #1324

tyler92 opened this issue Sep 7, 2022 · 5 comments · Fixed by #1351

Comments

@tyler92
Copy link
Contributor

tyler92 commented Sep 7, 2022

For some reason, I had an incomplete layer (according to the log file) and the nearest Podman's PlayKube command led to panic in the storage module. A cropped part of the stack trace (see full in the attach):

Sep 07 06:24:54 host podman[468]: time="2022-09-07T06:24:54Z" level=warning msg="Found incomplete layer "b229cb7be9f40343b5938e9ea4c58cbe9245da1ebbda2bd0e1b4600b56036080", deleting it"
Sep 07 06:24:54 host podman[468]: time="2022-09-07T06:24:54Z" level=warning msg="Recovering from API service endpoint handler panic: attempted to update last-writer in lockfile without the write lock, goroutine 43 [running]:
github.com/containers/podman/v4/pkg/api/server.panicHandler.func1.1.1
	github.com/containers/podman/v4/pkg/api/server/handler_panic.go:22 +0x78
panic
	runtime/panic.go:1038 +0x23c
github.com/containers/storage/pkg/lockfile.(*lockfile).Touch
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/pkg/lockfile/lockfile_unix.go:247 +0x198
github.com/containers/storage.(*layerStore).Touch
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/layers.go:1897 +0x28
github.com/containers/storage.(*layerStore).saveLayers
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/layers.go:498 +0x1d4
github.com/containers/storage.(*layerStore).Load
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/layers.go:422 +0x534
github.com/containers/storage.(*store).newLayerStore
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/layers.go:559 +0x39c
github.com/containers/storage.(*store).LayerStore
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/store.go:913 +0x398
github.com/containers/storage.(*store).layersByMappedDigest
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/store.go:3030 +0x40
github.com/containers/storage.(*store).LayersByUncompressedDigest
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/store.go:3072 +0xf8
github.com/containers/image/v5/storage.(*storageImageSource).getBlobAndLayerID
	github.com/containers/image/v5@v5.21.2-0.20220617075545-929f14a56f5c/storage/storage_image.go:202 +0x70
github.com/containers/image/v5/storage.(*storageImageSource).GetBlob
	github.com/containers/image/v5@v5.21.2-0.20220617075545-929f14a56f5c/storage/storage_image.go:163 +0x98
github.com/containers/image/v5/internal/image.(*manifestOCI1).ConfigBlob
	github.com/containers/image/v5@v5.21.2-0.20220617075545-929f14a56f5c/internal/image/oci.go:66 +0x148
github.com/containers/image/v5/internal/image.(*manifestOCI1).OCIConfig
	github.com/containers/image/v5@v5.21.2-0.20220617075545-929f14a56f5c/internal/image/oci.go:92 +0xa8
github.com/containers/podman/v4/libpod.(*storageService).CreateContainerStorage
	github.com/containers/podman/v4/libpod/storage.go:92 +0x1e4
github.com/containers/podman/v4/libpod.(*Container).setupStorage
	github.com/containers/podman/v4/libpod/container_internal.go:506 +0x6e4
github.com/containers/podman/v4/libpod.(*Runtime).setupContainer
	github.com/containers/podman/v4/libpod/runtime_ctr.go:424 +0x1178
github.com/containers/podman/v4/libpod.(*Runtime).newContainer
	github.com/containers/podman/v4/libpod/runtime_ctr.go:236 +0x184
github.com/containers/podman/v4/libpod.(*Runtime).NewContainer
	github.com/containers/podman/v4/libpod/runtime_ctr.go:52 +0xf8
github.com/containers/podman/v4/pkg/specgen/generate.ExecuteCreate
	github.com/containers/podman/v4/pkg/specgen/generate/container_create.go:239 +0x5c
github.com/containers/podman/v4/pkg/specgen/generate.MakePod
	github.com/containers/podman/v4/pkg/specgen/generate/pod_create.go:66 +0x4c0
github.com/containers/podman/v4/pkg/domain/infra/abi.(*ContainerEngine).playKubePod
	github.com/containers/podman/v4/pkg/domain/infra/abi/play.go:464 +0x1820
github.com/containers/podman/v4/pkg/domain/infra/abi.(*ContainerEngine).PlayKube
	github.com/containers/podman/v4/pkg/domain/infra/abi/play.go:174 +0x12ec
github.com/containers/podman/v4/pkg/api/handlers/libpod.PlayKube

Full log with panic: podman-panic.log

It seems like the wrong condition - layers.go#L397 because Locked function may return true even if the lock is owned by other thread. So the lock may be released at any time after the Locked function returns true. Probably we should acquire reader lock in newLayerStore.

@rhatdan
Copy link
Member

rhatdan commented Sep 7, 2022

@nalind @giuseppe @vrothberg PTAL

@nalind
Copy link
Member

nalind commented Sep 7, 2022

We need to be taking either a read lock or a write lock before calling the new layer store's Load() method in newLayerStore(). At that point Locked() will return the correct value, as it would if we called LoadLocked() instead.

tyler92 added a commit to tyler92/storage that referenced this issue Sep 16, 2022
There was a poosibility to panic due to such behavour:
attempted to update last-writer in lockfile without the write lock

Fixes: containers#1324

Signed-off-by: Mikhail Khachayants <tyler92@inbox.ru>
tyler92 added a commit to tyler92/storage that referenced this issue Sep 16, 2022
There was a possibility to panic due to such behavior:
attempted to update last-writer in lockfile without the write lock

Fixes: containers#1324

Signed-off-by: Mikhail Khachayants <tyler92@inbox.ru>
@tyler92
Copy link
Contributor Author

tyler92 commented Sep 16, 2022

I created a PR but can't do a good enough check.

tyler92 added a commit to tyler92/storage that referenced this issue Sep 21, 2022
There was a possibility to panic due to such behavior:
attempted to update last-writer in lockfile without the write lock

Fixes: containers#1324

Signed-off-by: Mikhail Khachayants <tyler92@inbox.ru>
tyler92 added a commit to tyler92/storage that referenced this issue Sep 21, 2022
There was a possibility to panic due to such behavior:
attempted to update last-writer in lockfile without the write lock

Fixes: containers#1324

Signed-off-by: Mikhail Khachayants <tyler92@inbox.ru>
@mtrmac
Copy link
Collaborator

mtrmac commented Oct 14, 2022

I’m missing something about what has happened here. Looking at the backtrace, there are two goroutines waiting for graphLock:

github.com/containers/storage.(*store).LayerStore(0x240ae40)
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/store.go:890 +0x58

and that’s fine.

But there also two goroutines holding graphLock:

github.com/containers/storage.(*store).newLayerStore(0x240ae40, {0x25a24b0, 0x26}, {0x25a2510, 0x2e}, {0x13d99bc, 0x2990000})
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/layers.go:559 +0x39c
github.com/containers/storage.(*store).LayerStore(0x240ae40)
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/store.go:913 +0x398

and

github.com/containers/storage.(*store).getGraphDriver(0x240ae40)
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/store.go:866 +0xbc
github.com/containers/storage.(*store).LayerStore(0x240ae40)
	github.com/containers/storage@v1.41.1-0.20220616120034-7df64288ef35/store.go:900 +0x178

Does anyone know what is going on here?

(I don’t fully understand the operation of graphLock, but for now I’m focusing on that one simple mystery.)

@mtrmac
Copy link
Collaborator

mtrmac commented Oct 14, 2022

We need to be taking either a read lock or a write lock before calling the new layer store's Load() method in newLayerStore(). At that point Locked() will return the correct value

But Locked() had to return true even without owning the lock, otherwise we couldn’t get to the saveLayers code path. AFAICS that’s another problem, #1388 .

giuseppe pushed a commit to giuseppe/storage that referenced this issue Dec 22, 2022
There was a possibility to panic due to such behavior:
attempted to update last-writer in lockfile without the write lock

Fixes: containers#1324

Signed-off-by: Mikhail Khachayants <tyler92@inbox.ru>
(cherry picked from commit 2ad8a08b85dd11f2edc990513b12a05259ccec02)
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe pushed a commit to giuseppe/storage that referenced this issue Dec 22, 2022
There was a possibility to panic due to such behavior:
attempted to update last-writer in lockfile without the write lock

Fixes: containers#1324

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2153883

Signed-off-by: Mikhail Khachayants <tyler92@inbox.ru>
(cherry picked from commit 2ad8a08b85dd11f2edc990513b12a05259ccec02)
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe pushed a commit to giuseppe/storage that referenced this issue Dec 23, 2022
There was a possibility to panic due to such behavior:
attempted to update last-writer in lockfile without the write lock

Fixes: containers#1324

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2153883

Signed-off-by: Mikhail Khachayants <tyler92@inbox.ru>
(cherry picked from commit e35b061)
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants