[BUG] killing soci-snapshotter-grpc while a container is running requires manual cleanup #275

sparr · 2023-01-03T19:18:23Z

Describe the bug
When the soci process is killed while a container is running, some of the mounts and metadata are left in a problematic state.

Steps To Reproduce

soci-snapshotter-grpc
ctr run --snapshotter soci [...]
killall soci-snapshotter-grpc

Expected behavior
All state (running processes, files, container and image metadata in the data store and registry, etc) is left in a state allowing the snapshotter and container to be started again without manual intervention.

Additional context
Problem statement is non-specific because I did not take sufficient notes when encountering this problem while troubleshooting a more pressing issue. Investigating will be a necessary step of resolving this issue.

The text was updated successfully, but these errors were encountered:

rdpsin · 2023-01-03T21:00:22Z

The problem is that FUSE mounts' lifetimes are tied to the snapshotter. If the snapshotter crashes, the FUSE mounts will die too and we have no way to re-mount them. A couple of options are:

Separate out the FUSE implementation from the snapshotter, so that the mounts still exist even if the snapshotter crashes.
Persist some kind of state on disk that will allow the snapshotter to reconstruct the FUSE mount whenever it comes back online.

Kern-- · 2023-01-03T21:10:35Z

Is this the same problem?
#93

Kern-- · 2023-07-20T17:11:37Z

I think the "solution" to this is the config to ignore the broken data: https://github.com/awslabs/soci-snapshotter/blob/main/config/service.go#L79

Maybe we could also have the SOCI snapshotter call to containerd to remove the broken snapshots? That seems a bit weird, though.

sparr added the bug Something isn't working label Jan 3, 2023

hanyuel self-assigned this Jan 20, 2023

sparr mentioned this issue Jan 20, 2023

[BUG] Results of ending snapshotter process undocumented in issue #275 #340

Closed

Kern-- unassigned hanyuel Feb 27, 2023

sondavidb mentioned this issue Oct 20, 2023

Keep directories when SIGINT sent to daemon #881

Merged

Kern-- closed this as completed Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] killing soci-snapshotter-grpc while a container is running requires manual cleanup #275

[BUG] killing soci-snapshotter-grpc while a container is running requires manual cleanup #275

sparr commented Jan 3, 2023

rdpsin commented Jan 3, 2023

Kern-- commented Jan 3, 2023

Kern-- commented Jul 20, 2023

[BUG] killing soci-snapshotter-grpc while a container is running requires manual cleanup #275

[BUG] killing soci-snapshotter-grpc while a container is running requires manual cleanup #275

Comments

sparr commented Jan 3, 2023

rdpsin commented Jan 3, 2023

Kern-- commented Jan 3, 2023

Kern-- commented Jul 20, 2023