Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint-restore-checkpoint-restore loses changes to rootfs #4606

Closed
t184256 opened this issue Dec 1, 2019 · 5 comments · Fixed by #4643
Closed

Checkpoint-restore-checkpoint-restore loses changes to rootfs #4606

t184256 opened this issue Dec 1, 2019 · 5 comments · Fixed by #4643
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@t184256
Copy link

t184256 commented Dec 1, 2019

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I believe that diffing root filesystem for checkpointing picks the wrong base for diffing on consecutive checkpoints.

Steps to reproduce the issue:

set -ex
podman run --name test -dit alpine
podman exec -l sh -c 'touch SOME_FILE'
podman exec -l sh -c 'ls *_FILE'                  # SOME_FILE
podman container checkpoint -l -e podman.tar
podman rm test
podman container restore -i podman.tar -n clone
podman exec -l sh -c 'touch SECOND_FILE'
podman exec -l sh -c 'ls *_FILE'                  # SOME_FILE SECOND_FILE
podman container checkpoint -l -e podman.tar
podman rm clone
podman container restore -i podman.tar -n clone
podman exec -l sh -c 'touch THIRD_FILE'
podman exec -l sh -c 'ls *_FILE'                  # SOME_FILE THIRD_FILE
podman container stop -t1 clone
podman rm clone

Describe the results you received:

Changes to the rootfs between first and second checkpoint are lost.

Describe the results you expected:

All changes to the rootfs are preserved.

Additional information you deem important (e.g. issue happens only occasionally):

Sorry about not testing against the latest version, but updating breaks my setup and a cursory glance across the files touched in #3443 hints at no relevant changes since that time.

Possibly interested person: @adrianreber
Possibly relevant investigation starting place: GetDiffTarStream https://github.com/containers/libpod/blob/39c705e9405faa4d02b71165d05eec1e7bb44d93/libpod/diff.go#L66

Output of podman version:

Version:            1.6.2
RemoteAPI Version:  1
Go Version:         go1.12.10
OS/Arch:            linux/amd64

Output of podman info --debug:

  compiler: gc
  git commit: ""
  go version: go1.12.10
  podman version: 1.6.2
host:
  BuildahVersion: 1.11.3
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.2-1.fc30.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.2, commit: a89d21975ee86e84e0b0e1c0f887687582f4b0e3'
  Distribution:
    distribution: fedora
    version: "30"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  MemFree: 724291584
  MemTotal: 16424656896
  OCIRuntime:
    name: runc
    package: containerd.io-1.2.10-3.2.fc30.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8+dev
      commit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
      spec: 1.0.1-dev
  SwapFree: 8268279808
  SwapTotal: 8296329216
  arch: amd64
  cpus: 8
  eventlogger: journald
  hostname: asosedkin-t480s
  kernel: 5.3.12-200.fc30.x86_64
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: slirp4netns-0.4.0-4.git19d199a.fc30.x86_64
    Version: |-
      slirp4netns version 0.4.0-beta.2
      commit: 19d199a6ca424fcf9516320a327cedad85cf4dfb
  uptime: 27h 31m 14.02s (Approximately 1.12 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /home/asosedki/.config/containers/storage.conf
  ContainerStore:
    number: 0
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-0.6.5-2.fc30.x86_64
      Version: |-
        fusermount3 version: 3.6.2
        fuse-overlayfs: version 0.6.5
        FUSE library version 3.6.2
        using FUSE kernel interface version 7.29
  GraphRoot: /home/asosedki/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 0
  RunRoot: /run/user/1000
  VolumePath: /home/asosedki/.local/share/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman-1.6.2-2.fc30.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):

A fairly regular Fedora 30 box.

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 1, 2019
@adrianreber
Copy link
Collaborator

Thanks for the bug report. I just tried it on RHEL 8 with the latest podman from git and I cannot reproduce it. I will also try it on Fedora 30 next.

Thinking about it, it should not be a problem. We are doing the diff always against the original layer and on restore all rootfs changes are applied on that layer, which should give the correct result in following checkpoints.

@adrianreber
Copy link
Collaborator

I can confirm the result on Fedora 30. Strange.

@adrianreber
Copy link
Collaborator

I can see that the rootfs diff in the exported checkpoint archive has the wrong content. The complete file-system is included. Confirming that this broken. I will provide a fix.

@t184256
Copy link
Author

t184256 commented Dec 2, 2019

Tried same Fedora 30 box but with podman-1.6.2-2.fc31 and runtime = "runc", same result.

@t184256
Copy link
Author

t184256 commented Dec 19, 2019

I've tried podman 2:1.7.0-0.16.dev.gitc1a7911.fc32 and now it works as expected. Thanks!

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants