Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman-4.8.0: "Rolling back transaction to validate database' #20902

Closed
srcshelton opened this issue Dec 4, 2023 · 8 comments
Closed

podman-4.8.0: "Rolling back transaction to validate database' #20902

srcshelton opened this issue Dec 4, 2023 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@srcshelton
Copy link
Contributor

srcshelton commented Dec 4, 2023

Issue Description

$ sudo podman --db-backend sqlite --transient-store=false ps -a
ERRO[0000] Rolling back transaction to validate database: sql: transaction has already been committed or rolled back 
Error: database static dir "/space/podman/static" does not match our static dir "/mnt/podman-storage/libpod": database configuration mismatch

(This isn't a one-off error - it occurs on any invocation with this combination of backend database and transient store settings)

I'm not sure where either of /space/podman/static or /mnt/podman-storage/libpod are coming from - static_dir is not set in /etc/containers/containers.conf, and bolt.db and db.sql exist in /var/run/podman/

Invocations which do work correctly are:

sudo podman --db-backend sqlite --transient-store=true ps -a
sudo podman --db-backend bolt --transient-store=false ps -a
sudo podman --db-backend bolt --transient-store=true ps -a

static_dir declarations:

$ sudo grep -R static /etc/containers/
/etc/containers/libpod.conf.deprecated:#static_dir = "/var/lib/containers/storage/libpod"
/etc/containers/libpod.conf.deprecated:static_dir = "/space/podman/static"
/etc/containers/containers.conf:#static_dir = "/var/lib/containers/storage/libpod"
/etc/containers/containers.conf:#static_dir = "/space/podman/static"
/etc/containers/containers.conf.example:# static_dir = "/var/lib/containers/storage/libpod"

$ sudo grep -R 'var/run' /etc/containers/
/etc/containers/libpod.conf.deprecated:tmp_dir = "/var/run/libpod"
/etc/containers/containers.conf:remote_uri= "unix://var/run/podman/podman.sock"
/etc/containers/containers.conf:#tmp_dir = "/var/run/libpod"
/etc/containers/containers.conf:tmp_dir = "/var/run/podman"
/etc/containers/containers.conf.example:# tmp_dir = "/var/run/libpod"
/etc/containers/storage.conf:#runroot = "/var/run/containers/storage"
/etc/containers/storage.conf:runroot = "/var/run/podman"

Steps to reproduce the issue

Steps to reproduce the issue

  1. Perform any podman operation with sqlite as a backend and without transient store.

Describe the results you received

The above error message - which was not present in podman-4.7.2.

(I briefly tested podman-4.8.0_rc1, so there's a possibility that something broken in the RC has persisted?)

This is immediately after a reboot, so any directories such as /var/run will have been cleared immediately prior, and podman system prune --force is run as part of the system shutdown procedure - suggesting that whatever corruption has crept in is either in a persistent directory which survives a system prune, or has occurred a short time into the new session.

Describe the results you expected

Any database errors should be manually or automatically resolvable, or podman should at least be specific about what exact file is affected and how the user might resolve the situation.

podman info output

host:
  arch: amd64
  buildahVersion: 1.33.2
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: app-containers/conmon-2.1.8
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.1.8, commit: 00e08f4a9ca5420de733bf542b930ad58e1a7e7d'
  cpuUtilization:
    idlePercent: 97.42
    systemPercent: 0.88
    userPercent: 1.7
  cpus: 8
  databaseBackend: sqlite
  distribution:
    distribution: gentoo
    version: "2.14"
  eventLogger: file
  freeLocks: 2000
  hostname: dellr330
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.6.3-gentoo
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 52513648640
  memTotal: 67331047424
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: app-containers/aardvark-dns-1.9.0
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.9.0
    package: app-containers/netavark-1.9.0
    path: /usr/libexec/podman/netavark
    version: netavark 1.9.0
  ociRuntime:
    name: crun
    package: app-containers/crun-1.12
    path: /usr/bin/crun
    version: |-
      crun version 1.12
      commit: ce429cb2e277d001c2179df1ac66a470f00802ae
      rundir: /var/run/crun
      spec: 1.0.0
      +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: false
    path: unix://var/run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 42949652480
  swapTotal: 42949652480
  uptime: 0h 59m 21.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  localhost:5000:
    Blocked: false
    Insecure: true
    Location: localhost:5000
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: localhost:5000
    PullFromMirror: ""
  search:
  - docker.io
  - docker.pkg.github.com
  - quay.io
  - public.ecr.aws
  - registry.fedoraproject.org
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 24
    paused: 0
    running: 23
    stopped: 1
  graphDriverName: btrfs
  graphOptions: {}
  graphRoot: /mnt/podman-storage
  graphRootAllocated: 2000397795328
  graphRootUsed: 80424771584
  graphStatus:
    Build Version: Btrfs v6.5.2
    Library Version: "102"
  imageCopyTmpDir: /var/tmp/.private/root
  imageStore:
    number: 113
  runRoot: /var/run/podman
  transientStore: true
  volumePath: /mnt/podman-storage/volumes
version:
  APIVersion: 4.8.0
  Built: 1701182904
  BuiltTime: Tue Nov 28 14:48:24 2023
  GitCommit: ""
  GoVersion: go1.21.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.8.0


### Podman in a container

No

### Privileged Or Rootless

None

### Upstream Latest Release

Yes

### Additional environment details

_No response_

### Additional information

_No response_
@srcshelton srcshelton added the kind/bug Categorizes issue or PR as related to a bug. label Dec 4, 2023
@srcshelton
Copy link
Contributor Author

N.B. podman is running as a service and /var/run/podman/podman.sock exists:

$ ls -l /var/run/podman/podman.sock
srw-rw---- 1 root podman 0 Dec  4 18:27 /var/run/podman/podman.sock=

… so I'm not sure why podman info always includes:

remoteSocket:
    exists: false
    path: unix://var/run/podman/podman.sock

… when this path does definitely exist?

@srcshelton
Copy link
Contributor Author

(N.B.: --transient-store used as workaround for #19938 which I've not yet investigated reversing-out…)

@srcshelton
Copy link
Contributor Author

Looks as if the static dir part at least is #20872

@edsantiago
Copy link
Collaborator

The --db-backend option is undocumented, unsupported, and destructive. If you can afford to clobber your entire podman setup (it may already be beyond recovery), try:

# podman system reset

If that throws more db errors, you may have to podman --db-backend=<WHICHEVER ONE IT COMPLAINS ABOUT> system reset.

@afbjorklund
Copy link
Contributor

… so I'm not sure why podman info always includes:

remoteSocket:
    exists: false
    path: unix://var/run/podman/podman.sock

… when this path does definitely exist?

I think you want to use either of:

unix:/run/podman/podman.sock (podman-only syntax)

unix:///run/podman/podman.sock (docker-compatible)

Since otherwise, the path would be relative.

@Luap99
Copy link
Member

Luap99 commented Dec 5, 2023

This is a dup of #20872 and the other error message was fixed in #20810

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 5, 2023
@srcshelton
Copy link
Contributor Author

The --db-backend option is undocumented, unsupported, and destructive. If you can afford to clobber your entire podman setup (it may already be beyond recovery), try:

# podman system reset

If that throws more db errors, you may have to podman --db-backend=<WHICHEVER ONE IT COMPLAINS ABOUT> system reset.

N.B. This issue still occurs even if --db-backend is excluded entirely: since it's a database-related issue, I included this only for completeness.

Running sqlite3 recovery again each *.sql file, db.sql (1.1M) and db.sql-shm (32k) validate, but db.sql-wal has zero size. I assume that these files are all generated by podman(?) but the error doesn't tell me where the problem might lie 😔

I'd rather avoid a complete system reset simply because restoring all of the images would be a significantly long task.

@Luap99
Copy link
Member

Luap99 commented Dec 5, 2023

Update to 4.8.1 it should have all the fixes. If you still have the problem with 4.8.1 then there is another cause but I doubt it.

But as @edsantiago mentioned do not use --db-backend outside of testing, it is undocumented for a reason.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Mar 5, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

4 participants