Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman machine won't start due to corrupted config file #16550

Closed
gravityFlower opened this issue Nov 18, 2022 · 15 comments · Fixed by #16681
Closed

Podman machine won't start due to corrupted config file #16550

gravityFlower opened this issue Nov 18, 2022 · 15 comments · Fixed by #16681
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine windows issue/bug on Windows

Comments

@gravityFlower
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

This problem concerns Podman on Windows. I have seen this problem occur on other occations with other programs on Windows. The content of the original (or changed) config file, every char is replaced with the value '\x00'.

grafik

Steps to reproduce the issue:

unknown

Describe the results you received:

Due to the corrupted configuration file the podman wsl container won't start.

Describe the results you expected:

Detect that the config file is corrupted, check for existing wsl containers and ask for permission to recreate the config for the machine.
Or maybe create backups of the config file and load the last one.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

grafik

Output of podman info:

Same as above.

Package info (e.g. output of rpm -q podman or apt list podman or brew info podman):

Installer used was podman-4.3.0-setup.exe

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

No

Additional environment details (AWS, VirtualBox, physical, etc.):

Edition	Windows 10 Pro
Version	22H2
Betriebssystembuild	19045.2251
Leistung	Windows Feature Experience Pack 120.2212.4180.0
@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 18, 2022
@mheon
Copy link
Member

mheon commented Nov 18, 2022

@n1hility PTAL

@Luap99 Luap99 added machine windows issue/bug on Windows labels Nov 18, 2022
@Luap99
Copy link
Member

Luap99 commented Nov 18, 2022

It would help if you could describe how you got into that situation?
I think it is much better to prevent this problem from happening than to recover a corrupt file which is likely much harder to do.

@gravityFlower
Copy link
Author

Unfortunately i can't.
Since there was no indication, which config file exactly was affected, i can't look if it was the case i suspect. I also uninstalled this podman version and installed the latest version.
It is my assumption that the file contains all \0 becaus this happened to me with other programs. These cases are extremely rare (i believe they happened to me 3-4 times in 6 years), but they do occur and my assumption is that it has to do with the way windows handles (over)writing (config) files (since affected files where always xml or json config files).
Maybe it should have rather been a feature request in the sense of "if the config file is corrupted, provide an option to try to fix it". I'm also ok with the issue being closed if you like.

@n1hility
Copy link
Member

n1hility commented Nov 29, 2022

@gravityFlower Thanks for the report. The presence of NULLs in the file indicates an incomplete NTFS write. When NTFS allocates space for a new/overriten file it's a multi-step operation that first assigns the space, and later writes the data. If the data is not written reads on the file are zero-filled, leading to the NULLs you observe. While these steps are usually immediate, there is a slight delay from write caching/buffer, and a power failure, or hard reboot in between them would lead to this observed corruption.

We can improve this with a different approach which is atomic.

@typedefstructer
Copy link

I just got the same error, how do I get out of it?
Any commands that I can run, that'll reset podman or wsl?

I did a reinstall, didn't help

@n1hility
Copy link
Member

You can do a podman machine rm command to get rid of the state and podman machine init to recreate

@niheaven
Copy link

Hmm, sorry, but podman machine rm gives error too.

Error: invalid character '\x00' looking for beginning of value

So does podman machine list

Error: listing vms: invalid character '\x00' looking for beginning of value

@yacao
Copy link

yacao commented Mar 17, 2023

Hmm, sorry, but podman machine rm gives error too.

Error: invalid character '\x00' looking for beginning of value

So does podman machine list

Error: listing vms: invalid character '\x00' looking for beginning of value

same here in my Windows 11, can't resolve the issue in this way

@gravityFlower
Copy link
Author

Look into the folder %USERPROFILE%\.config\containers\podman\machine\wsl (or C:\Users\<profile>\.config\containers\podman\machine\wsl)´. It should be the podman-machine-default.json file IIRC. Open it and instead of json there should be a lot of \x00 instead.
Delete either the podman folder or one up, the container folder.
On the terminal wsl --list --all should show you a podman-machine-default
Execute wsl --unregister podman-machine-default
Then try a podman machine init again.
At least that was what i did, if i remember correctly.

@yacao
Copy link

yacao commented Mar 17, 2023

thanks @gravityFlower ! After above steps can init a new podman machine! But there are additional issues in my case, can't build connection to the machine when pull image. I deleted the related .ssh file and execute "podman system connection remove *", all works now~

@allanp35
Copy link

allanp35 commented Mar 20, 2023

Thanks @gravityFlower! I tried something else! You can just recreate the file podman-machine-default.json and put the same parameters on it with this template :

{
"ConfigPath": "{Path_to_json_file}\\podman-machine-default.json",
"Created": "2023-02-21T17:41:57.9153656+01:00",
"ImageStream": "custom",
"ImagePath": "{Path_to_rootfs_file}\\rootfs.tar.xz",
"LastUp": "2023-03-15T18:14:19.4336118+01:00",
"Name": "podman-machine-default",
"Rootful": false, (depend on what you want)
"IdentityPath": "{Path_to_ssh_key}\\podman-machine-default",
"Port": {port},
"RemoteUsername": "user",
"Version": 3
}

You can find the port by searching it with the command "podman system connection list".

And then you can start again your machine without creating a new one!

@themr0c
Copy link

themr0c commented Apr 3, 2023

I had the same issue:
The previous day, on a Windows 10 virtual machine, I installed Podman Destkop v0.13.0. Then Podman Desktop installed Podman v4.4.4, and created a new Podman machine. I could use Podman as expected, pulled a couple of images and started a couple of containers, no more than that.

Today, the %USERPROFILE%\.config\containers\podman\machine\wsl\podman-machine-default.json exists but is empty.

@themr0c
Copy link

themr0c commented Apr 4, 2023

Full cleanup procedure:

  1. Find the name of the broken file.

    > ls .\.config\containers\podman\machine\wsl\
  2. Delete the broken json file.

    > rm .\.config\containers\podman\machine\wsl\podman-machine-default.json
  3. List all WSL instances.

    > wsl --list --all
  4. Delete the Podman WSL instance.

    > wsl --unregister podman-machine-default
  5. Delete the remaining connection configuration.

    > podman.exe system connection remove --all

Now you can initialize a new Podman machine.

I could reproduce the issue on a Windows 10 libvirt instance:

  1. Clean the VM
  2. Reinitialize a Podman machine (with Podman Desktop).
  3. Halt the hypervisor without first stopping or rebooting the Windows virtual machine.
  4. Restart the hypervisor.
  5. Start the Windows VM.
  6. The .\.config\containers\podman\machine\wsl\podman-machine-default.json file is empty.

@n1hility
Copy link
Member

n1hility commented Apr 5, 2023

@themr0c work is in progress on this on #18011

@themr0c
Copy link

themr0c commented Apr 6, 2023

@n1hility Thank you! That's exactly that. I could reproduce the issue today again after an unclean shutdown (press the power button on the laptop while the Windows VM is running).

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 28, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. machine windows issue/bug on Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants