Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: schema is corrupt #7064

Open
squalus opened this issue Sep 19, 2022 · 1 comment
Open

Error: schema is corrupt #7064

squalus opened this issue Sep 19, 2022 · 1 comment
Labels

Comments

@squalus
Copy link
Member

squalus commented Sep 19, 2022

Describe the bug

I have seen multiple cases of "schema is corrupt" error messages in a production environment. This tends to happen on NixOS systems that have unexpected power cuts.

$ nix store verify --all
error: '/mnt/nix/var/nix/db/schema' is corrupt

In this case, it's an ext4 file system and the schema file is empty.

Steps To Reproduce

I have a minimal test case that simulates a power cut with NixOS tests and reproduces the problem here: https://github.com/squalus/nix-durability-tests. It can be run on several different file system.

nix -L build github:squalus/nix-durability-tests#corrupt-schema-tests.xfs

This will hopefully print a "schema is corrupt" error message.

Expected behavior

The schema file should never be invalid, even if there's an unexpected power cut.

nix-env --version output
nix-env (Nix) 2.8.1

Additional context

Some possible causes:

  1. Errors from close(2) are ignored in nix::writeFile. (From man close: Failing to check the return value when closing a file may lead to silent loss of data.)
  2. fsync(2) is not run on the file after writing the contents. This means the data may not be fully flushed to disk.
  3. fsync(2) is not run on the parent directory after closing the file. This means the directory may have outdated contents. (This wouldn't cause an empty file, but it could cause a mismatch. I haven't yet observed this problem.)
  4. The file is not written atomically. It could instead be written with a temporary file and a call to rename(2), like in https://github.com/google/renameio.

Point 2 was addressed in this PR, but it was never merged: #1956

More background: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/

@squalus squalus added the bug label Sep 19, 2022
squalus added a commit to squalus/nix that referenced this issue Sep 19, 2022
- call close explicitly in writeFile to prevent the close exception
  from being ignored
- fsync after writing schema file to flush data to disk
- fsync schema file parent to flush metadata to disk

NixOS#7064
squalus added a commit to squalus/nix that referenced this issue Sep 19, 2022
- call close explicitly in writeFile to prevent the close exception
  from being ignored
- fsync after writing schema file to flush data to disk
- fsync schema file parent to flush metadata to disk

NixOS#7064
squalus added a commit to squalus/nix that referenced this issue Sep 20, 2022
- call close explicitly in writeFile to prevent the close exception
  from being ignored
- fsync after writing schema file to flush data to disk
- fsync schema file parent to flush metadata to disk

NixOS#7064
@squalus
Copy link
Member Author

squalus commented Sep 20, 2022

#7065 takes care of 1-3, but I'll keep this open because the atomic file write (point 4) could still be done to improve this.

Minion3665 pushed a commit to Minion3665/nix that referenced this issue Feb 23, 2023
- call close explicitly in writeFile to prevent the close exception
  from being ignored
- fsync after writing schema file to flush data to disk
- fsync schema file parent to flush metadata to disk

NixOS#7064
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant