Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which are the commands used when creating the wb cache for disk partitions ? #122

Open
Augusto7743 opened this issue Aug 12, 2022 · 5 comments
Labels

Comments

@Augusto7743
Copy link

Hello Petros Koutoupis. All right with you ?
Users need know Rapiddisk.
I have been using more than one year and works very good. Amazing tool.

The only problem is when doing writeback in an BTRFS disk not being related with Rapiddisk.
When cache is full is done an flush in the disk, but in sometimes is an partial written.
That action can damage an BTRFS partition with the error "parent transid verify failed"

"Parent transid verify failed" is the result of a failed internal consistency check of the filesystem's metadata"

Unhappily few times is possible fix it. File system was damaged.

"On-disk metadata is committed every time a FLUSH or FUA bio is written. If no such requests are made then commits will occur every second. This means the cache behaves like a physical disk that has a volatile write cache. If power is lost you may lose some recent writes. The metadata should always be consistent in spite of any crash."

When Rapiddisk create the wb caches which are the commands used when creating the dm-writeback for disk partitions ?
If are several commands not need reply with commands about it. The commands are documented ?
Only trying understand if is possible some configuration to avoid the problem above or if is problem when using dm-writecache module in an BTRFS partition and after sharing any information useful.

Have an good day.

@pkoutoupis
Copy link
Owner

@Augusto7743 This is a very tricky process and one of the main reasons why I was hesitant on implementing a writeback cache in RAM. I assume that you went through the notes by @matteotenca here: #86.

Can you please provide more details on the events or actions that led you to this corruption? Maybe also distro/version/kernel.

@Augusto7743
Copy link
Author

Thanks very much for your reply.

I had posted
#86
about how create an correct script to flush all writeback caches before OS shutdown.

I have used your amazing Rapiddisk using wb in /root , /home and /opt since when feature wb was added in RapidDisk.
I had created 3 BTRFS partitions for test and after reporting in the project. I can restore both partitions in any moment.
Your software not crash the OS. OS run much better with your software. I want also use wt, but not much RAM available. Only wb enabled. In future will be used also wt cache.

You adding writeback in Rapiddisk is an bless for average and beginner Linux users.

I had accessed information in
https://btrfs.readthedocs.io/en/latest/
https://www.man7.org/linux/man-pages/man8/dmsetup.8.html
for understand if have any command or configuration to avoid metadata damage.

The problem happen when done partial flush to block device.
wb cache only flush totally if using an command. When the cache is full happen partial flush written in disk.
That partial written not update metadata.


[ BTRFS documentation ]
parent transid verify failed on 29360128 wanted 1486656 found 1486662
If the second two numbers (wanted 1486656 and found 1486662) are close together (within about 20 of each other), then mounting with
-o ro,usebackuproot

Parent transid verify failed" is the result of a failed internal consistency check of the filesystem's metadata

I have started the OS and running for 1 hour and 20 minutes.
Using the command below only check the fs
sudo btrfs check --readonly --force --mode original --progress /dev/sda3
"No error found"
The cache not is full for /
Cache is 128 MB

sudo btrfs check --readonly --force --mode original --progress /dev/sda4
parent transid verify failed on 43925504 wanted 54343 found 53673
parent transid verify failed on 43925504 wanted 54343 found 53673
parent transid verify failed on 43925504 wanted 54343 found 53673
Ignoring transid failure
ERROR: could not setup extent tree
ERROR: cannot open file system

The cache was used 100 % (256 MB) and was done an little partial written in BTRFS partition /home. Not user data being written in /home.
Need run the command below before OS shutdown to avoid metadata damage because in the error message the 2 values not are close together with few chances to fix the fs.

sudo /usr/sbin/dmsetup message /dev/mapper/rc-wb_sda4 0 flush
Now "No error found". The cache was flushed and metadata updated.

sudo btrfs check --readonly --force --mode original --progress /dev/sda5
parent transid verify failed on 39436288 wanted 6800 found 6729
parent transid verify failed on 39436288 wanted 6800 found 6729
parent transid verify failed on 39436288 wanted 6800 found 6729
Ignoring transid failure
ERROR: could not setup extent tree
ERROR: cannot open file system

The cache was used 100 % ( 8 MB) and was done an little partial written in BTRFS partition /opt
Need run before shutdown the OS to avoid metadata damage.
sudo /usr/sbin/dmsetup message /dev/mapper/rc-wb_sda5 0 flush
Now "No error found".

wb cache for /opt is only 8 MB for test because /opt is few used for written. Only few files of less 1 MB are written in /opt.

The 3 fs are being mounted in fstab using
acl,noautodefrag,barrier,commit=1,compress-force=zlib:9,datacow,datasum,max_inline=0,metadata_ratio=0,nospace_cache
Testing using also noflushoncommit and notreelog

If is possible configure when creating the wb caches an option when the cache is full doing partial written to also flush metadata is possible avoid the problem above.

If have options to configure updating metadata in partial flush in dm-writeback cache is good to test it and see if the problem happen again.
If even using some commands to update the metadata not is possible avoid metadata damage I need share the information above with dmcache author and BTRFS team.

BTRFS does several complex tasks in disk, but unhappily when the fs is damaged not is possible use tools to fix.
BTRFS change the location of metadata.
The "parent transid verify failed" not happen only with dmwritecache. Internet have several posts about users with the same problem.
Have some options to enable in BTRFS fs configuring metadata, but need to be done exactly after the fs was created. I will test it.

I only need understand if have any command or configuration to optimize flushing metada. Having need test before to contact btrfs devs.
BTRFS is stable with good features, but need care.

@pkoutoupis
Copy link
Owner

OK. Based on what I am reading, I think this is more related to #62 and the ability to configure the dm-writeback parameters.

@matteotenca
Copy link
Collaborator

@Augusto7743 @pkoutoupis I suggest Augusto to try the approach I tried to detail here (link to the specific message):

#86 (comment)

This is the only way I found to avoid filesystem corruptions while using write back mode on the main root volume upon shutdown.
If you can find some time to try this approach, we can gather some more infos.

Regards

@Augusto7743
Copy link
Author

Augusto7743 commented Sep 24, 2022

@matteotenca
Yes ... using dmwriteback in root volume avoid problems . However need figure as configure correctly writeback shutdown in multiple volumes.
I see "file system pre" not is an systemd service.
The command to flush need run before target "file system pre".

@pkoutoupis
I want undestand if need reporting that issue for BTRFS developers.
You have tested any configuration in dm-writeback parameters when creating wb cache ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants