-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple iSCSI fileIO Block Devices on One Btrfs Disk with bees #263
Comments
Try using |
Hi @kakra and thanks for the quick response. Just so there's no confusion, you're suggesting I do |
Then I suggest creating a subvol for the images: Validate with |
While carrying out your instructions, I discovered that the image file itself already has the m attribute. Wouldn't this mean that compression is already disabled? If so, I can still try disabling iSCSI direct IO. |
Okay, so your iSCSI tools already take care of that. Then disabling direct io may fix it. Cached io may work best if that is available, and it should be safe because CoW is still enabled on the files. Also, check if auto-defrag in btrfs is enabled and maybe try disabling it. I don't think the observed problems are bees fault. It only uncovers a flaw in btrfs with concurrent access in the direct io path. You should be able to observe it even with bees not running but high IO load in the images. |
Thanks for the advice. I did make sure not to set the autodefrag mount option as per your documentation, but I've created the qcow2 file and am now testing it out as a Steam library. |
If you suffer performance problems, you should consider a kernel patch to put btrfs meta-data on a dedicated SSD, also maybe use bcache to cache IO reads and writes. I have some patches here for metadata-{preferred,only} partitions: kakra/linux#26 BTW: Meanwhile I converted to two NVMe disks for dedicated meta-data in a btrfs-raid1 setup and bcache in a mdraid1 setup. Remaining HDDs are still the same. |
So I'm currently installing Baldurs Gate 3. It hasn't failed yet, but I am getting lots of journal errors as before. Improving performance may be ill-fated at this point, but I'll wait until it finishes before coming to any conclusions |
Errors from bees logged to journal? That's okay, I think. bees is very chatty about situations it didn't expect - which happens while writing to the image files. If you're no longer seeing complaints from the kernel, everything is fine. |
It's mostly crawl entries from bees, but I am getting a few BTRFS kernel errors. Although they may be as a result of the old .img file still being present on the volume -- bees does seem to be detecting duplicate content between the .img and the .qcow2. |
Yeah, the checksums of the img files are borked... But bees created own checksums so it still detects the duplicates and tries to read the files which btrfs refuses to read - finally probably resulting in the qcow images becoming damaged, too. |
The errors will likely persist until the old .img is removed. bees will try to read every reachable block on the filesystem, so if some of them still have csum errors, then bees will eventually find (and complain about) all of them. bees will only try reading each block reference once, so it will skip over the extents with errors as it finds them. There can be multiple references to the same extent with a bad block, but there's a finite number of those, and bees should eventually run out of references to try. I run bees on a large enough fleet that there's always some drive failing somewhere, so this error detection and recovery path is fairly well tested. bees might create new references to extents that contain errors; however, bees should not be creating any new errors. The kernel dedupe ioctl doesn't allow modifications of data if one side of the dedupe isn't readable, and the inode-level locks during dedupe should prevent concurrent direct IO. bees does add a lot of IO workload to the system, so it can make any existing data corruption bug worse, especially if the corruption is caused by a race condition (as direct IO with concurrent in-memory data modification is). Historically there have been several kernel bugs found and fixed over the years. Direct IO is an exception because its behavior isn't considered a kernel bug for some reason. |
There is one exception to this: if a block goes bad after a block is stored in the bees hash table (e.g. due to device-level data corruption), bees will keep hitting that bad block every time it reads a new block with a matching hash. That continues until the bad block is deleted (then the hash won't match and bees will remove the hash table entry). This could be handled differently, e.g. bees could detect the read error and remove the hash table entry immediately, or bees could simply exit when any IO error is detected. Right now bees assumes all errors are temporary, and tries to continue after skipping the task that found the error. That shouldn't happen in this case, because data csums corrupted by direct IO are bad starting at the time of their creation. Data blocks aren't stored in the page cache when using direct IO, so the csum failure can't be bypassed by reading the block from cache (which wouldn't verify the data against the csum). The combination of those would mean there's no way for bees to read the block, which would prevent the block from ever reaching the bees hash table. |
Hello and thanks for the info @Zygo. Example: This will take several hours to complete though so this may change. Again, thank you for your interest and patience! |
No good I'm afraid... |
@a-priestley The fact that you are getting errors is worrisome. The /dev/sda device where you see the errors, is that inside the initiator (iSCSI client) OS or on the host device where the iscsi target images are stored? What iscsi target software do you use (fileio sounds like LIO target?) and what settings do you use for exporting each target? |
hi @Forza-tng, I was using targetcli to set up my fileIO backstores. I chose it simply because it is currently in-kernel, and the Arch Wiki leans toward it. I have not tried tgtd. |
I believe that I also can't see that using qcow2 instead of raw should make any differences to data corruption, unless there's a bug in the fileio driver itself. Where are you seeing the errors, on the host dmesg or in the clients? About tgtd. It's a iscsi server written in user-space and does not need any kernel modules. This, I found personally, is a more stable approach. At work I recently converted a rather large storage server to tgtd from targetcli (LIO). The server uses btrfs too. A little while back I wrote a small wiki entry on setting up tgtd. https://wiki.tnonline.net/w/Blog/iSCSI_target_in_user-space#Configuration_example |
Well to be honest with you, I think Steam just doesn't like btrfs for reasons I don't fully understand. I did some more testing with it using just btrfs, freshly formatted and mounted as a games library on a completely separate disk -- no deduplication. The same issues were happening: Steam downloads the files, tries to verify them, a bunch of corruption errors show up, and the process fails with a "disk read error". It seems to correlate with reading very large files. I'm not sure what to make of it, but I do know that the Steam Deck ships on ext4. There's probably a good reason for that. |
This problem does not exists here (running completely on btrfs), and Steam actually explicitly supports btrfs since 2+ years according to a dev I've chatted with: Proton uses reflink copies of the wine prefix to clone new prefixes per game to save space, and this will probably be ported to Steam Deck once btrfs supports case-folding. My library contains over 2 TB of downloaded games, and not even one game failed to verify - not in the past, and not now, all files are pristine. Bees is running on the library and finds a lot of duplicate extents, so it's also not just write once. "correlate with reading very large files" more likely indicates a statistical observation: your hardware may introduce bit errors or your storage software stack on the lower levels may introduce cache inconsistencies (like direct IO on btrfs) and it is more likely to be visible in large files. Did your test really use a native disk? Or some software block device? Any pre-fail conditions in smartctl? Did you check the Steam logs which files at which location really failed checks? Also: "completely separate disk" may not mean much: Steam may download to a temp folder first, then move files over to the library. So verification may have failed early on the temp folder. |
It turns out I have been having memory errors for the passed while. One of my modules appears to be faulty. After taking it out, none of the errors I've been seeing are happening any longer. |
I'm currently working on a project with the goal of setting up a remote game streaming service with simultaneous client capabilities, with a focus on efficient use of storage available over network.
For network storage availability, I'm using iSCSI fileIO image backstores -- one per client.
For the streaming service, I'm using wolf, which is a containerized service that dynamically spins up headless streaming displays using gstreamer for clients to connect to using moonlight.
The problem I am trying to solve when using this service is related to storage use. Each client would need its own filesystem, which means that storage use for duplicate files on the underlying disk would be multiplied by the number of clients using those files. Basically, two clients with the same game installed will double the storage requirements for that game.
The general idea is to use bees to reduce the storage requirements for these duplicate files. So I've currently got a 2TB hybrid drive formatted as a partitionless btrfs, which I'm running bees on, with a 256MB DB_SIZE. Inside the volume I've created an iSCSI target on a sparse image file.
For testing purposes, I'm connecting to the target on the same machine that I'm hosting from, and I've mounted the resulting block device as ext4, which I'm now testing out as a Steam library.
Unfortunately at this point I'm running into some trouble. While many titles are installing and running with no issue, others, particularly larger ones are unable to be verified by Steam due to disk read errors. I can see these coming in as btrfs corruption errors in the journal as well as beesd crawl errors such as:
I'm trying to ascertain whether or not this project is even feasible. Is it advisable to use bees in this way? If anyone thinks it could work, are there any pointers you can give on how I can tweak the service?
Thanks!
The text was updated successfully, but these errors were encountered: