I've been successfully using bees on my backup server for a long time now. Recently, however, I've had the data amount grow far enough to make hash table size not optimal, attempted to increase its size, and now it seems like I can't use bees anymore, as it is stuck deep in snapshots and seems keen to grow my metadata to uncomfortable amounts.
I turned the service off for now. I don't think any of this is bees's fault, as I brought it all upon myself. But I am wondering whether there's a way to recover (and whether I even should, as bees might simply be a bad fit for my usage scenario).
Background
Some years ago, I built myself a backup server with a single 8 TB drive (I am only mentioning the storage drives; OS lives elsewhere and is not relevant, as it's not even on btrfs). I figured that, if I rsync my other machines to subvolumes on this server, and then create readonly snapshots of those volumes, I'm going to make great use of btrfs's CoW features. To maximize CoW, I used the --inplace --no-whole-file -M--no-whole-file rsync flags, to make sure that, if a file is only partially changed, it will be only partially overwritten.
This seemed to work well enough. I wanted to optimize further, so I enabled compression (compress-force=zstd:12) and set up bees. (I also did other things, like using bcache with redundancy and a custom kernel with kakra/linux#36, but they don't seem relevant, so I'm not mentioning them here.) When setting up bees, I did some research and accounted for a couple of things:
- While bees can be used with however many snapshots one wants, the snapshots should be created after dedup. Because of this, I made sure to first do the backup rsync, then let bees run, and only then snapshot once bees is done.
- The optimal hash table size can be calculated as
unique_data_bytes / (128 * 1024) * 16 (amount of 128K unique hashed extents times 16 bytes per entry). I figured that 8 TiB of unique data is a reasonable expectation with an 8 TB filesystem (that is expected to grow later on), so I went with a 1 GiB hash table.
- It's pointless to dedup (and, indeed, compress) some files (e.g. VM storage) to avoid insane fragmentation, so I made sure to mark such files nodatacow.
It all went fine and performed about as I expected. Then I grew the filesystem, adding two more 20 TB drives (for a total of 48 TB raw storage and 24 TB usable space in a raid1 configuration) and ran a full balance as is recommended. Then my amount of data grew steadily as I made more backups, at some point reaching current 10.5 TiB, and I wondered whether the hash table size is future proof.
Failure of cognitive function
Despite 10.5 TiB being still perfectly fine and within tolerance for a 1 GiB hash table, I, being the smart and insightful professional that I am, considered the possibility that it will be too costly to grow the hash table once data amount reaches, say, 40 TiB. So I decided to grow the hash table to 4 GiB preemptively, as I have the RAM to spare. I looked up how to do it, found this issue #70 (and some other ones, which said basically the same things I believe), and learned that I have to delete beeshash.dat and beescrawl.dat and change the db size in the config.
Of course, I completely failed to understand or consider the implications. Deleting those files basically reset bees and made it crawl and dedup my whole storage from scratch — that is, 100+ snapshots of 3-4 TiB each. I have also completely missed (or forgot, since I researched this so long ago) that bees will create its own metadata for every snapshot as it dedups.
I got alarmed after my metadata tripled in size, did some better research, stopped bees, and here we are.
What next?
Accepting the damage (extra metadata) and moving on without bees is the most obvious default choice. My usage scenario probably makes bees less useful than it could be, anyway, as the in-place writes and incremental backups deduplicate things quite a lot as is.
But what if I wanted to keep using bees? I thought a bit and came up with a "replay step-by-step" plan:
- Making use of LVM features, shrink the existing btrfs filesystem and create a new one beside it on the same set of drives, with same settings.
- Replicate the backup subvolume structure (1 subvol per machine, separate nodatacow directories for troublesome files, etc.).
- For each backed-up machine, rsync the oldest history snapshot into the new filesystem. This should fully defragment the data and minimize metadata.
- Run bees on the new filesystem. Wait for it to finish dedup.
- Create snapshots of the deduped subvolumes, replicating the oldest snapshots within the new filesystem.
- Delete these oldest snapshots from the old filesystem.
- GOTO 3 until there are no snapshots left in the old filesystem, balancing and shrinking it on the way, as needed, to grow the new one.
- Delete the old filesystem and replace it with the new one, fully deduped with new proper hash table size.
So, basically, I would replicate the steps I took to create each backup in the first place, one by one. This is obviously very involved and will take a very long time, but it seems like the best option I've got (aside from deleting the snapshots and starting anew from the current state).
This can be made potentially faster and more efficient if I segregate the backups. As I don't really need most of this data in versioned history, I could shrink the snapshots quite a lot by storing the non-versioned data separately, in subvolumes without snapshots. Finally, I could even keep the non-versioned data on a separate btrfs filesystem, and run bees only there. Which brings me to...
What's the right way to handle a dynamic (adding/removing/replacing drives) filesystem, with snapshots, with bees?
Going back to #70 and stuff mentioned there (once again, there were other issues I looked through, but the info was more or less the same), it seems like:
- The only current way to resize the hash table is to nuke it. Along with it, one must nuke the crawl state, because otherwise dedup won't be efficient, as it won't be aware of any older data (at least that's how I understand it).
So, the only way to resize the hash table is to basically start bees from scratch, which should not be done if you have a lot (or even a few, really) of huge snapshots.
- Full balance (like one must do when adding/removing drives or replacing faulty ones) invalidates the hash table. Once again, one must drop the crawl state along with it to avoid the same issue (dedup being inefficient because it can't find the older data to reference).
So, a full balance for any reason also basically forces one to start bees from scratch.
If the hash table sizing issue can be circumvented (predicting the expected data size and picking a hash table size accordingly is a reasonable expectation, especially since there's no hard requirement to keep the optimal 128k dedup extent size), the balance issue seems just completely unavoidable with a multi-drive storage system (even if one never adds or removes drives, a drive will fail at some point, requiring a replacement and a full balance).
So does that mean bees shouldn't be used on multi-drive storage with many (and/or huge) snapshots? As a balance will be required at some point, which will invalidate the hash table, which will either cripple bees or require restarting from scratch (taking ages and exploding metadata due to snapshots).
Or is there a right way that I'm missing? Maybe this info is just stale and balancing is somehow handled without hash invalidation by now? Or maybe there's some way one can run a one-off "hash table fill" that will read through all extents (rather than following snapshots and transactions) and recover the hash table to a usable state in reasonable time?
I've been successfully using bees on my backup server for a long time now. Recently, however, I've had the data amount grow far enough to make hash table size not optimal, attempted to increase its size, and now it seems like I can't use bees anymore, as it is stuck deep in snapshots and seems keen to grow my metadata to uncomfortable amounts.
I turned the service off for now. I don't think any of this is bees's fault, as I brought it all upon myself. But I am wondering whether there's a way to recover (and whether I even should, as bees might simply be a bad fit for my usage scenario).
Background
Some years ago, I built myself a backup server with a single 8 TB drive (I am only mentioning the storage drives; OS lives elsewhere and is not relevant, as it's not even on btrfs). I figured that, if I rsync my other machines to subvolumes on this server, and then create readonly snapshots of those volumes, I'm going to make great use of btrfs's CoW features. To maximize CoW, I used the
--inplace --no-whole-file -M--no-whole-filersync flags, to make sure that, if a file is only partially changed, it will be only partially overwritten.This seemed to work well enough. I wanted to optimize further, so I enabled compression (
compress-force=zstd:12) and set up bees. (I also did other things, like usingbcachewith redundancy and a custom kernel with kakra/linux#36, but they don't seem relevant, so I'm not mentioning them here.) When setting up bees, I did some research and accounted for a couple of things:unique_data_bytes / (128 * 1024) * 16(amount of 128K unique hashed extents times 16 bytes per entry). I figured that 8 TiB of unique data is a reasonable expectation with an 8 TB filesystem (that is expected to grow later on), so I went with a 1 GiB hash table.It all went fine and performed about as I expected. Then I grew the filesystem, adding two more 20 TB drives (for a total of 48 TB raw storage and 24 TB usable space in a raid1 configuration) and ran a full balance as is recommended. Then my amount of data grew steadily as I made more backups, at some point reaching current 10.5 TiB, and I wondered whether the hash table size is future proof.
Failure
of cognitive functionDespite 10.5 TiB being still perfectly fine and within tolerance for a 1 GiB hash table, I, being the smart and insightful professional that I am, considered the possibility that it will be too costly to grow the hash table once data amount reaches, say, 40 TiB. So I decided to grow the hash table to 4 GiB preemptively, as I have the RAM to spare. I looked up how to do it, found this issue #70 (and some other ones, which said basically the same things I believe), and learned that I have to delete
beeshash.datandbeescrawl.datand change the db size in the config.Of course, I completely failed to understand or consider the implications. Deleting those files basically reset bees and made it crawl and dedup my whole storage from scratch — that is, 100+ snapshots of 3-4 TiB each. I have also completely missed (or forgot, since I researched this so long ago) that bees will create its own metadata for every snapshot as it dedups.
I got alarmed after my metadata tripled in size, did some better research, stopped bees, and here we are.
What next?
Accepting the damage (extra metadata) and moving on without bees is the most obvious default choice. My usage scenario probably makes bees less useful than it could be, anyway, as the in-place writes and incremental backups deduplicate things quite a lot as is.
But what if I wanted to keep using bees? I thought a bit and came up with a "replay step-by-step" plan:
So, basically, I would replicate the steps I took to create each backup in the first place, one by one. This is obviously very involved and will take a very long time, but it seems like the best option I've got (aside from deleting the snapshots and starting anew from the current state).
This can be made potentially faster and more efficient if I segregate the backups. As I don't really need most of this data in versioned history, I could shrink the snapshots quite a lot by storing the non-versioned data separately, in subvolumes without snapshots. Finally, I could even keep the non-versioned data on a separate btrfs filesystem, and run bees only there. Which brings me to...
What's the right way to handle a dynamic (adding/removing/replacing drives) filesystem, with snapshots, with bees?
Going back to #70 and stuff mentioned there (once again, there were other issues I looked through, but the info was more or less the same), it seems like:
So, the only way to resize the hash table is to basically start bees from scratch, which should not be done if you have a lot (or even a few, really) of huge snapshots.
So, a full balance for any reason also basically forces one to start bees from scratch.
If the hash table sizing issue can be circumvented (predicting the expected data size and picking a hash table size accordingly is a reasonable expectation, especially since there's no hard requirement to keep the optimal 128k dedup extent size), the balance issue seems just completely unavoidable with a multi-drive storage system (even if one never adds or removes drives, a drive will fail at some point, requiring a replacement and a full balance).
So does that mean bees shouldn't be used on multi-drive storage with many (and/or huge) snapshots? As a balance will be required at some point, which will invalidate the hash table, which will either cripple bees or require restarting from scratch (taking ages and exploding metadata due to snapshots).
Or is there a right way that I'm missing? Maybe this info is just stale and balancing is somehow handled without hash invalidation by now? Or maybe there's some way one can run a one-off "hash table fill" that will read through all extents (rather than following snapshots and transactions) and recover the hash table to a usable state in reasonable time?