-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: bees startup and distro integration #244
Comments
Thank you for the write-up. I think it is actually much simpler than this, or might be to begin with. I completely agree it should NOT be like install-and-it-all-started-deduping. This is wrong in my view, users should configure each bees instance separately. No, I don't think extra complications like an additional "bees-autorun"-type package is needed, - it is trivial to run `systemctl enable bees@UUID' or somesuch — which should take care of everything. How I think it should look like.
The plan is to implement the necessary options (--home, --root, defaults to /run/bees/$UUID). Implement --configure to create .beeshome (or --home) and set size (later to notify running daemon, or just to refuse to operate if it is running) — not necessary to enable the systemd service, this can be done explicitly. Implement reading /etc/bees/$UUID.conf for ROOT/HOME/VERBOSE/etc stuff (and writing a skeleton file at --configure). Choose some sane db size for --configure instead of the current 128Mb. All this is trivial. Maybe BEESHOME should be named --db or --databse. What I want to achieve is to have bees self-contained by itself and for each instance: so it can be run either from command line (looking for everything in /etc/bees/uuid.conf) or from a systemd or other unit without a painful way to specify parameters in /etc/systemd/system/bees@UUID.conf.d/foo.conf :) Drop beesd, move bees to /usr/sbin/, and adjust the systemd unit file for real — currently it has lots of cruft and lacks many actuall needed things. |
I wrote an PoC bees wrapper script which does most of the above, at http://www.corpit.ru/mjt/tmp/bees . It is supposed to be a temporary wrapper around actual bees executable (to be put to /usr/sbin/bees, actual executable to /usr/libexec/bees), until we're satisfied with the result and can move its complete functionality into bees itself, so actual executable will be /usr/sbin/bees without any other wrappers. What this wrapper does:
So it should be possible to run this bees startup in place of the "new and improved" bees. Maybe this script can also create a skeleton conf file with Variable |
A few words about mounts in /run/bees/$UUID/mount.
|
I don't see a need for Similarly, I do not think bees should roll it's own logic for namespaces etc. - just leave that to the operator, please (systemd isolation, firejail, containers etc.) - doing this on your own is needlessly complicated and more often than not will break in spectacular ways on weird quirks that the mentioned dedicated tools have already worked around. In summary, just give bees a
I am not sure how much of an issue that is in practice. Just tell your monitoring script how to find the new paths?
Bees should support both and the decision be made by the operator. I think using mount namespaces is a sane default for the reference service file. |
Here we obviously disagree. The prob with providing settings in environment or command line — eg in a systemd unit file — already forces an operator to create a config file (in the case of systemd, it is either an ugly Why I kept /etc/bees/$uuid.conf and think this is a good idea: you can run things from command line to test and be sure bees receives the same configuration when run from command line or as a system service. Especially it should be welcome by Zygo himself, to be able to re-run stuff manually for testing. The same is true for any possible local debugging. Please note the config file is optional, unlike the current The important config parameters are the The Imagine you forgot to specify Adding config file support is trivial (as shown in my script), it is entirely optional, it resembles the current practice, and it makes some things easier and more robust.
This is also a though one. For now, I haven't used any configuration or setup for the namespaces, just because I'm not sure how this will work for now.
The prob is backwards. Usually a monitoring tool will find all filesystem automatically, and will be surprised by a new filesystem appearing where it should not be.
I don't yet know which default is sane in this context (private or not). But this might be thought later either way. Be it done at systemd level (or at other system startup manager level), or implemented in bees internally, or any other way to do it — this can be done at the second stage. The script I proposed will work either way. For now though, I'd leave it without any fancy namespacing, to see how it all works out. Again, as a principle of least surprise if nothing more. |
I was thinking this would defer some of the decision requirements, following the model of bees clearly doesn't need initramfs integration and today doesn't need a separate startup package either, but that would be a logical next step at some point at some point in the future. That would allow for policy selection through the package manager or support for private vs public namespaces, or integration into someone else's mature sandboxing architecture. Upstream has to support all of these cases, but distros can make choices--one of which is to not use systemd at all. |
Historically bees had very few configuration options and I've been able to squeeze the crawl state into a handful of backward-compatible integers, but that era is coming to an end. #205 (comment) has some of my thoughts from last year on the future of bees configuration. A year later, I'm not as enthusiastic about putting a .conf in To reach the high and low ends of the scalability spectrum, there are some 20 or 30 parameters to set, including the existing constants in A host might have SSD and HDD filesystems, with one .conf that has common rules for all, one that has rules for SSDs, one for HDDs, then the user can assemble specific .confs for each filesystem which refer to these. Other packages might conceivably want to throw in a
So we'd need Case 2 also comes up if the bees defaults change, e.g. switching to a better scan mode, when the user doesn't choose one explicitly. Currently we're stuck on one hash function because we haven't picked a way to coordinate with existing users to change to a better one. |
bees does a lot here:
but bees could do more because there are two attacks left:
Both issues can be prevented at the expense of more complexity and runtime cost in bees, but both issues can also be prevented from outside by dropping bees into an empty namespace where it can only reach the target filesystem,
This causes
I'm a fan of putting status in a directory (i.e. |
I haven't really looked at the namespace API to see how hard it would be for bees to sandbox itself. It doesn't seem that bad:
...end of list? bees uses If bees ever needed to access btrfs If the user didn't give us the subvol root FD then it's a little harder:
...or something like that. Trying to avoid a |
Please don't. If a sysadmin wants to isolate a program, they use any of the mentioned established tooling, and it works and behaves the same for just about any program. Having bees roll its own crap adds no value, much confusion, and even more incompatibilities and broken edge cases. It's similar spirit to writing your own build system. What would this add over just using established methods like with pretty much any other software? |
External methods at best will still leave bees with access to unnecessary things while it's running, because they can't remove access to bees (or whatever the C++ runtime needs at startup, like The main reason bees doesn't simply Internal sandboxing is a supplement to external sandboxing, not a replacement. You could start up bees in a namespace that contains only the bees binary and the target filesystem. The internal sandboxing would start from there, and drop access to the bees binary once it's running. External sandboxing can't remove the privileges that allow internal sandboxing to work, because they're also needed for the btrfs ioctls. If the btrfs ioctl privileges were lowered then maybe this would become a problem. There's no impact on ability to umount the filesystem. It turns out that even if you lazy-umount the filesystem, bees will happily keep running on it, because it does everything through If the sandboxing fails, nothing else changes--bees can still work, it will just work with more access than it needs. That's roughly equivalent to the existing call to There is still the possibility of new external incompatibilities, like systemd flipping the default kernel mount subtree policy from private to shared. That kind of change is pretty disastrous--it broke our 10-year-old software even when it did use the established methods, because systemd unilaterally decided to ignore the established methods. Nothing we can do about systemd except run CI testing and adapt to the new broken. There is a bug in the plan I have above--the new namespace still has all its original mount points underneath, they're just no longer visible. That would kill the whole idea if there isn't an easy solution for it. |
I have reconsidered and changed my opinion, I don't think there's any problem with config files as long as using them is not necessary for basic usage. Exposing the config options from I don't mind |
Or it can be |
programs invoking |
Noting down some insights from #btrfs: If |
Yes, this is definitely a valid point. So guess we'll end up with a separate bees-setup script (it doesn't need to be a binary, a shell script will do just fine) |
A few notes on the script:
A few notes for me:
|
So we're actually back to current I'm not sure we need two scripts (one for setup and another for startup), one can do it, because many functions in there are the same (finding the root filesystem, checking it is actually btrfs, etc). The only thing I dislike in the script I wrote in this context is the need to repeat all options for getopt. Maybe |
Regarding sandboxing: Would more systemd sandboxing be middle ground compared to a full container? About starting bees for every fs automatically: I don't think that is a good option since the fs might be io starved, e.g. starting bees on spinning drive that has big chunks of data to go through. |
This is collecting a few notes together on how bees should work, as opposed to how it currently does work.
How does it work now?
Currently there are some scripts which make key assumptions:
but there are problems:
How should it work?
Users should opt in to running bees
Merely installing the package on a general-use distro should not cause bees to run without user opt-in. Opt-in could be a question asked during the install, or using the distro's automation infrastructure to provide an answer, or deriving the answer from optional metapackage dependencies. e.g. the
bees-autorun
package might depend-on thebees
package, where thebees-autorun
contains a default startup script andbees
contains the bees binary, and users opt into running bees by default at startup by installingbees-autorun
.If the distro is specialized (e.g. designed for NAS builds) then the opt-in could be implied by the choice to use the specialized distro.
Multiple btrfs is a thing
Each btrfs filesystem mounted on a host may have different constraints, where some are suitable for dedupe and some are not.
Each btrfs filesystem requires a separate bees instance, which multiplies resource usage (especially RAM cache size) accordingly.
A user with multiple filesystems might reasonably expect to choose which filesystems to run bees on, and which not.
Private namespaces are good
Ideally we fully sandbox bees so it can't access anything outside of the filesystem it's deduping (and also
BEESHOME
because it might be on some other FS). This is relatively easy if we have the service mount the root subvol in a private namespace for bees to work on, and arguably bees should start doing that itself if we can figure out a way to make that happen without crashing in libc.Public namespaces are also good
Ideally, if the user already has
subvol=/
mounted somewhere (e.g. for backups orbtdu
), bees would be able to simply reuse that mount point.Tracking mounts is hard, especially in the general case
Many users want to be able to mount and umount their filesystems[citation needed]. Ideally, we'd be able to start bees after the mounting and stop bees just before the umounting (though just after can also work in some cases).
One possible model is that we'd track all of the points where a filesystem is mounted, excluding the one bees is mounted on, across all namespaces including the private ones, and stop the service when bees holds the last remaining mount point of the filesystem. It's easy to tell when a filesystem is completely umounted, but hard to tell when exactly one mount point remains (or N mount points, if there are N tools like bees that users want to run at the same time, and they can't all share the mount point for some reason). This isn't a good model--only the kernel knows all the places where a btrfs filesystem is mounted, and bees can add delays on umount, especially for a "clean" (SIGTERM) bees shutdown where we might significantly delay umount for a slow removable disk.
Another model is to have the user bring the filesystem to bees. In this model, when users want bees to run on a filesystem, they must arrange for the filesystem to be mounted at
/run/bees/UUID/mount
. The bees service would then start and stop depending on whether that mount point was mounted, and normal inter-service dependencies can control the mount point.Service dependencies could also work the other way, triggering the bees mount point and the bees instance when some other filesystem is mounted or umounted. This might be a model that distro maintainers can adapt to the quirks of their installer's filesystem layout, or advanced users might construct themselves. This suggests that we should stick to a common "run one bees instance" script and keep it loosely integrated with different bees instance generator scripts that can be swapped out for common use cases like "run everywhere by default" or "run on
/
and/home
but not/var/lib/docker
" or "only run on/var/lib/docker
".bees must be careful to avoid changing btrfs mount options unintentionally
bees could host an upstream configuration tool which takes a btrfs mount point and sets up
/etc/fstab
or systemd mount units for it. Using a mount point (as opposed to the filesystem UUID) means the tool can copy the options from the user's existing mount point, which means a lower chance of getting them wrong (e.g. accidentally changing whether compression or autodefrag is enabled for the filesystem). On the other hand, now there are two copies of the mount options in the system to maintain.Users should have full control over the configuration after a default config is created
We can provide a tool which automatically generates service configuration, but once generated, the user should be able to edit or correct the output of the tool, or send a PR to improve the tool to make better configurations. This implies a compilation model rather than a framework model.
Users should choose the hash table size and RAM usage
bees's constant-size hash table is a central feature of the bees design--it uses the same amount of RAM no matter how large the filesystem gets. Different users will want different size-performance trade-offs (e.g. 1% for casual users on their laptops, 25% for dedicated storage boxes with huge RAID arrays) and therefore different hash table sizes.
We could adopt the PostgreSQL model, where the package installs a configuration with a minimal RAM cache size, a few dozen MB at most. Most users will increase it to something larger, typically 10-75% of RAM depending on how important DBMS performance is relative to other tasks in the host's workload, divided by the number of instances they intend to run.
We can also improve bees's efficiency to make better use of tiny hash table sizes, so that an affordable size like 16M could be "good enough" for casual users with a half-filled single 1TB SSD.
The text was updated successfully, but these errors were encountered: