New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bootmisc accidentally wiped non-/tmp settings #458
Comments
|
this goes against FHS recommendations, what OpenRC has been doing for decades, and what pretty much every other distro does. iirc, a good amount of the /tmp cleanup logic was inspired by logic Debian had. bootmisc already supports /tmp being mounted as tmpfs if the user wants that. add it to your /etc/fstab. OpenRC shouldn't be requiring either behavior. bootmisc already supports not wiping /tmp if the user doesn't want it. /etc/conf.d/bootmisc has a specific so i don't see us changing any defaults here. if you can see where there's a bug in the logic, we can fix it, but this is the first report i've heard, and this logic has been around for a long time. |
|
The |
|
For whatever reason (not clear; may have been a result of fs corruption, but the bootmisc script itself was not corrupted because I read it during recovery) the In any case, executing |
|
The function that cleans up temp directories is
|
|
I plan on spinning up a VM this evening and seeing if I can reproduce it. |
|
If it helps, the crash just prior to this was my attempt to run anbox. It failed to run and my system was left in some sort of weird state. I don't recall any specifics of it beyond that it seemed like rebooting would be a good idea... So it's possible it left something in /tmp that the script handled badly, though I can't think what. rm does not follow symlinks. |
|
if you saw the message
if you had a ton of files in if you had mounts inside of /tmp, then |
|
Since this was during boot I cannot think of any way there could have been mounts (bind or otherwise) inside of tmp. There would not be under normal runtime circumstances either. This could be mitigated with I think the most likely cause was actual filesystem corruption - perhaps |
|
i understand your preference. you're free to disable the behavior, either through existing knobs or by having /tmp be a tmpfs. but the fact is that clearing of /tmp is, and has been for decades, the standard in the *NIX world, and has been strongly recommended by FHS for decades (such that they would have mandated it if they felt FHS's scope included administration in general). changing this default behavior would be a strong disservice to users, the vast majority of which want this. so if we agree that the code as-is doesn't look like it would ascend into the / or otherwise start operating on paths not reachable via /tmp, then it sounds like there's nothing left to do or discuss here. |
|
In that case I'll just continue to follow up on the Alpine side about fixing this in a patch, and about this as yet another reason for moving off OpenRC. |
|
Hey @richfelker I would like to know more about why Alpine is looking into moving off of OpenRC; I would like to see if we can work on ways to fix your issues. |
|
@richfelker It looks like -xdev is an option for find, but it looks like -mount is the more portable option, so I would probably use that. @vapier Isn't it reasonable to assume that /tmp is on one file system? |
you make it sound like OpenRC is unique or weird in its behavior. as i clearly mentioned above, it is not, so any reasonable system you move to would have similar default behavior. unless you write something from scratch, in which case you're free to do whatever non-standard leaky stuff you prefer. it's not like there's that many choices out there anymore since systemd became the defacto Linux standard.
but as i noted, that only helps with find, not with rm, and the rm code path is the default and most common. |
|
Why can't we just remove the |
|
if you read the code comments you'd see: i don't recall the exact scale needed before one starts to see a difference between the two, but i don't think it was "tons of files" |
|
A while ago, you noted that the logic in OpenRC originates from Debian (presumably by way of Gentoo baselayout-1). Looking at Debian, they never use Is this really something that needs performance optimization? |
|
yes, openrc is simply baselayout-2 rebranded and relicensed to BSD using rm was a measurable difference, and people want their system to boot fast. Debian didn't really focus on that since everything in their init setup was slow. it's easy to only use numbers from the latest top of the line systems and hardware stacks and not care about older or embedded systems. I think systemd just uses tmpfs for /tmp, so they wouldn't need to clean (like we already do if you set your system up to use tmpfs). if Debian has added -xdev to their find, I'd be fine adding it to ours. an argument could be made in either direction, but feels mostly bike shedding. |
|
@vapier I thought at the time that -mount was more portable because of this wording in the find man page:
But, I'm fine with -xdev since that is in posix. |
|
and the reason we have the rm is because we had user reports that the /tmp cleaning step took a long time I wonder if we can guide people towards the tmpfs method by default somehow |
|
I can make bootmisc complain on Linux systems if any directory in cleanup_tmp_dirs is not a mounted tmpfs; it would be a modification of the loop at line 207. |
|
@kaniini If we do this, would you want the warning to be on the 0.44.x branch? |
|
@vapier Do you have any more specific info about why we need the |
|
Yes, we should have a deprecation warning if the behavior is planned to change. Backporting would be appreciated. |
|
@vapier What do you think about attempting the following instead of the |
|
not using tmpfs is not, and should not be, a problem. we should not be warning people or making it seem like there is a problem with their system. there are plenty of legitimate reasons why one would not use or want a tmpfs mount at /tmp. we should recommend that people use one if they can, but that's it. we don't install the fstab file, but we can update the Gentoo default in baselayout and the handbook to mention it. that should at least help with new installs. other distros are also free to adjust their own setups and defaults accordingly without openrc being involved in their decision tree. /tmp can be full of cruft from the previous boot, and exist on slow media (like MMC or spinning media). running find in the first place is slower, and running it twice is worse. we shouldn't be penalizing everyone in order to support corrupted systems. this is the wrong trade off. we don't even know if this was the reason for the reported breakage in the first place -- a corrupt or hard linked directory could make the rootfs appear under /tmp and not as a mount (I.e. xdev wouldn't help). a corrupt filesystem could lead to data loss even just by writing to it regardless of file removal. it's impossible to protect against data loss when the system is already screwed. this is the entire reason we run fsck early on, and we don't code defensively against every possible method of corruption in every filesystem related operation. so I'll reiterate again: if you don't like clearing /tmp due to some philosophical reason, then we already offer multiple alternatives. |
Can you site these reports? how long ago were they? could this be some legasy thing that isn't an issue any longer?
I see that guidance being two parts.
|
|
this code stretches back decades, and some devs have a history of not documenting things well when they make commits (this is a huge ongoing problem in Gentoo even today), so unfortunately i only have my recollection of events and the code comments left. but as i said, i don't think telling people to buy modern big hardware is the answer to getting a fast boot. "legacy" is just code for "not my current desktop". the power & resource constraints of desktops from 10 years isn't that far off from embedded systems of today, or of container/microservices. if you want to put something in bootmisc, you can update the something like: |
|
Hi, @vapier didn't write that original conversion, but you need to go deep into history to see where it changed, and why. https://gitweb.gentoo.org/proj/baselayout.git/commit/init.d?h=fb86ee4424c8918a2e006ab24cf3517eecf61c02 is where it switched from The crucial thing is that find's If we're willing to be NOT-POSIX, then absolutely, switching to Otherwise, we're back to the point of Wiping
|
|
If |
|
I was thinking about the overall best course of action here, and I think it's likely to just go BACK to Debian's solution, which has evolved to be tmpfiles-based and inherited from systemd:
Regarding Maybe detection of |
|
@robbat2 @richfelker @kaniini @vapier How does the pr look now? Is this ok? |
|
our usage of tmpfiles.d today is pretty light. that said, i'm totally fine with making it a hard requirement and going all in on it. it'd allow us to deprecate the @williamh your PR is buggy & inconsistent. maybe focus on tmpfiles.d and avoid the debate. |
|
@vapier If I can avoid tmpfiles.d I'd rather do that, systemd-tmpfiles isn't portable. |
|
@vapier can you please clarify your statement on how |
i'm talking about the tmpfiles.d specification, not a specific implementation of the runtime. systemd-tmpfiles itself should work on all Linux systems. it builds independently of systemd and can be installed in isolation. Gentoo has more than proved that fact. opentmpfiles is a POSIX shell script that can run anywhere (ignoring the problems it had that we could rectify, but are choosing not to and instead telling people to use systemd-tmpfiles). we could delete the vast majority of bootmisc by installing a single tmpfiles.d entry, and leave it to users to add more. then say OpenRC requires a tmpfiles.d implementation at runtime and we're done. if you're extremely concerned about providing a tmpfiles.d that runs under *BSD's, then you could always dust off opentmpfiles (and my partial conversion to a standalone C implementation). continuing to bend over backwards to maintain our own ad-hoc
i didn't say it was safer, i said being defensive in the fact of filesystem corruption is a fools errand, and thus they're equiv at that point
i think you're misunderstanding things here. if /tmp/foobar is a hard linked directory to / or /usr, then yes, i'm aware that Linux tends to ban them, but that's at the level of where it won't allow them to be created via syscalls. in many filesystems, it is possible to create hardlinks in the FS itself via debugfs or raw FS manipulation, and since we're talking about FS corruption, that is possible. in fact, CrOS recently had a persistence root exploit that involved creating a directory hardlink in an EXT4 FS via debugfs. |
|
@vapier If you feel they are in fact equivalent, then you should have no fundamental objection to using I agree that using If Gentoo is about choice, it SHOULD be possible to ship a As for @kaniini's other request about being able to mostly disable the functionality:
That historical list:
And the other to clarify: what files in
|
|
Alpine would prefer to avoid tmpfiles.d, as that opens the pandora's box of importing systemd components, or writing our own implementation. We could write our own implementation, but that puts us in a position where we are permanently in a position of chasing systemd. We chose to do so with pkg-config, because pkg-config was not in a developmental state where we had to worry about chasing it. |
please do not extrapolate what i've said in directions that i did not say or mean. i clearly stated that, in the context of a corrupt filesystem, these commands are effectively equivalent wrt safety. that has nothing to do with their speed or overhead costs in the non-corrupt case, which is to say, the 99.9999% of cases.
you already have a choice: pick whatever tmpfilesd implementation you want, or write your own. much like you can pick your own POSIX compliant shell. or POSIX find implementation. or any other low level program. that's not the same as OpenRC providing multiple parallel implementations. it's a bit of a stretch to try and say "it's a matter of Gentoo choice" here.
i already documented how to do that. (*) currently the minimal set you can attain is what i already highlighted -- ICE-unix & X11-unix initialization.
moving individual entries out of the historical list and to their relevant packages sounds fine to me. but that's still orthogonal to the default blanket wipe (which we should retain), and to the fact that it's already possible to disable these things. |
|
It was pointed out to me that the -name switch was on the original find for a reason, so I put that back and added -xdev to all of the find calls. Also, if we make the directory, we don't need to wipe it since it will be empty. |
|
@richfelker please comment if you can also. |
kaniini commentedOct 7, 2021
In Alpine, we had an incident where a user's box got
rm -rf'd due to thebootmiscscript: https://gitlab.alpinelinux.org/alpine/aports/-/issues/13070It seems to me that we should just mount a tmpfs at
/tmp, and not "wipe" anything there at all.The text was updated successfully, but these errors were encountered: