-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement OOM score adjustment #435
Conversation
should send the ionice change as a sep PR so we can get that in independently (i know it's a sep commit, but GH doesn't make it easy to merge PR's partially) imo we should avoid short options unless there's a commonish meaning behind it. for example, |
Fine by me. I hate short options, especially in scripts where their meaning is never clear when I go back to look at the script I wrote even a week later. I only defined a short option because every other option had a short form. If you're happy to break that pattern, I'm happy to remove As for splitting into two PRs, how should I do that? The PR to add the Regarding merging, you're aware that you can merge PRs using the command line, right? GitHub will pick up the merge and reflect it properly on the web. I think if you merge only the first commit of this PR (which you could/should do as a FF, in my opinion), then GitHub will leave this PR open. |
Hi @whitslack, Yes, both prs should target master and one will have to be rebased after the other is merged. I do know that you can merge on the command line, but you get the full pr still when you do that. I agree also with @vapier that we should move away from short options. If I have my way I would deprecate short options where possible. |
@williamh: I'm confused. You don't have to merge from the tip of the PR branch.
What are you saying GitHub does in that case? Admittedly I've never done it, but my expectation is that the PR would be left open and would remain mergeable with no conflicts.
No, that's not necessary. You just leave the short option out of the |
83779b9
to
4a0ea5e
Compare
i'm not anti all short options, just using short options that are not common and/or obvious.
|
@vapier: I used Note that 0x1000 is going to make your switch table much sparser, which may not be as good for code size. |
def do not use values <127. it's confusing to read, will be even more confusing once we exhaust the limited non-printable ASCII space, and can be easy to introduce collisions if you're not careful (since they're all ints, i would actually look at codegen in -O2/-Os levels before deciding between 0x1000 & 0x80 base. i suspect it won't be that big of a deal, but i don't care enough at that point to bikeshed :p. |
etc/rc.conf
Outdated
@@ -120,6 +120,8 @@ | |||
# Or the ionice level. The format is class[:data] , just like the | |||
# --ionice start-stop-daemon parameter. | |||
#SSD_IONICELEVEL="2:2" | |||
# Or the OOM score adjustment. | |||
#SSD_OOMSCOREADJ="-1000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is -1000
a good default to leave here ? it basically means "never allow OOM to kill this". if you want to keep that, it'd be worth a comment, and a ref to the appropriate man page for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used -100 for my own critical services. Would you find that a more palatable suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not saying -1000
is never a good idea, but i strongly doubt that it should be the suggested default that people get by uncommenting a single line. the number of services that you'd want using this on any given system would usually be extremely small (probably like <5).
plus, config files with commented out lines are supposed to represent the actual default rather than some other random value. so here it'd be better to use #SSD_OOMSCOREADJ="0"
(which i think is the default behavior, but i don't have a way of easily checking), and then leave a comment pointing people to more extensive documentation which would cover valid ranges, as well as the underlying meaning & implications of these settings, as well as some recommended conventions.
this kind of thing is why i suggested sep PR's for sep issues ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this kind of thing is why i suggested sep PR's for sep issues ;).
This kind of thing is why I put the bugfix first and the controversial stuff subsequent, so the bugfix could be merged while we continue to hash out the details of the enhancement. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the default should be the value in the commented-out example config option, but that's not the case for the other options currently. (The default nice level is not -19, and the default I/O nice is not 2:2.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can/should fix the other bad ones ;).
i think there's a few reasonable methods for config files:
- the commented value is the default
- the inline docs clearly mention that the value is changing behavior from the default
- provide both so the default is clear, and can list one or two common alternatives that users would want without having to refer to extensive manual pages
since our configs have largely been doing (1), we should favor consistency here and just switch them all to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this kind of thing is why i suggested sep PR's for sep issues ;).
This kind of thing is why I put the bugfix first and the controversial stuff subsequent, so the bugfix could be merged while we continue to hash out the details of the enhancement. ;)
this is possible if doing manual git operations from the CLI, but afaik, not possible at all via the web interfaces.
i'm comfortable with manipulating/extracting git objects, but i don't think it's common, and even then, GH doesn't exactly make it an easy process. figuring out the exact remote & ref to manually fetch is tedious.
a sep PR makes it a lot easier to just smash the merge button. basically GH is tailored for flows where a single PR maps to a single "feature", and not tailored for unrelated things. ignoring the "here's a grab bag of fixes that everyone agrees on" type of thing.
my main desktop had hardware failures recently, so i've (temporarily) lost easy access to do any manual surgery from the CLI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the commented value is the default
since our configs have largely been doing (1), we should favor consistency here and just switch them all to that.
The slight gotcha here is that setting SSD_NICELEVEL="0"
will not actually cause start-stop-daemon
and supervise-daemon
to set the priority of the daemon process to 0. Rather, no change in priority will be applied. The same goes for the new SSD_OOMSCOREADJ
variable: setting it to 0 is equivalent to leaving it unspecified: no change to the value will be applied. Granted, the expectation is that start-stop-daemon
will be exec'd in a process that already has priority 0 and OOM score adjustment 0, but that's not necessarily the case. I'm not sure that the commented-out suggestions for these variables should have values that are no-ops.
Perhaps I ought to choose an out-of-range value as the sentinel ("unset") value so that explicitly specifying 0 for these options will cause the priority/oom_score_adj value to be affirmatively set to 0. Then the commented-out suggestions in the example config file could all meaningfully have 0 values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I ought to choose an out-of-range value as the sentinel ("unset") value so that explicitly specifying 0 for these options will cause the priority/oom_score_adj value to be affirmatively set to 0. Then the commented-out suggestions in the example config file could all meaningfully have 0 values.
Okay. I'll change it.
Are you certain that |
I also don't care to bikeshed. I'm just supporting my argument that making a sparse switch is less code-efficient. I'll implement whatever you say. |
i think you're missing some aspects of how locales work in POSIX runtimes. nowhere in openrc do we call that said, when i added the so only use <0x80 values in getopts_long when it's actual printable characters. if we're opting the short option, use >=0x80. |
That may be true now. I'd argue it's better to avoid the fragility of assuming anything about running in any particular locale, even the POSIX one. You don't want to tie your hands for the future in case you ever want to implement something where you'd want to depend on the user's preferred locale.
Okay, I'll base the long-only option value at 0x80. |
making assumptions about POSIX locale behavior is the entire point of the POSIX locale. it's an exact minimal definition, and anything that deviates from it is horribly broken and deserves to blow up and is not our problem. it'd be like saying you can't make assumptions about ASCII codepoints. i'm not tied to the use of isprint in the usage code as i alluded earlier. switching to a hardcoded 0x80 test would be fine. which would really be replacing one set of assumptions with another (in this case, encoding ASCII codepoint assumptions at compile time about C compiler behavior). but i'll also note that writing locale portable code is a fun challenge by itself, and this particular aspect would hardly be the biggest issue. i'll wager that we'll never need to support locale in our tools directly, and even if we wanted to, doing anything other than UTF-8 would be a waste of time, and supporting non-ASCII CLI flags would be a terrible idea. |
src/rc/start-stop-daemon.c
Outdated
@@ -790,6 +803,15 @@ int main(int argc, char **argv) | |||
eerrorx("%s: ioprio_set %d %d: %s", applet, | |||
ionicec, ioniced, strerror(errno)); | |||
|
|||
if (oom_score_adj != 0) { | |||
fp = fopen("/proc/self/oom_score_adj", "w"); | |||
if (! fp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
omit the space after the !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I change the other occurrence(s) of this at the same time? It's not my style; I was mimicking what's already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, if you want to adjust that by itself, sounds fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no semantic changes in this commit. Suggested-by: Mike Frysinger <vapier@gentoo.org> See: OpenRC#435 (review)
…_adj 0 Previously, specifying --nicelevel=0 or --oom-score-adj=0 was equivalent to not specifying the option. This meant that there was no way to set the nice level or OOM score adjustment of the launched daemon process affirmatively to 0. This commit changes the sentinel values, indicating an unspecified option, from 0 to an out-of-range value (INT_MIN) so that specifying an option value as 0 will actually cause the value 0 to be applied to the corresponding process knob. Additionally, per a suggestion by Mike Frysinger, the suggested values for the SSD_NICELEVEL, SSD_IONICELEVEL, and SSD_OOMSCOREADJ variables in the example config file are now given as zeros, which are the kernel's default values of these process knobs for the init process at boot. Note that uncommenting any of these zero-valued suggestions will cause SSD/SD to set the corresponding process knob affirmatively to zero, whereas leaving the variable unset (and the equivalent command- line option unspecified) means SSD/SD will not change the corresponding process knob from its inherited value. See: OpenRC#435 (comment)
@whitslack Please rebase this on current master and I'll take a look. |
This commit adds a new --oom-score-adj option to start-stop-daemon and supervise-daemon, as well as an equivalent SSD_OOMSCOREADJ environment variable. If either of these are specified (with the command-line option taking precedence), then the specified adjustment value is written to /proc/self/oom_score_adj after forking but prior to exec'ing the daemon (at the time when nice and ionice are applied). Additionally, per a suggestion by Mike Frysinger, the suggested values for the SSD_NICELEVEL, SSD_IONICELEVEL, and SSD_OOMSCOREADJ variables in the example config file are now given as zeros, which are the kernel's default values of these process knobs for the init process at boot. Note that uncommenting any of these zero-valued suggestions will cause SSD/SD to set the corresponding process knob affirmatively to zero, whereas leaving the variable unset (and the equivalent command- line option unspecified) means SSD/SD will not change the corresponding process knob from its inherited value. See: OpenRC#435 (comment)
There are no semantic changes in this commit. Suggested-by: Mike Frysinger <vapier@gentoo.org> See: OpenRC#435 (review)
1238680
to
4f43085
Compare
@williamh: Rebased (and squashed) as requested. |
@whitslack All of the code style adjustments in this pr make it hard to review. Are you willing to fix the pr to only contain the code for the oom score adjustment implementation? |
@williamh: 😅 That's how it was to start, but then @vapier asked for a code style adjustment. I put all the code style changes in their own commit with no other changes, so if you're looking to review only the OOM score changes, they're all in d779c5e. |
There are no semantic changes in this commit. Suggested-by: Mike Frysinger <vapier@gentoo.org> See: #435 (review)
i peeled the style change out and merged it by itself. can you rebase please ? |
I just did the same with the error message fix; it is merged into master. |
|
@vapier Ok, that's what I thought, I'm working on that locally. |
This commit adds a new --oom-score-adj option to start-stop-daemon and supervise-daemon, as well as an equivalent SSD_OOMSCOREADJ environment variable. If either of these are specified (with the command-line option taking precedence), then the specified adjustment value is written to /proc/self/oom_score_adj after forking but prior to exec'ing the daemon (at the time when nice and ionice are applied). Additionally, per a suggestion by Mike Frysinger, the suggested values for the SSD_NICELEVEL, SSD_IONICELEVEL, and SSD_OOMSCOREADJ variables in the example config file are now given as zeros, which are the kernel's default values of these process knobs for the init process at boot. Note that uncommenting any of these zero-valued suggestions will cause SSD/SD to set the corresponding process knob affirmatively to zero, whereas leaving the variable unset (and the equivalent command- line option unspecified) means SSD/SD will not change the corresponding process knob from its inherited value. See: OpenRC#435 (comment) code style: remove space after unary "not" operator There are no semantic changes in this commit. Suggested-by: Mike Frysinger <vapier@gentoo.org> See: OpenRC#435 (review)
4f43085
to
83550a7
Compare
If you make that change, then you should also change |
to be clear, the nice settings don't have corresponding files whose names would be matched to. but i don't disagree with using |
This PR adds a new
--oom-score-adj
option tostart-stop-daemon
andsupervise-daemon
to complement the existing--nicelevel
and--ionice
options. Often it is advantageous to depress the OOM score of critical system services so that the kernel's OOM killer will prefer to kill more "disposable" processes in OOM scenarios. SSD and SD previously lacked a clean way to adjust the OOM scores of the daemons they start. With this patchset, now theoom_score_adj
of launched daemons can be specified either by the new command-line option or by a newSSD_OOMSCOREADJ
environment variable, which is akin toSSD_NICELEVEL
andSSD_IONICELEVEL
.Note: While crafting this enhancement, I noticed that
supervise-daemon
was missing support for theSSD_IONICELEVEL
environment variable, as though it had been overlooked when such support was added tostart-stop-daemon
, so I took the opportunity to fill in the missing implementation (in a separate commit, which precedes the commit that holds the real meat of this PR).