New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(90kernel-modules): use AWK instead of shell monster #2002
Conversation
~120ms -> ~2ms Downstream: https://bugs.debian.org/1017411
I continue to fail to see where you get the idea that this ends up in the initrd. |
I just realized as well, so I deleted my earlier comment as it is not relevant, sorry about that. I will try to take a closer look later, but this PR seems like a good optimization. |
Thanks for finding this. I'd use |
@nabijaczleweli, would you consider measuring grep as well to make a decision between grep and awk for this optimization ? |
I did; it's benchmark 2 in OP |
I'd prefer to keep the spec file, but it needs to be updated and actively tested (which it currently is not). |
Sorry. missed it. Thank you ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Only the test if $srcmods/modules.dep exists is missing, do awk may fail if the file does not exist. Besides that it LGTM. |
That bit is covered by also uhoh i clicked a control by accident, my b |
I'm rather in favor of dropping awk as well....
I'll send PR with backports from Fedora (there's not too much difference). AFAIK we actually use it without any modifications in c9s (and therefore RHEL9): https://gitlab.com/redhat/centos-stream/rpms/dracut/-/blob/c9s/dracut.spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing my earlier approval based on recent comments advocating using grep instead of awk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing my earlier approval based on recent comments advocating using grep instead of awk
Given that we have a Debian contributor who could probably not care less about how things are being package in Fedora/RHEL or (Open)Suse yet the reviewers here seemingly seem to be basing their review on how this contribution affects downstream packaging here's a kindly reminder to the reviewers... Which packaging solution and how downstream decides to package things does not dictate or decide how or whether we ack or nack PR from contributors or otherwise requires them to make changes to their PR's. Reviews should not be influenced how things are being packaged in the distribution the reviewer currently uses or how it affect his or hers maintenance downstream if he or she happens to be a downstream package maintainer for dracut. |
Wait so let me get this straight: everyone likes this approach except fedora maintainers who would rather spew pages here than fix their packages to include a dependency on literally any AWK implementation, that dracut already uses in base host-side code, and their inability to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be that for "basic dracut" the project would only introduce a new dependency for a well justified reason. "basic dracut" here means functionality that most dracut user use.
This opinion is based on dependency management and not on distribution packaging.
I understand dracut ("host side") already depends on awk, but in practice that seems to be for zfs only. This PR makes basic functionality depend on awk (technically this PR is for a dracut module and not core dracut, but in practice most dracut installations would execute this code).
The patch sent to Debian by @nabijaczleweli is with grep and not with awk - https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1017411;filename=150ms.patch;msg=10
I suspect most dracut users have gawk installed and based on the perf test "on gawk systems grep|cut is faster".
I did not mean to block this PR from landing. Not sure how to tell this to Github given that I approved it earlier.
I am going to re-approve it now, but I prefer not to land it and continue the discussion.
Yes, you may also notice that that patch is trivially broken. If it were correct, I'd obviously prefer Just grep, no surprises there. Here's popcon stats for mawk vs gawk installations in Debian and its derivatives (https://qa.debian.org/popcon-graph.php?packages=dracut+dracut-core+mawk+nawk+gawk&show_installed=on&want_percent=on&want_legend=on&want_ticks=on&from_date=&to_date=&hlght_date=&date_fmt=%25Y-%25m&beenhere=1; untick percent for first): Here we can see that, in a.d. 2022, the decided majority of users has mawk installed but not gawk installed, and this has been the trend since ~2007 (for those that have both, gawk takes precedence as awk by default (Priority: 10 for gawk and Priority: 5 for mawk), so that's the upper limit on gawk use – users are free to pick mawk and I know anecdotally that I and the users I know do; there's no alternatives data reported by popcon). Given the above, there appears to be 0 reason to fret about Pulling In An AWK, because all systems have one, and very little reason to expect Most Users to have awk=gawk instead of awk=mawk. |
I'm sorry, I disagree. Since distributions are the ones contributing; shipping; and using it, their feedback absolutely matters. Note that from my perspective in primarily RPM-based distributions, I was fine with it and approved it. @pvalena was expressing a preference but not blocking it, @LaszloGombos was fine with it as well. Of course the Debian contributor's opinion matters too, the point of us working together upstream is to resolve these kinds of things and get a solution we're all happy with. If nobody is happy with what we're doing upstream, then they're just going to fork it or use something else. For the adoption of dracut to continue, we need to work with our stakeholders to ensure their success too. |
You can disagree all you want but the reality is that there are individuals from within distribution that make contributions not distributions themselves which is what matters and the fact is we cannot be everything to everyone nor can we have everything we do be dictated and decided by downstream distro's, their packaging solution, politics or release cycles and we most certainly wont be partaking in any downstream packaging solution here upstream anymore. On top of that we are bound to progress at the rate that the upstream components that we rely on evolve not somekind of downstream distribution release cycles, enterprise or otherwise and the fact is there will always be competition, heck there seems to be new initramfs tool popping up every year in a new language and in a new manner and there is absolutely nothing wrong with that since that's a good thing. In the end of the day the consumers will always choose the tool that works best for them not what we try to be or we think works best for them, sometimes that tool is dracut and we get feedback and contribution flow here upstream as a result of that and sometimes some other project gets feedback and contribution flowing to it and there is absolutely nothing wrong with that we dont have to strive for world domination, there is enough space for multiple initramfs tools in the f/oss ecosystem. |
IIUC that patch lacks the
Yes, this is what I implicitly meant in #2002 (comment). Usually new dependencies to external packages are added for a good reason, and if the same functionality can be implemented using a current dependency, it sounds better, although the dependency is a package usually installed by default. |
Paraphrasing Debian#1017411:
In searching for low-hanging time fruit, I found that 90kernel-modules does
which after shedding the bashisms is
or
so
So on GAWK systems grep|cut is faster and on MAWK systems awk is faster.
Not that it's by much, so it's editor's choice, really.
Checklist