Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(90kernel-modules): use AWK instead of shell monster #2002

Merged
merged 1 commit into from Oct 22, 2022

Conversation

nabijaczleweli
Copy link
Contributor

@nabijaczleweli nabijaczleweli commented Oct 13, 2022

Paraphrasing Debian#1017411:

In searching for low-hanging time fruit, I found that 90kernel-modules does

find_kernel_modules_external() {
    local a

    [[ -f "$srcmods/modules.dep" ]] || return 0

    while IFS=: read -r a _ || [[ $a ]]; do
        [[ $a =~ ^/ ]] && printf "%s\n" "$a"
    done < "$srcmods/modules.dep"
}

find_kernel_modules_external | instmods

which after shedding the bashisms is

grep ^/ "$srcmods/modules.dep" 2> /dev/null | cut -d: -f1 | instmods

or

awk -F: '/^\// {print $1}'  "$srcmods/modules.dep" 2> /dev/null | instmods

so

$ find_kernel_modules_external() {
>     local a
> 
>     [[ -f "$srcmods/modules.dep" ]] || return 0
> 
>     while IFS=: read -r a _ || [[ $a ]]; do
>         [[ $a =~ ^/ ]] && printf "%s\n" "$a"
>     done < "$srcmods/modules.dep"
> }
$ export -f find_kernel_modules_external
$ export srcmods=/lib/modules/5.10.0-17-amd64
$ hyperfine -S'bash --norc' 'find_kernel_modules_external' 'grep ^/ "$srcmods/modules.dep" 2> /dev/null | cut -d: -f1' 'mawk -F: '\''/^\// {print $1}'\''  "$srcmods/modules.dep" 2> /dev/null' 'gawk -F: '\''/^\// {print $1}'\''  "$srcmods/modules.dep" 2> /dev/null'
Benchmark 1: find_kernel_modules_external
  Time (mean ± σ):     121.2 ms ±   4.8 ms    [User: 92.9 ms, System: 28.3 ms]
  Range (min … max):   115.8 ms … 131.2 ms    22 runs

Benchmark 2: grep ^/ "$srcmods/modules.dep" 2> /dev/null | cut -d: -f1
  Time (mean ± σ):       4.0 ms ±   0.3 ms    [User: 3.7 ms, System: 2.0 ms]
  Range (min … max):     2.5 ms …   4.6 ms    648 runs

Benchmark 3: mawk -F: '/^\// {print $1}'  "$srcmods/modules.dep" 2> /dev/null
  Time (mean ± σ):       2.8 ms ±   0.5 ms    [User: 1.8 ms, System: 1.3 ms]
  Range (min … max):     1.3 ms …   3.4 ms    596 runs

Benchmark 4: gawk -F: '/^\// {print $1}'  "$srcmods/modules.dep" 2> /dev/null
  Time (mean ± σ):       6.0 ms ±   0.4 ms    [User: 4.4 ms, System: 2.0 ms]
  Range (min … max):     4.0 ms …   6.9 ms    333 runs

Summary
  'mawk -F: '/^\// {print $1}'  "$srcmods/modules.dep" 2> /dev/null' ran
    1.43 ± 0.29 times faster than 'grep ^/ "$srcmods/modules.dep" 2> /dev/null | cut -d: -f1'
    2.16 ± 0.44 times faster than 'gawk -F: '/^\// {print $1}'  "$srcmods/modules.dep" 2> /dev/null'
   43.53 ± 8.51 times faster than 'find_kernel_modules_external'

So on GAWK systems grep|cut is faster and on MAWK systems awk is faster.
Not that it's by much, so it's editor's choice, really.

Checklist

  • I have tested it locally – I've had this on my system since I reported the Debian bug
  • I have reviewed and updated any documentation if relevant
  • I am providing new code and test(s) for it

@github-actions github-actions bot added kernel-modules Issues related to the kernel-modules module modules Issue tracker for all modules labels Oct 13, 2022
@nabijaczleweli
Copy link
Contributor Author

I continue to fail to see where you get the idea that this ends up in the initrd.

@LaszloGombos
Copy link
Collaborator

LaszloGombos commented Oct 13, 2022

I continue to fail to see where you get the idea that this ends up in the initrd.

I just realized as well, so I deleted my earlier comment as it is not relevant, sorry about that. I will try to take a closer look later, but this PR seems like a good optimization.

@aafeijoo-suse
Copy link
Member

Thanks for finding this. I'd use grep to avoid having to add a require to awk in the spec file... but it seems upstream is going to drop the spec file sooner or later, and we already implicitly require awk since 9582f027

@LaszloGombos
Copy link
Collaborator

I'd use grep to avoid having to add a require to awk

@nabijaczleweli, would you consider measuring grep as well to make a decision between grep and awk for this optimization ?

@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Oct 17, 2022

I did; it's benchmark 2 in OP

@Conan-Kudo
Copy link
Member

Thanks for finding this. I'd use grep to avoid having to add a require to awk in the spec file... but it seems upstream is going to drop the spec file sooner or later, and we already implicitly require awk since 9582f027

I'd prefer to keep the spec file, but it needs to be updated and actively tested (which it currently is not).

@Conan-Kudo Conan-Kudo enabled auto-merge (rebase) October 17, 2022 21:31
@LaszloGombos
Copy link
Collaborator

I did; it's benchmark 2 in OP

Sorry. missed it. Thank you !

Copy link
Collaborator

@LaszloGombos LaszloGombos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Mrfai
Copy link
Contributor

Mrfai commented Oct 20, 2022

Only the test if $srcmods/modules.dep exists is missing, do awk may fail if the file does not exist. Besides that it LGTM.

@nabijaczleweli
Copy link
Contributor Author

That bit is covered by 2> /dev/nulling AWK

also uhoh i clicked a control by accident, my b

@pvalena
Copy link
Contributor

pvalena commented Oct 21, 2022

Thanks for finding this. I'd use grep to avoid having to add a require to awk in the spec file... but it seems upstream is going to drop the spec file sooner or later, and we already implicitly require awk since 9582f027

I'm rather in favor of dropping awk as well....

I'd prefer to keep the spec file, but it needs to be updated and actively tested (which it currently is not).

I'll send PR with backports from Fedora (there's not too much difference). AFAIK we actually use it without any modifications in c9s (and therefore RHEL9):

https://gitlab.com/redhat/centos-stream/rpms/dracut/-/blob/c9s/dracut.spec

Copy link
Collaborator

@LaszloGombos LaszloGombos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing my earlier approval based on recent comments advocating using grep instead of awk

Copy link
Collaborator

@LaszloGombos LaszloGombos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing my earlier approval based on recent comments advocating using grep instead of awk

@johannbg
Copy link
Collaborator

Given that we have a Debian contributor who could probably not care less about how things are being package in Fedora/RHEL or (Open)Suse yet the reviewers here seemingly seem to be basing their review on how this contribution affects downstream packaging here's a kindly reminder to the reviewers...

Which packaging solution and how downstream decides to package things does not dictate or decide how or whether we ack or nack PR from contributors or otherwise requires them to make changes to their PR's.

Reviews should not be influenced how things are being packaged in the distribution the reviewer currently uses or how it affect his or hers maintenance downstream if he or she happens to be a downstream package maintainer for dracut.

@nabijaczleweli
Copy link
Contributor Author

Wait so let me get this straight: everyone likes this approach except fedora maintainers who would rather spew pages here than fix their packages to include a dependency on literally any AWK implementation, that dracut already uses in base host-side code, and their inability to add Depends: awk or whatever the fedora spelling is means they vetoed this somehow? Does fedora even ship AWK that this is somehow a problem?

Copy link
Collaborator

@LaszloGombos LaszloGombos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be that for "basic dracut" the project would only introduce a new dependency for a well justified reason. "basic dracut" here means functionality that most dracut user use.

This opinion is based on dependency management and not on distribution packaging.

I understand dracut ("host side") already depends on awk, but in practice that seems to be for zfs only. This PR makes basic functionality depend on awk (technically this PR is for a dracut module and not core dracut, but in practice most dracut installations would execute this code).

The patch sent to Debian by @nabijaczleweli is with grep and not with awk - https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1017411;filename=150ms.patch;msg=10

I suspect most dracut users have gawk installed and based on the perf test "on gawk systems grep|cut is faster".

I did not mean to block this PR from landing. Not sure how to tell this to Github given that I approved it earlier.

I am going to re-approve it now, but I prefer not to land it and continue the discussion.

@nabijaczleweli
Copy link
Contributor Author

nabijaczleweli commented Oct 22, 2022

Yes, you may also notice that that patch is trivially broken. If it were correct, I'd obviously prefer Just grep, no surprises there.

Here's popcon stats for mawk vs gawk installations in Debian and its derivatives (https://qa.debian.org/popcon-graph.php?packages=dracut+dracut-core+mawk+nawk+gawk&show_installed=on&want_percent=on&want_legend=on&want_ticks=on&from_date=&to_date=&hlght_date=&date_fmt=%25Y-%25m&beenhere=1; untick percent for first):
image
image

Here we can see that, in a.d. 2022, the decided majority of users has mawk installed but not gawk installed, and this has been the trend since ~2007 (for those that have both, gawk takes precedence as awk by default (Priority: 10 for gawk and Priority: 5 for mawk), so that's the upper limit on gawk use – users are free to pick mawk and I know anecdotally that I and the users I know do; there's no alternatives data reported by popcon).

Given the above, there appears to be 0 reason to fret about Pulling In An AWK, because all systems have one, and very little reason to expect Most Users to have awk=gawk instead of awk=mawk.

@Conan-Kudo
Copy link
Member

Which packaging solution and how downstream decides to package things does not dictate or decide how or whether we ack or nack PR from contributors or otherwise requires them to make changes to their PR's.

I'm sorry, I disagree. Since distributions are the ones contributing; shipping; and using it, their feedback absolutely matters. Note that from my perspective in primarily RPM-based distributions, I was fine with it and approved it. @pvalena was expressing a preference but not blocking it, @LaszloGombos was fine with it as well.

Of course the Debian contributor's opinion matters too, the point of us working together upstream is to resolve these kinds of things and get a solution we're all happy with. If nobody is happy with what we're doing upstream, then they're just going to fork it or use something else. For the adoption of dracut to continue, we need to work with our stakeholders to ensure their success too.

@johannbg johannbg merged commit 77ac95d into dracutdevs:master Oct 22, 2022
@johannbg
Copy link
Collaborator

johannbg commented Oct 22, 2022

Which packaging solution and how downstream decides to package things does not dictate or decide how or whether we ack or nack PR from contributors or otherwise requires them to make changes to their PR's.

I'm sorry, I disagree. Since distributions are the ones contributing; shipping; and using it, their feedback absolutely matters.

You can disagree all you want but the reality is that there are individuals from within distribution that make contributions not distributions themselves which is what matters and the fact is we cannot be everything to everyone nor can we have everything we do be dictated and decided by downstream distro's, their packaging solution, politics or release cycles and we most certainly wont be partaking in any downstream packaging solution here upstream anymore.

On top of that we are bound to progress at the rate that the upstream components that we rely on evolve not somekind of downstream distribution release cycles, enterprise or otherwise and the fact is there will always be competition, heck there seems to be new initramfs tool popping up every year in a new language and in a new manner and there is absolutely nothing wrong with that since that's a good thing.

In the end of the day the consumers will always choose the tool that works best for them not what we try to be or we think works best for them, sometimes that tool is dracut and we get feedback and contribution flow here upstream as a result of that and sometimes some other project gets feedback and contribution flowing to it and there is absolutely nothing wrong with that we dont have to strive for world domination, there is enough space for multiple initramfs tools in the f/oss ecosystem.

@aafeijoo-suse
Copy link
Member

Yes, you may also notice that that patch is trivially broken. If it were correct, I'd obviously prefer Just grep, no surprises there.

IIUC that patch lacks the cut part.

My preference would be that for "basic dracut" the project would only introduce a new dependency for a well justified reason. "basic dracut" here means functionality that most dracut user use.

This opinion is based on dependency management and not on distribution packaging.

I understand dracut ("host side") already depends on awk, but in practice that seems to be for zfs only. This PR makes basic functionality depend on awk (technically this PR is for a dracut module and not core dracut, but in practice most dracut installations would execute this code).

Yes, this is what I implicitly meant in #2002 (comment). Usually new dependencies to external packages are added for a good reason, and if the same functionality can be implemented using a current dependency, it sounds better, although the dependency is a package usually installed by default.
I wouldn't have added awk as a requirement just for this change, but it's just my opinion anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel-modules Issues related to the kernel-modules module modules Issue tracker for all modules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants