Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collection: Add code to parse kernel config files #960

Merged
merged 3 commits into from
May 31, 2023

Conversation

eiffel-fl
Copy link
Contributor

This code uses existing golang package and writes the corresponding value to .kconfig section [1].
It is mainly a translation from C of libbpf code.

Signed-off-by: Francis Laniel flaniel@linux.microsoft.com
[1] https://github.com/ti-mo/kconfig

Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay. I think the approach is fine, we should just copy in Timo's code and improve on it. collection.go is already pretty large, how about you put the new code into a package internal/kconfig or something like that?

We can probably get away with two (exported) functions:

func ParseKconfig(file string) (map[string]string, error)
func PutKconfigValue([]byte, btf.Type, value string) error

Have you thought about how you're going to test this? We need coverage of the parser and of programs referencing the various kinds of .kconfig (libbpf_tristate, regular bool, int, etc.).

collection.go Outdated Show resolved Hide resolved
collection.go Outdated

path := fmt.Sprintf("/boot/config-%s", kernelRelease)

ret := kconfig.New()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kconfig.Kconfig abstraction seems overkill for us, I'd just port https://github.com/ti-mo/kconfig/blob/0708bf82969f7e34fdc1b0edc60a523a249280fe/parser.go#L11-L49 into internal.ParseKconfig(file string) (map[string]string, error) or similar. Some things to keep in mind:

  • The code in Timo's repo is unlicensed, so we need to get his OK to relicense to MIT. cc @ti-mo
  • I think we'll need to avoid regexp, since it's a fairly heavy dependency. You can hand write a parser using strings.Cut. Please make sure to add good test coverage here. I'm a bit worried that Kconfig is a fickle format.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already gave my permission offline, but for the record: go ahead! 🙂

collection.go Outdated
triModule triState = 2
)

func addKconfigValueTri(data []byte, typ btf.Type, value string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not take triState instead of string?

Suggested change
func addKconfigValueTri(data []byte, typ btf.Type, value string) error {
func addKconfigValueTri(data []byte, typ btf.Type, value triState) error {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the add*() takes a string as last argument, I think this is better to keep with this signature for the moment.

Copy link
Collaborator

@lmb lmb May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only understood what you meant after I read through the code: a kconfig can be a tristate, a bool, an integer or a string. It's therefore easiest to represent it as a string most of the time.

collection.go Outdated Show resolved Hide resolved
collection.go Outdated
data[0] = byt
case *btf.Enum:
enum := typ.(*btf.Enum)
if enum.Name != "libbpf_tristate" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a link to where this is defined / how it ends up in the C?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it is defined here:
https://github.com/libbpf/libbpf/blob/fbd60dbff51c870f5e80a17c4f2fd639eb80af90/src/bpf_helpers.h#L169
I translated the value to some const as there is no enum in golang.

collection.go Outdated Show resolved Hide resolved
collection.go Outdated Show resolved Hide resolved
collection.go Outdated Show resolved Hide resolved
collection.go Outdated Show resolved Hide resolved
collection.go Outdated Show resolved Hide resolved
@eiffel-fl
Copy link
Contributor Author

I addressed the majority of your comments, I will begin to write the tests for it by taking inspiration from libbpf ones:
https://github.com/torvalds/linux/blob/61d325dcbc05d8fef88110d35ef7776f3ac3f68b/tools/testing/selftests/bpf/prog_tests/core_extern.c#L1
https://github.com/torvalds/linux/blob/61d325dcbc05d8fef88110d35ef7776f3ac3f68b/tools/testing/selftests/bpf/progs/test_core_extern.c#L1

I am nonetheless wondering about how we should deal with this:
16934f3#diff-22046c426f0998f350e27ce487c2e9db214a0ae7b79bca7290144a193a0b4cc6R611-R618
https://github.com/torvalds/linux/blob/61d325dcbc05d8fef88110d35ef7776f3ac3f68b/tools/lib/bpf/libbpf.c#L7668-L7674

Should we add some initial support to __weak? Just to detect if something was actually marked as __weak to mimic libbpf behavior?

@eiffel-fl eiffel-fl force-pushed the francis/kconfig-parsing branch 4 times, most recently from 0ce8c28 to a1988c0 Compare April 26, 2023 16:08
@lmb
Copy link
Collaborator

lmb commented Apr 26, 2023

Just to detect if something was actually marked as __weak to mimic libbpf behavior?

As far as I remember we refuse all weak symbols so far (please double check), so just implementing strong sym support for now should be enough?

@eiffel-fl eiffel-fl force-pushed the francis/kconfig-parsing branch 2 times, most recently from 941934e to ac1991e Compare April 26, 2023 18:02
@eiffel-fl
Copy link
Contributor Author

eiffel-fl commented Apr 26, 2023

Just to detect if something was actually marked as __weak to mimic libbpf behavior?

As far as I remember we refuse all weak symbols so far (please double check), so just implementing strong sym support for now should be enough?

OK, for the moment let's return an error if something is unknown.

Regarding testing, so far I added unit tests but I would like to add integration ones (i.e. tests that creates eBPF program).
To do so, I would like to mimic libbpf but I would need to add a way to pass an "extra" .kconfig string to the program or map creation, like what is done with:
https://github.com/torvalds/linux/blob/5c7ecada25d2086aee607ff7deb69e77faa4aa92/tools/lib/bpf/libbpf.c#L622
https://github.com/torvalds/linux/blob/5c7ecada25d2086aee607ff7deb69e77faa4aa92/tools/testing/selftests/bpf/prog_tests/core_extern.c#L133

I think I can find a way to add, but I doubt it will be a proper way.
So, do you have any suggestion regarding this?
Basically, should it be given as a MapOptions, a ProgramOptions or something other?

@eiffel-fl eiffel-fl force-pushed the francis/kconfig-parsing branch 4 times, most recently from 6fd050f to 2dd68e2 Compare April 27, 2023 16:34
@eiffel-fl eiffel-fl marked this pull request as ready for review April 27, 2023 16:34
@eiffel-fl
Copy link
Contributor Author

eiffel-fl commented Apr 27, 2023

I mark this PR as ready for review as I think we are not so far from the truth.

I nonetheless would like to get your opinion on several point:

  1. For the purpose of the test, I added AdditionalKconfig to Map type, this is not very clean so if you any way to make it cleaner feel free to share and I will patch the actual.
  2. There is a problem with CONFIG_STR where I get the following error: load BTF: variable CONFIG_STR: getting alignment: can't calculate alignment of *btf.Array.
    Without it, the integration test passes fine.

I am also wondering if rather than creating the big map[string]string storing all the CONFIG_* if we should not rather store the CONFIG_ defined as extern somewhere, read the config file and while reading only adding the value to the map if we have a corresponding extern:
https://github.com/torvalds/linux/blob/32f7ad0fbe7521de2a5e8f79c33d46110247fd7c/tools/lib/bpf/libbpf.c#L1906-L1908
This avoids having a very big map for only two values.

Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the important bits are here! Some thoughts:

  • Maybe I've mentioned this before, but I'm not a big fan of integration tests. They end up being cumbersome to maintain and usually don't cover a lot of the edge cases. For example, we currently don't test for min/max values of integers, all possible bools, tristates, etc. Doing that through the ELF reader is a big bore. Why not split the problem into two? First, add tests for the various putKconfigValue... in package kconfig. Make sure we handle out of bounds values correctly. Second, extend the existing kconfig ELF reader test to pull in a very common, stable kconfig like CONFIG_HZ. That gets rid of all of the AdditionalKconfig mess and also allows us to drop PatchKconfig.
  • Can you add a small benchmark to measure how expensive parsing kconfig is? We currently parse it even if the BPF doesn't end up requiring it. I think that's the correct choice as long as parsing is not super slow.

internal/kconfig/kconfig.go Show resolved Hide resolved
internal/kconfig/kconfig.go Show resolved Hide resolved
internal/kconfig/kconfig.go Outdated Show resolved Hide resolved
internal/kconfig/kconfig.go Outdated Show resolved Hide resolved
internal/kconfig/kconfig.go Outdated Show resolved Hide resolved
collection.go Outdated Show resolved Hide resolved
testdata/kconfig_config.c Outdated Show resolved Hide resolved
// Highly inspired from libbpf test:
// https://github.com/torvalds/linux/blob/6e98b09da931a00bf4e0477d0fa52748bf28fcce/tools/testing/selftests/bpf/progs/test_core_extern.c
// The modifications are the following:
// 1. Removed __weak as cilium/ebpf does not support it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure we don't introduce behaviour that is incompatbile with adding __weak later: what does libbpf do when it finds a symbol that is marked weak but doesn't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testdata/kconfig_config.c Outdated Show resolved Hide resolved
testdata/kconfig_config.c Outdated Show resolved Hide resolved
@lmb
Copy link
Collaborator

lmb commented May 2, 2023

2. I get the following error: load BTF: variable CONFIG_STR: getting alignment: can't calculate alignment of *btf.Array.

That's because alignof doesn't understand arrays:

ebpf/btf/types.go

Lines 655 to 662 in d8d898a

switch t := UnderlyingType(typ).(type) {
case *Enum:
n = int(t.size())
case *Int:
n = int(t.Size)
default:
return 0, fmt.Errorf("can't calculate alignment of %T", t)
}
This might be enough to fix things:

diff --git a/btf/types.go b/btf/types.go
index 3fb5499..38cc76a 100644
--- a/btf/types.go
+++ b/btf/types.go
@@ -657,6 +657,8 @@ func alignof(typ Type) (int, error) {
                n = int(t.size())
        case *Int:
                n = int(t.Size)
+       case *Array:
+               return alignof(t.Type)
        default:
                return 0, fmt.Errorf("can't calculate alignment of %T", t)
        }

@eiffel-fl eiffel-fl force-pushed the francis/kconfig-parsing branch 7 times, most recently from 9e6cde8 to ae82cb2 Compare May 5, 2023 13:58
Copy link
Contributor Author

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review!
I think we are almost ready even though I am wondering about reducing the memory occupancy by only having a map containing the CONFIG_* used in the eBPF code rather than all the kernel config.

map.go Outdated Show resolved Hide resolved
btf/types.go Show resolved Hide resolved
collection.go Outdated Show resolved Hide resolved
@ti-mo
Copy link
Collaborator

ti-mo commented May 5, 2023

We currently parse it even if the BPF doesn't end up requiring it. I think that's the correct choice as long as parsing is not super slow.

@lmb Unless it significantly increases code complexity, a user shouldn't need to pay for a feature they don't use. Is there a reason we can't defer parsing kconfig until a lookup is needed?

I am wondering about reducing the memory occupancy by only having a map containing the CONFIG_* used in the eBPF code rather than all the kernel config.

Sure, but then we might as well not cache the result of the parser at all; you're basically describing populating .ksyms and throwing away the result. I'd say leave this out of scope of this PR. We can experiment with replacing the map with a minimal trie implementation that only supports inserts and lookups. (can be done in 30-40 lines of Go) This may cause more total allocs, but should result in a lower steady-state of consumed memory. Benchmarking is the only way to really know. :) I'd make sure we don't parse kconfig if the caller doesn't need it in any case.

(Note: haven't read the patches yet, will give this a pass next week) Not sure if the same has been done here, but in conntracct, I stripped the CONFIG_ prefix of all keys on store and lookup to cut down on memory usage. 4 years later and there are still exclusively entries starting with CONFIG_ in my config.gz, so we could do the same here if we haven't already.

@eiffel-fl
Copy link
Contributor Author

We currently parse it even if the BPF doesn't end up requiring it. I think that's the correct choice as long as parsing is not super slow.

@lmb Unless it significantly increases code complexity, a user shouldn't need to pay for a feature they don't use. Is there a reason we can't defer parsing kconfig until a lookup is needed?

I am wondering about reducing the memory occupancy by only having a map containing the CONFIG_* used in the eBPF code rather than all the kernel config.

Sure, but then we might as well not cache the result of the parser at all; you're basically describing populating .ksyms and throwing away the result. I'd say leave this out of scope of this PR. We can experiment with replacing the map with a minimal trie implementation that only supports inserts and lookups. (can be done in 30-40 lines of Go) This may cause more total allocs, but should result in a lower steady-state of consumed memory. Benchmarking is the only way to really know. :) I'd make sure we don't parse kconfig if the caller doesn't need it in any case.

This makes sense, I just wanted to mimic libbpf behavior.
Nonetheless, not parsing if not needed should be doable, as it is a matter of searching for CONFIG_* in .kconfig then if at least one is found, we need to parse, otherwise we can avoid parsing.

(Note: haven't read the patches yet, will give this a pass next week) Not sure if the same has been done here, but in conntracct, I stripped the CONFIG_ prefix of all keys on store and lookup to cut down on memory usage. 4 years later and there are still exclusively entries starting with CONFIG_ in my config.gz, so we could do the same here if we haven't already.

This makes sense too, I will check how I can modify the code to add this!

internal/kconfig/kconfig.go Outdated Show resolved Hide resolved
internal/kconfig/kconfig.go Outdated Show resolved Hide resolved
@eiffel-fl eiffel-fl force-pushed the francis/kconfig-parsing branch 2 times, most recently from c39dae9 to 6e9c256 Compare May 22, 2023 12:53
Copy link
Contributor Author

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I polished everything, thank you for the commits!
Nonetheless, the CI does not pass because the virtme-run kernel does not find a corresponding config under /boot and was not compiled with CONFIG_IKCONFIG*.

Should we enable the above CONFIG_IKCONFIG* for the tested kernels or just skip this test in the CI?

Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, especially the tests are really thorough! Please take a look at my suggestion to only parse when necessary and let me know what you think.

I've rebuilt all of our CI kernels so that /proc/config.gz is now a thing.

collection.go Outdated Show resolved Hide resolved
elf_reader_test.go Outdated Show resolved Hide resolved
internal/kconfig/kconfig.go Show resolved Hide resolved
internal/kconfig/kconfig.go Outdated Show resolved Hide resolved
collection.go Outdated
return fmt.Errorf("cannot find a kconfig file: %w", err)
}

config, err := kconfig.Parse(scanner)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Timo is right, it would be nice if we only parsed the kconfig when required.
I've updated the benchmark you added to use a bigger file and looked at the results very briefly: it takes around 1ms on my machine.

What if you did the following?

  • in the default switch case collect the necessary config items into map[string][]byte or similar
  • after we've iterated the items once, parse kconfig if len(map[string][]byte) > 0 and fill in the values

That should be doable without too many code changes.

Coming back to the benchmark, most of the time is spent constructing the full map[string]string. That's fine for now, since it's nice and simple. But later on we could experiment using the map[string][]byte to only parse the lines we're actually interested in.

Copy link
Contributor Author

@eiffel-fl eiffel-fl May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice if we only parsed the kconfig when required.

This is what is done libbpf and they even only really deal with value of interest (i.e. CONFIG_* which are both in eBPF code and kconfig):
https://github.com/torvalds/linux/blob/0d85b27b0cc6b5cf54567c5ad913a247a71583ce/tools/lib/bpf/libbpf.c#L1906-L1908
We should do the same with something like this: Parse(source io.ReaderAt, configsOfInterest []string) where the second argument is used a filter, so the return map key are value of configsOfInterest.
In this way, we avoid having a big map for only few CONFIG_* in eBPF code.

Note that, I suggested this previously in this PR but my comment should have pass below the radar.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, sorry if I didn't see that. I think we might also be able to cut down memory usage to a point where we can afford to keep a cached map around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all, should we implement the Parse(source io.ReaderAt, configsOfInterest []string) in this PR or in a follow up one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the filtering in the last two commits of this PR.
If you think this a good addition, I will squash them with the first two and polish the messages.

@eiffel-fl eiffel-fl force-pushed the francis/kconfig-parsing branch 2 times, most recently from 40d308a to b7d2444 Compare May 26, 2023 10:46
@eiffel-fl
Copy link
Contributor Author

The CI is passing, thank you for adding /proc/config.gz to the test kernel and for the latest commit.
By the way, should we squash it or is it better for it to remain separated?

Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits. Can you please update the commit messages? The first commit message describes the old API and the second one doesn't really exist :) So far we've also not used a . at the end of commit titles, I think it makes sense to stick to that. Can you please remove the .? You can squash your (first) two commits if you'd like, then there is only one message to write. Maybe you could include that we don't support weak symbols at the moment, or other things which we don't support yet?

collection.go Outdated Show resolved Hide resolved
collection.go Outdated Show resolved Hide resolved
@eiffel-fl eiffel-fl force-pushed the francis/kconfig-parsing branch 2 times, most recently from 1d41aa1 to 2d917f7 Compare May 31, 2023 11:20
@eiffel-fl
Copy link
Contributor Author

You can squash your (first) two commits if you'd like, then there is only one message to write

I modified the commit messages so they describe better the actual code.
I prefer to not squash the commits, as it is easier to navigate having small commits.
But if you prefer to have one commit per PR, I would then squash them.

eiffel-fl and others added 3 commits May 31, 2023 17:56
This new package exports several functions:
1. Find() which tries to find a kconfig file on the host.
2. Parse(source io.ReaderAt) which parses the kconfig source given as
parameter and returns a map[string]string of its content.
3. PutValue(data []byte, typ btf.Type, value string) which translates the
value given as parameter depending of the typ and writes the corresponding to
data.

The whole code is mainly adapted from libbpf C code and from Timo Beckers golang
package [1].

Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
[1]: https://github.com/ti-mo/kconfig
This permits supporting CONFIG_ in eBPF code.
For the moment, weak CONFIG_ symbols are not supported.

Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
test.kconfig is really small and therefore won't give results that
match the real world. Use a recent fedora kconfig instead.

    goos: linux
    goarch: amd64
    pkg: github.com/cilium/ebpf/internal/kconfig
    cpu: 12th Gen Intel(R) Core(TM) i7-1260P
            │  base.txt   │
            │   sec/op    │
    Parse-16   1.964m ± 6%

            │   base.txt   │
            │     B/op     │
    Parse-16   1.542Mi ± 0%

            │  base.txt   │
            │  allocs/op  │
    Parse-16   9.717k ± 0%

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
@lmb lmb force-pushed the francis/kconfig-parsing branch from f88f04f to 9d9dc7b Compare May 31, 2023 15:57
@lmb
Copy link
Collaborator

lmb commented May 31, 2023

Thanks for tidying up! I've removed the two commits since I don't have the time to review them right now, sorry. Let's get this merged and then we can iterate on the performance.

@lmb lmb merged commit 0ab6862 into cilium:master May 31, 2023
@lmb lmb changed the title collection: Add code to parse kernel config files. collection: Add code to parse kernel config files May 31, 2023
@eiffel-fl eiffel-fl deleted the francis/kconfig-parsing branch May 31, 2023 16:05
@eiffel-fl
Copy link
Contributor Author

Thanks for tidying up! I've removed the two commits since I don't have the time to review them right now, sorry. Let's get this merged and then we can iterate on the performance.

Thank you for the review and the merge!
I will open a separate PR for the filtering and just review it when you want/can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants