btf: make loading kernel BTF cheaper #1068

lmb · 2023-06-19T13:45:34Z

Loading vmlinux BTF is currently very expensive:

goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 12th Gen Intel(R) Core(TM) i7-1260P
                │      -       │
                │    sec/op    │
ParseVmlinux-16   50.47m ± ∞ ¹
¹ need >= 6 samples for confidence interval at level 0.95

                │       -       │
                │     B/op      │
ParseVmlinux-16   38.46Mi ± ∞ ¹
¹ need >= 6 samples for confidence interval at level 0.95

So ~50ms and 40MiB at current master, and this is after doing a bunch of optimization work. For this reason we cache vmlinux BTF in a global variable and try to reuse the same parsed Spec as much as possible. Otherwise features requiring vmlinux BTF like kconfig or ksyms are unbearably slow. I've made several attempts at improving the situation:

Parsing is expensive, but the parsed representation is also pretty large. Go doesn't have great tools for this, but an educated guess is that parsed vmlinux takes about 25MiB of RSS. Users have of course noticed this, and we've added an escape hatch to flush the cached BTF by calling linux.FlushCaches. Even that causes problems and confusion however, as shown by #1063. Tetragon was also impacted and had to chase down copies of vmlinux.

I've come to the conclusion that we're using the wrong approach. Instead of parsing the whole BTF upfront we should only "inflate" types that are needed by the user via Spec.TypeByID(), Spec.TypeByName() and so on. BTF wire format is already optimized for size, so we should refer to it as much as possible. @ti-mo suggested this a long time ago, but for reasons I can't remember I wasn't a fan. I think it's now time do to this, but getting there is going to take several refactors.

Split writing from reading BTF

Spec currently serves a dual purpose. You can use it to query for types you are interested in (the "vmlinux types" use case) and you can build BTF from scratch using NewSpec() and Spec.Add() (the "build map and program BTF" use case). Combining both use cases in a single type complicates the implementation. Crucially, it makes the changes I want to make for the vmlinux use case really difficult.

So as a first step, reading and writing BTF will be split into two separate types. Spec is used for the read case, and a new btf.Writer, btf.Builder, ... is used for writing. I've already prototyped this code and it's overall a net benefit. I'll propose the change even
if we decide against the rest of the changes.

Split reading from querying BTF

The constructor for Spec currently does a full parse of the vmlinux and turns all BTF wire format types into btf.Type. This is the really expensive part, since vmlinux contains around 130k types.

Instead, we will refactor the parsing logic so that we can pull Types out of wire format BTF one by one. This works by doing a single pass over the BTF and remembering at which offset each TypeID lives.

type Spec struct {
	// Reader pointing at the start of the BTF type section
	btf io.ReaderAt

	// To find the offset of a type we index into typeOffsets[ID-1]
	typeOffsets []uint32

	// other things like the string table
}

This change is purely internal and doesn't change exported API.

Memory cost of lazy decoding

Some napkin math to gauge how much memory we need in addition to the wire format BTF:

~130k types in vmlinux
~130k distinct strings, avg length 17
~70k distinct names

Not accounting for slice, string and map header overhead a reader will use
130000 * sizeof(uint32) + 70000 * (17 + sizeof(TypeID)) = 2451kb. So roughly 3MiB
which is 1/10th of what we currently pin.

Can we avoid copying wire-format BTF into memory?

On my machine vmlinux BTF is ~6MB:

$ ls -lh /sys/kernel/btf/vmlinux 
-r--r--r--. 1 root root 5.4M Jul  4 11:49 /sys/kernel/btf/vmlinux

On a buggy ubuntu this goes up to 15MB or so. Copying this into heap memory would double the memory used by vmlinux BTF. It would be nice if we could instead only read in the bits we are interested in, and otherwise rely on the page cache for fast access. We can do this either via regular os.File.Read or by mmaping the file. It's not clear which solution is better but both approaches have a huge drawback: we need to manage the lifecycle of the os.File or the mmap somehow.

The easiest solution is to add Spec.Close() which forwards to the underlying reader. This is problematic since there is plenty of existing
code which upon updating the library won't call Close(). We'll of course have a finalizer, but that
only closes the fd when there is memory pressure. This will cause problems in
cases where an application runs out of fds before triggering GC, for example if
ulimits have been set very low.

For this reason I want to come back to an idea @ti-mo and I were discussing a while ago: replace btf.LoadKernelSpec with a type that allows sharing a single BTF between multiple callers in a safe way. Let's call it btf.Cache:

type Cache struct{}

// LoadCache parses the given file.
func LoadCache(*os.File) (*Cache, error)

// Acquire returns a new Spec.
//
// Callers must ensure that [CachedSpec.Release] is called on the returned value.
func (*Cache) Acquire() *CachedSpec

// Close waits until all [CachedSpec] have been released and then closes the underlying file descriptor.
func (cs *CachedSpec) Close() error

type CachedSpec struct {
    *Spec
    cache *Cache
}

// Release the cached Spec.
//
// Must be called once the Spec is not required anymore.
func (*CachedSpec) Release()

We'd then replace LoadKernelSpec with a function which returns CachedSpec and rewrite FlushKernelSpec to call Cache.Close + a bunch of atomics.

"Regular" BTF parsing from ELF would not use this mechanism. Instead we simply copy the whole into memory, as we do now. This means that map and program loading doesn't have to change.

The text was updated successfully, but these errors were encountered:

lmb · 2023-07-06T12:55:02Z

Discussion with Timo:

Maybe introducing a Spec.Close isn't so bad after all. We could use this as an opportunity to rename Spec to Types or similar and then tell users that the API is the same except they need to call Close() as appropriate. Breaking API helps to highlight the new semantics.

P.S. This isn't as easy after all, since CollectionSpec.Types would mean that CollectionSpec has to be closed.

lmb added the enhancement New feature or request label Jun 19, 2023

lmb self-assigned this Jun 19, 2023

lmb mentioned this issue Jul 4, 2023

Revert "btf: avoid copies of vmlinux BTF for internal callers" #1087

Merged

lmb mentioned this issue Oct 19, 2023

btf: stop accessing Spec.types from tests #1176

Merged

lmb assigned dylandreimerink Nov 17, 2023

lmb mentioned this issue Jan 16, 2024

btf: fix slow LoadKernelSpec by making Spec.Copy lazy #1235

Merged

lmb closed this as completed in #1235 Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

btf: make loading kernel BTF cheaper #1068

btf: make loading kernel BTF cheaper #1068

lmb commented Jun 19, 2023 •

edited

lmb commented Jul 6, 2023 •

edited

btf: make loading kernel BTF cheaper #1068

btf: make loading kernel BTF cheaper #1068

Comments

lmb commented Jun 19, 2023 • edited

Split writing from reading BTF

Split reading from querying BTF

Memory cost of lazy decoding

Can we avoid copying wire-format BTF into memory?

lmb commented Jul 6, 2023 • edited

lmb commented Jun 19, 2023 •

edited

lmb commented Jul 6, 2023 •

edited