Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use instruction metadata to simplify linking and CO-RE relocation #606

Merged
merged 5 commits into from May 4, 2022

Conversation

lmb
Copy link
Collaborator

@lmb lmb commented Mar 28, 2022

btf: move CORERelocate to core.go

btf: add constructors for various *Info

Pull constructors for FuncInfo, LineInfo and CORERelocation out of
splitExtInfos. This makes the function smaller and makes removing it
later easier.

Instead of taking a *Spec in the constructors, take a []Type and
stringTable directly. This makes splitting ExtInfos out of *Spec
easier later on.

btf: track bpfCORERelo.InsnOff in instructions instead of bytes

This makes bpfCORERelo consistent with bpfFuncInfo and bpfLineInfo. It seems like
there is a latent bug here as well, since we used to subtract bpfFuncInfo.InsnOff
from bpfCORERelo.InsnOff, but for some reason the tests pass.

btf: store ExtInfos in instruction metadata

Use per-instruction metadata to store FuncInfo, LineInfo and CO-RE relocations.
As a result, simply concatenating two Instruction slices preserves the ext_info
without extra code to keep track of offsets. The simplest way to achieve is to
assign ext_infos per section, before we split into individual functions. This
also avoids having to split ext_infos in the first place.

Storing ext_infos in metadata in turn allows / requires removing code that relies
on stable offsets, the most notable being applying COREFixups. Instead of tracking
which offset a fixup should be applied to we change coreRelocate to instead return
results in the same order as CORERelocations are passed. In the caller we remember
which instruction originated the relocation and apply the fixup directly instead of
iterating all instructions once more.

Instead of storing ExtInfos in Spec we split it off and keep a reference to the
necessary types and string table around. This makes more sense since every BTF ELF
has a Spec, but ExtInfos are optional. It turns out that most code except the ELF
loader doesn't care about ExtInfos in the first place, so we might be able to
add an API that doesn't parse ExtInfos in the future.

Finally we can remove btf.Program since we don't need an intermediate type to hold
metadata anymore.

btf: remove Map

btf.Program is gone, so let's get rid of Map as well.

btf-metadata.md Outdated Show resolved Hide resolved
@lmb lmb marked this pull request as draft March 28, 2022 19:40
@lmb
Copy link
Collaborator Author

lmb commented Mar 28, 2022

Split off the WithSource + sparse metadata changes into #609

@lmb
Copy link
Collaborator Author

lmb commented Apr 1, 2022

I'll move program: remove references from ProgramSpec to a follow up PR once #609 is merged.

@lmb lmb force-pushed the more-metadata branch 2 times, most recently from 149ab47 to ef79de7 Compare April 20, 2022 16:54
@lmb lmb marked this pull request as ready for review April 20, 2022 16:56
@lmb lmb requested a review from ti-mo April 20, 2022 16:56
@lmb
Copy link
Collaborator Author

lmb commented Apr 20, 2022

This is now in a state where it can be reviewed! I've updated the PR description with the latest commit messages, please take a look.

I'm a little bit unhappy with ExtInfos.Assign, it seems like it belongs into the ELF loader. I'd prefer to convert bpfFuncInfo to FuncInfo, etc. at BTF load time and then expose something like the following:

type ExtInfos struct{
    []FuncInfo
    []LineInfo
    []CORERelocation
}

// Per-section ExtInfos
func LoadSpecFromReader(...) (*Spec ,map[string]ExtInfos, error)

The snag is that FuncInfo, etc. don't contain the InsnOff, since having an offset doesn't make sense when they are attached to an instruction. I can see two ways to fix this:

type ExtInfo struct {
    map[uint64]FuncInfo
    ...
}

By abusing the map we can avoid introducing a new type, but none of our code really wants to use a map. An ordered slice is much better. For that we'd need:

type FuncInfoWithOffset struct {
    FuncInfo
    Offset uint32
}

type ExtInfo struct {
    []FuncInfoWithOffset
    ...
}

This means a lot of additional types, which I don't particularly like. Maybe you have another idea?

@lmb lmb mentioned this pull request Apr 25, 2022
@lmb lmb added this to the Export `btf` milestone Apr 26, 2022
Copy link
Collaborator

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, have a few first observations that might or might not apply, let me know what you think.

internal/btf/ext_info.go Outdated Show resolved Hide resolved
if relo.kind == reloTypeIDLocal {
// Filtering out reloTypeIDLocal here makes our lives a lot easier
// down the line, since it doesn't have a target at all.
if len(relo.accessor) > 1 || relo.accessor[0] != 0 {
return nil, fmt.Errorf("%s: unexpected accessor %v", relo.kind, relo.accessor)
}

result[uint64(relo.insnOff)] = COREFixup{
result[i] = COREFixup{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we can't use Type instead of TypeID here. What is the remaining significance of a type ID? If we know the IDs of the local and target types, and we have both specs available within this function, what's stopping us from putting inflated types here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is libbpf behaviour, since relo.kind == reloTypeIDLocal we need an ID.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also creates problems for #643 of course.

Copy link
Collaborator

@ti-mo ti-mo May 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is probably something that needs solving in order to get rid of IDs. (as a follow-up, to implement #643)

internal/btf/core.go Outdated Show resolved Hide resolved
internal/btf/ext_info.go Outdated Show resolved Hide resolved
internal/btf/core.go Outdated Show resolved Hide resolved
internal/btf/core.go Show resolved Hide resolved
func LoadSpecFromReader(rd io.ReaderAt) (*Spec, error) {
// Returns ErrNotFound if reading from an ELF which contains no BTF. ExtInfos
// may be nil.
func LoadSpecFromReader(rd io.ReaderAt) (*Spec, *ExtInfos, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen you refer to eliminating *ExtInfos here in the future in your commit message, but can we get away with doing that sooner rather than later? Could we put *ExtInfos (back?) into the btf.Spec instead of returning it as a separate value here? IMO LoadSpec* should return a single value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to remove ExtInfos completely, but #606 (comment) is my best idea how to do it and I don't like it very much.

Embedding ExtInfos in Spec has it the wrong way round: when reading BTF we always have a Spec, but we don't always have ExtInfos. So modeling the relationship as ExtInfo{Spec} is more natural. For example, naked vmlinux BTF has no ext infos. Check the call sites of loadRawSpec, almost always we pass nil, nil and aren't interested in ExtInfos in the first place. With ExtInfos split out / removed a good amount of code can be refactored to be less deeply nested.

I was considering adding a separate constructor LoadExtInfos(io.Reader) and making LoadSpec skip ExtInfos. However I can't come up with a way to achieve that without (a) adding *elf.File to the btf API surface like LoadExtInfos(*elf.File) (and all the SafeELFFile baggage) or (b) parsing the ELF one more time. (We already parse the ELF header twice in the ELF loader due to this.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, check this out: 640c4b0 Kinda terrible, but we could use it to have LoadExtInfosFromReader(io.ReaderAt) and LoadSpecFromReader(io.ReaderAt).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the following:

I'd like to remove ExtInfos completely, but #606 (comment) is my best idea how to do it and I don't like it very much.

FuncInfo is wire format, the metadata would be a btf.Func. Func currently still holds a TypeID, so we can easily generate a FuncInfo based on the ID and the instruction's offset when marshaling.

Later on, when BTF marshaling lands, we can remove TypeID from btf.Func and generate BTF blobs that only contain the program's func and lineinfos.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding ExtInfos: could we somehow break out that review and discussion into a separate PR? Taking *elf.File is not the way to go for the reasons you mentioned, but for me primarily due to #550.

If we're reworking the btf/extinfo reader API, we should definitely take the opportunity to address this current limitation and come to a solution that works for module BTF as well. Actually parsing (and 'linking') module BTF might be a bigger chunk of work, but at least the API should allow for it to be added later if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding ExtInfos: could we somehow break out that review and discussion into a separate PR?

I was trying to find a way to not return ExtInfos from LoadSpec...() since you seemed opposed. If you can live with that solution I'm happy to shelve that whole discussion for now.

@lmb
Copy link
Collaborator Author

lmb commented Apr 27, 2022

As a follow up to #606 (comment), what if ExtInfos becomes this:

type ExtInfos struct{
    Relocations []*CORERelocations
    RelocationOffsets []uint32
    ...
}

func Foo(...) (map[string]*ExtInfos, error)

@lmb
Copy link
Collaborator Author

lmb commented Apr 29, 2022

Please take a look at da1770b:

  • I've gotten rid of ApplyRelocations and ExtInfos.Assign and MarshalExtInfos and moved them into linker.go instead. This is much closer to what we had before. This is nice because instruction metadata is now private to the ebpf package.
  • FuncInfo, LineInfo, CORERelocationInfo are now used to expose "metadata + offset" via ExtInfos. The real metadata is called Func (as you suggested), Line and CORERelocation.
  • I've split the constuctor for Spec in two, with one returning ExtInfos along with the Spec. The nice side effect is that we don't decode useless ExtInfo when loading BTF via handleCache.

Copy link
Collaborator

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting! Left a few more comments. Should the latest commit be squashed?

internal/btf/btf.go Show resolved Hide resolved
if relo.kind == reloTypeIDLocal {
// Filtering out reloTypeIDLocal here makes our lives a lot easier
// down the line, since it doesn't have a target at all.
if len(relo.accessor) > 1 || relo.accessor[0] != 0 {
return nil, fmt.Errorf("%s: unexpected accessor %v", relo.kind, relo.accessor)
}

result[uint64(relo.insnOff)] = COREFixup{
result[i] = COREFixup{
Copy link
Collaborator

@ti-mo ti-mo May 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is probably something that needs solving in order to get rid of IDs. (as a follow-up, to implement #643)

linker.go Outdated
return nil, nil
// marshalExtInfos encodes function and line info embedded in insns into kernel
// wire format.
func marshalExtInfos(insns asm.Instructions) (funcInfos, lineInfos []byte, _ error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this live in package btf, or is there an issue with a dependency on asm?

My hope was to get rid of FuncInfo and LineInfo, since all the information they carry can now be represented by asm.Instructions. e.g. *Func and the instruction's position in the slice as you've shown below.

Ultimately, we should end up with only Line and Func, and their respective wire formats, without the Func/LineInfo intermediates.

lmb added 3 commits May 4, 2022 10:33
Pull constructors for FuncInfo, LineInfo and CORERelocation out of
splitExtInfos. This makes the function smaller and makes removing it
later easier.

Instead of taking a *Spec in the constructors, take a []Type and
stringTable directly. This makes splitting ExtInfos out of *Spec
easier later on.
This makes bpfCORERelo consistent with bpfFuncInfo and bpfLineInfo. It seems like
there is a latent bug here as well, since we used to subtract bpfFuncInfo.InsnOff
from bpfCORERelo.InsnOff, but for some reason the tests pass.
@lmb lmb force-pushed the more-metadata branch 2 times, most recently from 04f12e3 to 4d44715 Compare May 4, 2022 11:15
Use per-instruction metadata to store func info, line info and CO-RE relocations.
As a result, simply concatenating two Instruction slices preserves the ext_info
without extra code to keep track of offsets. The simplest way to achieve is to
assign ext_infos per section, before we split into individual functions. This
also avoids having to split ext_infos in the first place.

Storing ext_infos in metadata in turn allows / requires removing code that relies
on stable offsets, the most notable being applying COREFixups. Instead of tracking
which offset a fixup should be applied to we change coreRelocate to instead return
results in the same order as CORERelocations are passed. In the caller we remember
which instruction originated the relocation and apply the fixup directly instead of
iterating all instructions once more.

Instead of storing ExtInfos in Spec we split it off into a separate exported type.
This makes more sense since every BTF ELF has a Spec, but ExtInfos are optional. It
turns out that only the ELF loader doesn't care about ExtInfos in the first place,
so we allow skipping ExtInfos altogether by introducing LoadSpecAndExtInfosFromReader.

Finally we can remove btf.Program since we don't need an intermediate type to hold
metadata anymore.

Fixes cilium#522
btf.Program is gone, so let's get rid of Map as well.
@lmb lmb merged commit 9dce119 into cilium:master May 4, 2022
@lmb lmb deleted the more-metadata branch May 4, 2022 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants