-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use OwnedFd
for BTF's file descriptor
#662
Conversation
✅ Deploy Preview for aya-rs-docs ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site configuration. |
3c24470
to
9f88ed3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 17 of 17 files at r1, 35 of 35 files at r2, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @nrxus)
- capitalize please
- s/leaked/leak
- period please
-- commits
line 23 at r2:
can these be exposed via accessors rather than by being made public?
aya/src/maps/mod.rs
line 495 at r2 (raw file):
impl MapData { /// Creates a new map with the provided `name` pub fn create(&mut self, name: &str, btf_fd: Option<impl AsFd>) -> Result<RawFd, MapError> {
could we pass BorrowedFd<'_>? i'm wearing of generics introducing more monomorphization and slower compilation as a result
here and everywhere else
aya/src/maps/mod.rs
line 891 at r2 (raw file):
let mut map = new_map(); assert_matches!(map.create("foo", Option::<BorrowedFd>::None), Ok(42));
I think if you remove the generic you won't need all this type annotation
here and a few other places
aya/src/sys/mod.rs
line 20 at r2 (raw file):
use crate::generated::{bpf_attr, bpf_cmd, perf_event_attr}; pub(crate) type SysResult<T = c_long> = Result<T, (c_long, io::Error)>;
do we need the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @tamird)
Previously, tamird (Tamir Duberstein) wrote…
- capitalize please
- s/leaked/leak
- period please
To reword this commit should I just rebase and force push on this branch? I am not sure if that might mess with reviewable or with your own tracking of what you've already seen or not seen but I am not sure how else to reword commits that doesn't involve a rebase + force push.
Previously, tamird (Tamir Duberstein) wrote…
can these be exposed via accessors rather than by being made public?
Unfortunately not. If it was an accessor like fn(&mut self) -> &mut HashMap
then the lifetime of the returned hashmap is aliasing the entire &mut self
as there is no way to do partial borrows in functions signatures. By exposing these fields I allow the compiler to be smart enough to detect that when I am mutably borrowing programs I am not borrowing neither btf file descriptor nor the maps so I can still borrow either of them (mutably or not) without issues. Borrowing of both the programs and the file descriptor is needed because loading of the programs needs to mutably borrow a program and (non mutably) borrow the btf file descriptor.
aya/src/maps/mod.rs
line 495 at r2 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
could we pass BorrowedFd<'_>? i'm wearing of generics introducing more monomorphization and slower compilation as a result
here and everywhere else
They become awkward to use if I use Option<BorrowedFd<'_>>
because then usage of it when having a Option<OwnedFd>
becomes something like: my_owned_fd.as_ref().map(|f| f.as_fd())
. I could use something like Option<&OwnedFd>
but that is significantly less flexible because then we need to have a reference to an owned fd. In our case we do always have an OwnedFd
but I didn't know if that was too limiting going forwards.
aya/src/sys/mod.rs
line 20 at r2 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
do we need the default?
We don't and I can remove the default if we don't want to deal with that but since so many functions already returned SysResult
without any generics specified I figured adding a default would touch the least amount of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @nrxus)
Previously, nrxus (Andres) wrote…
To reword this commit should I just rebase and force push on this branch? I am not sure if that might mess with reviewable or with your own tracking of what you've already seen or not seen but I am not sure how else to reword commits that doesn't involve a rebase + force push.
Yes, force push is the way to go. You'll need to rebase to resolve some of the clippy failures, and looks like CI is failing because all the tests need updating.
Previously, nrxus (Andres) wrote…
Unfortunately not. If it was an accessor like
fn(&mut self) -> &mut HashMap
then the lifetime of the returned hashmap is aliasing the entire&mut self
as there is no way to do partial borrows in functions signatures. By exposing these fields I allow the compiler to be smart enough to detect that when I am mutably borrowing programs I am not borrowing neither btf file descriptor nor the maps so I can still borrow either of them (mutably or not) without issues. Borrowing of both the programs and the file descriptor is needed because loading of the programs needs to mutably borrow a program and (non mutably) borrow the btf file descriptor.
Makes sense. We still have accessors for maps and programs; should we remove or deprecate those?
aya/src/maps/mod.rs
line 495 at r2 (raw file):
Previously, nrxus (Andres) wrote…
They become awkward to use if I use
Option<BorrowedFd<'_>>
because then usage of it when having aOption<OwnedFd>
becomes something like:my_owned_fd.as_ref().map(|f| f.as_fd())
. I could use something likeOption<&OwnedFd>
but that is significantly less flexible because then we need to have a reference to an owned fd. In our case we do always have anOwnedFd
but I didn't know if that was too limiting going forwards.
Ok, let's leave this as is.
aya/src/sys/mod.rs
line 20 at r2 (raw file):
Previously, nrxus (Andres) wrote…
We don't and I can remove the default if we don't want to deal with that but since so many functions already returned
SysResult
without any generics specified I figured adding a default would touch the least amount of code.
Let's just add the generic in a parent commit (without a default, please)
9f88ed3
to
d593413
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 50 files reviewed, 4 unresolved discussions (waiting on @tamird)
Previously, tamird (Tamir Duberstein) wrote…
Yes, force push is the way to go. You'll need to rebase to resolve some of the clippy failures, and looks like CI is failing because all the tests need updating.
I believe I've fixed the build failures in the integrations tests (sorry I didn't catch those earlier, I usually just run cargo clippy --workspace
which wasn't enough in this case. However, the clippy failures seem bogus to me. It looks like they are happening mostly due to a lint that is not yet available in stable and the lint seems to be poorly implemented so far (hence why it's not in stable yet I am guessing): rust-lang/rust-clippy#5546 (comment). Since the clippy errors don't seem to be about this PR but rather code that was here before that's only now starting to alert I am hesitant on addressing them for a nightly lint that is still a WIP. For example it's alerting about using a mutable reference to mapdata where we only need a reference for functions that update the bpf map because it doesn't know that we are modifying something under the hood using the file descriptor. I tried to disregard the warning but still pass clippy by adding #[allow(...)]
but it didn't seem to do anything, perhaps because it's not a lint that is in stable yet I am not sure.
Previously, tamird (Tamir Duberstein) wrote…
Makes sense. We still have accessors for maps and programs; should we remove or deprecate those?
I'd be in favor of just removing them since using the fields already gives more flexibility but I leave that decision up to you all. Let me know what you'd like me to do.
I do think that having a method in the Bpf
object to load programs and have it pass the btf file descriptor is probably warranted since it's such a common use case. The problem though is that having a method like Bpf::load_program(&mut self, name: &str)
doesn't work for all program types because some of them require more information than just the name (i.e., lsm, BTF trace point, fentry, fexit, and extension). This could be left as a future improvement in another PR though, the focus right now is to fix the file descriptor leak.
aya/src/maps/mod.rs
line 891 at r2 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
I think if you remove the generic you won't need all this type annotation
here and a few other places
Since we've decided to keep this generic I am marking this as done
aya/src/sys/mod.rs
line 20 at r2 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
Let's just add the generic in a parent commit (without a default, please)
Done.
d593413
to
7456d2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 50 files reviewed, 4 unresolved discussions (waiting on @tamird)
Previously, nrxus (Andres) wrote…
I believe I've fixed the build failures in the integrations tests (sorry I didn't catch those earlier, I usually just run
cargo clippy --workspace
which wasn't enough in this case. However, the clippy failures seem bogus to me. It looks like they are happening mostly due to a lint that is not yet available in stable and the lint seems to be poorly implemented so far (hence why it's not in stable yet I am guessing): rust-lang/rust-clippy#5546 (comment). Since the clippy errors don't seem to be about this PR but rather code that was here before that's only now starting to alert I am hesitant on addressing them for a nightly lint that is still a WIP. For example it's alerting about using a mutable reference to mapdata where we only need a reference for functions that update the bpf map because it doesn't know that we are modifying something under the hood using the file descriptor. I tried to disregard the warning but still pass clippy by adding#[allow(...)]
but it didn't seem to do anything, perhaps because it's not a lint that is in stable yet I am not sure.
And just after I sent the last message I see that you all fixed it in a separate PR. I don't quite agree with making some of those &mut
be &
just because clipper said so but since it's not really relevant to this PR I just rebased to the latest main and the nightly clippy seems happy (on my machine at least).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 38 of 38 files at r3, 7 of 7 files at r4, 40 of 40 files at r5, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod and @nrxus)
Previously, nrxus (Andres) wrote…
I'd be in favor of just removing them since using the fields already gives more flexibility but I leave that decision up to you all. Let me know what you'd like me to do.
I do think that having a method in the
Bpf
object to load programs and have it pass the btf file descriptor is probably warranted since it's such a common use case. The problem though is that having a method likeBpf::load_program(&mut self, name: &str)
doesn't work for all program types because some of them require more information than just the name (i.e., lsm, BTF trace point, fentry, fexit, and extension). This could be left as a future improvement in another PR though, the focus right now is to fix the file descriptor leak.
I'm in favor of removing as well. I know @alessandrod cares about this sort of thing. @alessandrod can you please weigh in?
test/integration-test/src/tests/bpf_probe_read.rs
line 108 at r5 (raw file):
let mut bpf = Bpf::load(bytes).unwrap(); let prog: &mut UProbe = bpf.programs.get_mut(prog_name).unwrap().try_into().unwrap();
instead of these point accesses, could we write
let Bpf { programs, btf_fd, .. } = &mut bpf;
that makes it clear that these things are borrowed from the same structure. here and everywhere please.
7456d2b
to
11e993f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 7 of 50 files reviewed, 2 unresolved discussions (waiting on @alessandrod and @tamird)
test/integration-test/src/tests/bpf_probe_read.rs
line 108 at r5 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
instead of these point accesses, could we write
let Bpf { programs, btf_fd, .. } = &mut bpf;
that makes it clear that these things are borrowed from the same structure. here and everywhere please.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done a full review, just commenting on the load() change: I don't think this is an acceptable change API wise.
BTF is largely an implementation detail. As a user, in 99% of the cases, I don't even know that it exists. There's BTF from the kernel, there's BTF in object files produced by the toolchain, they're used by the kernel to do useful things, but as a user I never interact with it. I never create it, I never lookup things inside it, nothing.
With this change, you're forcing everyone to have to be aware of BTF, by having to shuttle this opaque btf fd thing around. Something that is an implementation detail, suddenly becomes a central thing in the API. Seems wrong to me.
WDYT about introducing a type that wraps |
@nrxus does that sound reasonable? If you like, I can pick this up. We're desperately in need of fixing the FD ownership problems that are all over the codebase. |
Ah my bad I thought your last question was for @alessandrod. I can try to implement that tonight and send it to you all for review. I also don't mind if you want to take over this PR, I also just want the file descriptors ownership fixed, it doesn't matter to me much by who 😅 |
Go for it! I'd rather cultivate more contributors :) |
I started down the road of making the intermediary struct (
Because of this we would have to have a wrapper struct for each individual program type essentially (since not all programs are loaded with the same parameters). Before I start down that rabbit hole I think it may be worth mentioning alternatives:
trait ProgramLoad {
type Data;
fn load(&mut self, btf_fd: Option<impl AsFd>,data: Self::Data) -> Result<(), ProgramError>;
}
impl Bpf {
/// Loads a program with default data - useful when the programs don't need any extra information (e.g., kprobes, tracepoints)
fn load_program<P: ProgramLoad>(&mut self, name: &str) -> Result<P, ProgramError>
where P::Data : Default {
self.load_program_with(name, Data::Default)
}
/// Loads a program that needs extra information (e.g., LSM)
fn load_program_with<P: ProgramLoad>(&mut self, name: &str, data: P::Data) -> Result<P, ProgramError> {
let program = self.programs.get_mut(name).ok_or(ProgramError::NotFound)?:
let program: &mut P = program.try_into()?;
program.load(self.btf_fd.as_ref(), data)
}
} Then each program could implement I am curious to hear what you all think or if you all have a different idea. |
Are you sure you need a separate type for each one? I think you just need a separate impl block for each program type. struct WithBtf<'a, T> {
program: &'a mut T,
btf: BorrowedFd<'a>,
}
impl WithBtf<'_, Xdp> {
fn load(...) {
// this is the existing load method on Xdp
}
}
I'd rather not change performance characteristics or introduce multiple ownership.
This refactoring seems orthogonal to what we're doing here. I like it, but it seems orthogonal. |
44f3c25
to
b2b52aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 43 of 43 files at r6, 7 of 7 files at r7, 40 of 40 files at r8, 43 of 43 files at r9, 8 of 8 files at r10, 7 of 42 files at r11, all commit messages.
Reviewable status: 17 of 52 files reviewed, 9 unresolved discussions (waiting on @nrxus)
-- commits
line 20 at r11:
do we want to rewrite this message?
README.md
line 85 at r11 (raw file):
// get the `ingress_filter` program compiled into `bpf.o`. let mut ingress: aya::WithBtfFd<CgroupSkb> = bpf.program_mut("ingress_filter")?.try_into()?;
is it possible to put CgroupSkb elsewhere in this expression so that we don't have to name WithBtfFd
?
here and everywhere
aya/src/bpf.rs
line 749 at r11 (raw file):
} /// A program with a BTF file descriptor
comments should be full sentences with punctuation please.
though I wonder if this should be #[doc(hidden)]?
aya/src/bpf.rs
line 760 at r11 (raw file):
fn deref(&self) -> &Self::Target { &*self.program
does self.program
not work?
aya/src/bpf.rs
line 770 at r11 (raw file):
} impl<'p, P, T> AsRef<T> for WithBtfFd<'p, P>
how come these (AsRef, AsMut) are needed? won't auto deref make downstream code work even without these impls?
aya/src/bpf.rs
line 791 at r11 (raw file):
/// The main entry point into the library, used to work with eBPF programs and maps. #[derive(Debug)] #[non_exhaustive]
fields are no longer public, so this can back out
aya/src/programs/cgroup_device.rs
line 63 at r11 (raw file):
impl CgroupDevice { /// Loads the program inside the kernel pub fn load(&mut self, btf_fd: Option<impl AsFd>) -> Result<(), ProgramError> {
do we want to remove this, or at least make it not-pub? i.e. move the implementation entirely to WithBtfFd
? here and everywhere.
aya/src/programs/mod.rs
line 890 at r11 (raw file):
fn try_from(program: $crate::WithBtfFd<'a, Program>) -> Result<$crate::WithBtfFd<'a, $ty>, ProgramError> { Ok($crate::bpf::WithBtfFd { program: program.program.try_into()?,
could you destructure WIthBtfFd everywhere?
aya/src/sys/bpf.rs
line 574 at r11 (raw file):
u.btf_log_size = log_buf.len() as u32; } let raw_fd = sys_bpf(bpf_cmd::BPF_BTF_LOAD, &attr)? as RawFd;
can you avoid the as
cast here? let's use .try_into (there's an example of this already in the file)
I think this is what we should do. The cost of this The
|
I agree that the WithBtf type should be invisible in user code. I think if we can make that work, this approach remains viable. |
But how? program and program_mut want you to specify the return type |
The only way I can think of ascribing only the fn program_mut<P>(&mut self, name: &str) -> Result<WithBtfFd<'_, P>, ProgramError> {
let program = self.programs.get_mut(name).ok_or(ProgramError::NotFound)?;
program.try_into()
}
/* HOW IT'D BE USED */
let program = bpf.program_mut::<Kprobe>("my_kprobe").unwrap();
program.load() But then we are requiring users to know statically what program type they are going to want when they ask for one whilst currently that is not a requirement. For example, in the way I've been using program_names.iter().try_for_each(|p| {
let Some(program) = bpf.program_mut(p).ok_or_else(|| Err(/*idk some error*/))?;
match program {
Program::KProbe(p) => p.load(),
Program::Tracepoint(p) => p.load(),
_ => return Err(/* unsupported program */)
}
}) If we changed it to where |
Agreed. Seems the best we can do is a shared FD then. Does it need to be Arc, or can it be Rc? |
As far as I can tell |
Makes sense |
b2b52aa
to
d221d13
Compare
This is just taking aya-rs#633 to its logical conclusion. Because `std::os::fd` was only introduced as a module in Rust v1.66.0 I have also updated the `Cargo.toml` of the `aya` package to reflect the true MSRV. Note that this commit is *not* the cause for this MSRV bump, that was done by a previous commit, this commit is just making it explicit in the `Cargo.toml`
d221d13
to
fcabe5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 45 of 45 files at r12, 7 of 7 files at r13, 32 of 32 files at r14, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @alessandrod and @nrxus)
aya/src/sys/bpf.rs
line 574 at r11 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
can you avoid the
as
cast here? let's use .try_into (there's an example of this already in the file)
Let's address this one please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See clone comment, then it's ready to go!
fcabe5f
to
6fa7fdb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 49 of 52 files reviewed, 2 unresolved discussions (waiting on @alessandrod and @tamird)
aya/src/bpf.rs
line 550 at r14 (raw file):
Previously, alessandrod (Alessandro Decina) wrote…
Here and everywhere else, use Arc::clone(&fd) instead of fd.clone()
Done.
aya/src/sys/bpf.rs
line 574 at r11 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
Let's address this one please.
Done.
Although I personally think it's a bit unnecessary, the error here is impossible unless there's a kernel/rust bug right, it's not really something the user could handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r15, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @alessandrod and @nrxus)
aya/src/sys/bpf.rs
line 574 at r11 (raw file):
Previously, nrxus (Andres) wrote…
Done.
Although I personally think it's a bit unnecessary, the error here is impossible unless there's a kernel/rust bug right, it's not really something the user could handle.
Sure, feel free to panic if you think that's better - I'm just asking that we not cast the sign away.
aya/src/sys/bpf.rs
line 467 at r15 (raw file):
attr.__bindgen_anon_6.__bindgen_anon_1.prog_id = prog_id; // SAFETY: BPF_PROG_GET_FD_BY_ID returns a new file descriptor
can we please end comments with periods? here and 3 other places (let's match the style in the pre-existing comment please)
6fa7fdb
to
09fa11f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 51 of 52 files reviewed, 2 unresolved discussions (waiting on @alessandrod and @tamird)
aya/src/sys/bpf.rs
line 467 at r15 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
can we please end comments with periods? here and 3 other places (let's match the style in the pre-existing comment please)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r16, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @alessandrod)
Looks like the integration tests are failing, are we dropping a file descriptor somewhere? |
This fixes an existing file descriptor leak when there is BTF data in the loaded object. To avoid lifetime issues while having minimal impact to UX the `OwnedFd` returned from the BPF_BTF_LOAD syscall will be wrapped in an `Arc` and shared accross the programs and maps of the loaded BPF file.
09fa11f
to
ea96c29
Compare
Small goof in my refactor to extract the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r17, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @alessandrod)
Thanks @nrxus! |
This PR is meant as the first step towards using
OwnedFd
where possible instead ofRawFd
to be more foolproof and safer to use. This PR tackles the BTF file descriptor that is loaded when the object has a BTF section and fixes an existing file descriptor leak as that file descriptor was never being closed. Since we are moving to useOwnedFd
this close will now happen by default.The returned file descriptor will be wrapped in an
Arc
so that it can be shared across programs and maps of the loaded BPF file with no user-visible impact (no breaking changes).As an add-on I also changed the
Cargo/toml
to be explicit about the MSRV. This had been previously bumped by #633 by using thestd::os::fd
module which was only introduced in v1.66.0. This PR only really needs MSRV v1.63.0 to useOwnedFd
,Related to #612
This change is![Reviewable](https://camo.githubusercontent.com/23b05f5fb48215c989e92cc44cf6512512d083132bd3daf689867c8d9d386888/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)