Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BPF ring buffer #318

Merged
merged 2 commits into from
Sep 8, 2021
Merged

BPF ring buffer #318

merged 2 commits into from
Sep 8, 2021

Conversation

mythi
Copy link
Contributor

@mythi mythi commented Jun 2, 2021

Fixes: #316

Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding sharing code: seems like the epoll stuff is very similar. Maybe pull that into it's own abstraction in internal? It would:

  • Take a list of fds and event values to wait for
  • Have a function Wait() or similar that handles turning events into event values
  • Have a function Close() that handles interruption

For the ringbuf code: is there documentation somewhere? What are the BUSY and DISCARD bits used for? How do we account for dropped events from the buffer (I assume those are also a thing)?

ringbuf/ring.go Show resolved Hide resolved
ringbuf/ring_test.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Show resolved Hide resolved
@mythi
Copy link
Contributor Author

mythi commented Jun 7, 2021

For the ringbuf code: is there documentation somewhere? What are the BUSY and DISCARD bits used for?
How do we account for dropped events from the buffer (I assume those are also a thing)?

I added the kernel doc URL in the commit message. Also, I've found this blog post very useful: https://nakryiko.com/posts/bpf-ringbuf/ . The latter also talks about the dropped/lost samples case.

@lmb
Copy link
Collaborator

lmb commented Jun 8, 2021

That was helpful, thanks!

Each record has 8 byte header, which contains the length of reserved record, as well as two extra bits: busy bit to denote that record is still being worked on, and discard bit, which might be set at commit time if record is discarded. In the latter case, consumer is supposed to skip the record and move on to the next one.

This doesn't specify what should happen in case of the busy bit being set. Can user space actually encounter that? Currently the consumer returns an error (?), should it retry instead?

More random thoughts:

  • Keeping perf and ringbuf in separate packages makes sense
  • BPF is expected to share a buffer across CPUs. Having per-CPU buffers has to be done manually (?) Should the API here support reading from multiple ring buffers with one instance of Reader? (Probably not)
  • Related to the above: if we only need to wait for a single fd, do we need the epoll abstraction? (Probably yes since it might be easier to reuse what is there)
  • Since the ringbuf size is determined by the map max_elems, should the code derive the size from that instead of taking it as a parameter? What if max_elems is not a power of two or page size aligned?

@mythi
Copy link
Contributor Author

mythi commented Jun 8, 2021

Since the ringbuf size is determined by the map max_elems, should the code derive the size from that instead of
taking it as a parameter? What if max_elems is not a power of two or page size aligned?

AFAICS max_elems is pre-verified before NewReader() is instantiated with a *Map. We could have something like this for sanity checks:

+func TestRingbuf(t *testing.T) {
+       ringbufTests := []struct {
+               name        string
+               spec        *MapSpec
+               expectedErr bool
+       }{
+               {
+                       name: "RingBuf size OK",
+                       spec: &MapSpec{
+                               Type:       RingBuf,
+                               MaxEntries: 1024 * 4,
+                       },
+                       expectedErr: false,
+               },
+               {
+                       name: "RingBuf size smaller than page size",
+                       spec: &MapSpec{
+                               Type:       RingBuf,
+                               MaxEntries: 1024,
+                       },
+                       expectedErr: true,
+               },
+               {
+                       name: "RingBuf size larger than page size but not power of 2",
+                       spec: &MapSpec{
+                               Type:       RingBuf,
+                               MaxEntries: 1023 * 5,
+                       },
+                       expectedErr: true,
+               },
+       }
+
+       for _, tt := range ringbufTests {
+               t.Run(tt.name, func(t *testing.T) {
+                       m, err := NewMap(tt.spec)
+                       if tt.expectedErr && err == nil {
+                               t.Errorf("Test case '%q': expected error", tt.name)
+                       }
+                       if !tt.expectedErr && err != nil {
+                               t.Errorf("Test case '%q': expected success", tt.name)
+                       }
+                       defer m.Close()
+               })
+       }
+}

map_test.go Outdated Show resolved Hide resolved
@lmb
Copy link
Collaborator

lmb commented Jun 15, 2021

Since the ringbuf size is determined by the map max_elems, should the code derive the size from that instead of
taking it as a parameter? What if max_elems is not a power of two or page size aligned?

AFAICS max_elems is pre-verified before NewReader() is instantiated with a *Map. We could have something like this for sanity checks:

I personally wouldn't add tests for creating ringbuf maps with weird sizes: they just assert kernel behaviour which may change in the future (the uapi isn't as immutable as we like to think!). Instead I'd duplicate the power of two check to the start of NewReader. More robust and less code. Up to you.

@ti-mo
Copy link
Collaborator

ti-mo commented Jun 22, 2021

@mythi Would you be up for splitting up the two first commits into a separate PR? IMO that part is good to go, doesn't need to be blocked on the epoll stuff.

@mythi
Copy link
Contributor Author

mythi commented Jun 22, 2021

@ti-mo created #331. I dropped the selftest for ringbuf sizes for now.

@mythi
Copy link
Contributor Author

mythi commented Aug 13, 2021

FYI, I've improved things and rebased to master. Keeping it draft still until I finish all the unit tests.

Keeping perf and ringbuf in separate packages makes sense

+1

BPF is expected to share a buffer across CPUs. Having per-CPU buffers has to be done manually (?)

With BPF ringbuf, there's no per CPU buffers.

Should the API here support reading from multiple ring buffers with one instance of Reader? (Probably not)

Good question. It should be easy to add (by reusing the perCPU infra perf has) and libbpf seems to have support for it but in my latest version it's map per reader only.

Related to the above: if we only need to wait for a single fd, do we need the epoll abstraction? (Probably yes since it might be easier to reuse what is there)

In the latest version, I added pollTimeout as the Reader parameter to support busyloop reads and blocking read.

Since the ringbuf size is determined by the map max_elems, should the code derive the size from that instead of taking it as a parameter?

Yes, implemented.

What if max_elems is not a power of two or page size aligned?

Can we rely on the verifier to check the ringbuf map has the correct size?

@mythi mythi changed the title RFC: BPF ring buffer BPF ring buffer Aug 16, 2021
@mythi mythi marked this pull request as ready for review August 16, 2021 14:44
Copy link
Collaborator

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your continued work on this!

I'd like to look at the tests in a future round, this is all I have for now. Would also like for this to be a little more polished before it goes in (or we might hold off until after 0.7), but looks solid overall. 👍

ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
@mythi
Copy link
Contributor Author

mythi commented Aug 18, 2021

@ti-mo thanks for the great comments! I agree with them. This work is closely based on the perf package and some of the comments would be relevant there too but I'll leave that unchanged :-)

@ti-mo ti-mo mentioned this pull request Aug 19, 2021
Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look as well. Have you considered #318 (review) ?
There is a lot of similar-but-not-quite the same code here, which is a great recipe for bugs. If possible I'd like to share code that actually is similar/

ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/ring.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
@mythi mythi force-pushed the ringbuf branch 2 times, most recently from 68a1776 to 1793dc5 Compare August 25, 2021 09:55
@mythi
Copy link
Contributor Author

mythi commented Aug 25, 2021

Have you considered #318 (review)?

@lmb Yes, but getting some of the code shared between ringbuf and perf did not look very straightforward at first glance so I wanted to focus on the functionality/API/tests first. I could take another look once we get the other pieces stabilized, Ok?

Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments, I think this is in a good shape. Doing the epoll changes later is ok.

ringbuf/reader.go Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/ring.go Outdated Show resolved Hide resolved
ringbuf/ring_test.go Outdated Show resolved Hide resolved
@mythi mythi force-pushed the ringbuf branch 2 times, most recently from 5157592 to f9fe205 Compare August 27, 2021 16:32
@lmb lmb requested a review from ti-mo August 31, 2021 11:06
Copy link
Collaborator

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits, and would like some clarification around the use of atomics in ringbuf.

internal/unix/types_other.go Outdated Show resolved Hide resolved
ringbuf/reader.go Outdated Show resolved Hide resolved
ringbuf/reader_test.go Outdated Show resolved Hide resolved
ringbuf/ring.go Outdated Show resolved Hide resolved
ringbuf/ring.go Show resolved Hide resolved
ringbuf/ring.go Outdated Show resolved Hide resolved
@lmb
Copy link
Collaborator

lmb commented Sep 7, 2021

@ti-mo I took another quick look, and all seems well. I'll merge this end of today if you want to do another pass.

Copy link
Collaborator

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you!

ringbuf/ring.go Show resolved Hide resolved
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Linux 5.8 added support for BPF ring buffer that addresses some
limitations with perf events when delivering variable size records
from BPF programs to user space [1].

Similar to and based on the 'perf' package, 'ringbuf' package implements
an io.Reader to read such records submitted by bpf_ringbuf_output()/
bpf_ringbuf_submit().

[1] https://www.kernel.org/doc/html/latest/bpf/ringbuf.html

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
@ti-mo ti-mo merged commit af50a38 into cilium:master Sep 8, 2021
@mythi mythi mentioned this pull request Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BPF ring buffer
4 participants