Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: add initial BPF and XDP documentation #546

Merged
merged 1 commit into from
Apr 15, 2017
Merged

doc: add initial BPF and XDP documentation #546

merged 1 commit into from
Apr 15, 2017

Conversation

borkmann
Copy link
Member

We're currently lacking a guide on this, therefore add a start of it
that we can further extend over time.

Rendered version from wip branch:

http://cilium.readthedocs.io/en/doc-bpf-wip/

Signed-off-by: Daniel Borkmann daniel@cilium.io

Copy link
Contributor

@aalemayhu aalemayhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the minor nitpick this looks very good and was fun to read 🌮

generic and flexible enough that there are many kernel subsystems which use eBPF
apart from only networking. Nowadays, the Linux kernel runs eBPF only and loaded
cBPF bytecode is transparently translated into an eBPF representation in the
kernel before program execution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/into an/into a/ ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably fine as is, but I'll double check to make sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(an eBPF representation is the correct form)

these packet load instructions are less relevant nowadays. ``BPF_LDX`` class
holds instructions for byte / half-word / word / double-word loads out of
memory. Memory in this context is generic can could be stack memory, map value
data, packet data, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/can could/and could/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

included into the main programs. For example, Cilium makes heavy use of
this (see ``bpf/lib/``). However, this still allows for including header
files, for example, from the kernel or other libraries and resuse their
static inline functions or macros / definitions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/resuse/reuse/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


The remaining section names are specific for eBPF program code, for example,
the below code has been modified to contain to two program sections, ``ingress``
and ``egress``. The toy example code demonstrates that both can share a map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/to contain to/to contain/ ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

value size. This works, because during execution, eBPF programs are guaranteed
to never get preempted by the kernel and therefore can use the single map entry
as a scratch buffer for temporariy data, for example, to extend beyond the stack
limitation. This also works accross tail calls, since it has the same guarantees
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/temporariy/temporary/ ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

program attached. ``ip link | grep xdp`` can thus be used to find all interfaces
that have XDP running. Further introspection facilities will be provided through
the detailed view through ``ip -d link`` once the kernel API gains support for
dumping additional attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/view through/view/ ? The extra through adds nothing extra IMO, but either way is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote it as view with.

For removing the entire ``clsact`` qdisc from the netdevice, which implicitly also
removes all attached programs from the ``ingress`` and ``egress`` hooks, the
below command is provided:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/instace/instance/ ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

.. _bpf_guide:

*****************
BPF and XDP Guide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename this to "BPF and XDP Reference Guide" to make it a bit clearer that this is deep dive material for developers and not required for users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

types such as tc and XDP, and to aide developing Cilium's eBPF templates.

**This purely serves as a developer's guide and is explicitly not a requirement
for Cilium users to read, since Cilium abstracts eBPF internals away entirely.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this and make it the first paragraph. You can use:

.. note:: This purely serves as a ...

prefix so it will appear as a note widget so it stands out a bit more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

**This purely serves as a developer's guide and is explicitly not a requirement
for Cilium users to read, since Cilium abstracts eBPF internals away entirely.**

The older cBPF architecture is not covered by this document, since Cilium does
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop the 'since Cilium' part

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


eBPF does not define itself by only providing its instruction set, but also by
offering further infrastructure around it such as maps that act as efficient
key / value stores, helper functions to interact with the kernel, tail calls for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helper functions to interact with and leverage kernel functionaly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


* In case of networking (e.g., tc and XDP), eBPF programs can be updated atomically
without having to restart the kernel, system services or containers, and without
traffic interruptions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add that any program state can be maintained throughout updates via bpf maps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

calls with regard to user space applications.

* eBPF programs work in concert with the kernel, they make use of existing kernel
infrastructure and tooling (e.g., iproute2) as well as the safety guarantees that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to add an (e.g. ...) with a couple of examples of used kernel infrastructure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

infrastructure and tooling (e.g., iproute2) as well as the safety guarantees that
the kernel provides. Unlike kernel modules, eBPF programs are verified through an
in-kernel verifier in order to ensure that they cannot crash the kernel, always
terminate, etc. XDP programs, for example, reuse the existing in-kernel drivers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move XDP down until you have properly introduced it. The XDP concept is completely unknown to the reader at this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given it's kind of used throughout the doc (also in the more generic LLVM / iproute2 section), I tried to introduce XDP and tc terms at the very beginning now, but we could later on also still rearrange this a bit if we think a strict separation seems better.


The execution of an eBPF program inside the kernel is always event driven! For example,
a networking device that has an eBPF program attached on its ingress path will trigger
the execution of the program once a packet is received.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a second example for kprobe attachment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

handing execution back to the kernel, the exit value is passed as a 32 bit value.

Registers ``r1`` - ``r5`` are scratch registers, meaning the eBPF program needs to
either spill them to the eBPF stack or move them to callee saved registers if these
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a sentence to explains spilling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@borkmann borkmann force-pushed the doc-bpf-wip branch 4 times, most recently from bfe9836 to 44cd590 Compare April 14, 2017 23:35
We're currently lacking a guide on this, therefore add a start of it
that we can further extend over time.

Rendered version from wip branch:

  http://cilium.readthedocs.io/en/doc-bpf-wip/

Signed-off-by: Daniel Borkmann <daniel@cilium.io>
@tgraf
Copy link
Member

tgraf commented Apr 15, 2017

I'll provide more feedback directly as a PR but this is in great shape to go in already.

@borkmann borkmann merged commit e70ff85 into master Apr 15, 2017
@borkmann
Copy link
Member Author

Perfect, thanks. I'll continue working on the remaining bits.

@borkmann borkmann deleted the doc-bpf-wip branch April 15, 2017 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants