Skip to content

Commit ae28ed4

Browse files
committed
Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov: - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc (Amery Hung) Applied as a stable branch in bpf-next and net-next trees. - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki) Also a stable branch in bpf-next and net-next trees. - Enforce expected_attach_type for tailcall compatibility (Daniel Borkmann) - Replace path-sensitive with path-insensitive live stack analysis in the verifier (Eduard Zingerman) This is a significant change in the verification logic. More details, motivation, long term plans are in the cover letter/merge commit. - Support signed BPF programs (KP Singh) This is another major feature that took years to materialize. Algorithm details are in the cover letter/marge commit - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich) - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan) - Fix USDT SIB argument handling in libbpf (Jiawei Zhao) - Allow uprobe-bpf program to change context registers (Jiri Olsa) - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and Puranjay Mohan) - Allow access to union arguments in tracing programs (Leon Hwang) - Optimize rcu_read_lock() + migrate_disable() combination where it's used in BPF subsystem (Menglong Dong) - Introduce bpf_task_work_schedule*() kfuncs to schedule deferred execution of BPF callback in the context of a specific task using the kernel’s task_work infrastructure (Mykyta Yatsenko) - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya Dwivedi) - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi) - Improve the precision of tnum multiplier verifier operation (Nandakumar Edamana) - Use tnums to improve is_branch_taken() logic (Paul Chaignon) - Add support for atomic operations in arena in riscv JIT (Pu Lehui) - Report arena faults to BPF error stream (Puranjay Mohan) - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin Monnet) - Add bpf_strcasecmp() kfunc (Rong Tao) - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao Chen) * tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits) libbpf: Replace AF_ALG with open coded SHA-256 selftests/bpf: Add stress test for rqspinlock in NMI selftests/bpf: Add test case for different expected_attach_type bpf: Enforce expected_attach_type for tailcall compatibility bpftool: Remove duplicate string.h header bpf: Remove duplicate crypto/sha2.h header libbpf: Fix error when st-prefix_ops and ops from differ btf selftests/bpf: Test changing packet data from kfunc selftests/bpf: Add stacktrace map lookup_and_delete_elem test case selftests/bpf: Refactor stacktrace_map case with skeleton bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE selftests/bpf: Fix flaky bpf_cookie selftest selftests/bpf: Test changing packet data from global functions with a kfunc bpf: Emit struct bpf_xdp_sock type in vmlinux BTF selftests/bpf: Task_work selftest cleanup fixes MAINTAINERS: Delete inactive maintainers from AF_XDP bpf: Mark kfuncs as __noclone selftests/bpf: Add kprobe multi write ctx attach test selftests/bpf: Add kprobe write ctx attach test selftests/bpf: Add uprobe context ip register change test ...
2 parents 4b81e2e + 4ef77dd commit ae28ed4

File tree

254 files changed

+11831
-2795
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

254 files changed

+11831
-2795
lines changed

CREDITS

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3912,6 +3912,12 @@ S: C/ Federico Garcia Lorca 1 10-A
39123912
S: Sevilla 41005
39133913
S: Spain
39143914

3915+
N: Björn Töpel
3916+
E: bjorn@kernel.org
3917+
D: AF_XDP
3918+
S: Gothenburg
3919+
S: Sweden
3920+
39153921
N: Linus Torvalds
39163922
E: torvalds@linux-foundation.org
39173923
D: Original kernel hacker

Documentation/bpf/kfuncs.rst

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,9 +335,26 @@ consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE
335335
pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely
336336
also be KF_RET_NULL.
337337

338+
2.4.8 KF_RCU_PROTECTED flag
339+
---------------------------
340+
341+
The KF_RCU_PROTECTED flag is used to indicate that the kfunc must be invoked in
342+
an RCU critical section. This is assumed by default in non-sleepable programs,
343+
and must be explicitly ensured by calling ``bpf_rcu_read_lock`` for sleepable
344+
ones.
345+
346+
If the kfunc returns a pointer value, this flag also enforces that the returned
347+
pointer is RCU protected, and can only be used while the RCU critical section is
348+
active.
349+
350+
The flag is distinct from the ``KF_RCU`` flag, which only ensures that its
351+
arguments are at least RCU protected pointers. This may transitively imply that
352+
RCU protection is ensured, but it does not work in cases of kfuncs which require
353+
RCU protection but do not take RCU protected arguments.
354+
338355
.. _KF_deprecated_flag:
339356

340-
2.4.8 KF_DEPRECATED flag
357+
2.4.9 KF_DEPRECATED flag
341358
------------------------
342359

343360
The KF_DEPRECATED flag is used for kfuncs which are scheduled to be

Documentation/bpf/verifier.rst

Lines changed: 0 additions & 264 deletions
Original file line numberDiff line numberDiff line change
@@ -347,270 +347,6 @@ However, only the value of register ``r1`` is important to successfully finish
347347
verification. The goal of the liveness tracking algorithm is to spot this fact
348348
and figure out that both states are actually equivalent.
349349

350-
Data structures
351-
~~~~~~~~~~~~~~~
352-
353-
Liveness is tracked using the following data structures::
354-
355-
enum bpf_reg_liveness {
356-
REG_LIVE_NONE = 0,
357-
REG_LIVE_READ32 = 0x1,
358-
REG_LIVE_READ64 = 0x2,
359-
REG_LIVE_READ = REG_LIVE_READ32 | REG_LIVE_READ64,
360-
REG_LIVE_WRITTEN = 0x4,
361-
REG_LIVE_DONE = 0x8,
362-
};
363-
364-
struct bpf_reg_state {
365-
...
366-
struct bpf_reg_state *parent;
367-
...
368-
enum bpf_reg_liveness live;
369-
...
370-
};
371-
372-
struct bpf_stack_state {
373-
struct bpf_reg_state spilled_ptr;
374-
...
375-
};
376-
377-
struct bpf_func_state {
378-
struct bpf_reg_state regs[MAX_BPF_REG];
379-
...
380-
struct bpf_stack_state *stack;
381-
}
382-
383-
struct bpf_verifier_state {
384-
struct bpf_func_state *frame[MAX_CALL_FRAMES];
385-
struct bpf_verifier_state *parent;
386-
...
387-
}
388-
389-
* ``REG_LIVE_NONE`` is an initial value assigned to ``->live`` fields upon new
390-
verifier state creation;
391-
392-
* ``REG_LIVE_WRITTEN`` means that the value of the register (or stack slot) is
393-
defined by some instruction verified between this verifier state's parent and
394-
verifier state itself;
395-
396-
* ``REG_LIVE_READ{32,64}`` means that the value of the register (or stack slot)
397-
is read by a some child state of this verifier state;
398-
399-
* ``REG_LIVE_DONE`` is a marker used by ``clean_verifier_state()`` to avoid
400-
processing same verifier state multiple times and for some sanity checks;
401-
402-
* ``->live`` field values are formed by combining ``enum bpf_reg_liveness``
403-
values using bitwise or.
404-
405-
Register parentage chains
406-
~~~~~~~~~~~~~~~~~~~~~~~~~
407-
408-
In order to propagate information between parent and child states, a *register
409-
parentage chain* is established. Each register or stack slot is linked to a
410-
corresponding register or stack slot in its parent state via a ``->parent``
411-
pointer. This link is established upon state creation in ``is_state_visited()``
412-
and might be modified by ``set_callee_state()`` called from
413-
``__check_func_call()``.
414-
415-
The rules for correspondence between registers / stack slots are as follows:
416-
417-
* For the current stack frame, registers and stack slots of the new state are
418-
linked to the registers and stack slots of the parent state with the same
419-
indices.
420-
421-
* For the outer stack frames, only callee saved registers (r6-r9) and stack
422-
slots are linked to the registers and stack slots of the parent state with the
423-
same indices.
424-
425-
* When function call is processed a new ``struct bpf_func_state`` instance is
426-
allocated, it encapsulates a new set of registers and stack slots. For this
427-
new frame, parent links for r6-r9 and stack slots are set to nil, parent links
428-
for r1-r5 are set to match caller r1-r5 parent links.
429-
430-
This could be illustrated by the following diagram (arrows stand for
431-
``->parent`` pointers)::
432-
433-
... ; Frame #0, some instructions
434-
--- checkpoint #0 ---
435-
1 : r6 = 42 ; Frame #0
436-
--- checkpoint #1 ---
437-
2 : call foo() ; Frame #0
438-
... ; Frame #1, instructions from foo()
439-
--- checkpoint #2 ---
440-
... ; Frame #1, instructions from foo()
441-
--- checkpoint #3 ---
442-
exit ; Frame #1, return from foo()
443-
3 : r1 = r6 ; Frame #0 <- current state
444-
445-
+-------------------------------+-------------------------------+
446-
| Frame #0 | Frame #1 |
447-
Checkpoint +-------------------------------+-------------------------------+
448-
#0 | r0 | r1-r5 | r6-r9 | fp-8 ... |
449-
+-------------------------------+
450-
^ ^ ^ ^
451-
| | | |
452-
Checkpoint +-------------------------------+
453-
#1 | r0 | r1-r5 | r6-r9 | fp-8 ... |
454-
+-------------------------------+
455-
^ ^ ^
456-
|_______|_______|_______________
457-
| | |
458-
nil nil | | | nil nil
459-
| | | | | | |
460-
Checkpoint +-------------------------------+-------------------------------+
461-
#2 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... |
462-
+-------------------------------+-------------------------------+
463-
^ ^ ^ ^ ^
464-
nil nil | | | | |
465-
| | | | | | |
466-
Checkpoint +-------------------------------+-------------------------------+
467-
#3 | r0 | r1-r5 | r6-r9 | fp-8 ... | r0 | r1-r5 | r6-r9 | fp-8 ... |
468-
+-------------------------------+-------------------------------+
469-
^ ^
470-
nil nil | |
471-
| | | |
472-
Current +-------------------------------+
473-
state | r0 | r1-r5 | r6-r9 | fp-8 ... |
474-
+-------------------------------+
475-
\
476-
r6 read mark is propagated via these links
477-
all the way up to checkpoint #1.
478-
The checkpoint #1 contains a write mark for r6
479-
because of instruction (1), thus read propagation
480-
does not reach checkpoint #0 (see section below).
481-
482-
Liveness marks tracking
483-
~~~~~~~~~~~~~~~~~~~~~~~
484-
485-
For each processed instruction, the verifier tracks read and written registers
486-
and stack slots. The main idea of the algorithm is that read marks propagate
487-
back along the state parentage chain until they hit a write mark, which 'screens
488-
off' earlier states from the read. The information about reads is propagated by
489-
function ``mark_reg_read()`` which could be summarized as follows::
490-
491-
mark_reg_read(struct bpf_reg_state *state, ...):
492-
parent = state->parent
493-
while parent:
494-
if state->live & REG_LIVE_WRITTEN:
495-
break
496-
if parent->live & REG_LIVE_READ64:
497-
break
498-
parent->live |= REG_LIVE_READ64
499-
state = parent
500-
parent = state->parent
501-
502-
Notes:
503-
504-
* The read marks are applied to the **parent** state while write marks are
505-
applied to the **current** state. The write mark on a register or stack slot
506-
means that it is updated by some instruction in the straight-line code leading
507-
from the parent state to the current state.
508-
509-
* Details about REG_LIVE_READ32 are omitted.
510-
511-
* Function ``propagate_liveness()`` (see section :ref:`read_marks_for_cache_hits`)
512-
might override the first parent link. Please refer to the comments in the
513-
``propagate_liveness()`` and ``mark_reg_read()`` source code for further
514-
details.
515-
516-
Because stack writes could have different sizes ``REG_LIVE_WRITTEN`` marks are
517-
applied conservatively: stack slots are marked as written only if write size
518-
corresponds to the size of the register, e.g. see function ``save_register_state()``.
519-
520-
Consider the following example::
521-
522-
0: (*u64)(r10 - 8) = 0 ; define 8 bytes of fp-8
523-
--- checkpoint #0 ---
524-
1: (*u32)(r10 - 8) = 1 ; redefine lower 4 bytes
525-
2: r1 = (*u32)(r10 - 8) ; read lower 4 bytes defined at (1)
526-
3: r2 = (*u32)(r10 - 4) ; read upper 4 bytes defined at (0)
527-
528-
As stated above, the write at (1) does not count as ``REG_LIVE_WRITTEN``. Should
529-
it be otherwise, the algorithm above wouldn't be able to propagate the read mark
530-
from (3) to checkpoint #0.
531-
532-
Once the ``BPF_EXIT`` instruction is reached ``update_branch_counts()`` is
533-
called to update the ``->branches`` counter for each verifier state in a chain
534-
of parent verifier states. When the ``->branches`` counter reaches zero the
535-
verifier state becomes a valid entry in a set of cached verifier states.
536-
537-
Each entry of the verifier states cache is post-processed by a function
538-
``clean_live_states()``. This function marks all registers and stack slots
539-
without ``REG_LIVE_READ{32,64}`` marks as ``NOT_INIT`` or ``STACK_INVALID``.
540-
Registers/stack slots marked in this way are ignored in function ``stacksafe()``
541-
called from ``states_equal()`` when a state cache entry is considered for
542-
equivalence with a current state.
543-
544-
Now it is possible to explain how the example from the beginning of the section
545-
works::
546-
547-
0: call bpf_get_prandom_u32()
548-
1: r1 = 0
549-
2: if r0 == 0 goto +1
550-
3: r0 = 1
551-
--- checkpoint[0] ---
552-
4: r0 = r1
553-
5: exit
554-
555-
* At instruction #2 branching point is reached and state ``{ r0 == 0, r1 == 0, pc == 4 }``
556-
is pushed to states processing queue (pc stands for program counter).
557-
558-
* At instruction #4:
559-
560-
* ``checkpoint[0]`` states cache entry is created: ``{ r0 == 1, r1 == 0, pc == 4 }``;
561-
* ``checkpoint[0].r0`` is marked as written;
562-
* ``checkpoint[0].r1`` is marked as read;
563-
564-
* At instruction #5 exit is reached and ``checkpoint[0]`` can now be processed
565-
by ``clean_live_states()``. After this processing ``checkpoint[0].r1`` has a
566-
read mark and all other registers and stack slots are marked as ``NOT_INIT``
567-
or ``STACK_INVALID``
568-
569-
* The state ``{ r0 == 0, r1 == 0, pc == 4 }`` is popped from the states queue
570-
and is compared against a cached state ``{ r1 == 0, pc == 4 }``, the states
571-
are considered equivalent.
572-
573-
.. _read_marks_for_cache_hits:
574-
575-
Read marks propagation for cache hits
576-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
577-
578-
Another point is the handling of read marks when a previously verified state is
579-
found in the states cache. Upon cache hit verifier must behave in the same way
580-
as if the current state was verified to the program exit. This means that all
581-
read marks, present on registers and stack slots of the cached state, must be
582-
propagated over the parentage chain of the current state. Example below shows
583-
why this is important. Function ``propagate_liveness()`` handles this case.
584-
585-
Consider the following state parentage chain (S is a starting state, A-E are
586-
derived states, -> arrows show which state is derived from which)::
587-
588-
r1 read
589-
<------------- A[r1] == 0
590-
C[r1] == 0
591-
S ---> A ---> B ---> exit E[r1] == 1
592-
|
593-
` ---> C ---> D
594-
|
595-
` ---> E ^
596-
|___ suppose all these
597-
^ states are at insn #Y
598-
|
599-
suppose all these
600-
states are at insn #X
601-
602-
* Chain of states ``S -> A -> B -> exit`` is verified first.
603-
604-
* While ``B -> exit`` is verified, register ``r1`` is read and this read mark is
605-
propagated up to state ``A``.
606-
607-
* When chain of states ``C -> D`` is verified the state ``D`` turns out to be
608-
equivalent to state ``B``.
609-
610-
* The read mark for ``r1`` has to be propagated to state ``C``, otherwise state
611-
``C`` might get mistakenly marked as equivalent to state ``E`` even though
612-
values for register ``r1`` differ between ``C`` and ``E``.
613-
614350
Understanding eBPF verifier messages
615351
====================================
616352

MAINTAINERS

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27466,10 +27466,8 @@ F: tools/testing/selftests/bpf/*xdp*
2746627466
K: (?:\b|_)xdp(?:\b|_)
2746727467

2746827468
XDP SOCKETS (AF_XDP)
27469-
M: Björn Töpel <bjorn@kernel.org>
2747027469
M: Magnus Karlsson <magnus.karlsson@intel.com>
2747127470
M: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
27472-
R: Jonathan Lemon <jonathan.lemon@gmail.com>
2747327471
R: Stanislav Fomichev <sdf@fomichev.me>
2747427472
L: netdev@vger.kernel.org
2747527473
L: bpf@vger.kernel.org

arch/arm64/net/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
#
33
# ARM64 networking code
44
#
5-
obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o
5+
obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o bpf_timed_may_goto.o

0 commit comments

Comments
 (0)