Skip to content

Commit efc1970

Browse files
author
Alexei Starovoitov
committed
Merge branch 'Support storing struct task_struct objects as kptrs'
David Vernet says: ==================== Now that BPF supports adding new kernel functions with kfuncs, and storing kernel objects in maps with kptrs, we can add a set of kfuncs which allow struct task_struct objects to be stored in maps as referenced kptrs. The possible use cases for doing this are plentiful. During tracing, for example, it would be useful to be able to collect some tasks that performed a certain operation, and then periodically summarize who they are, which cgroup they're in, how much CPU time they've utilized, etc. Doing this now would require storing the tasks' pids along with some relevant data to be exported to user space, and later associating the pids to tasks in other event handlers where the data is recorded. Another useful by-product of this is that it allows a program to pin a task in a BPF program, and by proxy therefore also e.g. pin its task local storage. In order to support this, we'll need to expand KF_TRUSTED_ARGS to support receiving trusted, non-refcounted pointers. It currently only supports either PTR_TO_CTX pointers, or refcounted pointers. What this means in terms of the implementation is that check_kfunc_args() would have to also check for the PTR_TRUSTED or MEM_ALLOC type modifiers when determining if a trusted KF_ARG_PTR_TO_ALLOC_BTF_ID or KF_ARG_PTR_TO_BTF_ID pointer requires a refcount. Note that PTR_UNTRUSTED is insufficient for this purpose, as it does not cover all of the possible types of potentially unsafe pointers. For example, a pointer obtained from walking a struct is not PTR_UNTRUSTED. To account for this and enable us to expand KF_TRUSTED_ARGS to include allow-listed arguments such as those passed by the kernel to tracepoints and struct_ops callbacks, this patch set also introduces a new PTR_TRUSTED type flag modifier which records if a pointer was obtained passed from the kernel in a trusted context. Currently, both PTR_TRUSTED and MEM_ALLOC are used to imply that a pointer is trusted. Longer term, PTR_TRUSTED should be the sole source of truth for whether a pointer is trusted. This requires us to set PTR_TRUSTED when appropriate (e.g. when setting MEM_ALLOC), and unset it when appropriate (e.g. when setting PTR_UNTRUSTED). We don't do that in this patch, as we need to do more clean up before this can be done in a clear and well-defined manner. In closing, this patch set: 1. Adds the new PTR_TRUSTED register type modifier flag, and updates the verifier and existing selftests accordingly. Also expands KF_TRUSTED_ARGS to also include trusted pointers that were not obtained from walking structs. 2. Adds a new set of kfuncs that allows struct task_struct* objects to be used as kptrs. 3. Adds a new selftest suite to validate these new task kfuncs. --- Changelog: v8 -> v9: - Moved check for release register back to where we check for !PTR_TO_BTF_ID || socket. Change the verifier log message to reflect really what's being tested (the presence of unsafe modifiers) (Alexei) - Fix verifier_test error tests to reflect above changes - Remove unneeded parens around bitwise operator checks (Alexei) - Move updates to reg_type_str() which allow multiple type modifiers to be present in the prefix string, to a separate patch (Alexei) - Increase TYPE_STR_BUF_LEN size to 128 to reflect larger prefix size in reg_type_str(). v7 -> v8: - Rebased onto Kumar's latest patch set which, adds a new MEM_ALLOC reg type modifier for bpf_obj_new() calls. - Added comments to bpf_task_kptr_get() describing some of the subtle races we're protecting against (Alexei and John) - Slightly rework process_kf_arg_ptr_to_btf_id(), and add a new reg_has_unsafe_modifiers() function which validates that a register containing a kfunc release arg doesn't have unsafe modifiers. Note that this is slightly different than the check for KF_TRUSTED_ARGS. An alternative here would be to treat KF_RELEASE as implicitly requiring KF_TRUSTED_ARGS. - Export inline bpf_type_has_unsafe_modifiers() function from bpf_verifier.h so that it can be used from bpf_tcp_ca.c. Eventually this function should likely be changed to bpf_type_is_trusted(), once PTR_TRUSTED is the real source of truth. v6 -> v7: - Removed the PTR_WALKED type modifier, and instead define a new PTR_TRUSTED type modifier which is set on registers containing pointers passed from trusted contexts (i.e. as tracepoint or struct_ops callback args) (Alexei) - Remove the new KF_OWNED_ARGS kfunc flag. This can be accomplished by defining a new type that wraps an existing type, such as with struct nf_conn___init (Alexei) - Add a test_task_current_acquire_release testcase which verifies we can acquire a task struct returned from bpf_get_current_task_btf(). - Make bpf_task_acquire() no longer return NULL, as it can no longer be called with a NULL task. - Removed unnecessary is_test_kfunc_task() checks from failure testcases. v5 -> v6: - Add a new KF_OWNED_ARGS kfunc flag which may be used by kfuncs to express that they require trusted, refcounted args (Kumar) - Rename PTR_NESTED -> PTR_WALKED in the verifier (Kumar) - Convert reg_type_str() prefixes to use snprintf() instead of strncpy() (Kumar) - Add PTR_TO_BTF_ID | PTR_WALKED to missing struct btf_reg_type instances -- specifically btf_id_sock_common_types, and percpu_btf_ptr_types. - Add a missing PTR_TO_BTF_ID | PTR_WALKED switch case entry in check_func_arg_reg_off(), which is required when validating helper calls (Kumar) - Update reg_type_mismatch_ok() to check base types for the registers (i.e. to accommodate type modifiers). Additionally, add a lengthy comment that explains why this is being done (Kumar) - Update convert_ctx_accesses() to also issue probe reads for PTR_TO_BTF_ID | PTR_WALKED (Kumar) - Update selftests to expect new prefix reg type strings. - Rename task_kfunc_acquire_trusted_nested testcase to task_kfunc_acquire_trusted_walked, and fix a comment (Kumar) - Remove KF_TRUSTED_ARGS from bpf_task_release(), which already includes KF_RELEASE (Kumar) - Add bpf-next in patch subject lines (Kumar) v4 -> v5: - Fix an improperly formatted patch title. v3 -> v4: - Remove an unnecessary check from my repository that I forgot to remove after debugging something. v2 -> v3: - Make bpf_task_acquire() check for NULL, and include KF_RET_NULL (Martin) - Include new PTR_NESTED register modifier type flag which specifies whether a pointer was obtained from walking a struct. Use this to expand the meaning of KF_TRUSTED_ARGS to include trusted pointers that were passed from the kernel (Kumar) - Add more selftests to the task_kfunc selftest suite which verify that you cannot pass a walked pointer to bpf_task_acquire(). - Update bpf_task_acquire() to also specify KF_TRUSTED_ARGS. v1 -> v2: - Rename tracing_btf_ids to generic_kfunc_btf_ids, and add the new kfuncs to that list instead of making a separate btf id list (Alexei). - Don't run the new selftest suite on s390x, which doesn't appear to support invoking kfuncs. - Add a missing __diag_ignore block for -Wmissing-prototypes (lkp@intel.com). - Fix formatting on some of the SPDX-License-Identifier tags. - Clarified the function header comment a bit on bpf_task_kptr_get(). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 parents ee748cd + fe14795 commit efc1970

File tree

16 files changed

+886
-70
lines changed

16 files changed

+886
-70
lines changed

Documentation/bpf/kfuncs.rst

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -161,22 +161,20 @@ KF_ACQUIRE and KF_RET_NULL flags.
161161
--------------------------
162162

163163
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
164-
indicates that the all pointer arguments will always have a guaranteed lifetime,
165-
and pointers to kernel objects are always passed to helpers in their unmodified
166-
form (as obtained from acquire kfuncs).
167-
168-
It can be used to enforce that a pointer to a refcounted object acquired from a
169-
kfunc or BPF helper is passed as an argument to this kfunc without any
170-
modifications (e.g. pointer arithmetic) such that it is trusted and points to
171-
the original object.
172-
173-
Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
174-
but those can have a non-zero offset.
175-
176-
This flag is often used for kfuncs that operate (change some property, perform
177-
some operation) on an object that was obtained using an acquire kfunc. Such
178-
kfuncs need an unchanged pointer to ensure the integrity of the operation being
179-
performed on the expected object.
164+
indicates that the all pointer arguments are valid, and that all pointers to
165+
BTF objects have been passed in their unmodified form (that is, at a zero
166+
offset, and without having been obtained from walking another pointer).
167+
168+
There are two types of pointers to kernel objects which are considered "valid":
169+
170+
1. Pointers which are passed as tracepoint or struct_ops callback arguments.
171+
2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
172+
173+
Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
174+
KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
175+
176+
The definition of "valid" pointers is subject to change at any time, and has
177+
absolutely no ABI stability guarantees.
180178

181179
2.4.6 KF_SLEEPABLE flag
182180
-----------------------

include/linux/bpf.h

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -543,6 +543,35 @@ enum bpf_type_flag {
543543
*/
544544
MEM_ALLOC = BIT(11 + BPF_BASE_TYPE_BITS),
545545

546+
/* PTR was passed from the kernel in a trusted context, and may be
547+
* passed to KF_TRUSTED_ARGS kfuncs or BPF helper functions.
548+
* Confusingly, this is _not_ the opposite of PTR_UNTRUSTED above.
549+
* PTR_UNTRUSTED refers to a kptr that was read directly from a map
550+
* without invoking bpf_kptr_xchg(). What we really need to know is
551+
* whether a pointer is safe to pass to a kfunc or BPF helper function.
552+
* While PTR_UNTRUSTED pointers are unsafe to pass to kfuncs and BPF
553+
* helpers, they do not cover all possible instances of unsafe
554+
* pointers. For example, a pointer that was obtained from walking a
555+
* struct will _not_ get the PTR_UNTRUSTED type modifier, despite the
556+
* fact that it may be NULL, invalid, etc. This is due to backwards
557+
* compatibility requirements, as this was the behavior that was first
558+
* introduced when kptrs were added. The behavior is now considered
559+
* deprecated, and PTR_UNTRUSTED will eventually be removed.
560+
*
561+
* PTR_TRUSTED, on the other hand, is a pointer that the kernel
562+
* guarantees to be valid and safe to pass to kfuncs and BPF helpers.
563+
* For example, pointers passed to tracepoint arguments are considered
564+
* PTR_TRUSTED, as are pointers that are passed to struct_ops
565+
* callbacks. As alluded to above, pointers that are obtained from
566+
* walking PTR_TRUSTED pointers are _not_ trusted. For example, if a
567+
* struct task_struct *task is PTR_TRUSTED, then accessing
568+
* task->last_wakee will lose the PTR_TRUSTED modifier when it's stored
569+
* in a BPF register. Similarly, pointers passed to certain programs
570+
* types such as kretprobes are not guaranteed to be valid, as they may
571+
* for example contain an object that was recently freed.
572+
*/
573+
PTR_TRUSTED = BIT(12 + BPF_BASE_TYPE_BITS),
574+
546575
__BPF_TYPE_FLAG_MAX,
547576
__BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,
548577
};
@@ -636,6 +665,7 @@ enum bpf_return_type {
636665
RET_PTR_TO_RINGBUF_MEM_OR_NULL = PTR_MAYBE_NULL | MEM_RINGBUF | RET_PTR_TO_MEM,
637666
RET_PTR_TO_DYNPTR_MEM_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_MEM,
638667
RET_PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID,
668+
RET_PTR_TO_BTF_ID_TRUSTED = PTR_TRUSTED | RET_PTR_TO_BTF_ID,
639669

640670
/* This must be the last entry. Its purpose is to ensure the enum is
641671
* wide enough to hold the higher bits reserved for bpf_type_flag.

include/linux/bpf_verifier.h

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
*/
2020
#define BPF_MAX_VAR_SIZ (1 << 29)
2121
/* size of type_str_buf in bpf_verifier. */
22-
#define TYPE_STR_BUF_LEN 64
22+
#define TYPE_STR_BUF_LEN 128
2323

2424
/* Liveness marks, used for registers and spilled-regs (in stack slots).
2525
* Read marks propagate upwards until they find a write mark; they record that
@@ -680,4 +680,11 @@ static inline bool bpf_prog_check_recur(const struct bpf_prog *prog)
680680
}
681681
}
682682

683+
#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED)
684+
685+
static inline bool bpf_type_has_unsafe_modifiers(u32 type)
686+
{
687+
return type_flag(type) & ~BPF_REG_TRUSTED_MODIFIERS;
688+
}
689+
683690
#endif /* _LINUX_BPF_VERIFIER_H */

include/linux/btf.h

Lines changed: 41 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -19,36 +19,53 @@
1919
#define KF_RELEASE (1 << 1) /* kfunc is a release function */
2020
#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */
2121
#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */
22-
/* Trusted arguments are those which are meant to be referenced arguments with
23-
* unchanged offset. It is used to enforce that pointers obtained from acquire
24-
* kfuncs remain unmodified when being passed to helpers taking trusted args.
22+
/* Trusted arguments are those which are guaranteed to be valid when passed to
23+
* the kfunc. It is used to enforce that pointers obtained from either acquire
24+
* kfuncs, or from the main kernel on a tracepoint or struct_ops callback
25+
* invocation, remain unmodified when being passed to helpers taking trusted
26+
* args.
2527
*
26-
* Consider
27-
* struct foo {
28-
* int data;
29-
* struct foo *next;
30-
* };
28+
* Consider, for example, the following new task tracepoint:
3129
*
32-
* struct bar {
33-
* int data;
34-
* struct foo f;
35-
* };
30+
* SEC("tp_btf/task_newtask")
31+
* int BPF_PROG(new_task_tp, struct task_struct *task, u64 clone_flags)
32+
* {
33+
* ...
34+
* }
3635
*
37-
* struct foo *f = alloc_foo(); // Acquire kfunc
38-
* struct bar *b = alloc_bar(); // Acquire kfunc
36+
* And the following kfunc:
3937
*
40-
* If a kfunc set_foo_data() wants to operate only on the allocated object, it
41-
* will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like:
38+
* BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
4239
*
43-
* set_foo_data(f, 42); // Allowed
44-
* set_foo_data(f->next, 42); // Rejected, non-referenced pointer
45-
* set_foo_data(&f->next, 42);// Rejected, referenced, but wrong type
46-
* set_foo_data(&b->f, 42); // Rejected, referenced, but bad offset
40+
* All invocations to the kfunc must pass the unmodified, unwalked task:
4741
*
48-
* In the final case, usually for the purposes of type matching, it is deduced
49-
* by looking at the type of the member at the offset, but due to the
50-
* requirement of trusted argument, this deduction will be strict and not done
51-
* for this case.
42+
* bpf_task_acquire(task); // Allowed
43+
* bpf_task_acquire(task->last_wakee); // Rejected, walked task
44+
*
45+
* Programs may also pass referenced tasks directly to the kfunc:
46+
*
47+
* struct task_struct *acquired;
48+
*
49+
* acquired = bpf_task_acquire(task); // Allowed, same as above
50+
* bpf_task_acquire(acquired); // Allowed
51+
* bpf_task_acquire(task); // Allowed
52+
* bpf_task_acquire(acquired->last_wakee); // Rejected, walked task
53+
*
54+
* Programs may _not_, however, pass a task from an arbitrary fentry/fexit, or
55+
* kprobe/kretprobe to the kfunc, as BPF cannot guarantee that all of these
56+
* pointers are guaranteed to be safe. For example, the following BPF program
57+
* would be rejected:
58+
*
59+
* SEC("kretprobe/free_task")
60+
* int BPF_PROG(free_task_probe, struct task_struct *tsk)
61+
* {
62+
* struct task_struct *acquired;
63+
*
64+
* acquired = bpf_task_acquire(acquired); // Rejected, not a trusted pointer
65+
* bpf_task_release(acquired);
66+
*
67+
* return 0;
68+
* }
5269
*/
5370
#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
5471
#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */

kernel/bpf/btf.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5799,6 +5799,11 @@ static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto,
57995799
return nr_args + 1;
58005800
}
58015801

5802+
static bool prog_type_args_trusted(enum bpf_prog_type prog_type)
5803+
{
5804+
return prog_type == BPF_PROG_TYPE_TRACING || prog_type == BPF_PROG_TYPE_STRUCT_OPS;
5805+
}
5806+
58025807
bool btf_ctx_access(int off, int size, enum bpf_access_type type,
58035808
const struct bpf_prog *prog,
58045809
struct bpf_insn_access_aux *info)
@@ -5942,6 +5947,9 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
59425947
}
59435948

59445949
info->reg_type = PTR_TO_BTF_ID;
5950+
if (prog_type_args_trusted(prog->type))
5951+
info->reg_type |= PTR_TRUSTED;
5952+
59455953
if (tgt_prog) {
59465954
enum bpf_prog_type tgt_type;
59475955

kernel/bpf/helpers.c

Lines changed: 75 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1824,6 +1824,63 @@ struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
18241824
return __bpf_list_del(head, true);
18251825
}
18261826

1827+
/**
1828+
* bpf_task_acquire - Acquire a reference to a task. A task acquired by this
1829+
* kfunc which is not stored in a map as a kptr, must be released by calling
1830+
* bpf_task_release().
1831+
* @p: The task on which a reference is being acquired.
1832+
*/
1833+
struct task_struct *bpf_task_acquire(struct task_struct *p)
1834+
{
1835+
refcount_inc(&p->rcu_users);
1836+
return p;
1837+
}
1838+
1839+
/**
1840+
* bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
1841+
* kptr acquired by this kfunc which is not subsequently stored in a map, must
1842+
* be released by calling bpf_task_release().
1843+
* @pp: A pointer to a task kptr on which a reference is being acquired.
1844+
*/
1845+
struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
1846+
{
1847+
struct task_struct *p;
1848+
1849+
rcu_read_lock();
1850+
p = READ_ONCE(*pp);
1851+
1852+
/* Another context could remove the task from the map and release it at
1853+
* any time, including after we've done the lookup above. This is safe
1854+
* because we're in an RCU read region, so the task is guaranteed to
1855+
* remain valid until at least the rcu_read_unlock() below.
1856+
*/
1857+
if (p && !refcount_inc_not_zero(&p->rcu_users))
1858+
/* If the task had been removed from the map and freed as
1859+
* described above, refcount_inc_not_zero() will return false.
1860+
* The task will be freed at some point after the current RCU
1861+
* gp has ended, so just return NULL to the user.
1862+
*/
1863+
p = NULL;
1864+
rcu_read_unlock();
1865+
1866+
return p;
1867+
}
1868+
1869+
/**
1870+
* bpf_task_release - Release the reference acquired on a struct task_struct *.
1871+
* If this kfunc is invoked in an RCU read region, the task_struct is
1872+
* guaranteed to not be freed until the current grace period has ended, even if
1873+
* its refcount drops to 0.
1874+
* @p: The task on which a reference is being released.
1875+
*/
1876+
void bpf_task_release(struct task_struct *p)
1877+
{
1878+
if (!p)
1879+
return;
1880+
1881+
put_task_struct_rcu_user(p);
1882+
}
1883+
18271884
__diag_pop();
18281885

18291886
BTF_SET8_START(generic_btf_ids)
@@ -1836,21 +1893,36 @@ BTF_ID_FLAGS(func, bpf_list_push_front)
18361893
BTF_ID_FLAGS(func, bpf_list_push_back)
18371894
BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
18381895
BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
1896+
BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
1897+
BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
1898+
BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
18391899
BTF_SET8_END(generic_btf_ids)
18401900

18411901
static const struct btf_kfunc_id_set generic_kfunc_set = {
18421902
.owner = THIS_MODULE,
18431903
.set = &generic_btf_ids,
18441904
};
18451905

1906+
BTF_ID_LIST(generic_dtor_ids)
1907+
BTF_ID(struct, task_struct)
1908+
BTF_ID(func, bpf_task_release)
1909+
18461910
static int __init kfunc_init(void)
18471911
{
18481912
int ret;
1913+
const struct btf_id_dtor_kfunc generic_dtors[] = {
1914+
{
1915+
.btf_id = generic_dtor_ids[0],
1916+
.kfunc_btf_id = generic_dtor_ids[1]
1917+
},
1918+
};
18491919

18501920
ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
1851-
if (ret)
1852-
return ret;
1853-
return register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &generic_kfunc_set);
1921+
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &generic_kfunc_set);
1922+
ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &generic_kfunc_set);
1923+
return ret ?: register_btf_id_dtor_kfuncs(generic_dtors,
1924+
ARRAY_SIZE(generic_dtors),
1925+
THIS_MODULE);
18541926
}
18551927

18561928
late_initcall(kfunc_init);

0 commit comments

Comments
 (0)