Skip to content

Commit 156ed20

Browse files
Byte-LabAlexei Starovoitov
authored andcommitted
bpf: Don't use rcu_users to refcount in task kfuncs
A series of prior patches added some kfuncs that allow struct task_struct * objects to be used as kptrs. These kfuncs leveraged the 'refcount_t rcu_users' field of the task for performing refcounting. This field was used instead of 'refcount_t usage', as we wanted to leverage the safety provided by RCU for ensuring a task's lifetime. A struct task_struct is refcounted by two different refcount_t fields: 1. p->usage: The "true" refcount field which task lifetime. The task is freed as soon as this refcount drops to 0. 2. p->rcu_users: An "RCU users" refcount field which is statically initialized to 2, and is co-located in a union with a struct rcu_head field (p->rcu). p->rcu_users essentially encapsulates a single p->usage refcount, and when p->rcu_users goes to 0, an RCU callback is scheduled on the struct rcu_head which decrements the p->usage refcount. Our logic was that by using p->rcu_users, we would be able to use RCU to safely issue refcount_inc_not_zero() a task's rcu_users field to determine if a task could still be acquired, or was exiting. Unfortunately, this does not work due to p->rcu_users and p->rcu sharing a union. When p->rcu_users goes to 0, an RCU callback is scheduled to drop a single p->usage refcount, and because the fields share a union, the refcount immediately becomes nonzero again after the callback is scheduled. If we were to split the fields out of the union, this wouldn't be a problem. Doing so should also be rather non-controversial, as there are a number of places in struct task_struct that have padding which we could use to avoid growing the structure by splitting up the fields. For now, so as to fix the kfuncs to be correct, this patch instead updates bpf_task_acquire() and bpf_task_release() to use the p->usage field for refcounting via the get_task_struct() and put_task_struct() functions. Because we can no longer rely on RCU, the change also guts the bpf_task_acquire_not_zero() and bpf_task_kptr_get() functions pending a resolution on the above problem. In addition, the task fixes the kfunc and rcu_read_lock selftests to expect this new behavior. Fixes: 9066030 ("bpf: Add kfuncs for storing struct task_struct * as a kptr") Fixes: fca1aa7 ("bpf: Handle MEM_RCU type properly") Reported-by: Matus Jokay <matus.jokay@stuba.sk> Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20221206210538.597606-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
1 parent 235d2ef commit 156ed20

File tree

3 files changed

+60
-30
lines changed

3 files changed

+60
-30
lines changed

kernel/bpf/helpers.c

Lines changed: 48 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1833,8 +1833,7 @@ struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head)
18331833
*/
18341834
struct task_struct *bpf_task_acquire(struct task_struct *p)
18351835
{
1836-
refcount_inc(&p->rcu_users);
1837-
return p;
1836+
return get_task_struct(p);
18381837
}
18391838

18401839
/**
@@ -1845,9 +1844,48 @@ struct task_struct *bpf_task_acquire(struct task_struct *p)
18451844
*/
18461845
struct task_struct *bpf_task_acquire_not_zero(struct task_struct *p)
18471846
{
1848-
if (!refcount_inc_not_zero(&p->rcu_users))
1849-
return NULL;
1850-
return p;
1847+
/* For the time being this function returns NULL, as it's not currently
1848+
* possible to safely acquire a reference to a task with RCU protection
1849+
* using get_task_struct() and put_task_struct(). This is due to the
1850+
* slightly odd mechanics of p->rcu_users, and how task RCU protection
1851+
* works.
1852+
*
1853+
* A struct task_struct is refcounted by two different refcount_t
1854+
* fields:
1855+
*
1856+
* 1. p->usage: The "true" refcount field which tracks a task's
1857+
* lifetime. The task is freed as soon as this
1858+
* refcount drops to 0.
1859+
*
1860+
* 2. p->rcu_users: An "RCU users" refcount field which is statically
1861+
* initialized to 2, and is co-located in a union with
1862+
* a struct rcu_head field (p->rcu). p->rcu_users
1863+
* essentially encapsulates a single p->usage
1864+
* refcount, and when p->rcu_users goes to 0, an RCU
1865+
* callback is scheduled on the struct rcu_head which
1866+
* decrements the p->usage refcount.
1867+
*
1868+
* There are two important implications to this task refcounting logic
1869+
* described above. The first is that
1870+
* refcount_inc_not_zero(&p->rcu_users) cannot be used anywhere, as
1871+
* after the refcount goes to 0, the RCU callback being scheduled will
1872+
* cause the memory backing the refcount to again be nonzero due to the
1873+
* fields sharing a union. The other is that we can't rely on RCU to
1874+
* guarantee that a task is valid in a BPF program. This is because a
1875+
* task could have already transitioned to being in the TASK_DEAD
1876+
* state, had its rcu_users refcount go to 0, and its rcu callback
1877+
* invoked in which it drops its single p->usage reference. At this
1878+
* point the task will be freed as soon as the last p->usage reference
1879+
* goes to 0, without waiting for another RCU gp to elapse. The only
1880+
* way that a BPF program can guarantee that a task is valid is in this
1881+
* scenario is to hold a p->usage refcount itself.
1882+
*
1883+
* Until we're able to resolve this issue, either by pulling
1884+
* p->rcu_users and p->rcu out of the union, or by getting rid of
1885+
* p->usage and just using p->rcu_users for refcounting, we'll just
1886+
* return NULL here.
1887+
*/
1888+
return NULL;
18511889
}
18521890

18531891
/**
@@ -1858,41 +1896,23 @@ struct task_struct *bpf_task_acquire_not_zero(struct task_struct *p)
18581896
*/
18591897
struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
18601898
{
1861-
struct task_struct *p;
1862-
1863-
rcu_read_lock();
1864-
p = READ_ONCE(*pp);
1865-
1866-
/* Another context could remove the task from the map and release it at
1867-
* any time, including after we've done the lookup above. This is safe
1868-
* because we're in an RCU read region, so the task is guaranteed to
1869-
* remain valid until at least the rcu_read_unlock() below.
1899+
/* We must return NULL here until we have clarity on how to properly
1900+
* leverage RCU for ensuring a task's lifetime. See the comment above
1901+
* in bpf_task_acquire_not_zero() for more details.
18701902
*/
1871-
if (p && !refcount_inc_not_zero(&p->rcu_users))
1872-
/* If the task had been removed from the map and freed as
1873-
* described above, refcount_inc_not_zero() will return false.
1874-
* The task will be freed at some point after the current RCU
1875-
* gp has ended, so just return NULL to the user.
1876-
*/
1877-
p = NULL;
1878-
rcu_read_unlock();
1879-
1880-
return p;
1903+
return NULL;
18811904
}
18821905

18831906
/**
18841907
* bpf_task_release - Release the reference acquired on a struct task_struct *.
1885-
* If this kfunc is invoked in an RCU read region, the task_struct is
1886-
* guaranteed to not be freed until the current grace period has ended, even if
1887-
* its refcount drops to 0.
18881908
* @p: The task on which a reference is being released.
18891909
*/
18901910
void bpf_task_release(struct task_struct *p)
18911911
{
18921912
if (!p)
18931913
return;
18941914

1895-
put_task_struct_rcu_user(p);
1915+
put_task_struct(p);
18961916
}
18971917

18981918
#ifdef CONFIG_CGROUPS

tools/testing/selftests/bpf/progs/rcu_read_lock.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,11 @@ int task_acquire(void *ctx)
161161
/* acquire a reference which can be used outside rcu read lock region */
162162
gparent = bpf_task_acquire_not_zero(gparent);
163163
if (!gparent)
164+
/* Until we resolve the issues with using task->rcu_users, we
165+
* expect bpf_task_acquire_not_zero() to return a NULL task.
166+
* See the comment at the definition of
167+
* bpf_task_acquire_not_zero() for more details.
168+
*/
164169
goto out;
165170

166171
(void)bpf_task_storage_get(&map_a, gparent, 0, 0);

tools/testing/selftests/bpf/progs/task_kfunc_success.c

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -123,12 +123,17 @@ int BPF_PROG(test_task_get_release, struct task_struct *task, u64 clone_flags)
123123
}
124124

125125
kptr = bpf_task_kptr_get(&v->task);
126-
if (!kptr) {
126+
if (kptr) {
127+
/* Until we resolve the issues with using task->rcu_users, we
128+
* expect bpf_task_kptr_get() to return a NULL task. See the
129+
* comment at the definition of bpf_task_acquire_not_zero() for
130+
* more details.
131+
*/
132+
bpf_task_release(kptr);
127133
err = 3;
128134
return 0;
129135
}
130136

131-
bpf_task_release(kptr);
132137

133138
return 0;
134139
}

0 commit comments

Comments
 (0)