Skip to content
Permalink
Browse files
perf/core: Share an event with multiple cgroups
As we can run many jobs (in container) on a big machine, we want to
measure each job's performance during the run.  To do that, the
perf_event can be associated to a cgroup to measure it only.

However such cgroup events need to be opened separately and it causes
significant overhead in event multiplexing during the context switch
as well as resource consumption like in file descriptors and memory
footprint.

As a cgroup event is basically a cpu event, we can share a single cpu
event for multiple cgroups.  All we need is a separate counter (and
two timing variables) for each cgroup.  I added a hash table to map
from cgroup id to the attached cgroups.

With this change, the cpu event needs to calculate a delta of event
counter values when the cgroups of current and the next task are
different.  And it attributes the delta to the current task's cgroup.

This patch adds two new ioctl commands to perf_event for light-weight
cgroup event counting (i.e. perf stat).

 * PERF_EVENT_IOC_ATTACH_CGROUP - it takes a buffer consists of a
     64-bit array to attach given cgroups.  The first element is a
     number of cgroups in the buffer, and the rest is a list of cgroup
     ids to add a cgroup info to the given event.

 * PERF_EVENT_IOC_READ_CGROUP - it takes a buffer consists of a 64-bit
     array to get the event counter values.  The first element is size
     of the array in byte, and the second element is a cgroup id to
     read.  The rest is to save the counter value and timings.

This attaches all cgroups in a single syscall and I didn't add the
DETACH command deliberately to make the implementation simple.  The
attached cgroup nodes would be deleted when the file descriptor of the
perf_event is closed.

Cc: Tejun Heo <tj@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
  • Loading branch information
namhyung authored and intel-lab-lkp committed Apr 13, 2021
1 parent cface03 commit c604a61fb3cfd58be50992c8284b13e598312794
Show file tree
Hide file tree
Showing 3 changed files with 474 additions and 27 deletions.
@@ -771,6 +771,19 @@ struct perf_event {

#ifdef CONFIG_CGROUP_PERF
struct perf_cgroup *cgrp; /* cgroup event is attach to */

/* to share an event for multiple cgroups */
struct hlist_head *cgrp_node_hash;
struct perf_cgroup_node *cgrp_node_entries;
int nr_cgrp_nodes;
int cgrp_node_hash_bits;

struct list_head cgrp_node_entry;

/* snapshot of previous reading (for perf_cgroup_node below) */
u64 cgrp_node_count;
u64 cgrp_node_time_enabled;
u64 cgrp_node_time_running;
#endif

#ifdef CONFIG_SECURITY
@@ -780,6 +793,13 @@ struct perf_event {
#endif /* CONFIG_PERF_EVENTS */
};

struct perf_cgroup_node {
struct hlist_node node;
u64 id;
u64 count;
u64 time_enabled;
u64 time_running;
} ____cacheline_aligned;

struct perf_event_groups {
struct rb_root tree;
@@ -843,6 +863,8 @@ struct perf_event_context {
int pin_count;
#ifdef CONFIG_CGROUP_PERF
int nr_cgroups; /* cgroup evts */
struct list_head cgrp_node_list;
struct list_head cgrp_ctx_entry;
#endif
void *task_ctx_data; /* pmu specific data */
struct rcu_head rcu_head;
@@ -479,6 +479,8 @@ struct perf_event_query_bpf {
#define PERF_EVENT_IOC_PAUSE_OUTPUT _IOW('$', 9, __u32)
#define PERF_EVENT_IOC_QUERY_BPF _IOWR('$', 10, struct perf_event_query_bpf *)
#define PERF_EVENT_IOC_MODIFY_ATTRIBUTES _IOW('$', 11, struct perf_event_attr *)
#define PERF_EVENT_IOC_ATTACH_CGROUP _IOW('$', 12, __u64 *)
#define PERF_EVENT_IOC_READ_CGROUP _IOWR('$', 13, __u64 *)

enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,

0 comments on commit c604a61

Please sign in to comment.