Skip to content

Commit

Permalink
mm: memcg/slab: obj_cgroup API
Browse files Browse the repository at this point in the history
Obj_cgroup API provides an ability to account sub-page sized kernel
objects, which potentially outlive the original memory cgroup.

The top-level API consists of the following functions:
  bool obj_cgroup_tryget(struct obj_cgroup *objcg);
  void obj_cgroup_get(struct obj_cgroup *objcg);
  void obj_cgroup_put(struct obj_cgroup *objcg);

  int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size);
  void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size);

  struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg);
  struct obj_cgroup *get_obj_cgroup_from_current(void);

Object cgroup is basically a pointer to a memory cgroup with a per-cpu
reference counter.  It substitutes a memory cgroup in places where it's
necessary to charge a custom amount of bytes instead of pages.

All charged memory rounded down to pages is charged to the corresponding
memory cgroup using __memcg_kmem_charge().

It implements reparenting: on memcg offlining it's getting reattached to
the parent memory cgroup.  Each online memory cgroup has an associated
active object cgroup to handle new allocations and the list of all
attached object cgroups.  On offlining of a cgroup this list is reparented
and for each object cgroup in the list the memcg pointer is swapped to the
parent memory cgroup.  It prevents long-living objects from pinning the
original memory cgroup in the memory.

The implementation is based on byte-sized per-cpu stocks.  A sub-page
sized leftover is stored in an atomic field, which is a part of obj_cgroup
object.  So on cgroup offlining the leftover is automatically reparented.

memcg->objcg is rcu protected.  objcg->memcg is a raw pointer, which is
always pointing at a memory cgroup, but can be atomically swapped to the
parent memory cgroup.  So a user must ensure the lifetime of the
cgroup, e.g.  grab rcu_read_lock or css_set_lock.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200623174037.3951353-7-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
  • Loading branch information
rgushchin authored and torvalds committed Aug 7, 2020
1 parent 1a3e1f4 commit bf4f059
Show file tree
Hide file tree
Showing 2 changed files with 338 additions and 1 deletion.
51 changes: 51 additions & 0 deletions include/linux/memcontrol.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include <linux/page-flags.h>

struct mem_cgroup;
struct obj_cgroup;
struct page;
struct mm_struct;
struct kmem_cache;
Expand Down Expand Up @@ -192,6 +193,22 @@ struct memcg_cgwb_frn {
struct wb_completion done; /* tracks in-flight foreign writebacks */
};

/*
* Bucket for arbitrarily byte-sized objects charged to a memory
* cgroup. The bucket can be reparented in one piece when the cgroup
* is destroyed, without having to round up the individual references
* of all live memory objects in the wild.
*/
struct obj_cgroup {
struct percpu_ref refcnt;
struct mem_cgroup *memcg;
atomic_t nr_charged_bytes;
union {
struct list_head list;
struct rcu_head rcu;
};
};

/*
* The memory controller data structure. The memory controller controls both
* page cache and RSS per cgroup. We would eventually like to provide
Expand Down Expand Up @@ -301,6 +318,8 @@ struct mem_cgroup {
int kmemcg_id;
enum memcg_kmem_state kmem_state;
struct list_head kmem_caches;
struct obj_cgroup __rcu *objcg;
struct list_head objcg_list; /* list of inherited objcgs */
#endif

#ifdef CONFIG_CGROUP_WRITEBACK
Expand Down Expand Up @@ -416,6 +435,33 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){
return css ? container_of(css, struct mem_cgroup, css) : NULL;
}

static inline bool obj_cgroup_tryget(struct obj_cgroup *objcg)
{
return percpu_ref_tryget(&objcg->refcnt);
}

static inline void obj_cgroup_get(struct obj_cgroup *objcg)
{
percpu_ref_get(&objcg->refcnt);
}

static inline void obj_cgroup_put(struct obj_cgroup *objcg)
{
percpu_ref_put(&objcg->refcnt);
}

/*
* After the initialization objcg->memcg is always pointing at
* a valid memcg, but can be atomically swapped to the parent memcg.
*
* The caller must ensure that the returned memcg won't be released:
* e.g. acquire the rcu_read_lock or css_set_lock.
*/
static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
{
return READ_ONCE(objcg->memcg);
}

static inline void mem_cgroup_put(struct mem_cgroup *memcg)
{
if (memcg)
Expand Down Expand Up @@ -1368,6 +1414,11 @@ void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages);
int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order);
void __memcg_kmem_uncharge_page(struct page *page, int order);

struct obj_cgroup *get_obj_cgroup_from_current(void);

int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size);
void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size);

extern struct static_key_false memcg_kmem_enabled_key;
extern struct workqueue_struct *memcg_kmem_cache_wq;

Expand Down
Loading

0 comments on commit bf4f059

Please sign in to comment.