Skip to content

Commit 00501b5

Browse files
hnaztorvalds
authored andcommitted
mm: memcontrol: rewrite charge API
These patches rework memcg charge lifetime to integrate more naturally with the lifetime of user pages. This drastically simplifies the code and reduces charging and uncharging overhead. The most expensive part of charging and uncharging is the page_cgroup bit spinlock, which is removed entirely after this series. Here are the top-10 profile entries of a stress test that reads a 128G sparse file on a freshly booted box, without even a dedicated cgroup (i.e. executing in the root memcg). Before: 15.36% cat [kernel.kallsyms] [k] copy_user_generic_string 13.31% cat [kernel.kallsyms] [k] memset 11.48% cat [kernel.kallsyms] [k] do_mpage_readpage 4.23% cat [kernel.kallsyms] [k] get_page_from_freelist 2.38% cat [kernel.kallsyms] [k] put_page 2.32% cat [kernel.kallsyms] [k] __mem_cgroup_commit_charge 2.18% kswapd0 [kernel.kallsyms] [k] __mem_cgroup_uncharge_common 1.92% kswapd0 [kernel.kallsyms] [k] shrink_page_list 1.86% cat [kernel.kallsyms] [k] __radix_tree_lookup 1.62% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn After: 15.67% cat [kernel.kallsyms] [k] copy_user_generic_string 13.48% cat [kernel.kallsyms] [k] memset 11.42% cat [kernel.kallsyms] [k] do_mpage_readpage 3.98% cat [kernel.kallsyms] [k] get_page_from_freelist 2.46% cat [kernel.kallsyms] [k] put_page 2.13% kswapd0 [kernel.kallsyms] [k] shrink_page_list 1.88% cat [kernel.kallsyms] [k] __radix_tree_lookup 1.67% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn 1.39% kswapd0 [kernel.kallsyms] [k] free_pcppages_bulk 1.30% cat [kernel.kallsyms] [k] kfree As you can see, the memcg footprint has shrunk quite a bit. text data bss dec hex filename 37970 9892 400 48262 bc86 mm/memcontrol.o.old 35239 9892 400 45531 b1db mm/memcontrol.o This patch (of 4): The memcg charge API charges pages before they are rmapped - i.e. have an actual "type" - and so every callsite needs its own set of charge and uncharge functions to know what type is being operated on. Worse, uncharge has to happen from a context that is still type-specific, rather than at the end of the page's lifetime with exclusive access, and so requires a lot of synchronization. Rewrite the charge API to provide a generic set of try_charge(), commit_charge() and cancel_charge() transaction operations, much like what's currently done for swap-in: mem_cgroup_try_charge() attempts to reserve a charge, reclaiming pages from the memcg if necessary. mem_cgroup_commit_charge() commits the page to the charge once it has a valid page->mapping and PageAnon() reliably tells the type. mem_cgroup_cancel_charge() aborts the transaction. This reduces the charge API and enables subsequent patches to drastically simplify uncharging. As pages need to be committed after rmap is established but before they are added to the LRU, page_add_new_anon_rmap() must stop doing LRU additions again. Revive lru_cache_add_active_or_unevictable(). [hughd@google.com: fix shmem_unuse] [hughd@google.com: Add comments on the private use of -EAGAIN] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent 4449a51 commit 00501b5

File tree

12 files changed

+338
-395
lines changed

12 files changed

+338
-395
lines changed

Documentation/cgroups/memcg_test.txt

Lines changed: 5 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -24,24 +24,7 @@ Please note that implementation details can be changed.
2424

2525
a page/swp_entry may be charged (usage += PAGE_SIZE) at
2626

27-
mem_cgroup_charge_anon()
28-
Called at new page fault and Copy-On-Write.
29-
30-
mem_cgroup_try_charge_swapin()
31-
Called at do_swap_page() (page fault on swap entry) and swapoff.
32-
Followed by charge-commit-cancel protocol. (With swap accounting)
33-
At commit, a charge recorded in swap_cgroup is removed.
34-
35-
mem_cgroup_charge_file()
36-
Called at add_to_page_cache()
37-
38-
mem_cgroup_cache_charge_swapin()
39-
Called at shmem's swapin.
40-
41-
mem_cgroup_prepare_migration()
42-
Called before migration. "extra" charge is done and followed by
43-
charge-commit-cancel protocol.
44-
At commit, charge against oldpage or newpage will be committed.
27+
mem_cgroup_try_charge()
4528

4629
2. Uncharge
4730
a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by
@@ -69,19 +52,14 @@ Please note that implementation details can be changed.
6952
to new page is committed. At failure, charge to old page is committed.
7053

7154
3. charge-commit-cancel
72-
In some case, we can't know this "charge" is valid or not at charging
73-
(because of races).
74-
To handle such case, there are charge-commit-cancel functions.
75-
mem_cgroup_try_charge_XXX
76-
mem_cgroup_commit_charge_XXX
77-
mem_cgroup_cancel_charge_XXX
78-
these are used in swap-in and migration.
55+
Memcg pages are charged in two steps:
56+
mem_cgroup_try_charge()
57+
mem_cgroup_commit_charge() or mem_cgroup_cancel_charge()
7958

8059
At try_charge(), there are no flags to say "this page is charged".
8160
at this point, usage += PAGE_SIZE.
8261

83-
At commit(), the function checks the page should be charged or not
84-
and set flags or avoid charging.(usage -= PAGE_SIZE)
62+
At commit(), the page is associated with the memcg.
8563

8664
At cancel(), simply usage -= PAGE_SIZE.
8765

include/linux/memcontrol.h

Lines changed: 14 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -54,28 +54,11 @@ struct mem_cgroup_reclaim_cookie {
5454
};
5555

5656
#ifdef CONFIG_MEMCG
57-
/*
58-
* All "charge" functions with gfp_mask should use GFP_KERNEL or
59-
* (gfp_mask & GFP_RECLAIM_MASK). In current implementatin, memcg doesn't
60-
* alloc memory but reclaims memory from all available zones. So, "where I want
61-
* memory from" bits of gfp_mask has no meaning. So any bits of that field is
62-
* available but adding a rule is better. charge functions' gfp_mask should
63-
* be set to GFP_KERNEL or gfp_mask & GFP_RECLAIM_MASK for avoiding ambiguous
64-
* codes.
65-
* (Of course, if memcg does memory allocation in future, GFP_KERNEL is sane.)
66-
*/
67-
68-
extern int mem_cgroup_charge_anon(struct page *page, struct mm_struct *mm,
69-
gfp_t gfp_mask);
70-
/* for swap handling */
71-
extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
72-
struct page *page, gfp_t mask, struct mem_cgroup **memcgp);
73-
extern void mem_cgroup_commit_charge_swapin(struct page *page,
74-
struct mem_cgroup *memcg);
75-
extern void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg);
76-
77-
extern int mem_cgroup_charge_file(struct page *page, struct mm_struct *mm,
78-
gfp_t gfp_mask);
57+
int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
58+
gfp_t gfp_mask, struct mem_cgroup **memcgp);
59+
void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
60+
bool lrucare);
61+
void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg);
7962

8063
struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *);
8164
struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *);
@@ -233,30 +216,22 @@ void mem_cgroup_print_bad_page(struct page *page);
233216
#else /* CONFIG_MEMCG */
234217
struct mem_cgroup;
235218

236-
static inline int mem_cgroup_charge_anon(struct page *page,
237-
struct mm_struct *mm, gfp_t gfp_mask)
238-
{
239-
return 0;
240-
}
241-
242-
static inline int mem_cgroup_charge_file(struct page *page,
243-
struct mm_struct *mm, gfp_t gfp_mask)
244-
{
245-
return 0;
246-
}
247-
248-
static inline int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
249-
struct page *page, gfp_t gfp_mask, struct mem_cgroup **memcgp)
219+
static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
220+
gfp_t gfp_mask,
221+
struct mem_cgroup **memcgp)
250222
{
223+
*memcgp = NULL;
251224
return 0;
252225
}
253226

254-
static inline void mem_cgroup_commit_charge_swapin(struct page *page,
255-
struct mem_cgroup *memcg)
227+
static inline void mem_cgroup_commit_charge(struct page *page,
228+
struct mem_cgroup *memcg,
229+
bool lrucare)
256230
{
257231
}
258232

259-
static inline void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *memcg)
233+
static inline void mem_cgroup_cancel_charge(struct page *page,
234+
struct mem_cgroup *memcg)
260235
{
261236
}
262237

include/linux/swap.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -320,6 +320,9 @@ extern void swap_setup(void);
320320

321321
extern void add_page_to_unevictable_list(struct page *page);
322322

323+
extern void lru_cache_add_active_or_unevictable(struct page *page,
324+
struct vm_area_struct *vma);
325+
323326
/* linux/mm/vmscan.c */
324327
extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
325328
gfp_t gfp_mask, nodemask_t *mask);

kernel/events/uprobes.c

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,11 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
167167
/* For mmu_notifiers */
168168
const unsigned long mmun_start = addr;
169169
const unsigned long mmun_end = addr + PAGE_SIZE;
170+
struct mem_cgroup *memcg;
171+
172+
err = mem_cgroup_try_charge(kpage, vma->vm_mm, GFP_KERNEL, &memcg);
173+
if (err)
174+
return err;
170175

171176
/* For try_to_free_swap() and munlock_vma_page() below */
172177
lock_page(page);
@@ -179,6 +184,8 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
179184

180185
get_page(kpage);
181186
page_add_new_anon_rmap(kpage, vma, addr);
187+
mem_cgroup_commit_charge(kpage, memcg, false);
188+
lru_cache_add_active_or_unevictable(kpage, vma);
182189

183190
if (!PageAnon(page)) {
184191
dec_mm_counter(mm, MM_FILEPAGES);
@@ -200,6 +207,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
200207

201208
err = 0;
202209
unlock:
210+
mem_cgroup_cancel_charge(kpage, memcg);
203211
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
204212
unlock_page(page);
205213
return err;
@@ -315,18 +323,11 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
315323
if (!new_page)
316324
goto put_old;
317325

318-
if (mem_cgroup_charge_anon(new_page, mm, GFP_KERNEL))
319-
goto put_new;
320-
321326
__SetPageUptodate(new_page);
322327
copy_highpage(new_page, old_page);
323328
copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
324329

325330
ret = __replace_page(vma, vaddr, old_page, new_page);
326-
if (ret)
327-
mem_cgroup_uncharge_page(new_page);
328-
329-
put_new:
330331
page_cache_release(new_page);
331332
put_old:
332333
put_page(old_page);

mm/filemap.c

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
#include <linux/security.h>
3232
#include <linux/cpuset.h>
3333
#include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
34+
#include <linux/hugetlb.h>
3435
#include <linux/memcontrol.h>
3536
#include <linux/cleancache.h>
3637
#include <linux/rmap.h>
@@ -548,19 +549,24 @@ static int __add_to_page_cache_locked(struct page *page,
548549
pgoff_t offset, gfp_t gfp_mask,
549550
void **shadowp)
550551
{
552+
int huge = PageHuge(page);
553+
struct mem_cgroup *memcg;
551554
int error;
552555

553556
VM_BUG_ON_PAGE(!PageLocked(page), page);
554557
VM_BUG_ON_PAGE(PageSwapBacked(page), page);
555558

556-
error = mem_cgroup_charge_file(page, current->mm,
557-
gfp_mask & GFP_RECLAIM_MASK);
558-
if (error)
559-
return error;
559+
if (!huge) {
560+
error = mem_cgroup_try_charge(page, current->mm,
561+
gfp_mask, &memcg);
562+
if (error)
563+
return error;
564+
}
560565

561566
error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
562567
if (error) {
563-
mem_cgroup_uncharge_cache_page(page);
568+
if (!huge)
569+
mem_cgroup_cancel_charge(page, memcg);
564570
return error;
565571
}
566572

@@ -575,13 +581,16 @@ static int __add_to_page_cache_locked(struct page *page,
575581
goto err_insert;
576582
__inc_zone_page_state(page, NR_FILE_PAGES);
577583
spin_unlock_irq(&mapping->tree_lock);
584+
if (!huge)
585+
mem_cgroup_commit_charge(page, memcg, false);
578586
trace_mm_filemap_add_to_page_cache(page);
579587
return 0;
580588
err_insert:
581589
page->mapping = NULL;
582590
/* Leave page->index set: truncation relies upon it */
583591
spin_unlock_irq(&mapping->tree_lock);
584-
mem_cgroup_uncharge_cache_page(page);
592+
if (!huge)
593+
mem_cgroup_cancel_charge(page, memcg);
585594
page_cache_release(page);
586595
return error;
587596
}

0 commit comments

Comments
 (0)