Skip to content

Commit 1bc542c

Browse files
linuszengakpm00
authored andcommitted
mm/vmscan: wake up flushers conditionally to avoid cgroup OOM
Commit 14aa8b2 ("mm/mglru: don't sync disk for each aging cycle") removed the opportunity to wake up flushers during the MGLRU page reclamation process can lead to an increased likelihood of triggering OOM when encountering many dirty pages during reclamation on MGLRU. This leads to premature OOM if there are too many dirty pages in cgroup: Killed dd invoked oom-killer: gfp_mask=0x101cca(GFP_HIGHUSER_MOVABLE|__GFP_WRITE), order=0, oom_score_adj=0 Call Trace: <TASK> dump_stack_lvl+0x5f/0x80 dump_stack+0x14/0x20 dump_header+0x46/0x1b0 oom_kill_process+0x104/0x220 out_of_memory+0x112/0x5a0 mem_cgroup_out_of_memory+0x13b/0x150 try_charge_memcg+0x44f/0x5c0 charge_memcg+0x34/0x50 __mem_cgroup_charge+0x31/0x90 filemap_add_folio+0x4b/0xf0 __filemap_get_folio+0x1a4/0x5b0 ? srso_return_thunk+0x5/0x5f ? __block_commit_write+0x82/0xb0 ext4_da_write_begin+0xe5/0x270 generic_perform_write+0x134/0x2b0 ext4_buffered_write_iter+0x57/0xd0 ext4_file_write_iter+0x76/0x7d0 ? selinux_file_permission+0x119/0x150 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f vfs_write+0x30c/0x440 ksys_write+0x65/0xe0 __x64_sys_write+0x1e/0x30 x64_sys_call+0x11c2/0x1d50 do_syscall_64+0x47/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e memory: usage 308224kB, limit 308224kB, failcnt 2589 swap: usage 0kB, limit 9007199254740988kB, failcnt 0 ... file_dirty 303247360 file_writeback 0 ... oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=test, mems_allowed=0,oom_memcg=/test,task_memcg=/test,task=dd,pid=4404,uid=0 Memory cgroup out of memory: Killed process 4404 (dd) total-vm:10512kB, anon-rss:1152kB, file-rss:1824kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:0 The flusher wake up was removed to decrease SSD wearing, but if we are seeing all dirty folios at the tail of an LRU, not waking up the flusher could lead to thrashing easily. So wake it up when a memcg is about to OOM due to dirty caches. I did run the build kernel test[1] on V6, with -j16 1G memcg on my local branch: Without the patch(10 times): user 1449.394 system 368.78 372.58 363.03 362.31 360.84 372.70 368.72 364.94 373.51 366.58 (avg 367.399) real 164.883 With the V6 patch(10 times): user 1447.525 system 360.87 360.63 372.39 364.09 368.49 365.15 359.93 362.04 359.72 354.60 (avg 362.79) real 164.514 Test results show that this patch has about 1% performance improvement, which should be caused by noise. Link: https://lkml.kernel.org/r/20241026115714.1437435-1-jingxiangzeng.cas@gmail.com Link: https://lore.kernel.org/all/CACePvbV4L-gRN9UKKuUnksfVJjOTq_5Sti2-e=pb_w51kucLKQ@mail.gmail.com/ [1] Fixes: 14aa8b2 ("mm/mglru: don't sync disk for each aging cycle") Suggested-by: Wei Xu <weixugc@google.com> Signed-off-by: Zeng Jingxiang <linuszeng@tencent.com> Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Wei Xu <weixugc@google.com> Tested-by: Chris Li <chrisl@kernel.org> Cc: T.J. Mercier <tjmercier@google.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent 33d7f15 commit 1bc542c

File tree

1 file changed

+22
-3
lines changed

1 file changed

+22
-3
lines changed

mm/vmscan.c

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4284,6 +4284,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
42844284
int tier_idx)
42854285
{
42864286
bool success;
4287+
bool dirty, writeback;
42874288
int gen = folio_lru_gen(folio);
42884289
int type = folio_is_file_lru(folio);
42894290
int zone = folio_zonenum(folio);
@@ -4329,9 +4330,17 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
43294330
return true;
43304331
}
43314332

4333+
dirty = folio_test_dirty(folio);
4334+
writeback = folio_test_writeback(folio);
4335+
if (type == LRU_GEN_FILE && dirty) {
4336+
sc->nr.file_taken += delta;
4337+
if (!writeback)
4338+
sc->nr.unqueued_dirty += delta;
4339+
}
4340+
43324341
/* waiting for writeback */
4333-
if (folio_test_locked(folio) || folio_test_writeback(folio) ||
4334-
(type == LRU_GEN_FILE && folio_test_dirty(folio))) {
4342+
if (folio_test_locked(folio) || writeback ||
4343+
(type == LRU_GEN_FILE && dirty)) {
43354344
gen = folio_inc_gen(lruvec, folio, true);
43364345
list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
43374346
return true;
@@ -4447,7 +4456,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
44474456
trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, MAX_LRU_BATCH,
44484457
scanned, skipped, isolated,
44494458
type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
4450-
4459+
if (type == LRU_GEN_FILE)
4460+
sc->nr.file_taken += isolated;
44514461
/*
44524462
* There might not be eligible folios due to reclaim_idx. Check the
44534463
* remaining to prevent livelock if it's not making progress.
@@ -4581,6 +4591,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
45814591
return scanned;
45824592
retry:
45834593
reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false);
4594+
sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
45844595
sc->nr_reclaimed += reclaimed;
45854596
trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
45864597
scanned, reclaimed, &stat, sc->priority,
@@ -4789,6 +4800,13 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
47894800
cond_resched();
47904801
}
47914802

4803+
/*
4804+
* If too many file cache in the coldest generation can't be evicted
4805+
* due to being dirty, wake up the flusher.
4806+
*/
4807+
if (sc->nr.unqueued_dirty && sc->nr.unqueued_dirty == sc->nr.file_taken)
4808+
wakeup_flusher_threads(WB_REASON_VMSCAN);
4809+
47924810
/* whether this lruvec should be rotated */
47934811
return nr_to_scan < 0;
47944812
}
@@ -5934,6 +5952,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
59345952
bool reclaimable = false;
59355953

59365954
if (lru_gen_enabled() && root_reclaim(sc)) {
5955+
memset(&sc->nr, 0, sizeof(sc->nr));
59375956
lru_gen_shrink_node(pgdat, sc);
59385957
return;
59395958
}

0 commit comments

Comments
 (0)