Skip to content

Commit 6446a51

Browse files
a-darwishPeter Zijlstra
authored andcommitted
mm/swap: Do not abuse the seqcount_t latching API
Commit eef1a42 ("mm/swap.c: piggyback lru_add_drain_all() calls") implemented an optimization mechanism to exit the to-be-started LRU drain operation (name it A) if another drain operation *started and finished* while (A) was blocked on the LRU draining mutex. This was done through a seqcount_t latch, which is an abuse of its semantics: 1. seqcount_t latching should be used for the purpose of switching between two storage places with sequence protection to allow interruptible, preemptible, writer sections. The referenced optimization mechanism has absolutely nothing to do with that. 2. The used raw_write_seqcount_latch() has two SMP write memory barriers to insure one consistent storage place out of the two storage places available. A full memory barrier is required instead: to guarantee that the pagevec counter stores visible by local CPU are visible to other CPUs -- before loading the current drain generation. Beside the seqcount_t API abuse, the semantics of a latch sequence counter was force-fitted into the referenced optimization. What was meant is to track "generations" of LRU draining operations, where "global lru draining generation = x" implies that all generations 0 < n <= x are already *scheduled* for draining -- thus nothing needs to be done if the current generation number n <= x. Remove the conceptually-inappropriate seqcount_t latch usage. Manually implement the referenced optimization using a counter and SMP memory barriers. Note, while at it, use the non-atomic variant of cpumask_set_cpu(), __cpumask_set_cpu(), due to the already existing mutex protection. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/87y2pg9erj.fsf@vostro.fn.ogness.net
1 parent 58faf20 commit 6446a51

File tree

1 file changed

+54
-11
lines changed

1 file changed

+54
-11
lines changed

mm/swap.c

Lines changed: 54 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -763,10 +763,20 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
763763
*/
764764
void lru_add_drain_all(void)
765765
{
766-
static seqcount_t seqcount = SEQCNT_ZERO(seqcount);
767-
static DEFINE_MUTEX(lock);
766+
/*
767+
* lru_drain_gen - Global pages generation number
768+
*
769+
* (A) Definition: global lru_drain_gen = x implies that all generations
770+
* 0 < n <= x are already *scheduled* for draining.
771+
*
772+
* This is an optimization for the highly-contended use case where a
773+
* user space workload keeps constantly generating a flow of pages for
774+
* each CPU.
775+
*/
776+
static unsigned int lru_drain_gen;
768777
static struct cpumask has_work;
769-
int cpu, seq;
778+
static DEFINE_MUTEX(lock);
779+
unsigned cpu, this_gen;
770780

771781
/*
772782
* Make sure nobody triggers this path before mm_percpu_wq is fully
@@ -775,21 +785,54 @@ void lru_add_drain_all(void)
775785
if (WARN_ON(!mm_percpu_wq))
776786
return;
777787

778-
seq = raw_read_seqcount_latch(&seqcount);
788+
/*
789+
* Guarantee pagevec counter stores visible by this CPU are visible to
790+
* other CPUs before loading the current drain generation.
791+
*/
792+
smp_mb();
793+
794+
/*
795+
* (B) Locally cache global LRU draining generation number
796+
*
797+
* The read barrier ensures that the counter is loaded before the mutex
798+
* is taken. It pairs with smp_mb() inside the mutex critical section
799+
* at (D).
800+
*/
801+
this_gen = smp_load_acquire(&lru_drain_gen);
779802

780803
mutex_lock(&lock);
781804

782805
/*
783-
* Piggyback on drain started and finished while we waited for lock:
784-
* all pages pended at the time of our enter were drained from vectors.
806+
* (C) Exit the draining operation if a newer generation, from another
807+
* lru_add_drain_all(), was already scheduled for draining. Check (A).
785808
*/
786-
if (__read_seqcount_retry(&seqcount, seq))
809+
if (unlikely(this_gen != lru_drain_gen))
787810
goto done;
788811

789-
raw_write_seqcount_latch(&seqcount);
812+
/*
813+
* (D) Increment global generation number
814+
*
815+
* Pairs with smp_load_acquire() at (B), outside of the critical
816+
* section. Use a full memory barrier to guarantee that the new global
817+
* drain generation number is stored before loading pagevec counters.
818+
*
819+
* This pairing must be done here, before the for_each_online_cpu loop
820+
* below which drains the page vectors.
821+
*
822+
* Let x, y, and z represent some system CPU numbers, where x < y < z.
823+
* Assume CPU #z is is in the middle of the for_each_online_cpu loop
824+
* below and has already reached CPU #y's per-cpu data. CPU #x comes
825+
* along, adds some pages to its per-cpu vectors, then calls
826+
* lru_add_drain_all().
827+
*
828+
* If the paired barrier is done at any later step, e.g. after the
829+
* loop, CPU #x will just exit at (C) and miss flushing out all of its
830+
* added pages.
831+
*/
832+
WRITE_ONCE(lru_drain_gen, lru_drain_gen + 1);
833+
smp_mb();
790834

791835
cpumask_clear(&has_work);
792-
793836
for_each_online_cpu(cpu) {
794837
struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
795838

@@ -801,7 +844,7 @@ void lru_add_drain_all(void)
801844
need_activate_page_drain(cpu)) {
802845
INIT_WORK(work, lru_add_drain_per_cpu);
803846
queue_work_on(cpu, mm_percpu_wq, work);
804-
cpumask_set_cpu(cpu, &has_work);
847+
__cpumask_set_cpu(cpu, &has_work);
805848
}
806849
}
807850

@@ -816,7 +859,7 @@ void lru_add_drain_all(void)
816859
{
817860
lru_add_drain();
818861
}
819-
#endif
862+
#endif /* CONFIG_SMP */
820863

821864
/**
822865
* release_pages - batched put_page()

0 commit comments

Comments
 (0)