Skip to content

Commit 1da190f

Browse files
davidhildenbrandakpm00
authored andcommitted
mm: Copy-on-Write (COW) reuse support for PTE-mapped THP
Currently, we never end up reusing PTE-mapped THPs after fork. This wasn't really a problem with PMD-sized THPs, because they would have to be PTE-mapped first, but it's getting a problem with smaller THP sizes that are effectively always PTE-mapped. With our new "mapped exclusively" vs "maybe mapped shared" logic for large folios, implementing CoW reuse for PTE-mapped THPs is straight forward: if exclusively mapped, make sure that all references are from these (our) mappings. Add some helpful comments to explain the details. CONFIG_TRANSPARENT_HUGEPAGE selects CONFIG_MM_ID. If we spot an anon large folio without CONFIG_TRANSPARENT_HUGEPAGE in that code, something is seriously messed up. There are plenty of things we can optimize in the future: For example, we could remember that the folio is fully exclusive so we could speedup the next fault further. Also, we could try "faulting around", turning surrounding PTEs that map the same folio writable. But especially the latter might increase COW latency, so it would need further investigation. Link: https://lkml.kernel.org/r/20250303163014.1128035-14-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Andy Lutomirks^H^Hski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Michal Koutn <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: tejun heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent 6af8cb8 commit 1da190f

File tree

1 file changed

+75
-8
lines changed

1 file changed

+75
-8
lines changed

mm/memory.c

Lines changed: 75 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3727,19 +3727,86 @@ static vm_fault_t wp_page_shared(struct vm_fault *vmf, struct folio *folio)
37273727
return ret;
37283728
}
37293729

3730-
static bool wp_can_reuse_anon_folio(struct folio *folio,
3731-
struct vm_area_struct *vma)
3730+
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
3731+
static bool __wp_can_reuse_large_anon_folio(struct folio *folio,
3732+
struct vm_area_struct *vma)
37323733
{
3734+
bool exclusive = false;
3735+
3736+
/* Let's just free up a large folio if only a single page is mapped. */
3737+
if (folio_large_mapcount(folio) <= 1)
3738+
return false;
3739+
37333740
/*
3734-
* We could currently only reuse a subpage of a large folio if no
3735-
* other subpages of the large folios are still mapped. However,
3736-
* let's just consistently not reuse subpages even if we could
3737-
* reuse in that scenario, and give back a large folio a bit
3738-
* sooner.
3741+
* The assumption for anonymous folios is that each page can only get
3742+
* mapped once into each MM. The only exception are KSM folios, which
3743+
* are always small.
3744+
*
3745+
* Each taken mapcount must be paired with exactly one taken reference,
3746+
* whereby the refcount must be incremented before the mapcount when
3747+
* mapping a page, and the refcount must be decremented after the
3748+
* mapcount when unmapping a page.
3749+
*
3750+
* If all folio references are from mappings, and all mappings are in
3751+
* the page tables of this MM, then this folio is exclusive to this MM.
37393752
*/
3740-
if (folio_test_large(folio))
3753+
if (folio_test_large_maybe_mapped_shared(folio))
3754+
return false;
3755+
3756+
VM_WARN_ON_ONCE(folio_test_ksm(folio));
3757+
VM_WARN_ON_ONCE(folio_mapcount(folio) > folio_nr_pages(folio));
3758+
VM_WARN_ON_ONCE(folio_entire_mapcount(folio));
3759+
3760+
if (unlikely(folio_test_swapcache(folio))) {
3761+
/*
3762+
* Note: freeing up the swapcache will fail if some PTEs are
3763+
* still swap entries.
3764+
*/
3765+
if (!folio_trylock(folio))
3766+
return false;
3767+
folio_free_swap(folio);
3768+
folio_unlock(folio);
3769+
}
3770+
3771+
if (folio_large_mapcount(folio) != folio_ref_count(folio))
37413772
return false;
37423773

3774+
/* Stabilize the mapcount vs. refcount and recheck. */
3775+
folio_lock_large_mapcount(folio);
3776+
VM_WARN_ON_ONCE(folio_large_mapcount(folio) < folio_ref_count(folio));
3777+
3778+
if (folio_test_large_maybe_mapped_shared(folio))
3779+
goto unlock;
3780+
if (folio_large_mapcount(folio) != folio_ref_count(folio))
3781+
goto unlock;
3782+
3783+
VM_WARN_ON_ONCE(folio_mm_id(folio, 0) != vma->vm_mm->mm_id &&
3784+
folio_mm_id(folio, 1) != vma->vm_mm->mm_id);
3785+
3786+
/*
3787+
* Do we need the folio lock? Likely not. If there would have been
3788+
* references from page migration/swapout, we would have detected
3789+
* an additional folio reference and never ended up here.
3790+
*/
3791+
exclusive = true;
3792+
unlock:
3793+
folio_unlock_large_mapcount(folio);
3794+
return exclusive;
3795+
}
3796+
#else /* !CONFIG_TRANSPARENT_HUGEPAGE */
3797+
static bool __wp_can_reuse_large_anon_folio(struct folio *folio,
3798+
struct vm_area_struct *vma)
3799+
{
3800+
BUILD_BUG();
3801+
}
3802+
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
3803+
3804+
static bool wp_can_reuse_anon_folio(struct folio *folio,
3805+
struct vm_area_struct *vma)
3806+
{
3807+
if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && folio_test_large(folio))
3808+
return __wp_can_reuse_large_anon_folio(folio, vma);
3809+
37433810
/*
37443811
* We have to verify under folio lock: these early checks are
37453812
* just an optimization to avoid locking the folio and freeing

0 commit comments

Comments
 (0)