Matthew-Wilcox…
Commits on Apr 9, 2021
-
mm/filemap: Convert page wait queues to be folios
Reinforce that if we're waiting for a bit in a struct page, that's actually in the head page by changing the type from page to folio. Increases the size of cachefiles by two bytes, but the kernel core is unchanged in size. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/filemap: Convert wake_up_page_bit to wake_up_folio_bit
All callers have a folio, so use it directly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/filemap: Convert wait_on_page_bit to wait_on_folio_bit
We must always wait on the folio, otherwise we won't be woken up. This commit shrinks the kernel by 691 bytes, mostly due to moving the page waitqueue lookup into wait_on_folio_bit_common(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/writeback: Add wait_for_stable_folio
Move wait_for_stable_page() into the folio compatibility file. wait_for_stable_folio() avoids a call to compound_head() and is 14 bytes smaller than wait_for_stable_page() was. The net text size grows by 24 bytes as a result of this patch. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/writeback: Add wait_on_folio_writeback
wait_on_page_writeback_killable() only has one caller, so convert it to call wait_on_folio_writeback_killable(). For the wait_on_page_writeback() callers, add a compatibility wrapper around wait_on_folio_writeback(). Turning PageWriteback() into FolioWriteback() eliminates a call to compound_head() which saves 8 bytes and 15 bytes in the two functions. That is more than offset by adding the wait_on_page_writeback compatibility wrapper for a net increase in text of 15 bytes. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/filemap: Add end_folio_writeback
Add an end_page_writeback() wrapper function for users that are not yet converted to folios. end_folio_writeback() is less than half the size of end_page_writeback() at just 105 bytes compared to 213 bytes, due to removing all the compound_head() calls. The 30 byte wrapper function makes this a net saving of 70 bytes. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/filemap: Add wait_on_folio_locked
Also add wait_on_folio_locked_killable(). Turn wait_on_page_locked() and wait_on_page_locked_killable() into wrappers. This eliminates a call to compound_head() from each call-site, reducing text size by 200 bytes for me. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/filemap: Add __lock_folio_or_retry
Convert __lock_page_or_retry() to __lock_folio_or_retry(). This actually saves 4 bytes in the only caller of lock_page_or_retry() (due to better register allocation) and saves the 20 byte cost of calling page_folio() in __lock_folio_or_retry() for a total saving of 24 bytes. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/filemap: Add __lock_folio_async
There aren't any actual callers of lock_page_async(), so remove it. Convert filemap_update_page() to call __lock_folio_async(). __lock_folio_async() is 21 bytes smaller than __lock_page_async(), but the real savings come from using a folio in filemap_update_page(), shrinking it from 514 bytes to 403 bytes, saving 111 bytes. The text shrinks by 132 bytes in total. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/filemap: Add lock_folio_killable
This is like lock_page_killable() but for use by callers who know they have a folio. Convert __lock_page_killable() to be __lock_folio_killable(). This saves one call to compound_head() per contended call to lock_page_killable(). __lock_folio_killable() is 20 bytes smaller than __lock_page_killable() was. lock_page_maybe_drop_mmap() shrinks by 68 bytes and __lock_page_or_retry() shrinks by 66 bytes. That's a total of 154 bytes of text saved. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
This is like lock_page() but for use by callers who know they have a folio. Convert __lock_page() to be __lock_folio(). This saves one call to compound_head() per contended call to lock_page(). Saves 362 bytes of text; mostly from improved register allocation and inlining decisions. __lock_folio is 59 bytes while __lock_page was 79. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
Convert unlock_page() to call unlock_folio(). By using a folio we avoid a call to compound_head(). This shortens the function from 39 bytes to 25 and removes 4 instructions on x86-64. Because we still have unlock_page(), it's a net increase of 24 bytes of text for the kernel as a whole, but any path that uses unlock_folio() will execute 4 fewer instructions. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/memcg: Add folio wrappers for various functions
Add new wrapper functions folio_memcg(), lock_folio_memcg(), unlock_folio_memcg(), mem_cgroup_folio_lruvec() and count_memcg_folio_event() Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
This is the folio equivalent of page_mapcount(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/util: Add folio_mapping and folio_file_mapping
These are the folio equivalent of page_mapping() and page_file_mapping(). Add an out-of-line page_mapping() wrapper around folio_mapping() in order to prevent the page_folio() call from bloating every caller of page_mapping(). Adjust page_file_mapping() and page_mapping_file() to use folios internally. Rename __page_file_mapping() to swapcache_mapping() and change it to take a folio. This ends up saving 186 bytes of text overall. folio_mapping() is 45 bytes shorter than page_mapping() was, but the new page_mapping() wrapper is 30 bytes. The major reduction is a few bytes less in dozens of nfs functions (which call page_file_mapping()). Most of these appear to be a slight change in gcc's register allocation decisions, which allow: 48 8b 56 08 mov 0x8(%rsi),%rdx 48 8d 42 ff lea -0x1(%rdx),%rax 83 e2 01 and $0x1,%edx 48 0f 44 c6 cmove %rsi,%rax to become: 48 8b 46 08 mov 0x8(%rsi),%rax 48 8d 78 ff lea -0x1(%rax),%rdi a8 01 test $0x1,%al 48 0f 44 fe cmove %rsi,%rdi for a reduction of a single byte. Once the NFS client is converted to use folios, this entire sequence will disappear. Also add folio_mapping() documentation. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/filemap: Add folio_offset and folio_file_offset
These are just wrappers around their page counterpart. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/filemap: Add folio_next_index
This helper returns the page index of the next folio in the file (ie the end of this folio, plus one). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/filemap: Add folio_index, folio_file_page and folio_contains
folio_index() is the equivalent of page_index() for folios. folio_file_page() is the equivalent of find_subpage(). folio_contains() is the equivalent of thp_contains(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm: Handle per-folio private data
Add folio_private() and set_folio_private() which mirror page_private() and set_page_private() -- ie folio private data is the same as page private data. The only difference is that these return a void * instead of an unsigned long, which matches the majority of users. Turn attach_page_private() into attach_folio_private() and reimplement attach_page_private() as a wrapper. No filesystem which uses page private data currently supports compound pages, so we're free to define the rules. attach_page_private() may only be called on a head page; if you want to add private data to a tail page, you can call set_page_private() directly (and shouldn't increment the page refcount! That should be done when adding private data to the head page / folio). This saves 597 bytes of text with the distro-derived config that I'm testing due to removing the calls to compound_head() in get_page() & put_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
These new functions are the folio analogues of the PageFlags functions. If CONFIG_DEBUG_VM_PGFLAGS is enabled, we check the folio is not a tail page at every invocation. Note that this will also catch the PagePoisoned case as a poisoned page has every bit set, which would include PageTail. This saves 1727 bytes of text with the distro-derived config that I'm testing due to removing a double call to compound_head() in PageSwapCache(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
If we know we have a folio, we can call get_folio() instead of get_page() and save the overhead of calling compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
If we know we have a folio, we can call put_folio() instead of put_page() and save the overhead of calling compound_head(). Also skips the devmap checks. This commit looks like it should be a no-op, but actually saves 1312 bytes of text with the distro-derived config that I'm testing. Some functions grow a little while others shrink. I presume the compiler is making different inlining decisions. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm: Add folio reference count functions
These functions mirror their page reference counterparts. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/debug: Add VM_BUG_ON_FOLIO and VM_WARN_ON_ONCE_FOLIO
These are the folio equivalents of VM_BUG_ON_PAGE and VM_WARN_ON_ONCE_PAGE. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm/vmstat: Add functions to account folio statistics
Allow page counters to be more readily modified by callers which have a folio. Name these wrappers with 'stat' instead of 'state' as requested by Linus here: https://lore.kernel.org/linux-mm/CAHk-=wj847SudR-kt+46fT3+xFFgiwpgThvm7DJWGdi4cVrbnQ@mail.gmail.com/ Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm: Add folio_pgdat and folio_zone
These are just convenience wrappers for callers with folios; pgdat and zone can be reached from tail pages as well as head pages. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jeff Layton <jlayton@kernel.org>
-
A struct folio is a new abstraction to replace the venerable struct page. A function which takes a struct folio argument declares that it will operate on the entire (possibly compound) page, not just PAGE_SIZE bytes. In return, the caller guarantees that the pointer it is passing does not point to a tail page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Jeff Layton <jlayton@kernel.org>
-
mm: Optimise nth_page for contiguous memmap
If the memmap is virtually contiguous (either because we're using a virtually mapped memmap or because we don't support a discontig memmap at all), then we can implement nth_page() by simple addition. Contrary to popular belief, the compiler is not able to optimise this itself for a vmemmap configuration. This reduces one example user (sg.c) by four instructions: struct page *page = nth_page(rsv_schp->pages[k], offset >> PAGE_SHIFT); before: 49 8b 45 70 mov 0x70(%r13),%rax 48 63 c9 movslq %ecx,%rcx 48 c1 eb 0c shr $0xc,%rbx 48 8b 04 c8 mov (%rax,%rcx,8),%rax 48 2b 05 00 00 00 00 sub 0x0(%rip),%rax R_X86_64_PC32 vmemmap_base-0x4 48 c1 f8 06 sar $0x6,%rax 48 01 d8 add %rbx,%rax 48 c1 e0 06 shl $0x6,%rax 48 03 05 00 00 00 00 add 0x0(%rip),%rax R_X86_64_PC32 vmemmap_base-0x4 after: 49 8b 45 70 mov 0x70(%r13),%rax 48 63 c9 movslq %ecx,%rcx 48 c1 eb 0c shr $0xc,%rbx 48 c1 e3 06 shl $0x6,%rbx 48 03 1c c8 add (%rax,%rcx,8),%rbx Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> -
Add linux-next specific files for 20210409
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
-
-
memfd_secret: use unsigned int rather than long as syscall flags type
Yuri Norov says: If parameter size is the same for native and compat ABIs, we may wire a syscall made by compat client to native handler. This is true for unsigned int, but not true for unsigned long or pointer. That's why I suggest using unsigned int and so avoid creating compat entry point. Use unsigned int as the type of the flags parameter in memfd_secret() system call. Link: https://lkml.kernel.org/r/20210331142345.27532-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
-
secretmem: test: add basic selftest for memfd_secret(2)
The test verifies that file descriptor created with memfd_secret does not allow read/write operations, that secret memory mappings respect RLIMIT_MEMLOCK and that remote accesses with process_vm_read() and ptrace() to the secret memory fail. Link: https://lkml.kernel.org/r/20210303162209.8609-10-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Bottomley <jejb@linux.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tycho Andersen <tycho@tycho.ws> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
-
memfd_secret: use unsigned int rather than long as syscall flags type
Yuri Norov says: If parameter size is the same for native and compat ABIs, we may wire a syscall made by compat client to native handler. This is true for unsigned int, but not true for unsigned long or pointer. That's why I suggest using unsigned int and so avoid creating compat entry point. Use unsigned int as the type of the flags parameter in memfd_secret() system call. Link: https://lkml.kernel.org/r/20210331142345.27532-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
-
arch, mm: wire up memfd_secret system call where relevant
Wire up memfd_secret system call on architectures that define ARCH_HAS_SET_DIRECT_MAP, namely arm64, risc-v and x86. Link: https://lkml.kernel.org/r/20210303162209.8609-9-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christopher Lameter <cl@linux.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Bottomley <jejb@linux.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tycho Andersen <tycho@tycho.ws> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
-
PM: hibernate: disable when there are active secretmem users
It is unsafe to allow saving of secretmem areas to the hibernation snapshot as they would be visible after the resume and this essentially will defeat the purpose of secret memory mappings. Prevent hibernation whenever there are active secret memory users. Link: https://lkml.kernel.org/r/20210303162209.8609-8-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Bottomley <jejb@linux.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tycho Andersen <tycho@tycho.ws> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>