Skip to content
Permalink
Yonghong-Song/…
Switch branches/tags

Commits on Mar 19, 2021

  1. bpf: fix bpf_cgroup_storage_set() usage in test_run

    In bpf_test_run(), check the return value of bpf_cgroup_storage_set()
    and do bpf_cgroup_storate_unset() properly.
    
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Roman Gushchin <guro@fb.com>
    Signed-off-by: Yonghong Song <yhs@fb.com>
    yonghong-song authored and intel-lab-lkp committed Mar 19, 2021
  2. bpf: fix NULL pointer dereference in bpf_get_local_storage() helper

    Jiri Olsa reported a bug ([1]) in kernel where cgroup local
    storage pointer may be NULL in bpf_get_local_storage() helper.
    There are two issues uncovered by this bug:
      (1). kprobe or tracepoint prog incorrectly sets cgroup local storage
           before prog run,
      (2). due to change from preempt_disable to migrate_disable,
           preemption is possible and percpu storage might be overwritten
           by other tasks.
    
    This issue (1) is fixed in [2]. This patch tried to address issue (2).
    The following shows how things can go wrong:
      task 1:   bpf_cgroup_storage_set() for percpu local storage
             preemption happens
      task 2:   bpf_cgroup_storage_set() for percpu local storage
             preemption happens
      task 1:   run bpf program
    
    task 1 will effectively use the percpu local storage setting by task 2
    which will be either NULL or incorrect ones.
    
    Instead of just one common local storage per cpu, this patch fixed
    the issue by permitting 8 local storages per cpu and each local
    storage is identified by a task_struct pointer. This way, we
    allow at most 8 nested preemption between bpf_cgroup_storage_set()
    and bpf_cgroup_storage_unset(). The percpu local storage slot
    is released (calling bpf_cgroup_storage_unset()) by the same task
    after bpf program finished running.
    
    The patch is tested on top of [2] with reproducer in [1].
    Without this patch, kernel will emit error in 2-3 minutes.
    With this patch, after one hour, still no error.
    
     [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T
     [2] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T
    
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Roman Gushchin <guro@fb.com>
    Signed-off-by: Yonghong Song <yhs@fb.com>
    yonghong-song authored and intel-lab-lkp committed Mar 19, 2021

Commits on Mar 18, 2021

  1. Add linux-next specific files for 20210318

    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    sfrothwell committed Mar 18, 2021
  2. hack to make SPARC32 build

    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    sfrothwell committed Mar 18, 2021
  3. Merge branch 'akpm/master'

    sfrothwell committed Mar 18, 2021
  4. secretmem: test: add basic selftest for memfd_secret(2)

    The test verifies that file descriptor created with memfd_secret does not
    allow read/write operations, that secret memory mappings respect
    RLIMIT_MEMLOCK and that remote accesses with process_vm_read() and
    ptrace() to the secret memory fail.
    
    Link: https://lkml.kernel.org/r/20210303162209.8609-10-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Mar 18, 2021
  5. arch, mm: wire up memfd_secret system call where relevant

    Wire up memfd_secret system call on architectures that define
    ARCH_HAS_SET_DIRECT_MAP, namely arm64, risc-v and x86.
    
    Link: https://lkml.kernel.org/r/20210303162209.8609-9-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
    Acked-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Mar 18, 2021
  6. PM: hibernate: disable when there are active secretmem users

    It is unsafe to allow saving of secretmem areas to the hibernation
    snapshot as they would be visible after the resume and this essentially
    will defeat the purpose of secret memory mappings.
    
    Prevent hibernation whenever there are active secret memory users.
    
    Link: https://lkml.kernel.org/r/20210303162209.8609-8-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Mar 18, 2021
  7. mm: introduce memfd_secret system call to create "secret" memory areas

    Introduce "memfd_secret" system call with the ability to create memory
    areas visible only in the context of the owning process and not mapped not
    only to other processes but in the kernel page tables as well.
    
    The secretmem feature is off by default and the user must explicitly
    enable it at the boot time.
    
    Once secretmem is enabled, the user will be able to create a file
    descriptor using the memfd_secret() system call.  The memory areas created
    by mmap() calls from this file descriptor will be unmapped from the kernel
    direct map and they will be only mapped in the page table of the processes
    that have access to the file descriptor.
    
    The file descriptor based memory has several advantages over the
    "traditional" mm interfaces, such as mlock(), mprotect(), madvise().  File
    descriptor approach allows explict and controlled sharing of the memory
    areas, it allows to seal the operations.  Besides, file descriptor based
    memory paves the way for VMMs to remove the secret memory range from the
    userpace hipervisor process, for instance QEMU.  Andy Lutomirski says:
    
      "Getting fd-backed memory into a guest will take some possibly major
      work in the kernel, but getting vma-backed memory into a guest without
      mapping it in the host user address space seems much, much worse."
    
    memfd_secret() is made a dedicated system call rather than an extention to
    memfd_create() because it's purpose is to allow the user to create more
    secure memory mappings rather than to simply allow file based access to
    the memory.  Nowadays a new system call cost is negligible while it is way
    simpler for userspace to deal with a clear-cut system calls than with a
    multiplexer or an overloaded syscall.  Moreover, the initial
    implementation of memfd_secret() is completely distinct from
    memfd_create() so there is no much sense in overloading memfd_create()
    to begin with.  If there will be a need for code sharing between these
    implementation it can be easily achieved without a need to adjust user
    visible APIs.
    
    The secret memory remains accessible in the process context using uaccess
    primitives, but it is not exposed to the kernel otherwise; secret memory
    areas are removed from the direct map and functions in the
    follow_page()/get_user_page() family will refuse to return a page that
    belongs to the secret memory area.
    
    Once there will be a use case that will require exposing secretmem to the
    kernel it will be an opt-in request in the system call flags so that user
    would have to decide what data can be exposed to the kernel.
    
    Removing of the pages from the direct map may cause its fragmentation on
    architectures that use large pages to map the physical memory which
    affects the system performance.  However, the original Kconfig text for
    CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "...  can
    improve the kernel's performance a tiny bit ..." (commit 00d1c5e
    ("x86: add gbpages switches")) and the recent report [1] showed that "...
    although 1G mappings are a good default choice, there is no compelling
    evidence that it must be the only choice".  Hence, it is sufficient to
    have secretmem disabled by default with the ability of a system
    administrator to enable it at boot time.
    
    Pages in the secretmem regions are unevictable and unmovable to avoid
    accidental exposure of the sensitive data via swap or during page
    migration.
    
    Since the secretmem mappings are locked in memory they cannot exceed
    RLIMIT_MEMLOCK.  Since these mappings are already locked independently
    from mlock(), an attempt to mlock()/munlock() secretmem range would fail
    and mlockall()/munlockall() will ignore secretmem mappings.
    
    However, unlike mlock()ed memory, secretmem currently behaves more like
    long-term GUP: secretmem mappings are unmovable mappings directly consumed
    by user space.  With default limits, there is no excessive use of
    secretmem and it poses no real problem in combination with
    ZONE_MOVABLE/CMA, but in the future this should be addressed to allow
    balanced use of large amounts of secretmem along with ZONE_MOVABLE/CMA.
    
    A page that was a part of the secret memory area is cleared when it is
    freed to ensure the data is not exposed to the next user of that page.
    
    The following example demonstrates creation of a secret mapping (error
    handling is omitted):
    
    	fd = memfd_secret(0);
    	ftruncate(fd, MAP_SIZE);
    	ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
    		   MAP_SHARED, fd, 0);
    
    [1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/
    Link: https://lkml.kernel.org/r/20210303162209.8609-7-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Mar 18, 2021
  8. set_memory: allow querying whether set_direct_map_*() is actually ena…

    …bled
    
    On arm64, set_direct_map_*() functions may return 0 without actually
    changing the linear map.  This behaviour can be controlled using kernel
    parameters, so we need a way to determine at runtime whether calls to
    set_direct_map_invalid_noflush() and set_direct_map_default_noflush() have
    any effect.
    
    Extend set_memory API with can_set_direct_map() function that allows
    checking if calling set_direct_map_*() will actually change the page
    table, replace several occurrences of open coded checks in arm64 with the
    new function and provide a generic stub for architectures that always
    modify page tables upon calls to set_direct_map APIs.
    
    [arnd@arndb.de: arm64: kfence: fix header inclusion ]
    Link: https://lkml.kernel.org/r/20210303162209.8609-6-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Mar 18, 2021
  9. set_memory: allow set_direct_map_*_noflush() for multiple pages

    The underlying implementations of set_direct_map_invalid_noflush() and
    set_direct_map_default_noflush() allow updating multiple contiguous pages
    at once.
    
    Add numpages parameter to set_direct_map_*_noflush() to expose this
    ability with these APIs.
    
    Link: https://lkml.kernel.org/r/20210303162209.8609-5-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Mar 18, 2021
  10. riscv/Kconfig: make direct map manipulation options depend on MMU

    ARCH_HAS_SET_DIRECT_MAP and ARCH_HAS_SET_MEMORY configuration options have
    no meaning when CONFIG_MMU is disabled and there is no point to enable
    them for the nommu case.
    
    Add an explicit dependency on MMU for these options.
    
    Link: https://lkml.kernel.org/r/20210303162209.8609-4-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Mar 18, 2021
  11. mmap: make mlock_future_check() global

    It will be used by the upcoming secret memory implementation.
    
    Link: https://lkml.kernel.org/r/20210303162209.8609-3-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Mar 18, 2021
  12. mm: add definition of PMD_PAGE_ORDER

    Patch series "mm: introduce memfd_secret system call to create "secret" memory areas", v18.
    
    This is an implementation of "secret" mappings backed by a file
    descriptor.
    
    The file descriptor backing secret memory mappings is created using a
    dedicated memfd_secret system call The desired protection mode for the
    memory is configured using flags parameter of the system call.  The mmap()
    of the file descriptor created with memfd_secret() will create a "secret"
    memory mapping.  The pages in that mapping will be marked as not present
    in the direct map and will be present only in the page table of the owning
    mm.
    
    Although normally Linux userspace mappings are protected from other users,
    such secret mappings are useful for environments where a hostile tenant is
    trying to trick the kernel into giving them access to other tenants
    mappings.
    
    Additionally, in the future the secret mappings may be used as a mean to
    protect guest memory in a virtual machine host.
    
    For demonstration of secret memory usage we've created a userspace library
    
    https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloader.git
    
    that does two things: the first is act as a preloader for openssl to
    redirect all the OPENSSL_malloc calls to secret memory meaning any secret
    keys get automatically protected this way and the other thing it does is
    expose the API to the user who needs it.  We anticipate that a lot of the
    use cases would be like the openssl one: many toolkits that deal with
    secret keys already have special handling for the memory to try to give
    them greater protection, so this would simply be pluggable into the
    toolkits without any need for user application modification.
    
    Hiding secret memory mappings behind an anonymous file allows usage of the
    page cache for tracking pages allocated for the "secret" mappings as well
    as using address_space_operations for e.g.  page migration callbacks.
    
    The anonymous file may be also used implicitly, like hugetlb files, to
    implement mmap(MAP_SECRET) and use the secret memory areas with "native"
    mm ABIs in the future.
    
    Removing of the pages from the direct map may cause its fragmentation on
    architectures that use large pages to map the physical memory which
    affects the system performance.  However, the original Kconfig text for
    CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "...  can
    improve the kernel's performance a tiny bit ..." (commit 00d1c5e
    ("x86: add gbpages switches")) and the recent report [1] showed that "...
    although 1G mappings are a good default choice, there is no compelling
    evidence that it must be the only choice".  Hence, it is sufficient to
    have secretmem disabled by default with the ability of a system
    administrator to enable it at boot time.
    
    In addition, there is also a long term goal to improve management of the
    direct map.
    
    [1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/
    
    This patch (of 9):
    
    The definition of PMD_PAGE_ORDER denoting the number of base pages in the
    second-level leaf page is already used by DAX and maybe handy in other
    cases as well.
    
    Several architectures already have definition of PMD_ORDER as the size of
    second level page table, so to avoid conflict with these definitions use
    PMD_PAGE_ORDER name and update DAX respectively.
    
    Link: https://lkml.kernel.org/r/20210303162209.8609-1-rppt@kernel.org
    Link: https://lkml.kernel.org/r/20210303162209.8609-2-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Mar 18, 2021
  13. modules: add CONFIG_MODPROBE_PATH

    Allow the developer to specifiy the initial value of the modprobe_path[]
    string.  This can be used to set it to the empty string initially, thus
    effectively disabling request_module() during early boot until userspace
    writes a new value via the /proc/sys/kernel/modprobe interface.  [1]
    
    When building a custom kernel (often for an embedded target), it's normal
    to build everything into the kernel that is needed for booting, and indeed
    the initramfs often contains no modules at all, so every such
    request_module() done before userspace init has mounted the real rootfs is
    a waste of time.
    
    This is particularly useful when combined with the previous patch, which
    made the initramfs unpacking asynchronous - for that to work, it had to
    make any usermodehelper call wait for the unpacking to finish before
    attempting to invoke the userspace helper.  By eliminating all such
    (known-to-be-futile) calls of usermodehelper, the initramfs unpacking and
    the {device,late}_initcalls can proceed in parallel for much longer.
    
    For a relatively slow ppc board I'm working on, the two patches combined
    lead to 0.2s faster boot - but more importantly, the fact that the
    initramfs unpacking proceeds completely in the background while devices
    get probed means I get to handle the gpio watchdog in time without getting
    reset.
    
    [1] __request_module() already has an early -ENOENT return when
    modprobe_path is the empty string.
    
    Link: https://lkml.kernel.org/r/20210313212528.2956377-3-linux@rasmusvillemoes.dk
    Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Acked-by: Jessica Yu <jeyu@kernel.org>
    Acked-by: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Villemoes authored and sfrothwell committed Mar 18, 2021
  14. init/initramfs.c: do unpacking asynchronously

    Patch series "background initramfs unpacking, and CONFIG_MODPROBE_PATH", v3.
    
    These two patches are independent, but better-together.
    
    The second is a rather trivial patch that simply allows the developer to
    change "/sbin/modprobe" to something else - e.g.  the empty string, so
    that all request_module() during early boot return -ENOENT early, without
    even spawning a usermode helper, needlessly synchronizing with the
    initramfs unpacking.
    
    The first patch delegates decompressing the initramfs to a worker thread,
    allowing do_initcalls() in main.c to proceed to the device_ and late_
    initcalls without waiting for that decompression (and populating of
    rootfs) to finish.  Obviously, some of those later calls may rely on the
    initramfs being available, so I've added synchronization points in the
    firmware loader and usermodehelper paths - there might be other places
    that would need this, but so far no one has been able to think of any
    places I have missed.
    
    There's not much to win if most of the functionality needed during boot is
    only available as modules.  But systems with a custom-made .config and
    initramfs can boot faster, partly due to utilizing more than one cpu
    earlier, partly by avoiding known-futile modprobe calls (which would still
    trigger synchronization with the initramfs unpacking, thus eliminating
    most of the first benefit).
    
    This patch (of 2):
    
    Most of the boot process doesn't actually need anything from the
    initramfs, until of course PID1 is to be executed.  So instead of doing
    the decompressing and populating of the initramfs synchronously in
    populate_rootfs() itself, push that off to a worker thread.
    
    This is primarily motivated by an embedded ppc target, where unpacking
    even the rather modest sized initramfs takes 0.6 seconds, which is long
    enough that the external watchdog becomes unhappy that it doesn't get
    attention soon enough.  By doing the initramfs decompression in a worker
    thread, we get to do the device_initcalls and hence start petting the
    watchdog much sooner.
    
    Normal desktops might benefit as well.  On my mostly stock Ubuntu kernel,
    my initramfs is a 26M xz-compressed blob, decompressing to around 126M.
    That takes almost two seconds:
    
    [    0.201454] Trying to unpack rootfs image as initramfs...
    [    1.976633] Freeing initrd memory: 29416K
    
    Before this patch, these lines occur consecutively in dmesg.  With this
    patch, the timestamps on these two lines is roughly the same as above, but
    with 172 lines inbetween - so more than one cpu has been kept busy doing
    work that would otherwise only happen after the populate_rootfs()
    finished.
    
    Should one of the initcalls done after rootfs_initcall time (i.e., device_
    and late_ initcalls) need something from the initramfs (say, a kernel
    module or a firmware blob), it will simply wait for the initramfs
    unpacking to be done before proceeding, which should in theory make this
    completely safe.
    
    But if some driver pokes around in the filesystem directly and not via one
    of the official kernel interfaces (i.e.  request_firmware*(),
    call_usermodehelper*) that theory may not hold - also, I certainly might
    have missed a spot when sprinkling wait_for_initramfs().  So there is an
    escape hatch in the form of an initramfs_async= command line parameter.
    
    Link: https://lkml.kernel.org/r/20210313212528.2956377-1-linux@rasmusvillemoes.dk
    Link: https://lkml.kernel.org/r/20210313212528.2956377-2-linux@rasmusvillemoes.dk
    Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Villemoes authored and sfrothwell committed Mar 18, 2021
  15. kernel/async.c: remove async_unregister_domain()

    No callers in the tree.
    
    Link: https://lkml.kernel.org/r/20210309151723.1907838-2-linux@rasmusvillemoes.dk
    Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Tejun Heo <tj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Villemoes authored and sfrothwell committed Mar 18, 2021
  16. kernel/async.c: stop guarding pr_debug() statements

    It's currently nigh impossible to get these pr_debug()s to print
    something.  Being guarded by initcall_debug means one has to enable tons
    of other debug output during boot, and the system_state condition further
    means it's impossible to get them when loading modules later.
    
    Also, the compiler can't know that these global conditions do not change,
    so there are W=2 warnings
    
    kernel/async.c:125:9: warning: `calltime' may be used uninitialized in this function [-Wmaybe-uninitialized]
    kernel/async.c:300:9: warning: `starttime' may be used uninitialized in this function [-Wmaybe-uninitialized]
    
    Make it possible, for a DYNAMIC_DEBUG kernel, to get these to print their
    messages by booting with appropriate 'dyndbg="file async.c +p"' command
    line argument.  For a non-DYNAMIC_DEBUG kernel, pr_debug() compiles to
    nothing.
    
    This does cost doing an unconditional ktime_get() for the starttime value,
    but the corresponding ktime_get for the end time can be elided by
    factoring it into a function which only gets called if the printk()
    arguments end up being evaluated.
    
    Link: https://lkml.kernel.org/r/20210309151723.1907838-1-linux@rasmusvillemoes.dk
    Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Tejun Heo <tj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Villemoes authored and sfrothwell committed Mar 18, 2021
  17. selftests: remove duplicate include

    'assert.h' included in 'sparsebit.c' is duplicated.
    It is also included in the 161th line.
    'string.h' included in 'mincore_selftest.c' is duplicated.
    It is also included in the 15th line.
    'sched.h' included in 'tlbie_test.c' is duplicated.
    It is also included in the 33th line.
    
    Link: https://lkml.kernel.org/r/20210316073336.426255-1-zhang.yunkai@zte.com.cn
    Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Zhang Yunkai authored and sfrothwell committed Mar 18, 2021
  18. scripts/gdb: add lx_current support for arm64

    arm64 uses SP_EL0 to save the current task_struct address.  While running
    in EL0, SP_EL0 is clobbered by userspace.  So if the upper bit is not 1
    (not TTBR1), the current address is invalid.  This patch checks the upper
    bit of SP_EL0, if the upper bit is 1, lx_current() of arm64 will return
    the derefrence of current task.  Otherwise, lx_current() will tell users
    they are running in userspace(EL0).
    
    While arm64 is running in EL0, it is actually pointless to print current
    task as the memory of kernel space is not accessible in EL0.
    
    Link: https://lkml.kernel.org/r/20210314203444.15188-3-song.bao.hua@hisilicon.com
    Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
    Cc: Jan Kiszka <jan.kiszka@siemens.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Kieran Bingham <kbingham@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Barry Song authored and sfrothwell committed Mar 18, 2021
  19. scripts/gdb: document lx_current is only supported by x86

    Patch series "scripts/gdb: clarify the platforms supporting lx_current and add arm64 support", v2.
    
    lx_current depends on per_cpu current_task variable which exists on x86
    only.  so it actually works on x86 only.  the 1st patch documents this
    clearly; the 2nd patch adds support for arm64.
    
    This patch (of 2):
    
    x86 is the only architecture which has per_cpu current_task:
    arch$ git grep current_task | grep -i per_cpu
    x86/include/asm/current.h:DECLARE_PER_CPU(struct task_struct *, current_task);
    x86/kernel/cpu/common.c:DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
    x86/kernel/cpu/common.c:EXPORT_PER_CPU_SYMBOL(current_task);
    x86/kernel/cpu/common.c:DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
    x86/kernel/cpu/common.c:EXPORT_PER_CPU_SYMBOL(current_task);
    x86/kernel/smpboot.c:	per_cpu(current_task, cpu) = idle;
    
    On other architectures, lx_current() will lead to a python exception:
    (gdb) p $lx_current().pid
    Python Exception <class 'gdb.error'> No symbol "current_task" in current context.:
    Error occurred in Python: No symbol "current_task" in current context.
    
    To avoid more people struggling and wasting time in other architectures,
    document it.
    
    Link: https://lkml.kernel.org/r/20210314203444.15188-1-song.bao.hua@hisilicon.com
    Link: https://lkml.kernel.org/r/20210314203444.15188-2-song.bao.hua@hisilicon.com
    Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
    Cc: Jan Kiszka <jan.kiszka@siemens.com>
    Cc: Kieran Bingham <kbingham@kernel.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Barry Song authored and sfrothwell committed Mar 18, 2021
  20. gdb: lx-symbols: store the abspath()

    If we store the relative path, the user might later cd to a different
    directory, and that would break the automatic symbol resolving that
    happens when a module is loaded into the target kernel.  Fix this by
    storing the abspath() of each path given, just like we already do for the
    cwd (os.getcwd() is absolute.)
    
    Link: https://lkml.kernel.org/r/20201217091747.bf4332cf2b35.I10ebbdb7e9b80ab1a5cddebf53d073be8232d656@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
    Cc: Kieran Bingham <kbingham@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    jmberg-intel authored and sfrothwell committed Mar 18, 2021
  21. aio: simplify read_events()

    Change wait_event_hrtimeout() to not call __wait_event_hrtimeout() if
    timeout == 0, this matches other _timeout() helpers in wait.h.
    
    This allows to simplify its only user, read_events(), it no longer needs
    to optimize the "until == 0" case by hand.
    
    Note: this patch doesn't use ___wait_cond_timeout because _hrtimeout()
    also differs in that it returns 0 if succeeds and -ETIME on timeout.
    Perhaps we should change this to make it fully compatible with other
    helpers.
    
    Link: http://lkml.kernel.org/r/20190607175413.GA29187@redhat.com
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Benjamin LaHaise <bcrl@kvack.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: David Laight <David.Laight@ACULAB.COM>
    Cc: Deepa Dinamani <deepa.kernel@gmail.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Eric Wong <e@80x24.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    utrace authored and sfrothwell committed Mar 18, 2021
  22. gcov: use kvmalloc()

    Using vmalloc() in gcov is really quite wasteful, many of the objects
    allocated are really small (e.g.  I've seen 24 bytes.) Use kvmalloc() to
    automatically pick the better of kmalloc() or vmalloc() depending on the
    size.
    
    Link: https://lkml.kernel.org/r/20210315235453.799e7a9d627d.I741d0db096c6f312910f7f1bcdfde0fda20801a4@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    jmberg-intel authored and sfrothwell committed Mar 18, 2021
  23. gcov: simplify buffer allocation

    Use just a single vmalloc() with struct_size() instead of a separate
    kmalloc() for the iter struct.
    
    Link: https://lkml.kernel.org/r/20210315235453.b6de4a92096e.Iac40a5166589cefbff8449e466bd1b38ea7a17af@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    jmberg-intel authored and sfrothwell committed Mar 18, 2021
  24. gcov: combine common code

    There's a lot of duplicated code between gcc and clang implementations,
    move it over to fs.c to simplify the code, there's no reason to believe
    that for small data like this one would not just implement the simple
    convert_to_gcda() function.
    
    Link: https://lkml.kernel.org/r/20210315235453.e3fbb86e99a0.I08a3ee6dbe47ea3e8024956083f162884a958e40@changeid
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    jmberg-intel authored and sfrothwell committed Mar 18, 2021
  25. gcov: clang: drop support for clang-10 and older

    LLVM changed the expected function signatures for llvm_gcda_start_file()
    and llvm_gcda_emit_function() in the clang-11 release.  Drop the older
    implementations and require folks to upgrade their compiler if they're
    interested in GCOV support.
    
    Link: https://reviews.llvm.org/rGcdd683b516d147925212724b09ec6fb792a40041
    Link: https://reviews.llvm.org/rG13a633b438b6500ecad9e4f936ebadf3411d0f44
    Link: https://lkml.kernel.org/r/20210312224132.3413602-3-ndesaulniers@google.com
    Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
    Suggested-by: Nathan Chancellor <nathan@kernel.org>
    Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
    Reviewed-by: Nathan Chancellor <nathan@kernel.org>
    Cc: Fangrui Song <maskray@google.com>
    Cc: Prasad Sodagudi <psodagud@quicinc.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    nickdesaulniers authored and sfrothwell committed Mar 18, 2021
  26. kernel: kexec_file: fix error return code of kexec_calculate_store_di…

    …gests()
    
    When vzalloc() returns NULL to sha_regions, no error return code of
    kexec_calculate_store_digests() is assigned.  To fix this bug, ret is
    assigned with -ENOMEM in this case.
    
    Link: https://lkml.kernel.org/r/20210309083904.24321-1-baijiaju1990@gmail.com
    Fixes: a43cac0 ("kexec: split kexec_file syscall code to kexec_file.c")
    Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
    Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
    Acked-by: Baoquan He <bhe@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    XidianGeneral authored and sfrothwell committed Mar 18, 2021
  27. kexec: Add kexec reboot string

    The purpose is to notify the kernel module for fast reboot.
    
    Upstream a patch from the SONiC network operating system [1].
    
    [1]: sonic-net/sonic-linux-kernel#46
    
    Link: https://lkml.kernel.org/r/20210304124626.13927-1-pmenzel@molgen.mpg.de
    Signed-off-by: Joe LeVeque <jolevequ@microsoft.com>
    Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Guohan Lu <lguohan@gmail.com>
    Cc: Joe LeVeque <jolevequ@microsoft.com>
    Cc: Paul Menzel <pmenzel@molgen.mpg.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    jleveque authored and sfrothwell committed Mar 18, 2021
  28. kernel/crash_core: add crashkernel=auto for vmcore creation

    This adds crashkernel=auto feature to configure reserved memory for vmcore
    creation.  CONFIG_CRASH_AUTO_STR is defined to be set for different kernel
    distributions and different archs based on their needs.
    
    Link: https://lkml.kernel.org/r/20210223174153.72802-1-saeed.mirzamohammadi@oracle.com
    Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
    Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
    Tested-by: John Donnelly <john.p.donnelly@oracle.com>
    ed-by: Dave Young <dyoung@redhat.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: "Paul E. McKenney" <paulmck@kernel.org>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: "Guilherme G. Piccoli" <gpiccoli@canonical.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
    Cc: YiFei Zhu <yifeifz2@illinois.edu>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Mike Rapoport <rppt@kernel.org>
    Cc: Masahiro Yamada <masahiroy@kernel.org>
    Cc: Sami Tolvanen <samitolvanen@google.com>
    Cc: Frederic Weisbecker <frederic@kernel.org>
    Cc: Christian Brauner <christian.brauner@ubuntu.com>
    Cc: Stephen Boyd <sboyd@kernel.org>
    Cc: Andrey Konovalov <andreyknvl@google.com>
    Cc: Colin Ian King <colin.king@canonical.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Saeed Mirzamohammadi authored and sfrothwell committed Mar 18, 2021
  29. kernel/fork.c: fix typos

    change 'ancestoral' to 'ancestral'
    change 'reuseable' to 'reusable'
    delete 'do' grammatically
    
    Link: https://lkml.kernel.org/r/20210317082031.11692-1-caoxiaofeng@yulong.com
    Signed-off-by: Xiaofeng Cao <caoxiaofeng@yulong.com>
    Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    CaoXiaofengGH authored and sfrothwell committed Mar 18, 2021
  30. kernel/fork.c: simplify copy_mm()

    All this can happen without a single goto.
    
    Link: https://lkml.kernel.org/r/2072685.XptgVkyDqn@devpool47
    Signed-off-by: Rolf Eike Beer <eb@emlix.com>
    Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    DerDakon authored and sfrothwell committed Mar 18, 2021
  31. do_wait: make PIDTYPE_PID case O(1) instead of O(n)

    Add a special-case when waiting on a pid (via waitpid, waitid, wait4, etc)
    to avoid doing an O(n) scan of children and tracees, and instead do an
    O(1) lookup.  This improves performance when waiting on a pid from a
    thread group with many children and/or tracees.
    
    Time to fork and then call waitpid on the child, from a task that already
    has N children [1]:
    
    N    | Before  | After
    -----|---------|------
    1    | 74 us   | 74 us
    20   | 72 us   | 75 us
    100  | 83 us   | 77 us
    500  | 99 us   | 74 us
    1000 | 179 us  | 75 us
    5000 | 804 us  | 79 us
    8000 | 1268 us | 78 us
    
    [1]: https://lkml.org/lkml/2021/3/12/1567
    
    This can make a substantial performance improvement for applications with
    a thread that has many children or tracees and frequently needs to wait on
    them.  Tools that use ptrace to intercept syscalls for a large number of
    processes are likely to fall into this category.  In particular this patch
    was developed while building a ptrace-based second generation of the
    Shadow emulator [2], for which it allows us to avoid quadratic scaling
    (without having to use a workaround that introduces a ~40% performance
    penalty) [3].  Other examples of tools that fall into this category which
    this patch may help include User Mode Linux [4] and DetTrace [5].
    
    [2]: https://shadow.github.io/
    [3]: shadow/shadow#1134 (comment)
    [4]: https://en.wikipedia.org/wiki/User-mode_Linux
    [5]: https://github.com/dettrace/dettrace
    
    Link: https://lkml.kernel.org/r/20210314231544.9379-1-jnewsome@torproject.org
    Signed-off-by: James Newsome <jnewsome@torproject.org>
    Reviewed-by: Oleg Nesterov <oleg@redhat.com>
    Cc: "Eric W . Biederman" <ebiederm@xmission.com>
    Cc: Christian Brauner <christian@brauner.io>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    sporksmith authored and sfrothwell committed Mar 18, 2021
  32. fs: fat: fix spelling typo of values

    vaules -> values
    
    Link: https://lkml.kernel.org/r/20210302034817.30384-1-dingsenjie@163.com
    Signed-off-by: dingsenjie <dingsenjie@yulong.com>
    Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    dingsenjie authored and sfrothwell committed Mar 18, 2021
Older