Skip to content
Permalink
John-Ogness/in…
Switch branches/tags

Commits on Jun 7, 2021

  1. printk: fix cpu lock ordering

    The cpu lock implementation uses a full memory barrier to take
    the lock, but no memory barriers when releasing the lock. This
    means that changes performed by a lock owner may not be seen by
    the next lock owner. This may have been "good enough" for use
    by dump_stack() as a serialization mechanism, but it is not
    enough to provide proper protection for a critical section.
    
    Correct this problem by using acquire/release memory barriers
    for lock/unlock, respectively.
    
    Note that it is not necessary for a cpu lock to disable
    interrupts. However, in upcoming work this cpu lock will be used
    for emergency tasks (for example, atomic consoles during kernel
    crashes) and any interruptions should be avoided if possible.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    jogness authored and intel-lab-lkp committed Jun 7, 2021
  2. dump_stack: move cpu lock to printk.c

    dump_stack() implements its own cpu-reentrant spinning lock to
    best-effort serialize stack traces in the printk log. However,
    there are other functions (such as show_regs()) that can also
    benefit from this serialization.
    
    Move the cpu-reentrant spinning lock (cpu lock) into new helper
    functions printk_cpu_lock_irqsave()/printk_cpu_unlock_irqrestore()
    so that it is available for others as well. For !CONFIG_SMP the
    cpu lock is a NOP.
    
    Note that having multiple cpu locks in the system can easily
    lead to deadlock. Code needing a cpu lock should use the
    printk cpu lock, since the printk cpu lock could be acquired
    from any code and any context.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    jogness authored and intel-lab-lkp committed Jun 7, 2021
  3. Add linux-next specific files for 20210607

    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    sfrothwell committed Jun 7, 2021
  4. kdump: use vmlinux_build_id to simplify

    We can use the vmlinux_build_id array here now instead of open coding it.
    This mostly consolidates code.
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-14-swboyd@chromium.org
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  5. buildid: fix kernel-doc notation

    Kernel doc should use "Return:" instead of "Returns" to properly reflect
    the return values.
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-13-swboyd@chromium.org
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  6. buildid: mark some arguments const

    These arguments are never modified so they can be marked const to indicate
    as such.
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-12-swboyd@chromium.org
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  7. scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path

    Add "auto" to the usage message so that it's a little clearer that you can
    pass "auto" as the second argument.  When passing "auto" the script tries
    to find the base path automatically instead of requiring it be passed on
    the commandline.  Also use [<variable>] to indicate the variable argument
    and that it is optional so that we can differentiate from the literal
    "auto" that should be passed.
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-11-swboyd@chromium.org
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  8. scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm

    Sometimes if you're using tools that have linked things improperly or have
    new features/sections that older tools don't expect you'll see warnings
    printed to stderr.  We don't really care about these warnings, so let's
    just silence these messages to cleanup output of this script.
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-10-swboyd@chromium.org
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  9. scripts/decode_stacktrace.sh: support debuginfod

    Now that stacktraces contain the build ID information we can update this
    script to use debuginfod-find to locate the debuginfo for the vmlinux and
    modules automatically.  This can replace the existing code that requires
    specifying a path to vmlinux or tries to find the vmlinux and modules
    automatically by using the release number.  Work it into the script as a
    fallback option if the vmlinux isn't specified on the commandline.
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-9-swboyd@chromium.org
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  10. x86/dumpstack: use %pSb/%pBb for backtrace printing

    Let's use the new printk formats to print the stacktrace entries when
    printing a backtrace to the kernel logs.  This will include any module's
    build ID[1] in it so that offline/crash debugging can easily locate the
    debuginfo for a module via something like debuginfod[2].
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-8-swboyd@chromium.org
    Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
    Link: https://sourceware.org/elfutils/Debuginfod.html [2]
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  11. arm64: stacktrace: use %pSb for backtrace printing

    Let's use the new printk format to print the stacktrace entry when
    printing a backtrace to the kernel logs. This will include any module's
    build ID[1] in it so that offline/crash debugging can easily locate the
    debuginfo for a module via something like debuginfod[2].
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-7-swboyd@chromium.org
    Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
    Link: https://sourceware.org/elfutils/Debuginfod.html [2]
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  12. module: fix build error when CONFIG_SYSFS is disabled

    Fix build error when disable CONFIG_SYSFS:
    kernel/module.c:2805:8: error: implicit declaration of function `sect_empty'; did you mean `desc_empty'? [-Werror=implicit-function-declaration]
     2805 |   if (!sect_empty(sechdr) && sechdr->sh_type == SHT_NOTE &&
    
    Link: https://lkml.kernel.org/r/20210525105049.34804-1-cuibixuan@huawei.com
    Fixes: 9ee6682 ("module: add printk formats to add module build ID to stacktraces")
    Reported-by: Hulk Robot <hulkci@huawei.com>
    Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Acked-by: Jessica Yu <jeyu@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Bixuan Cui authored and sfrothwell committed Jun 7, 2021
  13. module-add-printk-formats-to-add-module-build-id-to-stacktraces-fix-fix

    make kallsyms_lookup_buildid() static
    
    warning: no previous prototype for 'kallsyms_lookup_buildid' [-Wmissing-prototypes]
    
    Cc: Stephen Boyd <swboyd@chromium.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    akpm00 authored and sfrothwell committed Jun 7, 2021
  14. buildid: fix build when CONFIG_MODULES is not set

    Omit the static_assert() when CONFIG_MODULES is not set/enabled.
    Fixes these build errors:
    
    ../kernel/kallsyms.c: In function `__sprint_symbol':
    ../include/linux/kernel.h:53:43: error: dereferencing pointer to incomplete type `struct module'
     #define typeof_member(T, m) typeof(((T*)0)->m)
                                               ^
    ../include/linux/build_bug.h:78:41: error: static assertion failed: "sizeof(typeof_member(struct module, build_id)) == 20"
     #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
                                             ^
    ../kernel/kallsyms.c:454:4: note: in expansion of macro `static_assert'
        static_assert(sizeof(typeof_member(struct module, build_id)) == 20);
        ^~~~~~~~~~~~~
    
    Link: https://lkml.kernel.org/r/20210513171510.20328-1-rdunlap@infradead.org
    Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rddunlap authored and sfrothwell committed Jun 7, 2021
  15. module-add-printk-formats-to-add-module-build-id-to-stacktraces-fix

    fix build with CONFIG_MODULES=n, tweak code layout
    
    Cc: Stephen Boyd <swboyd@chromium.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    akpm00 authored and sfrothwell committed Jun 7, 2021
  16. module: add printk formats to add module build ID to stacktraces

    Let's make kernel stacktraces easier to identify by including the build
    ID[1] of a module if the stacktrace is printing a symbol from a module.
    This makes it simpler for developers to locate a kernel module's full
    debuginfo for a particular stacktrace.  Combined with
    scripts/decode_stracktrace.sh, a developer can download the matching
    debuginfo from a debuginfod[2] server and find the exact file and line
    number for the functions plus offsets in a stacktrace that match the
    module.  This is especially useful for pstore crash debugging where the
    kernel crashes are recorded in something like console-ramoops and the
    recovery kernel/modules are different or the debuginfo doesn't exist on
    the device due to space concerns (the debuginfo can be too large for space
    limited devices).
    
    Originally, I put this on the %pS format, but that was quickly rejected
    given that %pS is used in other places such as ftrace where build IDs
    aren't meaningful.  There was some discussions on the list to put every
    module build ID into the "Modules linked in:" section of the stacktrace
    message but that quickly becomes very hard to read once you have more than
    three or four modules linked in.  It also provides too much information
    when we don't expect each module to be traversed in a stacktrace.  Having
    the build ID for modules that aren't important just makes things messy.
    Splitting it to multiple lines for each module quickly explodes the number
    of lines printed in an oops too, possibly wrapping the warning off the
    console.  And finally, trying to stash away each module used in a
    callstack to provide the ID of each symbol printed is cumbersome and would
    require changes to each architecture to stash away modules and return
    their build IDs once unwinding has completed.
    
    Instead, we opt for the simpler approach of introducing new printk formats
    '%pS[R]b' for "pointer symbolic backtrace with module build ID" and '%pBb'
    for "pointer backtrace with module build ID" and then updating the few
    places in the architecture layer where the stacktrace is printed to use
    this new format.
    
    Before:
    
     Call trace:
      lkdtm_WARNING+0x28/0x30 [lkdtm]
      direct_entry+0x16c/0x1b4 [lkdtm]
      full_proxy_write+0x74/0xa4
      vfs_write+0xec/0x2e8
    
    After:
    
     Call trace:
      lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
      direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9]
      full_proxy_write+0x74/0xa4
      vfs_write+0xec/0x2e8
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-6-swboyd@chromium.org
    Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
    Link: https://sourceware.org/elfutils/Debuginfod.html [2]
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  17. dump_stack: add vmlinux build ID to stack traces

    Add the running kernel's build ID[1] to the stacktrace information header.
    This makes it simpler for developers to locate the vmlinux with full
    debuginfo for a particular kernel stacktrace.  Combined with
    scripts/decode_stracktrace.sh, a developer can download the correct
    vmlinux from a debuginfod[2] server and find the exact file and line
    number for the functions plus offsets in a stacktrace.
    
    This is especially useful for pstore crash debugging where the kernel
    crashes are recorded in the pstore logs and the recovery kernel is
    different or the debuginfo doesn't exist on the device due to space
    concerns (the data can be large and a security concern).  The stacktrace
    can be analyzed after the crash by using the build ID to find the matching
    vmlinux and understand where in the function something went wrong.
    
    Example stacktrace from lkdtm:
    
     WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm]
     Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE
     CPU: 4 PID: 3255 Comm: bash Not tainted 5.11 #3 aa23f7a1231c229de205662d5a9e0d4c580f19a1
     Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
     pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--)
     pc : lkdtm_WARNING+0x28/0x30 [lkdtm]
    
    The hex string aa23f7a1231c229de205662d5a9e0d4c580f19a1 is the build ID,
    following the kernel version number. Put it all behind a config option,
    STACKTRACE_BUILD_ID, so that kernel developers can remove this
    information if they decide it is too much.
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-5-swboyd@chromium.org
    Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1]
    Link: https://sourceware.org/elfutils/Debuginfod.html [2]
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  18. buildid-stash-away-kernels-build-id-on-init-fix

    fix implicit declaration of function 'init_vmlinux_build_id'
    
    Link: https://lkml.kernel.org/r/CAE-0n51UjTbay8N9FXAyE7_aR2+ePrQnKSRJ0gbmRsXtcLBVaw@mail.gmail.com
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  19. buildid: stash away kernels build ID on init

    Parse the kernel's build ID at initialization so that other code can print
    a hex format string representation of the running kernel's build ID.  This
    will be used in the kdump and dump_stack code so that developers can
    easily locate the vmlinux debug symbols for a crash/stacktrace.
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-4-swboyd@chromium.org
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  20. buildid: add API to parse build ID out of buffer

    Add an API that can parse the build ID out of a buffer, instead of a vma,
    to support printing a kernel module's build ID for stack traces.
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-3-swboyd@chromium.org
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  21. buildid: only consider GNU notes for build ID parsing

    Patch series "Add build ID to stacktraces", v6.
    
    This series adds the kernel's build ID[1] to the stacktrace header printed
    in oops messages, warnings, etc.  and the build ID for any module that
    appears in the stacktrace after the module name.  The goal is to make the
    stacktrace more self-contained and descriptive by including the relevant
    build IDs in the kernel logs when something goes wrong.  This can be used
    by post processing tools like script/decode_stacktrace.sh and kernel
    developers to easily locate the debug info associated with a kernel crash
    and line up what line and file things started falling apart at.
    
    To show how this can be used I've included a patch to decode_stacktrace.sh
    that downloads the debuginfo from a debuginfod server.  This also includes
    some patches to make the buildid.c file use more const arguments and
    consolidate logic into buildid.c from kdump.  These are left to the end as
    they were mostly cleanup patches.
    
    Here's an example lkdtm stacktrace on arm64.
    
     WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 lkdtm_WARNING+0x28/0x30 [lkdtm]
     Modules linked in: lkdtm rfcomm algif_hash algif_skcipher af_alg xt_cgroup uinput xt_MASQUERADE
     CPU: 4 PID: 3255 Comm: bash Not tainted 5.11 #3 aa23f7a1231c229de205662d5a9e0d4c580f19a1
     Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
     pstate: 00400009 (nzcv daif +PAN -UAO -TCO BTYPE=--)
     pc : lkdtm_WARNING+0x28/0x30 [lkdtm]
     lr : lkdtm_do_action+0x24/0x40 [lkdtm]
     sp : ffffffc0134fbca0
     x29: ffffffc0134fbca0 x28: ffffff92d53ba240
     x27: 0000000000000000 x26: 0000000000000000
     x25: 0000000000000000 x24: ffffffe3622352c0
     x23: 0000000000000020 x22: ffffffe362233366
     x21: ffffffe3622352e0 x20: ffffffc0134fbde0
     x19: 0000000000000008 x18: 0000000000000000
     x17: ffffff929b6536fc x16: 0000000000000000
     x15: 0000000000000000 x14: 0000000000000012
     x13: ffffffe380ed892c x12: ffffffe381d05068
     x11: 0000000000000000 x10: 0000000000000000
     x9 : 0000000000000001 x8 : ffffffe362237000
     x7 : aaaaaaaaaaaaaaaa x6 : 0000000000000000
     x5 : 0000000000000000 x4 : 0000000000000001
     x3 : 0000000000000008 x2 : ffffff93fef25a70
     x1 : ffffff93fef15788 x0 : ffffffe3622352e0
     Call trace:
      lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
      direct_entry+0x16c/0x1b4 [lkdtm ed5019fdf5e53be37cb1ba7899292d7e143b259e]
      full_proxy_write+0x74/0xa4
      vfs_write+0xec/0x2e8
      ksys_write+0x84/0xf0
      __arm64_sys_write+0x24/0x30
      el0_svc_common+0xf4/0x1c0
      do_el0_svc_compat+0x28/0x3c
      el0_svc_compat+0x10/0x1c
      el0_sync_compat_handler+0xa8/0xcc
      el0_sync_compat+0x178/0x180
     ---[ end trace 3d95032303e59e68 ]---
    
    This patch (of 13):
    
    Some kernel elf files have various notes that also happen to have an elf
    note type of '3', which matches NT_GNU_BUILD_ID but the note name isn't
    "GNU".  For example, this note trips up the existing logic:
    
     Owner  Data size   Description
     Xen    0x00000008  Unknown note type: (0x00000003) description data: 00 00 00 ffffff80 ffffffff ffffffff ffffffff ffffffff
    
    Let's make sure that it is a GNU note when parsing the build ID so that we
    can use this function to parse a vmlinux's build ID too.
    
    Link: https://lkml.kernel.org/r/20210511003845.2429846-1-swboyd@chromium.org
    Link: https://lkml.kernel.org/r/20210511003845.2429846-2-swboyd@chromium.org
    Fixes: bd7525d ("bpf: Move stack_map_get_build_id into lib")
    Signed-off-by: Stephen Boyd <swboyd@chromium.org>
    Reported-by: Petr Mladek <pmladek@suse.com>
    Tested-by: Petr Mladek <pmladek@suse.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Jessica Yu <jeyu@kernel.org>
    Cc: Evan Green <evgreen@chromium.org>
    Cc: Hsin-Yi Wang <hsinyi@chromium.org>
    Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vivek Goyal <vgoyal@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    bebarino authored and sfrothwell committed Jun 7, 2021
  22. mm: fix spelling mistakes in header files

    Fix some spelling mistakes in comments:
    successfull ==> successful
    potentialy ==> potentially
    alloced ==> allocated
    indicies ==> indices
    wont ==> won't
    resposible ==> responsible
    dirtyness ==> dirtiness
    droppped ==> dropped
    alread ==> already
    occured ==> occurred
    interupts ==> interrupts
    extention ==> extension
    slighly ==> slightly
    Dont't ==> Don't
    
    Link: https://lkml.kernel.org/r/20210531034849.9549-2-thunder.leizhen@huawei.com
    Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
    Cc: Jerome Glisse <jglisse@redhat.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Dennis Zhou <dennis@kernel.org>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Christoph Lameter <cl@linux.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Zhen Lei authored and sfrothwell committed Jun 7, 2021
  23. secretmem: test: add basic selftest for memfd_secret(2)

    The test verifies that file descriptor created with memfd_secret does not
    allow read/write operations, that secret memory mappings respect
    RLIMIT_MEMLOCK and that remote accesses with process_vm_read() and
    ptrace() to the secret memory fail.
    
    Link: https://lkml.kernel.org/r/20210518072034.31572-8-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Cc: kernel test robot <lkp@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Jun 7, 2021
  24. arch, mm: wire up memfd_secret system call where relevant

    Wire up memfd_secret system call on architectures that define
    ARCH_HAS_SET_DIRECT_MAP, namely arm64, risc-v and x86.
    
    Link: https://lkml.kernel.org/r/20210518072034.31572-7-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
    Acked-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Cc: kernel test robot <lkp@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Jun 7, 2021
  25. PM: hibernate: disable when there are active secretmem users

    It is unsafe to allow saving of secretmem areas to the hibernation
    snapshot as they would be visible after the resume and this essentially
    will defeat the purpose of secret memory mappings.
    
    Prevent hibernation whenever there are active secret memory users.
    
    Link: https://lkml.kernel.org/r/20210518072034.31572-6-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Cc: kernel test robot <lkp@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Jun 7, 2021
  26. mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas-fix

    suppress Kconfig whine
    
    Cc: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    akpm00 authored and sfrothwell committed Jun 7, 2021
  27. mm: introduce memfd_secret system call to create "secret" memory areas

    Introduce "memfd_secret" system call with the ability to create memory
    areas visible only in the context of the owning process and not mapped not
    only to other processes but in the kernel page tables as well.
    
    The secretmem feature is off by default and the user must explicitly
    enable it at the boot time.
    
    Once secretmem is enabled, the user will be able to create a file
    descriptor using the memfd_secret() system call.  The memory areas created
    by mmap() calls from this file descriptor will be unmapped from the kernel
    direct map and they will be only mapped in the page table of the processes
    that have access to the file descriptor.
    
    Secretmem is designed to provide the following protections:
    
    * Enhanced protection (in conjunction with all the other in-kernel
      attack prevention systems) against ROP attacks.  Seceretmem makes
      "simple" ROP insufficient to perform exfiltration, which increases the
      required complexity of the attack.  Along with other protections like
      the kernel stack size limit and address space layout randomization which
      make finding gadgets is really hard, absence of any in-kernel primitive
      for accessing secret memory means the one gadget ROP attack can't work.
      Since the only way to access secret memory is to reconstruct the missing
      mapping entry, the attacker has to recover the physical page and insert
      a PTE pointing to it in the kernel and then retrieve the contents.  That
      takes at least three gadgets which is a level of difficulty beyond most
      standard attacks.
    
    * Prevent cross-process secret userspace memory exposures.  Once the
      secret memory is allocated, the user can't accidentally pass it into the
      kernel to be transmitted somewhere.  The secreremem pages cannot be
      accessed via the direct map and they are disallowed in GUP.
    
    * Harden against exploited kernel flaws.  In order to access secretmem,
      a kernel-side attack would need to either walk the page tables and
      create new ones, or spawn a new privileged uiserspace process to perform
      secrets exfiltration using ptrace.
    
    The file descriptor based memory has several advantages over the
    "traditional" mm interfaces, such as mlock(), mprotect(), madvise().  File
    descriptor approach allows explicit and controlled sharing of the memory
    areas, it allows to seal the operations.  Besides, file descriptor based
    memory paves the way for VMMs to remove the secret memory range from the
    userspace hipervisor process, for instance QEMU.  Andy Lutomirski says:
    
      "Getting fd-backed memory into a guest will take some possibly major
      work in the kernel, but getting vma-backed memory into a guest without
      mapping it in the host user address space seems much, much worse."
    
    memfd_secret() is made a dedicated system call rather than an extension to
    memfd_create() because it's purpose is to allow the user to create more
    secure memory mappings rather than to simply allow file based access to
    the memory.  Nowadays a new system call cost is negligible while it is way
    simpler for userspace to deal with a clear-cut system calls than with a
    multiplexer or an overloaded syscall.  Moreover, the initial
    implementation of memfd_secret() is completely distinct from
    memfd_create() so there is no much sense in overloading memfd_create() to
    begin with.  If there will be a need for code sharing between these
    implementation it can be easily achieved without a need to adjust user
    visible APIs.
    
    The secret memory remains accessible in the process context using uaccess
    primitives, but it is not exposed to the kernel otherwise; secret memory
    areas are removed from the direct map and functions in the
    follow_page()/get_user_page() family will refuse to return a page that
    belongs to the secret memory area.
    
    Once there will be a use case that will require exposing secretmem to the
    kernel it will be an opt-in request in the system call flags so that user
    would have to decide what data can be exposed to the kernel.
    
    Removing of the pages from the direct map may cause its fragmentation on
    architectures that use large pages to map the physical memory which
    affects the system performance.  However, the original Kconfig text for
    CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "...  can
    improve the kernel's performance a tiny bit ..." (commit 00d1c5e
    ("x86: add gbpages switches")) and the recent report [1] showed that "...
    although 1G mappings are a good default choice, there is no compelling
    evidence that it must be the only choice".  Hence, it is sufficient to
    have secretmem disabled by default with the ability of a system
    administrator to enable it at boot time.
    
    Pages in the secretmem regions are unevictable and unmovable to avoid
    accidental exposure of the sensitive data via swap or during page
    migration.
    
    Since the secretmem mappings are locked in memory they cannot exceed
    RLIMIT_MEMLOCK.  Since these mappings are already locked independently
    from mlock(), an attempt to mlock()/munlock() secretmem range would fail
    and mlockall()/munlockall() will ignore secretmem mappings.
    
    However, unlike mlock()ed memory, secretmem currently behaves more like
    long-term GUP: secretmem mappings are unmovable mappings directly consumed
    by user space.  With default limits, there is no excessive use of
    secretmem and it poses no real problem in combination with
    ZONE_MOVABLE/CMA, but in the future this should be addressed to allow
    balanced use of large amounts of secretmem along with ZONE_MOVABLE/CMA.
    
    A page that was a part of the secret memory area is cleared when it is
    freed to ensure the data is not exposed to the next user of that page.
    
    The following example demonstrates creation of a secret mapping (error
    handling is omitted):
    
    	fd = memfd_secret(0);
    	ftruncate(fd, MAP_SIZE);
    	ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
    		   MAP_SHARED, fd, 0);
    
    [1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/
    
    Link: https://lkml.kernel.org/r/20210518072034.31572-5-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
    Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: kernel test robot <lkp@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Jun 7, 2021
  28. set_memory: allow querying whether set_direct_map_*() is actually ena…

    …bled
    
    On arm64, set_direct_map_*() functions may return 0 without actually
    changing the linear map.  This behaviour can be controlled using kernel
    parameters, so we need a way to determine at runtime whether calls to
    set_direct_map_invalid_noflush() and set_direct_map_default_noflush() have
    any effect.
    
    Extend set_memory API with can_set_direct_map() function that allows
    checking if calling set_direct_map_*() will actually change the page
    table, replace several occurrences of open coded checks in arm64 with the
    new function and provide a generic stub for architectures that always
    modify page tables upon calls to set_direct_map APIs.
    
    [arnd@arndb.de: arm64: kfence: fix header inclusion ]
    Link: https://lkml.kernel.org/r/20210518072034.31572-4-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Cc: kernel test robot <lkp@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Jun 7, 2021
  29. riscv/Kconfig: make direct map manipulation options depend on MMU

    ARCH_HAS_SET_DIRECT_MAP and ARCH_HAS_SET_MEMORY configuration options have
    no meaning when CONFIG_MMU is disabled and there is no point to enable
    them for the nommu case.
    
    Add an explicit dependency on MMU for these options.
    
    Link: https://lkml.kernel.org/r/20210518072034.31572-3-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Reported-by: kernel test robot <lkp@intel.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Jun 7, 2021
  30. mmap: make mlock_future_check() global

    Patch series "mm: introduce memfd_secret system call to create "secret" memory areas", v20.
    
    This is an implementation of "secret" mappings backed by a file
    descriptor.
    
    The file descriptor backing secret memory mappings is created using a
    dedicated memfd_secret system call The desired protection mode for the
    memory is configured using flags parameter of the system call.  The mmap()
    of the file descriptor created with memfd_secret() will create a "secret"
    memory mapping.  The pages in that mapping will be marked as not present
    in the direct map and will be present only in the page table of the owning
    mm.
    
    Although normally Linux userspace mappings are protected from other users,
    such secret mappings are useful for environments where a hostile tenant is
    trying to trick the kernel into giving them access to other tenants
    mappings.
    
    It's designed to provide the following protections:
    
    * Enhanced protection (in conjunction with all the other in-kernel
      attack prevention systems) against ROP attacks.  Seceretmem makes
      "simple" ROP insufficient to perform exfiltration, which increases the
      required complexity of the attack.  Along with other protections like
      the kernel stack size limit and address space layout randomization which
      make finding gadgets is really hard, absence of any in-kernel primitive
      for accessing secret memory means the one gadget ROP attack can't work.
      Since the only way to access secret memory is to reconstruct the missing
      mapping entry, the attacker has to recover the physical page and insert
      a PTE pointing to it in the kernel and then retrieve the contents.  That
      takes at least three gadgets which is a level of difficulty beyond most
      standard attacks.
    
    * Prevent cross-process secret userspace memory exposures.  Once the
      secret memory is allocated, the user can't accidentally pass it into the
      kernel to be transmitted somewhere.  The secreremem pages cannot be
      accessed via the direct map and they are disallowed in GUP.
    
    * Harden against exploited kernel flaws.  In order to access secretmem,
      a kernel-side attack would need to either walk the page tables and
      create new ones, or spawn a new privileged uiserspace process to perform
      secrets exfiltration using ptrace.
    
    In the future the secret mappings may be used as a mean to protect guest
    memory in a virtual machine host.
    
    For demonstration of secret memory usage we've created a userspace library
    
    https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloader.git
    
    that does two things: the first is act as a preloader for openssl to
    redirect all the OPENSSL_malloc calls to secret memory meaning any secret
    keys get automatically protected this way and the other thing it does is
    expose the API to the user who needs it.  We anticipate that a lot of the
    use cases would be like the openssl one: many toolkits that deal with
    secret keys already have special handling for the memory to try to give
    them greater protection, so this would simply be pluggable into the
    toolkits without any need for user application modification.
    
    Hiding secret memory mappings behind an anonymous file allows usage of the
    page cache for tracking pages allocated for the "secret" mappings as well
    as using address_space_operations for e.g.  page migration callbacks.
    
    The anonymous file may be also used implicitly, like hugetlb files, to
    implement mmap(MAP_SECRET) and use the secret memory areas with "native"
    mm ABIs in the future.
    
    Removing of the pages from the direct map may cause its fragmentation on
    architectures that use large pages to map the physical memory which
    affects the system performance.  However, the original Kconfig text for
    CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "...  can
    improve the kernel's performance a tiny bit ..." (commit 00d1c5e
    ("x86: add gbpages switches")) and the recent report [1] showed that "...
    although 1G mappings are a good default choice, there is no compelling
    evidence that it must be the only choice".  Hence, it is sufficient to
    have secretmem disabled by default with the ability of a system
    administrator to enable it at boot time.
    
    In addition, there is also a long term goal to improve management of the
    direct map.
    
    [1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/
    
    This patch (of 7):
    
    It will be used by the upcoming secret memory implementation.
    
    Link: https://lkml.kernel.org/r/20210518072034.31572-1-rppt@kernel.org
    Link: https://lkml.kernel.org/r/20210518072034.31572-2-rppt@kernel.org
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Elena Reshetova <elena.reshetova@intel.com>
    Cc: Hagen Paul Pfeifer <hagen@jauu.net>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Palmer Dabbelt <palmerdabbelt@google.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tycho Andersen <tycho@tycho.ws>
    Cc: Will Deacon <will@kernel.org>
    Cc: kernel test robot <lkp@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rppt authored and sfrothwell committed Jun 7, 2021
  31. mm/slub: use stackdepot to save stack trace in objects-fix

    Paul reports [1] lockdep splat HARDIRQ-safe -> HARDIRQ-unsafe lock order
    detected.  Kernel test robot reports [2]
    BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c
    
    The stack trace might be saved from contexts where we can't block so GFP_KERNEL
    is unsafe. So use GFP_NOWAIT. Under memory pressure we might thus fail to save
    some new unique stack, but that should be extremely rare.
    
    [1] https://lore.kernel.org/linux-mm/20210515204622.GA2672367@paulmck-ThinkPad-P17-Gen-1/
    [2] https://lore.kernel.org/linux-mm/20210516144152.GA25903@xsang-OptiPlex-9020/
    
    Link: https://lkml.kernel.org/r/20210516195150.26740-1-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Reported-by: Paul E. McKenney <paulmck@kernel.org>
    Tested-by: Paul E. McKenney <paulmck@kernel.org>
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    tehcaster authored and sfrothwell committed Jun 7, 2021
  32. slub: STACKDEPOT: rename save_stack_trace()

    save_stack_trace() already exists, so change the one in
    CONFIG_STACKDEPOT to be save_stack_depot_trace().
    
    Fixes this build error:
    
    ../mm/slub.c:607:29: error: conflicting types for `save_stack_trace'
     static depot_stack_handle_t save_stack_trace(gfp_t flags)
                                 ^~~~~~~~~~~~~~~~
    In file included from ../include/linux/page_ext.h:6:0,
                     from ../include/linux/mm.h:25,
                     from ../mm/slub.c:13:
    ../include/linux/stacktrace.h:86:13: note: previous declaration of `save_stack_trace' was here
     extern void save_stack_trace(struct stack_trace *trace);
                 ^~~~~~~~~~~~~~~~
    
    from this patch in mmotm:
      Subject: mm/slub: use stackdepot to save stack trace in objects
    
    Link: https://lkml.kernel.org/r/20210513051920.29320-1-rdunlap@infradead.org
    Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
    Acked-by: Oliver Glitta <glittao@gmail.com>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    rddunlap authored and sfrothwell committed Jun 7, 2021
  33. mm/slub: use stackdepot to save stack trace in objects

    Many stack traces are similar so there are many similar arrays.
    Stackdepot saves each unique stack only once.
    
    Replace field addrs in struct track with depot_stack_handle_t handle.  Use
    stackdepot to save stack trace.
    
    The benefits are smaller memory overhead and possibility to aggregate
    per-cache statistics in the future using the stackdepot handle instead of
    matching stacks manually.
    
    Link: https://lkml.kernel.org/r/20210414163434.4376-1-glittao@gmail.com
    Signed-off-by: Oliver Glitta <glittao@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: David Rientjes <rientjes@google.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Oliver Glitta authored and sfrothwell committed Jun 7, 2021
Older