Skip to content
Permalink
John-Wood/Fork…
Switch branches/tags

Commits on May 22, 2021

  1. MAINTAINERS: Add a new entry for the Brute LSM

    In order to maintain the code for the Brute LSM add a new entry to the
    maintainers list.
    
    Signed-off-by: John Wood <john.wood@gmx.com>
    johwood authored and intel-lab-lkp committed May 22, 2021
  2. Documentation: Add documentation for the Brute LSM

    Add some info detailing what is the Brute LSM, its motivation, weak
    points of existing implementations, proposed solutions, enabling,
    disabling, configuration and self-tests.
    
    Signed-off-by: John Wood <john.wood@gmx.com>
    johwood authored and intel-lab-lkp committed May 22, 2021
  3. selftests/brute: Add tests for the Brute LSM

    Add tests to check the brute LSM functionality and cover fork/exec brute
    force attacks crossing the following privilege boundaries:
    
    1.- setuid process
    2.- privilege changes
    3.- network to local
    
    Also, as a first step check that fork/exec brute force attacks without
    crossing any privilege boundary already commented doesn't trigger the
    detection and mitigation stage.
    
    Once a brute force attack is detected, the "test" executable is marked
    as "not allowed". To start again a new test, use the "rmxattr" app to
    revert this state. This way, all the tests can be run using the same
    binary.
    
    Signed-off-by: John Wood <john.wood@gmx.com>
    johwood authored and intel-lab-lkp committed May 22, 2021
  4. security/brute: Mitigate a brute force attack

    When a brute force attack is detected all the offending tasks involved
    in the attack must be killed. In other words, it is necessary to kill
    all the tasks that are executing the same file that is running during
    the brute force attack.
    
    Also, to prevent the executable involved in the attack from being
    respawned by a supervisor, and thus prevent a brute force attack from
    being started again, test the "not_allowed" flag and avoid the file
    execution based on this.
    
    Signed-off-by: John Wood <john.wood@gmx.com>
    johwood authored and intel-lab-lkp committed May 22, 2021
  5. security/brute: Detect a brute force attack

    For a correct management of a fork brute force attack it is necessary to
    track all the information related to the application crashes. To do so,
    use the extended attributes (xattr) of the executable files and define a
    statistical data structure to hold all the necessary information shared
    by all the fork hierarchy processes. This info is the number of crashes,
    the last crash timestamp and the crash period's moving average.
    
    The same can be achieved using a pointer to the fork hierarchy
    statistical data held by the task_struct structure. But this has an
    important drawback: a brute force attack that happens through the execve
    system call losts the faults info since these statistics are freed when
    the fork hierarchy disappears. Using this method makes not possible to
    manage this attack type that can be successfully treated using extended
    attributes.
    
    Also, to avoid false positives during the attack detection it is
    necessary to narrow the possible cases. So, only the following scenarios
    are taken into account:
    
    1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
        desirable memory layout is got (e.g. Stack Clash).
    2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
        until a desirable memory layout is got (e.g. what CTFs do for simple
        network service).
    3.- Launching processes without exec() (e.g. Android Zygote) and
        exposing state to attack a sibling.
    4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly
        until the previously shared memory layout of all the other children
        is exposed (e.g. kind of related to HeartBleed).
    
    In each case, a privilege boundary has been crossed:
    
    Case 1: setuid/setgid process
    Case 2: network to local
    Case 3: privilege changes
    Case 4: network to local
    
    To mark that a privilege boundary has been crossed it is only necessary
    to create a new stats for the executable file via the extended attribute
    and only if it has no previous statistical data. This is done using four
    different LSM hooks, one per privilege boundary:
    
    setuid/setgid process --> bprm_creds_from_file hook (based on secureexec
                              flag).
    network to local -------> socket_accept hook (taking into account only
                              external connections).
    privilege changes ------> task_fix_setuid and task_fix_setgid hooks.
    
    To detect a brute force attack it is necessary that the executable file
    statistics be updated in every fatal crash and the most important data
    to update is the application crash period. To do so, use the new
    "task_fatal_signal" LSM hook added in a previous step.
    
    The application crash period must be a value that is not prone to change
    due to spurious data and follows the real crash period. So, to compute
    it, the exponential moving average (EMA) is used.
    
    Based on the updated statistics two different attacks can be handled. A
    slow brute force attack that is detected if the maximum number of faults
    per fork hierarchy is reached and a fast brute force attack that is
    detected if the application crash period falls below a certain
    threshold.
    
    Moreover, only the signals delivered by the kernel are taken into
    account with the exception of the SIGABRT signal since the latter is
    used by glibc for stack canary, malloc, etc failures, which may indicate
    that a mitigation has been triggered.
    
    Signed-off-by: John Wood <john.wood@gmx.com>
    johwood authored and intel-lab-lkp committed May 22, 2021
  6. security/brute: Define a LSM and add sysctl attributes

    Add a new Kconfig file to define a menu entry under "Security options"
    to enable the "Fork brute force attack detection and mitigation"
    feature.
    
    The detection of a brute force attack can be based on the number of
    faults per application and its crash rate.
    
    There are two types of brute force attacks that can be detected. The
    first one is a slow brute force attack that is detected if the maximum
    number of faults per fork hierarchy is reached. The second type is a
    fast brute force attack that is detected if the application crash period
    falls below a certain threshold.
    
    The application crash period must be a value that is not prone to change
    due to spurious data and follows the real crash period. So, to compute
    it, the exponential moving average (EMA) will be used.
    
    This kind of average defines a weight (between 0 and 1) for the new
    value to add and applies the remainder of the weight to the current
    average value. This way, some spurious data will not excessively modify
    the average and only if the new values are persistent, the moving
    average will tend towards them.
    
    Mathematically the application crash period's EMA can be expressed as
    follows:
    
    period_ema = period * weight + period_ema * (1 - weight)
    
    Moreover, it is important to note that a minimum number of faults is
    needed to guarantee a trend in the crash period when the EMA is used.
    
    So, based on all the previous information define a LSM with five sysctl
    attributes that will be used to fine tune the attack detection.
    
    ema_weight_numerator
    ema_weight_denominator
    max_faults
    min_faults
    crash_period_threshold
    
    This patch is a previous step on the way to fine tune the attack
    detection.
    
    Signed-off-by: John Wood <john.wood@gmx.com>
    johwood authored and intel-lab-lkp committed May 22, 2021
  7. security: Add LSM hook at the point where a task gets a fatal signal

    Add a security hook that allows a LSM to be notified when a task gets a
    fatal signal. This patch is a previous step on the way to compute the
    task crash period by the "brute" LSM (linux security module to detect
    and mitigate fork brute force attack against vulnerable userspace
    processes).
    
    Signed-off-by: John Wood <john.wood@gmx.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    johwood authored and intel-lab-lkp committed May 22, 2021

Commits on Apr 22, 2021

  1. Merge branch 'landlock_lsm_v34' into next-testing

    Added flags to landlock_create_ruleset(2) to allow for future
    changes to ruleset format.
    James Morris committed Apr 22, 2021
  2. landlock: Enable user space to infer supported features

    Add a new flag LANDLOCK_CREATE_RULESET_VERSION to
    landlock_create_ruleset(2).  This enables to retreive a Landlock ABI
    version that is useful to efficiently follow a best-effort security
    approach.  Indeed, it would be a missed opportunity to abort the whole
    sandbox building, because some features are unavailable, instead of
    protecting users as much as possible with the subset of features
    provided by the running kernel.
    
    This new flag enables user space to identify the minimum set of Landlock
    features supported by the running kernel without relying on a filesystem
    interface (e.g. /proc/version, which might be inaccessible) nor testing
    multiple syscall argument combinations (i.e. syscall bisection).  New
    Landlock features will be documented and tied to a minimum version
    number (greater than 1).  The current version will be incremented for
    each new kernel release supporting new Landlock features.  User space
    libraries can leverage this information to seamlessly restrict processes
    as much as possible while being compatible with newer APIs.
    
    This is a much more lighter approach than the previous
    landlock_get_features(2): the complexity is pushed to user space
    libraries.  This flag meets similar needs as securityfs versions:
    selinux/policyvers, apparmor/features/*/version* and tomoyo/version.
    
    Supporting this flag now will be convenient for backward compatibility.
    
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: James Morris <jmorris@namei.org>
    Cc: Jann Horn <jannh@google.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Link: https://lore.kernel.org/r/20210422154123.13086-14-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    l0kod authored and James Morris committed Apr 22, 2021
  3. landlock: Add user and kernel documentation

    Add a first document describing userspace API: how to define and enforce
    a Landlock security policy.  This is explained with a simple example.
    The Landlock system calls are described with their expected behavior and
    current limitations.
    
    Another document is dedicated to kernel developers, describing guiding
    principles and some important kernel structures.
    
    This documentation can be built with the Sphinx framework.
    
    Cc: James Morris <jmorris@namei.org>
    Cc: Jann Horn <jannh@google.com>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/20210422154123.13086-13-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    l0kod authored and James Morris committed Apr 22, 2021
  4. samples/landlock: Add a sandbox manager example

    Add a basic sandbox tool to launch a command which can only access a
    list of file hierarchies in a read-only or read-write way.
    
    Cc: James Morris <jmorris@namei.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Reviewed-by: Jann Horn <jannh@google.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/20210422154123.13086-12-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    l0kod authored and James Morris committed Apr 22, 2021
  5. selftests/landlock: Add user space tests

    Test all Landlock system calls, ptrace hooks semantic and filesystem
    access-control with multiple layouts.
    
    Test coverage for security/landlock/ is 93.6% of lines.  The code not
    covered only deals with internal kernel errors (e.g. memory allocation)
    and race conditions.
    
    Cc: James Morris <jmorris@namei.org>
    Cc: Jann Horn <jannh@google.com>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/20210422154123.13086-11-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    l0kod authored and James Morris committed Apr 22, 2021
  6. landlock: Add syscall implementations

    These 3 system calls are designed to be used by unprivileged processes
    to sandbox themselves:
    * landlock_create_ruleset(2): Creates a ruleset and returns its file
      descriptor.
    * landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a
      ruleset, identified by the dedicated file descriptor.
    * landlock_restrict_self(2): Enforces a ruleset on the calling thread
      and its future children (similar to seccomp).  This syscall has the
      same usage restrictions as seccomp(2): the caller must have the
      no_new_privs attribute set or have CAP_SYS_ADMIN in the current user
      namespace.
    
    All these syscalls have a "flags" argument (not currently used) to
    enable extensibility.
    
    Here are the motivations for these new syscalls:
    * A sandboxed process may not have access to file systems, including
      /dev, /sys or /proc, but it should still be able to add more
      restrictions to itself.
    * Neither prctl(2) nor seccomp(2) (which was used in a previous version)
      fit well with the current definition of a Landlock security policy.
    
    All passed structs (attributes) are checked at build time to ensure that
    they don't contain holes and that they are aligned the same way for each
    architecture.
    
    See the user and kernel documentation for more details (provided by a
    following commit):
    * Documentation/userspace-api/landlock.rst
    * Documentation/security/landlock.rst
    
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: James Morris <jmorris@namei.org>
    Cc: Jann Horn <jannh@google.com>
    Cc: Kees Cook <keescook@chromium.org>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Acked-by: Serge Hallyn <serge@hallyn.com>
    Link: https://lore.kernel.org/r/20210422154123.13086-9-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    l0kod authored and James Morris committed Apr 22, 2021
  7. arch: Wire up Landlock syscalls

    Wire up the following system calls for all architectures:
    * landlock_create_ruleset(2)
    * landlock_add_rule(2)
    * landlock_restrict_self(2)
    
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: James Morris <jmorris@namei.org>
    Cc: Jann Horn <jannh@google.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Link: https://lore.kernel.org/r/20210422154123.13086-10-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    l0kod authored and James Morris committed Apr 22, 2021
  8. fs,security: Add sb_delete hook

    The sb_delete security hook is called when shutting down a superblock,
    which may be useful to release kernel objects tied to the superblock's
    lifetime (e.g. inodes).
    
    This new hook is needed by Landlock to release (ephemerally) tagged
    struct inodes.  This comes from the unprivileged nature of Landlock
    described in the next commit.
    
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: James Morris <jmorris@namei.org>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Reviewed-by: Jann Horn <jannh@google.com>
    Acked-by: Serge Hallyn <serge@hallyn.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/20210422154123.13086-7-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    l0kod authored and James Morris committed Apr 22, 2021
  9. landlock: Support filesystem access-control

    Using Landlock objects and ruleset, it is possible to tag inodes
    according to a process's domain.  To enable an unprivileged process to
    express a file hierarchy, it first needs to open a directory (or a file)
    and pass this file descriptor to the kernel through
    landlock_add_rule(2).  When checking if a file access request is
    allowed, we walk from the requested dentry to the real root, following
    the different mount layers.  The access to each "tagged" inodes are
    collected according to their rule layer level, and ANDed to create
    access to the requested file hierarchy.  This makes possible to identify
    a lot of files without tagging every inodes nor modifying the
    filesystem, while still following the view and understanding the user
    has from the filesystem.
    
    Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
    keep the same struct inodes for the same inodes whereas these inodes are
    in use.
    
    This commit adds a minimal set of supported filesystem access-control
    which doesn't enable to restrict all file-related actions.  This is the
    result of multiple discussions to minimize the code of Landlock to ease
    review.  Thanks to the Landlock design, extending this access-control
    without breaking user space will not be a problem.  Moreover, seccomp
    filters can be used to restrict the use of syscall families which may
    not be currently handled by Landlock.
    
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
    Cc: James Morris <jmorris@namei.org>
    Cc: Jann Horn <jannh@google.com>
    Cc: Jeff Dike <jdike@addtoit.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Serge E. Hallyn <serge@hallyn.com>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Link: https://lore.kernel.org/r/20210422154123.13086-8-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    l0kod authored and James Morris committed Apr 22, 2021
  10. LSM: Infrastructure management of the superblock

    Move management of the superblock->sb_security blob out of the
    individual security modules and into the security infrastructure.
    Instead of allocating the blobs from within the modules, the modules
    tell the infrastructure how much space is required, and the space is
    allocated there.
    
    Cc: John Johansen <john.johansen@canonical.com>
    Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
    Acked-by: Serge Hallyn <serge@hallyn.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/20210422154123.13086-6-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    cschaufler authored and James Morris committed Apr 22, 2021
  11. landlock: Add ptrace restrictions

    Using ptrace(2) and related debug features on a target process can lead
    to a privilege escalation.  Indeed, ptrace(2) can be used by an attacker
    to impersonate another task and to remain undetected while performing
    malicious activities.  Thanks to  ptrace_may_access(), various part of
    the kernel can check if a tracer is more privileged than a tracee.
    
    A landlocked process has fewer privileges than a non-landlocked process
    and must then be subject to additional restrictions when manipulating
    processes. To be allowed to use ptrace(2) and related syscalls on a
    target process, a landlocked process must have a subset of the target
    process's rules (i.e. the tracee must be in a sub-domain of the tracer).
    
    Cc: James Morris <jmorris@namei.org>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Reviewed-by: Jann Horn <jannh@google.com>
    Acked-by: Serge Hallyn <serge@hallyn.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/20210422154123.13086-5-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    l0kod authored and James Morris committed Apr 22, 2021
  12. landlock: Set up the security framework and manage credentials

    Process's credentials point to a Landlock domain, which is underneath
    implemented with a ruleset.  In the following commits, this domain is
    used to check and enforce the ptrace and filesystem security policies.
    A domain is inherited from a parent to its child the same way a thread
    inherits a seccomp policy.
    
    Cc: James Morris <jmorris@namei.org>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Reviewed-by: Jann Horn <jannh@google.com>
    Acked-by: Serge Hallyn <serge@hallyn.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/20210422154123.13086-4-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    l0kod authored and James Morris committed Apr 22, 2021
  13. landlock: Add ruleset and domain management

    A Landlock ruleset is mainly a red-black tree with Landlock rules as
    nodes.  This enables quick update and lookup to match a requested
    access, e.g. to a file.  A ruleset is usable through a dedicated file
    descriptor (cf. following commit implementing syscalls) which enables a
    process to create and populate a ruleset with new rules.
    
    A domain is a ruleset tied to a set of processes.  This group of rules
    defines the security policy enforced on these processes and their future
    children.  A domain can transition to a new domain which is the
    intersection of all its constraints and those of a ruleset provided by
    the current process.  This modification only impact the current process.
    This means that a process can only gain more constraints (i.e. lose
    accesses) over time.
    
    Cc: James Morris <jmorris@namei.org>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Acked-by: Serge Hallyn <serge@hallyn.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Reviewed-by: Jann Horn <jannh@google.com>
    Link: https://lore.kernel.org/r/20210422154123.13086-3-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    l0kod authored and James Morris committed Apr 22, 2021
  14. landlock: Add object management

    A Landlock object enables to identify a kernel object (e.g. an inode).
    A Landlock rule is a set of access rights allowed on an object.  Rules
    are grouped in rulesets that may be tied to a set of processes (i.e.
    subjects) to enforce a scoped access-control (i.e. a domain).
    
    Because Landlock's goal is to empower any process (especially
    unprivileged ones) to sandbox themselves, we cannot rely on a
    system-wide object identification such as file extended attributes.
    Indeed, we need innocuous, composable and modular access-controls.
    
    The main challenge with these constraints is to identify kernel objects
    while this identification is useful (i.e. when a security policy makes
    use of this object).  But this identification data should be freed once
    no policy is using it.  This ephemeral tagging should not and may not be
    written in the filesystem.  We then need to manage the lifetime of a
    rule according to the lifetime of its objects.  To avoid a global lock,
    this implementation make use of RCU and counters to safely reference
    objects.
    
    A following commit uses this generic object management for inodes.
    
    Cc: James Morris <jmorris@namei.org>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Reviewed-by: Jann Horn <jannh@google.com>
    Acked-by: Serge Hallyn <serge@hallyn.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/20210422154123.13086-2-mic@digikod.net
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    l0kod authored and James Morris committed Apr 22, 2021

Commits on Apr 9, 2021

  1. Merge branch 'fixes-v5.12' into next-testing

    James Morris committed Apr 9, 2021

Commits on Mar 24, 2021

  1. security: commoncap: fix -Wstringop-overread warning

    gcc-11 introdces a harmless warning for cap_inode_getsecurity:
    
    security/commoncap.c: In function ‘cap_inode_getsecurity’:
    security/commoncap.c:440:33: error: ‘memcpy’ reading 16 bytes from a region of size 0 [-Werror=stringop-overread]
      440 |                                 memcpy(&nscap->data, &cap->data, sizeof(__le32) * 2 * VFS_CAP_U32);
          |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    The problem here is that tmpbuf is initialized to NULL, so gcc assumes
    it is not accessible unless it gets set by vfs_getxattr_alloc().  This is
    a legitimate warning as far as I can tell, but the code is correct since
    it correctly handles the error when that function fails.
    
    Add a separate NULL check to tell gcc about it as well.
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: James Morris <jamorris@linux.microsoft.com>
    arndb authored and James Morris committed Mar 24, 2021

Commits on Mar 14, 2021

  1. Linux 5.12-rc3

    torvalds committed Mar 14, 2021
  2. prctl: fix PR_SET_MM_AUXV kernel stack leak

    Doing a
    
    	prctl(PR_SET_MM, PR_SET_MM_AUXV, addr, 1);
    
    will copy 1 byte from userspace to (quite big) on-stack array
    and then stash everything to mm->saved_auxv.
    AT_NULL terminator will be inserted at the very end.
    
    /proc/*/auxv handler will find that AT_NULL terminator
    and copy original stack contents to userspace.
    
    This devious scheme requires CAP_SYS_RESOURCE.
    
    Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Alexey Dobriyan authored and torvalds committed Mar 14, 2021
  3. Merge tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/lin…

    …ux/kernel/git/tip/tip
    
    Pull irq fixes from Thomas Gleixner:
     "A set of irqchip updates:
    
       - Make the GENERIC_IRQ_MULTI_HANDLER configuration correct
    
       - Add a missing DT compatible string for the Ingenic driver
    
       - Remove the pointless debugfs_file pointer from struct irqdomain"
    
    * tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      irqchip/ingenic: Add support for the JZ4760
      dt-bindings/irq: Add compatible string for the JZ4760B
      irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER
      ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly
      irqdomain: Remove debugfs_file from struct irq_domain
    torvalds committed Mar 14, 2021
  4. Merge tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull timer fix from Thomas Gleixner:
     "A single fix in for hrtimers to prevent an interrupt storm caused by
      the lack of reevaluation of the timers which expire in softirq context
      under certain circumstances, e.g. when the clock was set"
    
    * tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
    torvalds committed Mar 14, 2021
  5. Merge tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/tip/tip
    
    Pull scheduler fixes from Thomas Gleixner:
     "A set of scheduler updates:
    
       - Prevent a NULL pointer dereference in the migration_stop_cpu()
         mechanims
    
       - Prevent self concurrency of affine_move_task()
    
       - Small fixes and cleanups related to task migration/affinity setting
    
       - Ensure that sync_runqueues_membarrier_state() is invoked on the
         current CPU when it is in the cpu mask"
    
    * tag 'sched-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      sched/membarrier: fix missing local execution of ipi_sync_rq_state()
      sched: Simplify set_affinity_pending refcounts
      sched: Fix affine_move_task() self-concurrency
      sched: Optimize migration_cpu_stop()
      sched: Collate affine_move_task() stoppers
      sched: Simplify migration_cpu_stop()
      sched: Fix migration_cpu_stop() requeueing
    torvalds committed Mar 14, 2021
  6. Merge tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm…

    …/linux/kernel/git/tip/tip
    
    Pull objtool fix from Thomas Gleixner:
     "A single objtool fix to handle the PUSHF/POPF validation correctly for
      the paravirt changes which modified arch_local_irq_restore not to use
      popf"
    
    * tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      objtool,x86: Fix uaccess PUSHF/POPF validation
    torvalds committed Mar 14, 2021
  7. Merge tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm…

    …/linux/kernel/git/tip/tip
    
    Pull locking fixes from Thomas Gleixner:
     "A couple of locking fixes:
    
       - A fix for the static_call mechanism so it handles unaligned
         addresses correctly.
    
       - Make u64_stats_init() a macro so every instance gets a seperate
         lockdep key.
    
       - Make seqcount_latch_init() a macro as well to preserve the static
         variable which is used for the lockdep key"
    
    * tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      seqlock,lockdep: Fix seqcount_latch_init()
      u64_stats,lockdep: Fix u64_stats_init() vs lockdep
      static_call: Fix the module key fixup
    torvalds committed Mar 14, 2021
  8. Merge tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm…

    …/linux/kernel/git/tip/tip
    
    Pull perf fixes from Borislav Petkov:
    
     - Make sure PMU internal buffers are flushed for per-CPU events too and
       properly handle PID/TID for large PEBS.
    
     - Handle the case properly when there's no PMU and therefore return an
       empty list of perf MSRs for VMX to switch instead of reading random
       garbage from the stack.
    
    * tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case
      perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
      perf/core: Flush PMU internal buffers for per-CPU events
    torvalds committed Mar 14, 2021
  9. Merge tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull EFI fix from Ard Biesheuvel via Borislav Petkov:
     "Fix an oversight in the handling of EFI_RT_PROPERTIES_TABLE, which was
      added v5.10, but failed to take the SetVirtualAddressMap() RT service
      into account"
    
    * tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table
    torvalds committed Mar 14, 2021
  10. Merge tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull x86 fixes from Borislav Petkov:
    
     - A couple of SEV-ES fixes and robustifications: verify usermode stack
       pointer in NMI is not coming from the syscall gap, correctly track
       IRQ states in the #VC handler and access user insn bytes atomically
       in same handler as latter cannot sleep.
    
     - Balance 32-bit fast syscall exit path to do the proper work on exit
       and thus not confuse audit and ptrace frameworks.
    
     - Two fixes for the ORC unwinder going "off the rails" into KASAN
       redzones and when ORC data is missing.
    
    * tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      x86/sev-es: Use __copy_from_user_inatomic()
      x86/sev-es: Correctly track IRQ states in runtime #VC handler
      x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
      x86/sev-es: Introduce ip_within_syscall_gap() helper
      x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
      x86/unwind/orc: Silence warnings caused by missing ORC data
      x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
    torvalds committed Mar 14, 2021
  11. Merge tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/powerpc/linux
    
    Pull powerpc fixes from Michael Ellerman:
     "Some more powerpc fixes for 5.12:
    
       - Fix wrong instruction encoding for lis in ppc_function_entry(),
         which could potentially lead to missed kprobes.
    
       - Fix SET_FULL_REGS on 32-bit and 64e, which prevented ptrace of
         non-volatile GPRs immediately after exec.
    
       - Clean up a missed SRR specifier in the recent interrupt rework.
    
       - Don't treat unrecoverable_exception() as an interrupt handler, it's
         called from other handlers so shouldn't do the interrupt entry/exit
         accounting itself.
    
       - Fix build errors caused by missing declarations for
         [en/dis]able_kernel_vsx().
    
      Thanks to Christophe Leroy, Daniel Axtens, Geert Uytterhoeven, Jiri
      Olsa, Naveen N. Rao, and Nicholas Piggin"
    
    * tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
      powerpc/traps: unrecoverable_exception() is not an interrupt handler
      powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
      powerpc/64s/exception: Clean up a missed SRR specifier
      powerpc: Fix inverted SET_FULL_REGS bitop
      powerpc/64s: Use symbolic macros for function entry encoding
      powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()
    torvalds committed Mar 14, 2021
  12. Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

    Pull KVM fixes from Paolo Bonzini:
     "More fixes for ARM and x86"
    
    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
      KVM: LAPIC: Advancing the timer expiration on guest initiated write
      KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
      KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
      kvm: x86: annotate RCU pointers
      KVM: arm64: Fix exclusive limit for IPA size
      KVM: arm64: Reject VM creation when the default IPA size is unsupported
      KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
      KVM: arm64: Don't use cbz/adr with external symbols
      KVM: arm64: Fix range alignment when walking page tables
      KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
      KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
      KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
      KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
      KVM: arm64: Fix nVHE hyp panic host context restore
      KVM: arm64: Avoid corrupting vCPU context register in guest exit
      KVM: arm64: nvhe: Save the SPE context early
      kvm: x86: use NULL instead of using plain integer as pointer
      KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
      KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
    torvalds committed Mar 14, 2021
Older