Skip to content
Permalink
Nick-Terrell/l…

Commits on Jun 25, 2017

  1. squashfs: Add zstd support

    Add zstd compression and decompression support to SquashFS. zstd is a
    great fit for SquashFS because it can compress at ratios approaching xz,
    while decompressing twice as fast as zlib. For SquashFS in particular,
    it can decompress as fast as lzo and lz4. It also has the flexibility
    to turn down the compression ratio for faster compression times.
    
    The compression benchmark is run on the file tree from the SquashFS archive
    found in ubuntu-16.10-desktop-amd64.iso [1]. It uses `mksquashfs` with the
    default block size (128 KB) and and various compression algorithms/levels.
    xz and zstd are also benchmarked with 256 KB blocks. The decompression
    benchmark times how long it takes to `tar` the file tree into `/dev/null`.
    See the benchmark file in the upstream zstd source repository located under
    `contrib/linux-kernel/squashfs-benchmark.sh` [2] for details.
    
    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD.
    
    | Method         | Ratio | Compression MB/s | Decompression MB/s |
    |----------------|-------|------------------|--------------------|
    | gzip           |  2.92 |               15 |                128 |
    | lzo            |  2.64 |              9.5 |                217 |
    | lz4            |  2.12 |               94 |                218 |
    | xz             |  3.43 |              5.5 |                 35 |
    | xz 256 KB      |  3.53 |              5.4 |                 40 |
    | zstd 1         |  2.71 |               96 |                210 |
    | zstd 5         |  2.93 |               69 |                198 |
    | zstd 10        |  3.01 |               41 |                225 |
    | zstd 15        |  3.13 |             11.4 |                224 |
    | zstd 16 256 KB |  3.24 |              8.1 |                210 |
    
    This patch was written by Sean Purcell <me@seanp.xyz>, but I will be
    taking over the submission process.
    
    [1] http://releases.ubuntu.com/16.10/
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/squashfs-benchmark.sh
    
    zstd source repository: https://github.com/facebook/zstd
    
    Cc: Sean Purcell <me@seanp.xyz>
    Signed-off-by: Nick Terrell <terrelln@fb.com>
    terrelln authored and fengguang committed Jun 25, 2017
  2. btrfs: Add zstd support

    Add zstd compression and decompression support to BtrFS. zstd at its
    fastest level compresses almost as well as zlib, while offering much
    faster compression and decompression, approaching lzo speeds.
    
    I benchmarked btrfs with zstd compression against no compression, lzo
    compression, and zlib compression. I benchmarked two scenarios. Copying
    a set of files to btrfs, and then reading the files. Copying a tarball
    to btrfs, extracting it to btrfs, and then reading the extracted files.
    After every operation, I call `sync` and include the sync time.
    Between every pair of operations I unmount and remount the filesystem
    to avoid caching. The benchmark files can be found in the upstream
    zstd source repository under
    `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
    [1] [2].
    
    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD.
    
    The first compression benchmark is copying 10 copies of the unzipped
    Silesia corpus [3] into a BtrFS filesystem mounted with
    `-o compress-force=Method`. The decompression benchmark times how long
    it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
    measured by comparing the output of `df` and `du`. See the benchmark file
    [1] for details. I benchmarked multiple zstd compression levels, although
    the patch uses zstd level 1.
    
    | Method  | Ratio | Compression MB/s | Decompression speed |
    |---------|-------|------------------|---------------------|
    | None    |  0.99 |              504 |                 686 |
    | lzo     |  1.66 |              398 |                 442 |
    | zlib    |  2.58 |               65 |                 241 |
    | zstd 1  |  2.57 |              260 |                 383 |
    | zstd 3  |  2.71 |              174 |                 408 |
    | zstd 6  |  2.87 |               70 |                 398 |
    | zstd 9  |  2.92 |               43 |                 406 |
    | zstd 12 |  2.93 |               21 |                 408 |
    | zstd 15 |  3.01 |               11 |                 354 |
    
    The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it
    measures the compression ratio, extracts the tar, and deletes the tar.
    Then it measures the compression ratio again, and `tar`s the extracted
    files into `/dev/null`. See the benchmark file [2] for details.
    
    | Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) |
    |--------|-----------|---------------|----------|------------|----------|
    | None   |      0.97 |          0.78 |    0.981 |      5.501 |    8.807 |
    | lzo    |      2.06 |          1.38 |    1.631 |      8.458 |    8.585 |
    | zlib   |      3.40 |          1.86 |    7.750 |     21.544 |   11.744 |
    | zstd 1 |      3.57 |          1.85 |    2.579 |     11.479 |    9.389 |
    
    [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh
    [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
    [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz
    
    zstd source repository: https://github.com/facebook/zstd
    
    Signed-off-by: Nick Terrell <terrelln@fb.com>
    terrelln authored and fengguang committed Jun 25, 2017
  3. lib: Add zstd modules

    Add zstd compression and decompression kernel modules.
    zstd offers a wide varity of compression speed and quality trade-offs.
    It can compress at speeds approaching lz4, and quality approaching lzma.
    zstd decompressions at speeds more than twice as fast as zlib, and
    decompression speed remains roughly the same across all compression levels.
    
    The code was ported from the upstream zstd source repository. The
    `linux/zstd.h` header was modified to match linux kernel style.
    The cross-platform and allocation code was stripped out. Instead zstd
    requires the caller to pass a preallocated workspace. The source files
    were clang-formatted [1] to match the Linux Kernel style as much as
    possible. Otherwise, the code was unmodified. We would like to avoid
    as much further manual modification to the source code as possible, so it
    will be easier to keep the kernel zstd up to date.
    
    I benchmarked zstd compression as a special character device. I ran zstd
    and zlib compression at several levels, as well as performing no
    compression, which measure the time spent copying the data to kernel space.
    Data is passed to the compresser 4096 B at a time. The benchmark file is
    located in the upstream zstd source repository under
    `contrib/linux-kernel/zstd_compress_test.c` [2].
    
    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
    211,988,480 B large. Run the following commands for the benchmark:
    
        sudo modprobe zstd_compress_test
        sudo mknod zstd_compress_test c 245 0
        sudo cp silesia.tar zstd_compress_test
    
    The time is reported by the time of the userland `cp`.
    The MB/s is computed with
    
        1,536,217,008 B / time(buffer size, hash)
    
    which includes the time to copy from userland.
    The Adjusted MB/s is computed with
    
        1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
    
    The memory reported is the amount of memory the compressor requests.
    
    | Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
    |----------|----------|----------|-------|---------|----------|----------|
    | none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
    | zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
    | zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
    | zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
    | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
    | zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
    | zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
    | zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
    | zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
    | zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
    | zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |
    
    I benchmarked zstd decompression using the same method on the same machine.
    The benchmark file is located in the upstream zstd repo under
    `contrib/linux-kernel/zstd_decompress_test.c` [4]. The memory reported is
    the amount of memory required to decompress data compressed with the given
    compression level. If you know the maximum size of your input, you can
    reduce the memory usage of decompression irrespective of the compression
    level.
    
    | Method   | Time (s) | MB/s    | Adjusted MB/s | Memory (MB) |
    |----------|----------|---------|---------------|-------------|
    | none     |    0.025 | 8479.54 |             - |           - |
    | zstd -1  |    0.358 |  592.15 |        636.60 |        0.84 |
    | zstd -3  |    0.396 |  535.32 |        571.40 |        1.46 |
    | zstd -5  |    0.396 |  535.32 |        571.40 |        1.46 |
    | zstd -10 |    0.374 |  566.81 |        607.42 |        2.51 |
    | zstd -15 |    0.379 |  559.34 |        598.84 |        4.61 |
    | zstd -19 |    0.412 |  514.54 |        547.77 |        8.80 |
    | zlib -1  |    0.940 |  225.52 |        231.68 |        0.04 |
    | zlib -3  |    0.883 |  240.08 |        247.07 |        0.04 |
    | zlib -6  |    0.844 |  251.17 |        258.84 |        0.04 |
    | zlib -9  |    0.837 |  253.27 |        287.64 |        0.04 |
    
    Tested in userland using the test-suite in the zstd repo under
    `contrib/linux-kernel/test/UserlandTest.cpp` [5] by mocking the kernel
    functions. Fuzz tested using libfuzzer [6] with the fuzz harnesses under
    `contrib/linux-kernel/test/{RoundTripCrash.c,DecompressCrash.c}` [7] [8]
    with ASAN, UBSAN, and MSAN. Additionaly, it was tested while testing the
    BtrFS and SquashFS patches coming next.
    
    [1] https://clang.llvm.org/docs/ClangFormat.html
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_compress_test.c
    [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
    [4] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_decompress_test.c
    [5] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/UserlandTest.cpp
    [6] http://llvm.org/docs/LibFuzzer.html
    [7] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/RoundTripCrash.c
    [8] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/DecompressCrash.c
    
    zstd source repository: https://github.com/facebook/zstd
    
    Signed-off-by: Nick Terrell <terrelln@fb.com>
    terrelln authored and fengguang committed Jun 25, 2017
  4. lib: Add xxhash module

    Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an
    extremely fast non-cryptographic hash algorithm for checksumming.
    The zstd compression and decompression modules added in the next patch
    require xxhash. I extracted it out from zstd since it is useful on its
    own. I copied the code from the upstream XXHash source repository and
    translated it into kernel style. I ran benchmarks and tests in the kernel
    and tests in userland.
    
    I benchmarked xxhash as a special character device. I ran in four modes,
    no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to
    kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute
    hashes on the copied data. I also ran it with four different buffer sizes.
    The benchmark file is located in the upstream zstd source repository under
    `contrib/linux-kernel/xxhash_test.c` [1].
    
    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs`
    from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large.
    Run the following commands for the benchmark:
    
        modprobe xxhash_test
        mknod xxhash_test c 245 0
        time cp filesystem.squashfs xxhash_test
    
    The time is reported by the time of the userland `cp`.
    The GB/s is computed with
    
        1,536,217,008 B / time(buffer size, hash)
    
    which includes the time to copy from userland.
    The Normalized GB/s is computed with
    
        1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
    
    | Buffer Size (B) | Hash  | Time (s) | GB/s | Adjusted GB/s |
    |-----------------|-------|----------|------|---------------|
    |            1024 | none  |    0.408 | 3.77 |             - |
    |            1024 | xxh32 |    0.649 | 2.37 |          6.37 |
    |            1024 | xxh64 |    0.542 | 2.83 |         11.46 |
    |            1024 | crc32 |    1.290 | 1.19 |          1.74 |
    |            4096 | none  |    0.380 | 4.04 |             - |
    |            4096 | xxh32 |    0.645 | 2.38 |          5.79 |
    |            4096 | xxh64 |    0.500 | 3.07 |         12.80 |
    |            4096 | crc32 |    1.168 | 1.32 |          1.95 |
    |            8192 | none  |    0.351 | 4.38 |             - |
    |            8192 | xxh32 |    0.614 | 2.50 |          5.84 |
    |            8192 | xxh64 |    0.464 | 3.31 |         13.60 |
    |            8192 | crc32 |    1.163 | 1.32 |          1.89 |
    |           16384 | none  |    0.346 | 4.43 |             - |
    |           16384 | xxh32 |    0.590 | 2.60 |          6.30 |
    |           16384 | xxh64 |    0.466 | 3.30 |         12.80 |
    |           16384 | crc32 |    1.183 | 1.30 |          1.84 |
    
    Tested in userland using the test-suite in the zstd repo under
    `contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the
    kernel functions. A line in each branch of every function in `xxhash.c`
    was commented out to ensure that the test-suite fails. Additionally
    tested while testing zstd and with SMHasher [3].
    
    [1] https://phabricator.intern.facebook.com/P57526246
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp
    [3] https://github.com/aappleby/smhasher
    
    zstd source repository: https://github.com/facebook/zstd
    XXHash source repository: https://github.com/cyan4973/xxhash
    
    Signed-off-by: Nick Terrell <terrelln@fb.com>
    terrelln authored and fengguang committed Jun 25, 2017

Commits on Jun 19, 2017

  1. Linux 4.12-rc6

    torvalds committed Jun 19, 2017
  2. mm: larger stack guard gap, between vmas

    Stack guard page is a useful feature to reduce a risk of stack smashing
    into a different mapping. We have been using a single page gap which
    is sufficient to prevent having stack adjacent to a different mapping.
    But this seems to be insufficient in the light of the stack usage in
    userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
    used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
    which is 256kB or stack strings with MAX_ARG_STRLEN.
    
    This will become especially dangerous for suid binaries and the default
    no limit for the stack size limit because those applications can be
    tricked to consume a large portion of the stack and a single glibc call
    could jump over the guard page. These attacks are not theoretical,
    unfortunatelly.
    
    Make those attacks less probable by increasing the stack guard gap
    to 1MB (on systems with 4k pages; but make it depend on the page size
    because systems with larger base pages might cap stack allocations in
    the PAGE_SIZE units) which should cover larger alloca() and VLA stack
    allocations. It is obviously not a full fix because the problem is
    somehow inherent, but it should reduce attack space a lot.
    
    One could argue that the gap size should be configurable from userspace,
    but that can be done later when somebody finds that the new 1MB is wrong
    for some special case applications.  For now, add a kernel command line
    option (stack_guard_gap) to specify the stack gap size (in page units).
    
    Implementation wise, first delete all the old code for stack guard page:
    because although we could get away with accounting one extra page in a
    stack vma, accounting a larger gap can break userspace - case in point,
    a program run with "ulimit -S -v 20000" failed when the 1MB gap was
    counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
    and strict non-overcommit mode.
    
    Instead of keeping gap inside the stack vma, maintain the stack guard
    gap as a gap between vmas: using vm_start_gap() in place of vm_start
    (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
    places which need to respect the gap - mainly arch_get_unmapped_area(),
    and and the vma tree's subtree_gap support for that.
    
    Original-patch-by: Oleg Nesterov <oleg@redhat.com>
    Original-patch-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Tested-by: Helge Deller <deller@gmx.de> # parisc
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hugh Dickins authored and torvalds committed Jun 19, 2017
  3. Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/arm/arm-soc
    
    Pull ARM SoC fixes from Olof Johansson:
     "Stream of fixes has slowed down, only a few this week:
    
       - Some DT fixes for Allwinner platforms, and addition of a clock to
         the R_CCU clock controller that had been missed.
    
       - A couple of small DT fixes for am335x-sl50"
    
    * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
      arm64: allwinner: a64: Add PLL_PERIPH0 clock to the R_CCU
      ARM: sunxi: h3-h5: Add PLL_PERIPH0 clock to the R_CCU
      ARM: dts: am335x-sl50: Fix cannot claim requested pins for spi0
      ARM: dts: am335x-sl50: Fix card detect pin for mmc1
      arm64: allwinner: h5: Remove syslink to shared DTSI
      ARM: sunxi: h3/h5: fix the compatible of R_CCU
    torvalds committed Jun 19, 2017
  4. Merge tag 'sunxi-fixes-for-4.12' of https://git.kernel.org/pub/scm/li…

    …nux/kernel/git/sunxi/linux into fixes
    
    Allwinner fixes for 4.12
    
    A few fixes around the PRCM support that got in 4.12 with a wrong
    compatible, and a missing clock in the binding.
    
    * tag 'sunxi-fixes-for-4.12' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
      arm64: allwinner: a64: Add PLL_PERIPH0 clock to the R_CCU
      ARM: sunxi: h3-h5: Add PLL_PERIPH0 clock to the R_CCU
      arm64: allwinner: h5: Remove syslink to shared DTSI
      ARM: sunxi: h3/h5: fix the compatible of R_CCU
    
    Signed-off-by: Olof Johansson <olof@lixom.net>
    olofj committed Jun 19, 2017
  5. Merge tag 'omap-for-v4.12/fixes-sl50' of git://git.kernel.org/pub/scm…

    …/linux/kernel/git/tmlind/linux-omap into fixes
    
    Two fixes for am335x-sl50 to fix a boot time error
    for claiming SPI pins, and to fix a SDIO card detect
    pin for production version of the device.
    
    * tag 'omap-for-v4.12/fixes-sl50' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
      ARM: dts: am335x-sl50: Fix cannot claim requested pins for spi0
      ARM: dts: am335x-sl50: Fix card detect pin for mmc1
    
    Signed-off-by: Olof Johansson <olof@lixom.net>
    olofj committed Jun 19, 2017
  6. Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/gi…

    …t/mst/vhost
    
    Pull virtio bugfix from Michael Tsirkin:
     "It turns out balloon does not handle IOMMUs correctly. We should fix
      that at some point, for now let's just disable this configuration"
    
    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
      virtio_balloon: disable VIOMMU support
    torvalds committed Jun 19, 2017
  7. Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/wsa/linux
    
    Pull i2c fixes from Wolfram Sang:
     "Two driver bugfixes"
    
    * 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
      i2c: ismt: fix wrong device address when unmap the data buffer
      i2c: rcar: use correct length when unmapping DMA
    torvalds committed Jun 19, 2017
  8. Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upst…

    …ream-linus
    
    Pull MIPS fixes from Ralf Baechle:
    
     - Three highmem fixes:
        + Fixed mapping initialization
        + Adjust the pkmap location
        + Ensure we use at most one page for PTEs
    
     - Fix makefile dependencies for .its targets to depend on vmlinux
    
     - Fix reversed condition in BNEZC and JIALC software branch emulation
    
     - Only flush initialized flush_insn_slot to avoid NULL pointer
       dereference
    
     - perf: Remove incorrect odd/even counter handling for I6400
    
     - ftrace: Fix init functions tracing
    
    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
      MIPS: .its targets depend on vmlinux
      MIPS: Fix bnezc/jialc return address calculation
      MIPS: kprobes: flush_insn_slot should flush only if probe initialised
      MIPS: ftrace: fix init functions tracing
      MIPS: mm: adjust PKMAP location
      MIPS: highmem: ensure that we don't use more than one page for PTEs
      MIPS: mm: fixed mappings: correct initialisation
      MIPS: perf: Remove incorrect odd/even counter handling for I6400
    torvalds committed Jun 19, 2017

Commits on Jun 18, 2017

  1. virtio_balloon: disable VIOMMU support

    virtio balloon bypasses the DMA API entirely so does not support the
    VIOMMU right now.  It's not clear we need that support, for now let's
    just make sure we don't pretend to support it.
    
    Cc: stable@vger.kernel.org
    Cc: Wei Wang <wei.w.wang@intel.com>
    Fixes: 1a93769 ("virtio: new feature to detect IOMMU device quirk")
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    mstsirkin committed Jun 18, 2017
  2. Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/tip/tip
    
    Pull x86 fixes from Thomas Gleixner:
     "Two fixlets for x86:
    
       - Handle WARN_ONs proper with the new UD based WARN implementation
    
       - Disable 1G mappings when 2M mappings are disabled by kmemleak or
         debug_pagealloc. Otherwise 1G mappings might still be used,
         confusing the debug mechanisms"
    
    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      x86/mm: Disable 1GB direct mappings when disabling 2MB mappings
      x86/debug: Handle early WARN_ONs proper
    torvalds committed Jun 18, 2017
  3. Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/sc…

    …m/linux/kernel/git/tip/tip
    
    Pull timer fixes from Thomas Gleixner:
     "Three fixlets for timers:
    
       - Two hot-fixes for the alarmtimer based posix timers, which prevent
         a nasty DOS by self rescheduling timers. The proper cleanup of that
         mess is queued for 4.13
    
       - Make a function static"
    
    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      tick/broadcast: Make tick_broadcast_setup_oneshot() static
      alarmtimer: Rate limit periodic intervals
      alarmtimer: Prevent overflow of relative timers
    torvalds committed Jun 18, 2017
  4. Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm…

    …/linux/kernel/git/tip/tip
    
    Pull scheduler fixes from Thomas Gleixner:
     "Two small fixes for the schedulre core:
    
       - Use the proper switch_mm() variant in idle_task_exit() because that
         code is not called with interrupts disabled.
    
       - Fix a confusing typo in a printk"
    
    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()
      sched/fair: Fix typo in printk message
    torvalds committed Jun 18, 2017
  5. Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull perf fixes from Thomas Gleixner:
     "Three fixes for the perf user space side:
    
       - Fix the probing of precise_ip level, which got broken recently for
         x86.
    
       - Unbreak the ARCH=x86_64 build
    
       - Report module before trying to unwind into the module code, which
         avoids broken stack frames displayed"
    
    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      perf unwind: Report module before querying isactivation in dwfl unwind
      perf tools: Fix build with ARCH=x86_64
      perf evsel: Fix probing of precise_ip level for default cycles event
    torvalds committed Jun 18, 2017
  6. Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/tip/tip
    
    Pull irq fix from Thomas Gleixner:
     "Add a missing resource release to an error path"
    
    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      genirq: Release resources in __setup_irq() error path
    torvalds committed Jun 18, 2017
  7. Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull objtool fix from Thomas Gleixner:
     "A single fix which adds fortify_panic to the list of no return
      functions"
    
    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      objtool: Add fortify_panic as __noreturn function
    torvalds committed Jun 18, 2017

Commits on Jun 17, 2017

  1. Merge tag 'led_fixes_for_4.12-rc6' of git://git.kernel.org/pub/scm/li…

    …nux/kernel/git/j.anaszewski/linux-leds
    
    Pull LED fixes from Jacek Anaszewski:
     "Two LED fixes:
    
       - fix signal source assignment for leds-bcm6328
    
       - revert patch that intended to fix LED behavior on suspend but it
         had a side effect preventing suspend at all due to uevent being
         sent on trigger removal"
    
    * tag 'led_fixes_for_4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
      Revert "leds: handle suspend/resume in heartbeat trigger"
      leds: bcm6328: fix signal source assignment for leds 4 to 7
    torvalds committed Jun 17, 2017
  2. Merge tag 'usb-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/gregkh/usb
    
    Pull USB fixes from Greg KH:
     "Here are some small gadget and xhci USB fixes for 4.12-rc6.
    
      Nothing major, but one of the gadget patches does fix a reported oops,
      and the xhci ones resolve reported problems. All have been in
      linux-next with no reported issues"
    
    * tag 'usb-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
      USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks
      usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
      usb: xhci: Fix USB 3.1 supported protocol parsing
      USB: gadget: fix GPF in gadgetfs
      usb: gadget: composite: make sure to reactivate function on unbind
    torvalds committed Jun 17, 2017
  3. Merge tag 'staging-4.12-rc6' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/gregkh/staging
    
    Pull staging and IIO fixes from Greg KH:
     "Here are some small staging and IIO driver fixes for 4.12-rc6.
    
      Nothing huge, just a few small driver fixes for reported issues. All
      have been in linux-next with no reported issues"
    
    * tag 'staging-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
      Staging: rtl8723bs: fix an error code in isFileReadable()
      iio: buffer-dmaengine: Add missing header buffer_impl.h
      iio: buffer-dma: Add missing header buffer_impl.h
      iio: adc: meson-saradc: fix potential crash in meson_sar_adc_clear_fifo
      iio: adc: mxs-lradc: Fix return value check in mxs_lradc_adc_probe()
      iio: imu: inv_mpu6050: add accel lpf setting for chip >= MPU6500
      staging: iio: ad7152: Fix deadlock in ad7152_write_raw_samp_freq()
    torvalds committed Jun 17, 2017
  4. Merge tag 'ceph-for-4.12-rc6' of git://github.com/ceph/ceph-client

    Pull ceph fixes from Ilya Dryomov:
     "A fix for an old ceph ->fh_to_* bug from Luis and two timestamp fixups
      from Zheng, prompted by the ongoing y2038 work"
    
    * tag 'ceph-for-4.12-rc6' of git://github.com/ceph/ceph-client:
      ceph: unify inode i_ctime update
      ceph: use current_kernel_time() to get request time stamp
      ceph: check i_nlink while converting a file handle to dentry
    torvalds committed Jun 17, 2017
  5. Merge tag 'xfs-4.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/x…

    …fs-linux
    
    Pull xfs fix from Darrick Wong:
     "One more bugfix for you for 4.12-rc6 to fix something that came up in
      an earlier rc:
    
       - Fix some bogus ASSERT failures on CONFIG_SMP=n and CONFIG_XFS_DEBUG=y"
    
    * tag 'xfs-4.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
      xfs: fix spurious spin_is_locked() assert failures on non-smp kernels
    torvalds committed Jun 17, 2017
  6. Merge branch 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/viro/vfs
    
    Pull ufs fixes from Al Viro:
     "Fix assorted ufs bugs: a couple of deadlocks, fs corruption in
      truncate(), oopsen on tail unpacking and truncate when racing with
      vmscan, mild fs corruption (free blocks stats summary buggered, *BSD
      fsck would complain and fix), several instances of broken logics
      around reserved blocks (starting with "check almost never triggers
      when it should" and then there are issues with sufficiently large
      UFS2)"
    
    [ Note: ufs hasn't gotten any loving in a long time, because nobody
      really seems to use it. These ufs fixes are triggered by people
      actually caring now, not some sudden influx of new bugs.  - Linus ]
    
    * 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
      ufs_truncate_blocks(): fix the case when size is in the last direct block
      ufs: more deadlock prevention on tail unpacking
      ufs: avoid grabbing ->truncate_mutex if possible
      ufs_get_locked_page(): make sure we have buffer_heads
      ufs: fix s_size/s_dsize users
      ufs: fix reserved blocks check
      ufs: make ufs_freespace() return signed
      ufs: fix logics in "ufs: make fsck -f happy"
    torvalds committed Jun 17, 2017
  7. Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/viro/vfs
    
    Pull vfs fixes from Al Viro:
     "A couple of fixes; a leak in mntns_install() caught by Andrei (this
      cycle regression) + d_invalidate() softlockup fix - that had been
      reported by a bunch of people lately, but the problem is pretty old"
    
    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
      fs: don't forget to put old mntns in mntns_install
      Hang/soft lockup in d_invalidate with simultaneous calls
    torvalds committed Jun 17, 2017

Commits on Jun 16, 2017

  1. Merge tag 'pci-v4.12-fixes-2' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/helgaas/pci
    
    Pull PCI fixes from Bjorn Helgaas:
    
     - fix another PCI_ENDPOINT build error (merged for v4.12)
    
     - fix error codes added to config accessors for v4.12
    
    * tag 'pci-v4.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
      PCI: endpoint: Select CRC32 to fix test build error
      PCI: Make error code types consistent in pci_{read,write}_config_*
    torvalds committed Jun 16, 2017
  2. Merge tag 'fbdev-v4.12-rc6' of git://github.com/bzolnier/linux

    Pull fbdev fixes from Bartlomiej Zolnierkiewicz:
    
     - fix udlfb driver to stop spamming logs (Mike Gerow)
    
     - add missing endianness conversions in smscufx & udlfb drivers (Johan
       Hovold)
    
     - fix few gcc warnings/errors (Arnd Bergmann)
    
    * tag 'fbdev-v4.12-rc6' of git://github.com/bzolnier/linux:
      video: fbdev: udlfb: drop log level for blanking
      video: fbdev: via: remove possibly unused variables
      video: fbdev: add missing USB-descriptor endianness conversions
      video: fbdev: avoid int-in-bool-context warning
    torvalds committed Jun 16, 2017
  3. Merge branch 'akpm' (patches from Andrew)

    Merge misc fixes from Andrew Morton:
     "5 fixes"
    
    * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
      mm: correct the comment when reclaimed pages exceed the scanned pages
      userfaultfd: shmem: handle coredumping in handle_userfault()
      mm: numa: avoid waiting on freed migrated pages
      swap: cond_resched in swap_cgroup_prepare()
      mm/memory-failure.c: use compound_head() flags for huge pages
    torvalds committed Jun 16, 2017
  4. mm: correct the comment when reclaimed pages exceed the scanned pages

    Commit e1587a4 ("mm: vmpressure: fix sending wrong events on
    underflow") declared that reclaimed pages exceed the scanned pages due
    to the thp reclaim.
    
    That is incorrect because THP will be spilt to normal page and loop
    again, which will result in the scanned pages increment.
    
    [akpm@linux-foundation.org: tweak comment text]
    Link: http://lkml.kernel.org/r/1496824266-25235-1-git-send-email-zhongjiang@huawei.com
    Signed-off-by: zhongjiang <zhongjiang@huawei.com>
    Acked-by: Minchan Kim <minchan@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    xiongzhongjiang authored and torvalds committed Jun 16, 2017
  5. userfaultfd: shmem: handle coredumping in handle_userfault()

    Anon and hugetlbfs handle FOLL_DUMP set by get_dump_page() internally to
    __get_user_pages().
    
    shmem as opposed has no special FOLL_DUMP handling there so
    handle_mm_fault() is invoked without mmap_sem and ends up calling
    handle_userfault() that isn't expecting to be invoked without mmap_sem
    held.
    
    This makes handle_userfault() fail immediately if invoked through
    shmem_vm_ops->fault during coredumping and solves the problem.
    
    The side effect is a BUG_ON with no lock held triggered by the
    coredumping process which exits.  Only 4.11 is affected, pre-4.11 anon
    memory holes are skipped in __get_user_pages by checking FOLL_DUMP
    explicitly against empty pagetables (mm/gup.c:no_page_table()).
    
    It's zero cost as we already had a check for current->flags to prevent
    futex to trigger userfaults during exit (PF_EXITING).
    
    Link: http://lkml.kernel.org/r/20170615214838.27429-1-aarcange@redhat.com
    Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
    Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
    Cc: <stable@vger.kernel.org>	[4.11+]
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    aagit authored and torvalds committed Jun 16, 2017
  6. mm: numa: avoid waiting on freed migrated pages

    In do_huge_pmd_numa_page(), we attempt to handle a migrating thp pmd by
    waiting until the pmd is unlocked before we return and retry.  However,
    we can race with migrate_misplaced_transhuge_page():
    
        // do_huge_pmd_numa_page                // migrate_misplaced_transhuge_page()
        // Holds 0 refs on page                 // Holds 2 refs on page
    
        vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
        /* ... */
        if (pmd_trans_migrating(*vmf->pmd)) {
                page = pmd_page(*vmf->pmd);
                spin_unlock(vmf->ptl);
                                                ptl = pmd_lock(mm, pmd);
                                                if (page_count(page) != 2)) {
                                                        /* roll back */
                                                }
                                                /* ... */
                                                mlock_migrate_page(new_page, page);
                                                /* ... */
                                                spin_unlock(ptl);
                                                put_page(page);
                                                put_page(page); // page freed here
                wait_on_page_locked(page);
                goto out;
        }
    
    This can result in the freed page having its waiters flag set
    unexpectedly, which trips the PAGE_FLAGS_CHECK_AT_PREP checks in the
    page alloc/free functions.  This has been observed on arm64 KVM guests.
    
    We can avoid this by having do_huge_pmd_numa_page() take a reference on
    the page before dropping the pmd lock, mirroring what we do in
    __migration_entry_wait().
    
    When we hit the race, migrate_misplaced_transhuge_page() will see the
    reference and abort the migration, as it may do today in other cases.
    
    Fixes: b891663 ("mm: Prevent parallel splits during THP migration")
    Link: http://lkml.kernel.org/r/1497349722-6731-2-git-send-email-will.deacon@arm.com
    Signed-off-by: Mark Rutland <mark.rutland@arm.com>
    Signed-off-by: Will Deacon <will.deacon@arm.com>
    Acked-by: Steve Capper <steve.capper@arm.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Mark Rutland authored and torvalds committed Jun 16, 2017
  7. swap: cond_resched in swap_cgroup_prepare()

    I saw need_resched() warnings when swapping on large swapfile (TBs)
    because continuously allocating many pages in swap_cgroup_prepare() took
    too long.
    
    We already cond_resched when freeing page in swap_cgroup_swapoff().  Do
    the same for the page allocation.
    
    Link: http://lkml.kernel.org/r/20170604200109.17606-1-yuzhao@google.com
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Yu Zhao authored and torvalds committed Jun 16, 2017
  8. mm/memory-failure.c: use compound_head() flags for huge pages

    memory_failure() chooses a recovery action function based on the page
    flags.  For huge pages it uses the tail page flags which don't have
    anything interesting set, resulting in:
    
    > Memory failure: 0x9be3b4: Unknown page state
    > Memory failure: 0x9be3b4: recovery action for unknown page: Failed
    
    Instead, save a copy of the head page's flags if this is a huge page,
    this means if there are no relevant flags for this tail page, we use the
    head pages flags instead.  This results in the me_huge_page() recovery
    action being called:
    
    > Memory failure: 0x9b7969: recovery action for huge page: Delayed
    
    For hugepages that have not yet been allocated, this allows the hugepage
    to be dequeued.
    
    Fixes: 524fca1 ("HWPOISON: fix misjudgement of page_action() for errors on mlocked pages")
    Link: http://lkml.kernel.org/r/20170524130204.21845-1-james.morse@arm.com
    Signed-off-by: James Morse <james.morse@arm.com>
    Tested-by: Punit Agrawal <punit.agrawal@arm.com>
    Acked-by: Punit Agrawal <punit.agrawal@arm.com>
    Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    James Morse authored and torvalds committed Jun 16, 2017
  9. Merge tag 'powerpc-4.12-6' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/powerpc/linux
    
    Pull powerpc fixes from Michael Ellerman:
     "Three small fixes for recently merged code:
    
       - remove a spurious WARN_ON when a PCI device has no of_node, it's
         allowed in some circumstances for there to be no of_node.
    
       - fix the offset for store EOI MMIOs in the XIVE interrupt
         controller.
    
       - fix non-const WARN_ONs which were becoming BUGs due to them losing
         BUGFLAG_WARNING in a recent cleanup patch.
    
      Thanks to: Alexey Kardashevskiy, Alistair Popple, Benjamin
      Herrenschmidt"
    
    * tag 'powerpc-4.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
      powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path
      powerpc/xive: Fix offset for store EOI MMIOs
      powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has no of_node
    torvalds committed Jun 16, 2017
Older
You can’t perform that action at this time.