Skip to content
Permalink
Nick-Terrell/A…

Commits on Jul 23, 2017

  1. squashfs: Add zstd support

    Add zstd compression and decompression support to SquashFS. zstd is a
    great fit for SquashFS because it can compress at ratios approaching xz,
    while decompressing twice as fast as zlib. For SquashFS in particular,
    it can decompress as fast as lzo and lz4. It also has the flexibility
    to turn down the compression ratio for faster compression times.
    
    The compression benchmark is run on the file tree from the SquashFS archive
    found in ubuntu-16.10-desktop-amd64.iso [1]. It uses `mksquashfs` with the
    default block size (128 KB) and and various compression algorithms/levels.
    xz and zstd are also benchmarked with 256 KB blocks. The decompression
    benchmark times how long it takes to `tar` the file tree into `/dev/null`.
    See the benchmark file in the upstream zstd source repository located under
    `contrib/linux-kernel/squashfs-benchmark.sh` [2] for details.
    
    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD.
    
    | Method         | Ratio | Compression MB/s | Decompression MB/s |
    |----------------|-------|------------------|--------------------|
    | gzip           |  2.92 |               15 |                128 |
    | lzo            |  2.64 |              9.5 |                217 |
    | lz4            |  2.12 |               94 |                218 |
    | xz             |  3.43 |              5.5 |                 35 |
    | xz 256 KB      |  3.53 |              5.4 |                 40 |
    | zstd 1         |  2.71 |               96 |                210 |
    | zstd 5         |  2.93 |               69 |                198 |
    | zstd 10        |  3.01 |               41 |                225 |
    | zstd 15        |  3.13 |             11.4 |                224 |
    | zstd 16 256 KB |  3.24 |              8.1 |                210 |
    
    This patch was written by Sean Purcell <me@seanp.xyz>, but I will be
    taking over the submission process.
    
    [1] http://releases.ubuntu.com/16.10/
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/squashfs-benchmark.sh
    
    zstd source repository: https://github.com/facebook/zstd
    
    Cc: Sean Purcell <me@seanp.xyz>
    Signed-off-by: Nick Terrell <terrelln@fb.com>
    terrelln authored and fengguang committed Jul 23, 2017
  2. btrfs: Add zstd support

    Add zstd compression and decompression support to BtrFS. zstd at its
    fastest level compresses almost as well as zlib, while offering much
    faster compression and decompression, approaching lzo speeds.
    
    I benchmarked btrfs with zstd compression against no compression, lzo
    compression, and zlib compression. I benchmarked two scenarios. Copying
    a set of files to btrfs, and then reading the files. Copying a tarball
    to btrfs, extracting it to btrfs, and then reading the extracted files.
    After every operation, I call `sync` and include the sync time.
    Between every pair of operations I unmount and remount the filesystem
    to avoid caching. The benchmark files can be found in the upstream
    zstd source repository under
    `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
    [1] [2].
    
    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD.
    
    The first compression benchmark is copying 10 copies of the unzipped
    Silesia corpus [3] into a BtrFS filesystem mounted with
    `-o compress-force=Method`. The decompression benchmark times how long
    it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
    measured by comparing the output of `df` and `du`. See the benchmark file
    [1] for details. I benchmarked multiple zstd compression levels, although
    the patch uses zstd level 1.
    
    | Method  | Ratio | Compression MB/s | Decompression speed |
    |---------|-------|------------------|---------------------|
    | None    |  0.99 |              504 |                 686 |
    | lzo     |  1.66 |              398 |                 442 |
    | zlib    |  2.58 |               65 |                 241 |
    | zstd 1  |  2.57 |              260 |                 383 |
    | zstd 3  |  2.71 |              174 |                 408 |
    | zstd 6  |  2.87 |               70 |                 398 |
    | zstd 9  |  2.92 |               43 |                 406 |
    | zstd 12 |  2.93 |               21 |                 408 |
    | zstd 15 |  3.01 |               11 |                 354 |
    
    The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it
    measures the compression ratio, extracts the tar, and deletes the tar.
    Then it measures the compression ratio again, and `tar`s the extracted
    files into `/dev/null`. See the benchmark file [2] for details.
    
    | Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) |
    |--------|-----------|---------------|----------|------------|----------|
    | None   |      0.97 |          0.78 |    0.981 |      5.501 |    8.807 |
    | lzo    |      2.06 |          1.38 |    1.631 |      8.458 |    8.585 |
    | zlib   |      3.40 |          1.86 |    7.750 |     21.544 |   11.744 |
    | zstd 1 |      3.57 |          1.85 |    2.579 |     11.479 |    9.389 |
    
    [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh
    [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
    [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz
    
    zstd source repository: https://github.com/facebook/zstd
    
    Signed-off-by: Nick Terrell <terrelln@fb.com>
    terrelln authored and fengguang committed Jul 23, 2017
  3. lib: Add zstd modules

    Add zstd compression and decompression kernel modules.
    zstd offers a wide varity of compression speed and quality trade-offs.
    It can compress at speeds approaching lz4, and quality approaching lzma.
    zstd decompressions at speeds more than twice as fast as zlib, and
    decompression speed remains roughly the same across all compression levels.
    
    The code was ported from the upstream zstd source repository. The
    `linux/zstd.h` header was modified to match linux kernel style.
    The cross-platform and allocation code was stripped out. Instead zstd
    requires the caller to pass a preallocated workspace. The source files
    were clang-formatted [1] to match the Linux Kernel style as much as
    possible. Otherwise, the code was unmodified. We would like to avoid
    as much further manual modification to the source code as possible, so it
    will be easier to keep the kernel zstd up to date.
    
    I benchmarked zstd compression as a special character device. I ran zstd
    and zlib compression at several levels, as well as performing no
    compression, which measure the time spent copying the data to kernel space.
    Data is passed to the compresser 4096 B at a time. The benchmark file is
    located in the upstream zstd source repository under
    `contrib/linux-kernel/zstd_compress_test.c` [2].
    
    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is
    211,988,480 B large. Run the following commands for the benchmark:
    
        sudo modprobe zstd_compress_test
        sudo mknod zstd_compress_test c 245 0
        sudo cp silesia.tar zstd_compress_test
    
    The time is reported by the time of the userland `cp`.
    The MB/s is computed with
    
        1,536,217,008 B / time(buffer size, hash)
    
    which includes the time to copy from userland.
    The Adjusted MB/s is computed with
    
        1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
    
    The memory reported is the amount of memory the compressor requests.
    
    | Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
    |----------|----------|----------|-------|---------|----------|----------|
    | none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
    | zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
    | zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
    | zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
    | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
    | zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
    | zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
    | zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
    | zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
    | zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
    | zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |
    
    I benchmarked zstd decompression using the same method on the same machine.
    The benchmark file is located in the upstream zstd repo under
    `contrib/linux-kernel/zstd_decompress_test.c` [4]. The memory reported is
    the amount of memory required to decompress data compressed with the given
    compression level. If you know the maximum size of your input, you can
    reduce the memory usage of decompression irrespective of the compression
    level.
    
    | Method   | Time (s) | MB/s    | Adjusted MB/s | Memory (MB) |
    |----------|----------|---------|---------------|-------------|
    | none     |    0.025 | 8479.54 |             - |           - |
    | zstd -1  |    0.358 |  592.15 |        636.60 |        0.84 |
    | zstd -3  |    0.396 |  535.32 |        571.40 |        1.46 |
    | zstd -5  |    0.396 |  535.32 |        571.40 |        1.46 |
    | zstd -10 |    0.374 |  566.81 |        607.42 |        2.51 |
    | zstd -15 |    0.379 |  559.34 |        598.84 |        4.61 |
    | zstd -19 |    0.412 |  514.54 |        547.77 |        8.80 |
    | zlib -1  |    0.940 |  225.52 |        231.68 |        0.04 |
    | zlib -3  |    0.883 |  240.08 |        247.07 |        0.04 |
    | zlib -6  |    0.844 |  251.17 |        258.84 |        0.04 |
    | zlib -9  |    0.837 |  253.27 |        287.64 |        0.04 |
    
    Tested in userland using the test-suite in the zstd repo under
    `contrib/linux-kernel/test/UserlandTest.cpp` [5] by mocking the kernel
    functions. Fuzz tested using libfuzzer [6] with the fuzz harnesses under
    `contrib/linux-kernel/test/{RoundTripCrash.c,DecompressCrash.c}` [7] [8]
    with ASAN, UBSAN, and MSAN. Additionaly, it was tested while testing the
    BtrFS and SquashFS patches coming next.
    
    [1] https://clang.llvm.org/docs/ClangFormat.html
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_compress_test.c
    [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
    [4] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_decompress_test.c
    [5] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/UserlandTest.cpp
    [6] http://llvm.org/docs/LibFuzzer.html
    [7] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/RoundTripCrash.c
    [8] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/DecompressCrash.c
    
    zstd source repository: https://github.com/facebook/zstd
    
    Signed-off-by: Nick Terrell <terrelln@fb.com>
    terrelln authored and fengguang committed Jul 23, 2017
  4. lib: Add xxhash module

    Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an
    extremely fast non-cryptographic hash algorithm for checksumming.
    The zstd compression and decompression modules added in the next patch
    require xxhash. I extracted it out from zstd since it is useful on its
    own. I copied the code from the upstream XXHash source repository and
    translated it into kernel style. I ran benchmarks and tests in the kernel
    and tests in userland.
    
    I benchmarked xxhash as a special character device. I ran in four modes,
    no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to
    kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute
    hashes on the copied data. I also ran it with four different buffer sizes.
    The benchmark file is located in the upstream zstd source repository under
    `contrib/linux-kernel/xxhash_test.c` [1].
    
    I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
    The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
    16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs`
    from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large.
    Run the following commands for the benchmark:
    
        modprobe xxhash_test
        mknod xxhash_test c 245 0
        time cp filesystem.squashfs xxhash_test
    
    The time is reported by the time of the userland `cp`.
    The GB/s is computed with
    
        1,536,217,008 B / time(buffer size, hash)
    
    which includes the time to copy from userland.
    The Normalized GB/s is computed with
    
        1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).
    
    | Buffer Size (B) | Hash  | Time (s) | GB/s | Adjusted GB/s |
    |-----------------|-------|----------|------|---------------|
    |            1024 | none  |    0.408 | 3.77 |             - |
    |            1024 | xxh32 |    0.649 | 2.37 |          6.37 |
    |            1024 | xxh64 |    0.542 | 2.83 |         11.46 |
    |            1024 | crc32 |    1.290 | 1.19 |          1.74 |
    |            4096 | none  |    0.380 | 4.04 |             - |
    |            4096 | xxh32 |    0.645 | 2.38 |          5.79 |
    |            4096 | xxh64 |    0.500 | 3.07 |         12.80 |
    |            4096 | crc32 |    1.168 | 1.32 |          1.95 |
    |            8192 | none  |    0.351 | 4.38 |             - |
    |            8192 | xxh32 |    0.614 | 2.50 |          5.84 |
    |            8192 | xxh64 |    0.464 | 3.31 |         13.60 |
    |            8192 | crc32 |    1.163 | 1.32 |          1.89 |
    |           16384 | none  |    0.346 | 4.43 |             - |
    |           16384 | xxh32 |    0.590 | 2.60 |          6.30 |
    |           16384 | xxh64 |    0.466 | 3.30 |         12.80 |
    |           16384 | crc32 |    1.183 | 1.30 |          1.84 |
    
    Tested in userland using the test-suite in the zstd repo under
    `contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the
    kernel functions. A line in each branch of every function in `xxhash.c`
    was commented out to ensure that the test-suite fails. Additionally
    tested while testing zstd and with SMHasher [3].
    
    [1] https://phabricator.intern.facebook.com/P57526246
    [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp
    [3] https://github.com/aappleby/smhasher
    
    zstd source repository: https://github.com/facebook/zstd
    XXHash source repository: https://github.com/cyan4973/xxhash
    
    Signed-off-by: Nick Terrell <terrelln@fb.com>
    terrelln authored and fengguang committed Jul 23, 2017

Commits on Jul 20, 2017

  1. x86: mark kprobe templates as character arrays, not single characters

    They really are, and the "take the address of a single character" makes
    the string fortification code unhappy (it believes that you can now only
    acccess one byte, rather than a byte range, and then raises errors for
    the memory copies going on in there).
    
    We could now remove a few 'addressof' operators (since arrays naturally
    degrade to pointers), but this is the minimal patch that just changes
    the C prototypes of those template arrays (the templates themselves are
    defined in inline asm).
    
    Reported-by: kernel test robot <xiaolong.ye@intel.com>
    Acked-and-tested-by: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Daniel Micay <danielmicay@gmail.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    torvalds committed Jul 20, 2017
  2. Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/jack/linux-fs
    
    Pull misc filesystem fixes from Jan Kara:
     "Several ACL related fixes for ext2, reiserfs, and hfsplus.
    
      And also one minor isofs cleanup"
    
    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
      hfsplus: Don't clear SGID when inheriting ACLs
      isofs: Fix off-by-one in 'session' mount option parsing
      reiserfs: preserve i_mode if __reiserfs_set_acl() fails
      ext2: preserve i_mode if ext2_set_acl() fails
      ext2: Don't clear SGID when inheriting ACLs
      reiserfs: Don't clear SGID when inheriting ACLs
    torvalds committed Jul 20, 2017
  3. Merge tag 'for-f2fs-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/jaegeuk/f2fs
    
    Pull f2fs fixes from Jaegeuk Kim:
     "We've filed some bug fixes:
    
       - missing f2fs case in terms of stale SGID bit, introduced by Jan
    
       - build error for seq_file.h
    
       - avoid cpu lockup
    
       - wrong inode_unlock in error case"
    
    * tag 'for-f2fs-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
      f2fs: avoid cpu lockup
      f2fs: include seq_file.h for sysfs.c
      f2fs: Don't clear SGID when inheriting ACLs
      f2fs: remove extra inode_unlock() in error path
    torvalds committed Jul 20, 2017
  4. Merge branch 'stable-4.13' of git://git.infradead.org/users/pcmoore/a…

    …udit
    
    Pull audit fix from Paul Moore:
     "A small audit fix, just a single line, to plug a memory leak in some
      audit error handling code"
    
    * 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit:
      audit: fix memleak in auditd_send_unicast_skb.
    torvalds committed Jul 20, 2017
  5. Merge tag 'libnvdimm-fixes-4.13-rc2' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/nvdimm/nvdimm
    
    Pull libnvdimm fixes from Dan Williams:
     "A handful of small fixes for 4.13-rc2. Three of these fixes are tagged
      for -stable. They have all appeared in at least one -next release with
      no reported issues
    
       - Fix handling of media errors that span a sector
    
       - Fix support of multiple namespaces in a libnvdimm region being in
         device-dax mode
    
       - Clean up the machine check notifier properly when the nfit driver
         fails to register
    
       - Address a static analysis (smatch) report in device-dax"
    
    * tag 'libnvdimm-fixes-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
      device-dax: fix sysfs duplicate warnings
      MAINTAINERS: list drivers/acpi/nfit/ files for libnvdimm sub-system
      acpi/nfit: Fix memory corruption/Unregister mce decoder on failure
      device-dax: fix 'passing zero to ERR_PTR()' warning
      libnvdimm: fix badblock range handling of ARS range
    torvalds committed Jul 20, 2017
  6. Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/jikos/hid
    
    Pull HID fixes from Jiri Kosina:
    
     - HID multitouch 4.12 regression fix from Dmitry Torokhov
    
     - error handling fix for HID++ driver from Gustavo A. R. Silva
    
    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
      HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value
      HID: multitouch: do not blindly set EV_KEY or EV_ABS bits
    torvalds committed Jul 20, 2017
  7. HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value

    Check return value from call to devm_kmemdup() in order to prevent a NULL
    pointer dereference.
    
    Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
    Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
    Signed-off-by: Jiri Kosina <jkosina@suse.cz>
    GustavoARSilva authored and Jiri Kosina committed Jul 20, 2017

Commits on Jul 19, 2017

  1. llist: clang: introduce member_address_is_nonnull()

    Currently llist_for_each_entry() and llist_for_each_entry_safe() iterate
    until &pos->member != NULL.  But when building the kernel with Clang,
    the compiler assumes &pos->member cannot be NULL if the member's offset
    is greater than 0 (which would be equivalent to the object being
    non-contiguous in memory).  Therefore the loop condition is always true,
    and the loops become infinite.
    
    To work around this, introduce the member_address_is_nonnull() macro,
    which casts object pointer to uintptr_t, thus letting the member pointer
    to be NULL.
    
    Signed-off-by: Alexander Potapenko <glider@google.com>
    Tested-by: Sodagudi Prasad <psodagud@codeaurora.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    ramosian-glider authored and torvalds committed Jul 19, 2017
  2. Merge tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/lin…

    …ux/kernel/git/kees/linux
    
    Pull structure randomization updates from Kees Cook:
     "Now that IPC and other changes have landed, enable manual markings for
      randstruct plugin, including the task_struct.
    
      This is the rest of what was staged in -next for the gcc-plugins, and
      comes in three patches, largest first:
    
       - mark "easy" structs with __randomize_layout
    
       - mark task_struct with an optional anonymous struct to isolate the
         __randomize_layout section
    
       - mark structs to opt _out_ of automated marking (which will come
         later)
    
      And, FWIW, this continues to pass allmodconfig (normal and patched to
      enable gcc-plugins) builds of x86_64, i386, arm64, arm, powerpc, and
      s390 for me"
    
    * tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
      randstruct: opt-out externally exposed function pointer structs
      task_struct: Allow randomized layout
      randstruct: Mark various structs for randomization
    torvalds committed Jul 19, 2017
  3. Merge tag 'ceph-for-4.13-rc2' of git://github.com/ceph/ceph-client

    Pull ceph fixes from Ilya Dryomov:
     "A number of small fixes for -rc1 Luminous changes plus a readdir race
      fix, marked for stable"
    
    * tag 'ceph-for-4.13-rc2' of git://github.com/ceph/ceph-client:
      libceph: potential NULL dereference in ceph_msg_data_create()
      ceph: fix race in concurrent readdir
      libceph: don't call encode_request_finish() on MOSDBackoff messages
      libceph: use alloc_pg_mapping() in __decode_pg_upmap_items()
      libceph: set -EINVAL in one place in crush_decode()
      libceph: NULL deref on osdmap_apply_incremental() error path
      libceph: fix old style declaration warnings
    torvalds committed Jul 19, 2017
  4. audit: fix memleak in auditd_send_unicast_skb.

    Found this issue by kmemleak report, auditd_send_unicast_skb
    did not free skb if rcu_dereference(auditd_conn) returns null.
    
    unreferenced object 0xffff88082568ce00 (size 256):
    comm "auditd", pid 1119, jiffies 4294708499
    backtrace:
    [<ffffffff8176166a>] kmemleak_alloc+0x4a/0xa0
    [<ffffffff8121820c>] kmem_cache_alloc_node+0xcc/0x210
    [<ffffffff8161b99d>] __alloc_skb+0x5d/0x290
    [<ffffffff8113c614>] audit_make_reply+0x54/0xd0
    [<ffffffff8113dfa7>] audit_receive_msg+0x967/0xd70
    ----------------
    (gdb) list *audit_receive_msg+0x967
    0xffffffff8113dff7 is in audit_receive_msg (kernel/audit.c:1133).
    1132    skb = audit_make_reply(0, AUDIT_REPLACE, 0,
                                    0, &pvnr, sizeof(pvnr));
    ---------------
    [<ffffffff8113e402>] audit_receive+0x52/0xa0
    [<ffffffff8166c561>] netlink_unicast+0x181/0x240
    [<ffffffff8166c8e2>] netlink_sendmsg+0x2c2/0x3b0
    [<ffffffff816112e8>] sock_sendmsg+0x38/0x50
    [<ffffffff816117a2>] SYSC_sendto+0x102/0x190
    [<ffffffff81612f4e>] SyS_sendto+0xe/0x10
    [<ffffffff8176d337>] entry_SYSCALL_64_fastpath+0x1a/0xa5
    [<ffffffffffffffff>] 0xffffffffffffffff
    
    Signed-off-by: Shu Wang <shuwang@redhat.com>
    Signed-off-by: Paul Moore <paul@paul-moore.com>
    Shu Wang authored and pcmoore committed Jul 19, 2017
  5. device-dax: fix sysfs duplicate warnings

    Fix warnings of the form...
    
         WARNING: CPU: 10 PID: 4983 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
         sysfs: cannot create duplicate filename '/class/dax/dax12.0'
         Call Trace:
          dump_stack+0x63/0x86
          __warn+0xcb/0xf0
          warn_slowpath_fmt+0x5a/0x80
          ? kernfs_path_from_node+0x4f/0x60
          sysfs_warn_dup+0x62/0x80
          sysfs_do_create_link_sd.isra.2+0x97/0xb0
          sysfs_create_link+0x25/0x40
          device_add+0x266/0x630
          devm_create_dax_dev+0x2cf/0x340 [dax]
          dax_pmem_probe+0x1f5/0x26e [dax_pmem]
          nvdimm_bus_probe+0x71/0x120
    
    ...by reusing the namespace id for the device-dax instance name.
    
    Now that we have decided that there will never by more than one
    device-dax instance per libnvdimm-namespace parent device [1], we can
    directly reuse the namepace ids. There are some possible follow-on
    cleanups, but those are saved for a later patch to simplify the -stable
    backport.
    
    [1]: https://lists.01.org/pipermail/linux-nvdimm/2016-December/008266.html
    
    Fixes: 98a29c3 ("libnvdimm, namespace: allow creation of multiple pmem...")
    Cc: Jeff Moyer <jmoyer@redhat.com>
    Cc: <stable@vger.kernel.org>
    Reported-by: Dariusz Dokupil <dariusz.dokupil@intel.com>
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    djbw committed Jul 19, 2017

Commits on Jul 18, 2017

  1. Merge tag 'md/4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/shli/md
    
    Pull MD fixes from Shaohua Li:
    
     - raid5-ppl fix by Artur. This one is introduced in this release cycle.
    
     - raid5 reshape fix by Xiao. This is an old bug and will be added to
       stable.
    
     - bitmap fix by Guoqing.
    
    * tag 'md/4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
      raid5-ppl: use BIOSET_NEED_BVECS when creating bioset
      Raid5 should update rdev->sectors after reshape
      md/bitmap: don't read page from device with Bitmap_sync
    torvalds committed Jul 18, 2017
  2. Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/gi…

    …t/dledford/rdma
    
    Pull rdma fixes from Doug Ledford:
     "First set of -rc fixes for 4.13 cycle:
    
       - misc iSER fixes
    
       - namespace fixups
    
       - fix the fact that IPoIB didn't use the proper API for noio mem allocs
    
       - rxe driver fixes
    
       - hns_roce fixes
    
       - misc core fixes
    
       - misc IPoIB fixes"
    
    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (27 commits)
      IB/core: Allow QP state transition from reset to error
      IB/hns: Fix for checkpatch.pl comment style warnings
      IB/hns: Fix the bug with modifying the MAC address without removing the driver
      IB/hns: Fix the bug with rdma operation
      IB/hns: Fix the bug with wild pointer when destroy rc qp
      IB/hns: Fix the bug of polling cq failed for loopback Qps
      IB/rxe: Set dma_mask and coherent_dma_mask
      IB/rxe: Fix kernel panic from skb destructor
      IB/ipoib: Let lower driver handle get_stats64 call
      IB/core: Add ordered workqueue for RoCE GID management
      IB/mlx5: Clean mr_cache debugfs in case of failure
      IB/core: Remove NOIO QP create flag
      {net, IB}/mlx4: Remove gfp flags argument
      IB/{rdmavt, qib, hfi1}: Remove gfp flags argument
      IB/IPoIB: Convert IPoIB to memalloc_noio_* calls
      IB/IPoIB: Forward MTU change to driver below
      IB: Convert msleep below 20ms to usleep_range
      IB/uverbs: Make use of ib_modify_qp variant to avoid resolving DMAC
      IB/core: Introduce modify QP operation with udata
      IB/core: Don't resolve IP address to the loopback device
      ...
    torvalds committed Jul 18, 2017
  3. Merge tag 'nfsd-4.13-1' of git://linux-nfs.org/~bfields/linux

    Pull nfsd fix from Bruce Fields:
     "One fix for a problem introduced in the most recent merge window and
      found by Dave Jones and KASAN"
    
    * tag 'nfsd-4.13-1' of git://linux-nfs.org/~bfields/linux:
      nfsd: Fix a memory scribble in the callback channel
    torvalds committed Jul 18, 2017
  4. hfsplus: Don't clear SGID when inheriting ACLs

    When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
    set, DIR1 is expected to have SGID bit set (and owning group equal to
    the owning group of 'DIR0'). However when 'DIR0' also has some default
    ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
    'DIR1' to get cleared if user is not member of the owning group.
    
    Fix the problem by creating __hfsplus_set_posix_acl() function that does
    not call posix_acl_update_mode() and use it when inheriting ACLs. That
    prevents SGID bit clearing and the mode has been properly set by
    posix_acl_create() anyway.
    
    Fixes: 0739310
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Kara <jack@suse.cz>
    jankara committed Jul 18, 2017
  5. isofs: Fix off-by-one in 'session' mount option parsing

    According to ECMA-130 standard maximum valid track number is 99. Since
    'session' mount option starts indexing at 0 (and we add 1 to the passed
    number), we should refuse value 99. Also the condition in
    isofs_get_last_session() unnecessarily repeats the check - remove it.
    
    Reported-by: David Howells <dhowells@redhat.com>
    Signed-off-by: Jan Kara <jack@suse.cz>
    jankara committed Jul 18, 2017
  6. reiserfs: preserve i_mode if __reiserfs_set_acl() fails

    When changing a file's acl mask, reiserfs_set_acl() will first set the
    group bits of i_mode to the value of the mask, and only then set the
    actual extended attribute representing the new acl.
    
    If the second part fails (due to lack of space, for example) and the
    file had no acl attribute to begin with, the system will from now on
    assume that the mask permission bits are actual group permission bits,
    potentially granting access to the wrong users.
    
    Prevent this by only changing the inode mode after the acl has been set.
    
    Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
    Signed-off-by: Jan Kara <jack@suse.cz>
    eafer authored and jankara committed Jul 18, 2017
  7. ext2: preserve i_mode if ext2_set_acl() fails

    When changing a file's acl mask, ext2_set_acl() will first set the group
    bits of i_mode to the value of the mask, and only then set the actual
    extended attribute representing the new acl.
    
    If the second part fails (due to lack of space, for example) and the file
    had no acl attribute to begin with, the system will from now on assume
    that the mask permission bits are actual group permission bits, potentially
    granting access to the wrong users.
    
    Prevent this by only changing the inode mode after the acl has been set.
    
    [JK: Rebased on top of "ext2: Don't clear SGID when inheriting ACLs"]
    Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
    Signed-off-by: Jan Kara <jack@suse.cz>
    eafer authored and jankara committed Jul 18, 2017
  8. f2fs: avoid cpu lockup

    Before retrying to flush data or dentry pages, we need to release cpu in order
    to prevent watchdog.
    
    Reviewed-by: Chao Yu <yuchao0@huawei.com>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Jaegeuk Kim
    Jaegeuk Kim committed Jul 18, 2017
  9. f2fs: include seq_file.h for sysfs.c

    This patch includes seq_file.h to avoid compile error.
    
    Signed-off-by: Eric Biggers <ebiggers@google.com>
    Reviewed-by: Chao Yu <yuchao0@huawei.com>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Jaegeuk Kim
    Jaegeuk Kim committed Jul 18, 2017
  10. IB/core: Allow QP state transition from reset to error

    Playing with IP-O-IB interface can trigger a warning message:
    "ib0: Failed to modify QP to ERROR state" to be logged.
    This happens when the QP is in IB_QPS_RESET state and the stack
    is trying to transition it to IB_QPS_ERR state in ipoib_ib_dev_stop().
    
    According to the IB spec, Table 91 - "QP State Transition Properties"
    it looks like the transition from reset to error is valid:
    
    Transition: Any State to Error
    Required Attributes: None
    Optional Attributes: None allowed
    Actions: Queue processing is stopped. Work Requests pending or in
    process are completed in error, when possible.
    
    This patch allows the transition and quiets the message.
    
    Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
    Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
    Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
    Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>
    tstruk authored and dledford committed Jul 18, 2017
  11. IB/hns: Fix for checkpatch.pl comment style warnings

    This patch correct the comment style warnings caught by
    checkpatch.pl script.
    
    Signed-off-by: Lijun Ou <oulijun@huawei.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>
    oulijun authored and dledford committed Jul 18, 2017
  12. IB/hns: Fix the bug with modifying the MAC address without removing t…

    …he driver
    
    When modified the MAC address used hns_roce_mac function, we release and create
    reserved qp again, It is not necessary to use spin_lock_bh and spin_unlock_bh in
    handle_en_event, Otherwise, it will occur a error. This patch mainly fixes it.
    
    Signed-off-by: Lijun Ou <oulijun@huawei.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>
    oulijun authored and dledford committed Jul 18, 2017
  13. IB/hns: Fix the bug with rdma operation

    When opcode of work request is RDMA read and write, it
    should use rdma_wr to get remote_addr and rkey. This
    patch fixes it.
    
    Signed-off-by: Lijun Ou <oulijun@huawei.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>
    oulijun authored and dledford committed Jul 18, 2017
  14. IB/hns: Fix the bug with wild pointer when destroy rc qp

    When destroyed rc qp, the hr_qp will be used after freed. This patch
    will fix it.
    
    Signed-off-by: Lijun Ou <oulijun@huawei.com>
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>
    oulijun authored and dledford committed Jul 18, 2017
  15. IB/hns: Fix the bug of polling cq failed for loopback Qps

    In hip06 SoC, RoCE driver creates 8 reserved loopback QPs to
    ensure zero wqe when free mr. However, if the enabled phy
    port number is less than 6, it will fail in polling cqe with
    8 reserved loopback QPs.
    
    In order to solve this problem, the number of loopback Qps
    will be adjusted based on the number of enabled phy port.
    
    Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
    Signed-off-by: Lijun Ou <oulijun@huawei.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>
    oulijun authored and dledford committed Jul 18, 2017
  16. IB/rxe: Set dma_mask and coherent_dma_mask

    The RXE coupled with dummy device causes to the kernel panic attached
    below.  The panic happens when ib_register_device tries to set dma_mask
    by accessing a NULLed parent device.
    
    The RXE does not actually use DMA, so we can set the dma_mask
    to architecture value.
    
    [16240.199689] RIP: 0010:ib_register_device+0x468/0x5a0 [ib_core]
    [16240.205289] RSP: 0018:ffffc9000220fc10 EFLAGS: 00010246
    [16240.209909] RAX: 0000000000000024 RBX: ffff880220d1a2a8 RCX: 0000000000000000
    [16240.212244] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
    [16240.214385] RBP: ffffc9000220fcb0 R08: 0000000000000000 R09: 000000000000023f
    [16240.254465] R10: 0000000000000007 R11: 0000000000000000 R12: 0000000000000000
    [16240.259467] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880220d1a2a8
    [16240.263314] FS:  00007fd8ecca0740(0000) GS:ffff8802364c0000(0000) knlGS:0000000000000000
    [16240.267292] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [16240.273503] CR2: 0000000000000218 CR3: 00000002253ba000 CR4: 00000000000006e0
    [16240.277066] Call Trace:
    [16240.281836]  ? __kmalloc+0x26f/0x280
    [16240.286596]  rxe_register_device+0x297/0x300 [rdma_rxe]
    [16240.291377]  rxe_add+0x535/0x5b0 [rdma_rxe]
    [16240.297586]  rxe_net_add+0x3e/0xc0 [rdma_rxe]
    [16240.302375]  rxe_param_set_add+0x65/0x144 [rdma_rxe]
    [16240.307769]  param_attr_store+0x68/0xd0
    [16240.311640]  module_attr_store+0x1d/0x30
    [16240.316421]  sysfs_kf_write+0x3a/0x50
    [16240.317802]  kernfs_fop_write+0xff/0x180
    [16240.322989]  __vfs_write+0x37/0x140
    [16240.328164]  ? handle_mm_fault+0xce/0x240
    [16240.333340]  vfs_write+0xb2/0x1b0
    [16240.335013]  SyS_write+0x55/0xc0
    [16240.340632]  entry_SYSCALL_64_fastpath+0x1a/0xa9
    
    Fixes: 8700e3e ("Soft RoCE driver")
    Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
    Reviewed-by: Moni Shoua <monis@mellanox.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
    Signed-off-by: Doug Ledford <dledford@redhat.com>
    yonatanco authored and dledford committed Jul 18, 2017
  17. IB/rxe: Fix kernel panic from skb destructor

    In the time between rxe_send has finished and skb destructor
    called, the QP's ref count might be 0, leading to a possible
    QP destruction. This will lead to a kernel panic when the destructor
    dereferences the QP.
    
    The operation of incrementing QP ref count at rxe_send and decrementing
    from skb destructor will prevent this crash.
    
    BUG: unable to handle kernel NULL pointer dereference at 000000000000072c
    IP: [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
    PGD 0 [16240.211178]
    Oops: 0002 [#1] SMP
    CPU: 3 PID: 0 Comm: swapper/3 Tainted: G           OE   4.9.0-mlnx #1
    Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
    task: ffff88042d6b1480 task.stack: ffffc90001904000
    RIP: 0010:[<ffffffffa05df765>]  [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
    RSP: 0018:ffff88043fcc3df0  EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff880429684700 RCX: ffff88042d248200
    RDX: 00000000ffffffff RSI: 00000000fffffe01 RDI: ffff880429684700
    RBP: ffff88043fcc3e00 R08: ffff88043fcda240 R09: 00000000ff2d1de6
    R10: 0000000000000000 R11: 00000000f49cf6fe R12: ffff880429684700
    R13: ffffffff81893f96 R14: ffffffff817d66f0 R15: ffff880427f74200
    FS:  0000000000000000(0000) GS:ffff88043fcc0000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000000072c CR3: 000000041d3df000 CR4: 00000000000006e0
    Stack:
     ffffffff817b29cf ffff880429684700 ffff88043fcc3e18 ffffffff817b42c2
     ffff880429684700 ffff88043fcc3e40 ffffffff817b4332 ffff880429684700
     ffff880427f74238 ffff880427f74228 ffff88043fcc3e58 ffffffff81893f96
    Call Trace:
     <IRQ> [16240.336345]  [<ffffffff817b29cf>] ? skb_release_head_state+0x4f/0xb0
     [<ffffffff817b42c2>] skb_release_all+0x12/0x30
     [<ffffffff817b4332>] kfree_skb+0x32/0x90
     [<ffffffff81893f96>] ndisc_error_report+0x36/0x40
     [<ffffffff817d4de1>] neigh_invalidate+0x81/0xf0
     [<ffffffff817d68f7>] neigh_timer_handler+0x207/0x2b0
     [<ffffffff81109295>] call_timer_fn+0x35/0x120
     [<ffffffff81109db7>] run_timer_softirq+0x1d7/0x460
     [<ffffffff8106155e>] ? kvm_sched_clock_read+0x1e/0x30
     [<ffffffff810366b9>] ? sched_clock+0x9/0x10
     [<ffffffff810cfed2>] ? sched_clock_cpu+0x72/0xa0
     [<ffffffff818dd537>] __do_softirq+0xd7/0x289
     [<ffffffff810a6c95>] irq_exit+0xb5/0xc0
     [<ffffffff818dd372>] smp_apic_timer_interrupt+0x42/0x50
     [<ffffffff818dc682>] apic_timer_interrupt+0x82/0x90
     <EOI> [16240.395776]  [<ffffffff818da156>] ? native_safe_halt+0x6/0x10
     [<ffffffff818d9e6e>] default_idle+0x1e/0xd0
     [<ffffffff8103797f>] arch_cpu_idle+0xf/0x20
     [<ffffffff818da2c5>] default_idle_call+0x35/0x40
     [<ffffffff810e3eb5>] cpu_startup_entry+0x185/0x210
     [<ffffffff81050433>] start_secondary+0x103/0x130
    RIP  [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe]
    
    Fixes: 8700e3e ("Soft RoCE driver")
    Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
    Reviewed-by: Moni Shoua <monis@mellanox.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
    Signed-off-by: Doug Ledford <dledford@redhat.com>
    yonatanco authored and dledford committed Jul 18, 2017
  18. IB/ipoib: Let lower driver handle get_stats64 call

    The driver checks if the lower level driver supports get_stats, and if
    so calls it to get the updated statistics, otherwise takes from the
    current netdevice stats object.
    
    Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
    Reviewed-by: Alex Vesker <valex@mellanox.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>
    Erez Shitrit authored and dledford committed Jul 18, 2017
  19. IB/core: Add ordered workqueue for RoCE GID management

    Currently the RoCE GID management uses the ib_wq to do add and delete new GIDs
    according to the netdev events.
    
    The ib_wq isn't an ordered workqueue and thus two work elements can be executed
    concurrently which will result in unexpected behavior and inconsistency of the
    GIDs cache content.
    
    Example:
    ifconfig eth1 11.11.11.11/16 up
    
    This command will invoke the following netdev events in the following order:
    1. NETDEV_UP
    2. NETDEV_DOWN
    3. NETDEV_UP
    
    If (2) and (3) will be executed concurrently or in reverse order, instead of
    having a new GID with 11.11.11.11 IP, we will end up without any new GIDs.
    
    Signed-off-by: Majd Dibbiny <majd@mellanox.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>
    majdmellanox authored and dledford committed Jul 18, 2017
Older
You can’t perform that action at this time.