Nick-Terrell/A…
Commits on Jul 23, 2017
-
Add zstd compression and decompression support to SquashFS. zstd is a great fit for SquashFS because it can compress at ratios approaching xz, while decompressing twice as fast as zlib. For SquashFS in particular, it can decompress as fast as lzo and lz4. It also has the flexibility to turn down the compression ratio for faster compression times. The compression benchmark is run on the file tree from the SquashFS archive found in ubuntu-16.10-desktop-amd64.iso [1]. It uses `mksquashfs` with the default block size (128 KB) and and various compression algorithms/levels. xz and zstd are also benchmarked with 256 KB blocks. The decompression benchmark times how long it takes to `tar` the file tree into `/dev/null`. See the benchmark file in the upstream zstd source repository located under `contrib/linux-kernel/squashfs-benchmark.sh` [2] for details. I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor, 16 GB of RAM, and a SSD. | Method | Ratio | Compression MB/s | Decompression MB/s | |----------------|-------|------------------|--------------------| | gzip | 2.92 | 15 | 128 | | lzo | 2.64 | 9.5 | 217 | | lz4 | 2.12 | 94 | 218 | | xz | 3.43 | 5.5 | 35 | | xz 256 KB | 3.53 | 5.4 | 40 | | zstd 1 | 2.71 | 96 | 210 | | zstd 5 | 2.93 | 69 | 198 | | zstd 10 | 3.01 | 41 | 225 | | zstd 15 | 3.13 | 11.4 | 224 | | zstd 16 256 KB | 3.24 | 8.1 | 210 | This patch was written by Sean Purcell <me@seanp.xyz>, but I will be taking over the submission process. [1] http://releases.ubuntu.com/16.10/ [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/squashfs-benchmark.sh zstd source repository: https://github.com/facebook/zstd Cc: Sean Purcell <me@seanp.xyz> Signed-off-by: Nick Terrell <terrelln@fb.com>
-
Add zstd compression and decompression support to BtrFS. zstd at its fastest level compresses almost as well as zlib, while offering much faster compression and decompression, approaching lzo speeds. I benchmarked btrfs with zstd compression against no compression, lzo compression, and zlib compression. I benchmarked two scenarios. Copying a set of files to btrfs, and then reading the files. Copying a tarball to btrfs, extracting it to btrfs, and then reading the extracted files. After every operation, I call `sync` and include the sync time. Between every pair of operations I unmount and remount the filesystem to avoid caching. The benchmark files can be found in the upstream zstd source repository under `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}` [1] [2]. I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor, 16 GB of RAM, and a SSD. The first compression benchmark is copying 10 copies of the unzipped Silesia corpus [3] into a BtrFS filesystem mounted with `-o compress-force=Method`. The decompression benchmark times how long it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is measured by comparing the output of `df` and `du`. See the benchmark file [1] for details. I benchmarked multiple zstd compression levels, although the patch uses zstd level 1. | Method | Ratio | Compression MB/s | Decompression speed | |---------|-------|------------------|---------------------| | None | 0.99 | 504 | 686 | | lzo | 1.66 | 398 | 442 | | zlib | 2.58 | 65 | 241 | | zstd 1 | 2.57 | 260 | 383 | | zstd 3 | 2.71 | 174 | 408 | | zstd 6 | 2.87 | 70 | 398 | | zstd 9 | 2.92 | 43 | 406 | | zstd 12 | 2.93 | 21 | 408 | | zstd 15 | 3.01 | 11 | 354 | The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it measures the compression ratio, extracts the tar, and deletes the tar. Then it measures the compression ratio again, and `tar`s the extracted files into `/dev/null`. See the benchmark file [2] for details. | Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) | |--------|-----------|---------------|----------|------------|----------| | None | 0.97 | 0.78 | 0.981 | 5.501 | 8.807 | | lzo | 2.06 | 1.38 | 1.631 | 8.458 | 8.585 | | zlib | 3.40 | 1.86 | 7.750 | 21.544 | 11.744 | | zstd 1 | 3.57 | 1.85 | 2.579 | 11.479 | 9.389 | [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz zstd source repository: https://github.com/facebook/zstd Signed-off-by: Nick Terrell <terrelln@fb.com> -
Add zstd compression and decompression kernel modules. zstd offers a wide varity of compression speed and quality trade-offs. It can compress at speeds approaching lz4, and quality approaching lzma. zstd decompressions at speeds more than twice as fast as zlib, and decompression speed remains roughly the same across all compression levels. The code was ported from the upstream zstd source repository. The `linux/zstd.h` header was modified to match linux kernel style. The cross-platform and allocation code was stripped out. Instead zstd requires the caller to pass a preallocated workspace. The source files were clang-formatted [1] to match the Linux Kernel style as much as possible. Otherwise, the code was unmodified. We would like to avoid as much further manual modification to the source code as possible, so it will be easier to keep the kernel zstd up to date. I benchmarked zstd compression as a special character device. I ran zstd and zlib compression at several levels, as well as performing no compression, which measure the time spent copying the data to kernel space. Data is passed to the compresser 4096 B at a time. The benchmark file is located in the upstream zstd source repository under `contrib/linux-kernel/zstd_compress_test.c` [2]. I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using `silesia.tar` [3], which is 211,988,480 B large. Run the following commands for the benchmark: sudo modprobe zstd_compress_test sudo mknod zstd_compress_test c 245 0 sudo cp silesia.tar zstd_compress_test The time is reported by the time of the userland `cp`. The MB/s is computed with 1,536,217,008 B / time(buffer size, hash) which includes the time to copy from userland. The Adjusted MB/s is computed with 1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)). The memory reported is the amount of memory the compressor requests. | Method | Size (B) | Time (s) | Ratio | MB/s | Adj MB/s | Mem (MB) | |----------|----------|----------|-------|---------|----------|----------| | none | 11988480 | 0.100 | 1 | 2119.88 | - | - | | zstd -1 | 73645762 | 1.044 | 2.878 | 203.05 | 224.56 | 1.23 | | zstd -3 | 66988878 | 1.761 | 3.165 | 120.38 | 127.63 | 2.47 | | zstd -5 | 65001259 | 2.563 | 3.261 | 82.71 | 86.07 | 2.86 | | zstd -10 | 60165346 | 13.242 | 3.523 | 16.01 | 16.13 | 13.22 | | zstd -15 | 58009756 | 47.601 | 3.654 | 4.45 | 4.46 | 21.61 | | zstd -19 | 54014593 | 102.835 | 3.925 | 2.06 | 2.06 | 60.15 | | zlib -1 | 77260026 | 2.895 | 2.744 | 73.23 | 75.85 | 0.27 | | zlib -3 | 72972206 | 4.116 | 2.905 | 51.50 | 52.79 | 0.27 | | zlib -6 | 68190360 | 9.633 | 3.109 | 22.01 | 22.24 | 0.27 | | zlib -9 | 67613382 | 22.554 | 3.135 | 9.40 | 9.44 | 0.27 | I benchmarked zstd decompression using the same method on the same machine. The benchmark file is located in the upstream zstd repo under `contrib/linux-kernel/zstd_decompress_test.c` [4]. The memory reported is the amount of memory required to decompress data compressed with the given compression level. If you know the maximum size of your input, you can reduce the memory usage of decompression irrespective of the compression level. | Method | Time (s) | MB/s | Adjusted MB/s | Memory (MB) | |----------|----------|---------|---------------|-------------| | none | 0.025 | 8479.54 | - | - | | zstd -1 | 0.358 | 592.15 | 636.60 | 0.84 | | zstd -3 | 0.396 | 535.32 | 571.40 | 1.46 | | zstd -5 | 0.396 | 535.32 | 571.40 | 1.46 | | zstd -10 | 0.374 | 566.81 | 607.42 | 2.51 | | zstd -15 | 0.379 | 559.34 | 598.84 | 4.61 | | zstd -19 | 0.412 | 514.54 | 547.77 | 8.80 | | zlib -1 | 0.940 | 225.52 | 231.68 | 0.04 | | zlib -3 | 0.883 | 240.08 | 247.07 | 0.04 | | zlib -6 | 0.844 | 251.17 | 258.84 | 0.04 | | zlib -9 | 0.837 | 253.27 | 287.64 | 0.04 | Tested in userland using the test-suite in the zstd repo under `contrib/linux-kernel/test/UserlandTest.cpp` [5] by mocking the kernel functions. Fuzz tested using libfuzzer [6] with the fuzz harnesses under `contrib/linux-kernel/test/{RoundTripCrash.c,DecompressCrash.c}` [7] [8] with ASAN, UBSAN, and MSAN. Additionaly, it was tested while testing the BtrFS and SquashFS patches coming next. [1] https://clang.llvm.org/docs/ClangFormat.html [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_compress_test.c [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia [4] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/zstd_decompress_test.c [5] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/UserlandTest.cpp [6] http://llvm.org/docs/LibFuzzer.html [7] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/RoundTripCrash.c [8] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/DecompressCrash.c zstd source repository: https://github.com/facebook/zstd Signed-off-by: Nick Terrell <terrelln@fb.com> -
Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an extremely fast non-cryptographic hash algorithm for checksumming. The zstd compression and decompression modules added in the next patch require xxhash. I extracted it out from zstd since it is useful on its own. I copied the code from the upstream XXHash source repository and translated it into kernel style. I ran benchmarks and tests in the kernel and tests in userland. I benchmarked xxhash as a special character device. I ran in four modes, no-op, xxh32, xxh64, and crc32. The no-op mode simply copies the data to kernel space and ignores it. The xxh32, xxh64, and crc32 modes compute hashes on the copied data. I also ran it with four different buffer sizes. The benchmark file is located in the upstream zstd source repository under `contrib/linux-kernel/xxhash_test.c` [1]. I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using the file `filesystem.squashfs` from `ubuntu-16.10-desktop-amd64.iso`, which is 1,536,217,088 B large. Run the following commands for the benchmark: modprobe xxhash_test mknod xxhash_test c 245 0 time cp filesystem.squashfs xxhash_test The time is reported by the time of the userland `cp`. The GB/s is computed with 1,536,217,008 B / time(buffer size, hash) which includes the time to copy from userland. The Normalized GB/s is computed with 1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)). | Buffer Size (B) | Hash | Time (s) | GB/s | Adjusted GB/s | |-----------------|-------|----------|------|---------------| | 1024 | none | 0.408 | 3.77 | - | | 1024 | xxh32 | 0.649 | 2.37 | 6.37 | | 1024 | xxh64 | 0.542 | 2.83 | 11.46 | | 1024 | crc32 | 1.290 | 1.19 | 1.74 | | 4096 | none | 0.380 | 4.04 | - | | 4096 | xxh32 | 0.645 | 2.38 | 5.79 | | 4096 | xxh64 | 0.500 | 3.07 | 12.80 | | 4096 | crc32 | 1.168 | 1.32 | 1.95 | | 8192 | none | 0.351 | 4.38 | - | | 8192 | xxh32 | 0.614 | 2.50 | 5.84 | | 8192 | xxh64 | 0.464 | 3.31 | 13.60 | | 8192 | crc32 | 1.163 | 1.32 | 1.89 | | 16384 | none | 0.346 | 4.43 | - | | 16384 | xxh32 | 0.590 | 2.60 | 6.30 | | 16384 | xxh64 | 0.466 | 3.30 | 12.80 | | 16384 | crc32 | 1.183 | 1.30 | 1.84 | Tested in userland using the test-suite in the zstd repo under `contrib/linux-kernel/test/XXHashUserlandTest.cpp` [2] by mocking the kernel functions. A line in each branch of every function in `xxhash.c` was commented out to ensure that the test-suite fails. Additionally tested while testing zstd and with SMHasher [3]. [1] https://phabricator.intern.facebook.com/P57526246 [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/test/XXHashUserlandTest.cpp [3] https://github.com/aappleby/smhasher zstd source repository: https://github.com/facebook/zstd XXHash source repository: https://github.com/cyan4973/xxhash Signed-off-by: Nick Terrell <terrelln@fb.com>
Commits on Jul 20, 2017
-
x86: mark kprobe templates as character arrays, not single characters
They really are, and the "take the address of a single character" makes the string fortification code unhappy (it believes that you can now only acccess one byte, rather than a byte range, and then raises errors for the memory copies going on in there). We could now remove a few 'addressof' operators (since arrays naturally degrade to pointers), but this is the minimal patch that just changes the C prototypes of those template arrays (the templates themselves are defined in inline asm). Reported-by: kernel test robot <xiaolong.ye@intel.com> Acked-and-tested-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
torvalds committedJul 20, 2017 -
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel…
…/git/jack/linux-fs Pull misc filesystem fixes from Jan Kara: "Several ACL related fixes for ext2, reiserfs, and hfsplus. And also one minor isofs cleanup" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: hfsplus: Don't clear SGID when inheriting ACLs isofs: Fix off-by-one in 'session' mount option parsing reiserfs: preserve i_mode if __reiserfs_set_acl() fails ext2: preserve i_mode if ext2_set_acl() fails ext2: Don't clear SGID when inheriting ACLs reiserfs: Don't clear SGID when inheriting ACLs
torvalds committedJul 20, 2017 -
Merge tag 'for-f2fs-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/…
…kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "We've filed some bug fixes: - missing f2fs case in terms of stale SGID bit, introduced by Jan - build error for seq_file.h - avoid cpu lockup - wrong inode_unlock in error case" * tag 'for-f2fs-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: avoid cpu lockup f2fs: include seq_file.h for sysfs.c f2fs: Don't clear SGID when inheriting ACLs f2fs: remove extra inode_unlock() in error path
torvalds committedJul 20, 2017 -
Merge branch 'stable-4.13' of git://git.infradead.org/users/pcmoore/a…
…udit Pull audit fix from Paul Moore: "A small audit fix, just a single line, to plug a memory leak in some audit error handling code" * 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit: audit: fix memleak in auditd_send_unicast_skb.
torvalds committedJul 20, 2017 -
Merge tag 'libnvdimm-fixes-4.13-rc2' of git://git.kernel.org/pub/scm/…
…linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A handful of small fixes for 4.13-rc2. Three of these fixes are tagged for -stable. They have all appeared in at least one -next release with no reported issues - Fix handling of media errors that span a sector - Fix support of multiple namespaces in a libnvdimm region being in device-dax mode - Clean up the machine check notifier properly when the nfit driver fails to register - Address a static analysis (smatch) report in device-dax" * tag 'libnvdimm-fixes-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: device-dax: fix sysfs duplicate warnings MAINTAINERS: list drivers/acpi/nfit/ files for libnvdimm sub-system acpi/nfit: Fix memory corruption/Unregister mce decoder on failure device-dax: fix 'passing zero to ERR_PTR()' warning libnvdimm: fix badblock range handling of ARS rangetorvalds committedJul 20, 2017 -
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…
…/git/jikos/hid Pull HID fixes from Jiri Kosina: - HID multitouch 4.12 regression fix from Dmitry Torokhov - error handling fix for HID++ driver from Gustavo A. R. Silva * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value HID: multitouch: do not blindly set EV_KEY or EV_ABS bits
torvalds committedJul 20, 2017 -
HID: hid-logitech-hidpp: add NULL check on devm_kmemdup() return value
Check return value from call to devm_kmemdup() in order to prevent a NULL pointer dereference. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Commits on Jul 19, 2017
-
llist: clang: introduce member_address_is_nonnull()
Currently llist_for_each_entry() and llist_for_each_entry_safe() iterate until &pos->member != NULL. But when building the kernel with Clang, the compiler assumes &pos->member cannot be NULL if the member's offset is greater than 0 (which would be equivalent to the object being non-contiguous in memory). Therefore the loop condition is always true, and the loops become infinite. To work around this, introduce the member_address_is_nonnull() macro, which casts object pointer to uintptr_t, thus letting the member pointer to be NULL. Signed-off-by: Alexander Potapenko <glider@google.com> Tested-by: Sodagudi Prasad <psodagud@codeaurora.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Merge tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/lin…
…ux/kernel/git/kees/linux Pull structure randomization updates from Kees Cook: "Now that IPC and other changes have landed, enable manual markings for randstruct plugin, including the task_struct. This is the rest of what was staged in -next for the gcc-plugins, and comes in three patches, largest first: - mark "easy" structs with __randomize_layout - mark task_struct with an optional anonymous struct to isolate the __randomize_layout section - mark structs to opt _out_ of automated marking (which will come later) And, FWIW, this continues to pass allmodconfig (normal and patched to enable gcc-plugins) builds of x86_64, i386, arm64, arm, powerpc, and s390 for me" * tag 'gcc-plugins-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: randstruct: opt-out externally exposed function pointer structs task_struct: Allow randomized layout randstruct: Mark various structs for randomizationtorvalds committedJul 19, 2017 -
Merge tag 'ceph-for-4.13-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov: "A number of small fixes for -rc1 Luminous changes plus a readdir race fix, marked for stable" * tag 'ceph-for-4.13-rc2' of git://github.com/ceph/ceph-client: libceph: potential NULL dereference in ceph_msg_data_create() ceph: fix race in concurrent readdir libceph: don't call encode_request_finish() on MOSDBackoff messages libceph: use alloc_pg_mapping() in __decode_pg_upmap_items() libceph: set -EINVAL in one place in crush_decode() libceph: NULL deref on osdmap_apply_incremental() error path libceph: fix old style declaration warnings
torvalds committedJul 19, 2017 -
audit: fix memleak in auditd_send_unicast_skb.
Found this issue by kmemleak report, auditd_send_unicast_skb did not free skb if rcu_dereference(auditd_conn) returns null. unreferenced object 0xffff88082568ce00 (size 256): comm "auditd", pid 1119, jiffies 4294708499 backtrace: [<ffffffff8176166a>] kmemleak_alloc+0x4a/0xa0 [<ffffffff8121820c>] kmem_cache_alloc_node+0xcc/0x210 [<ffffffff8161b99d>] __alloc_skb+0x5d/0x290 [<ffffffff8113c614>] audit_make_reply+0x54/0xd0 [<ffffffff8113dfa7>] audit_receive_msg+0x967/0xd70 ---------------- (gdb) list *audit_receive_msg+0x967 0xffffffff8113dff7 is in audit_receive_msg (kernel/audit.c:1133). 1132 skb = audit_make_reply(0, AUDIT_REPLACE, 0, 0, &pvnr, sizeof(pvnr)); --------------- [<ffffffff8113e402>] audit_receive+0x52/0xa0 [<ffffffff8166c561>] netlink_unicast+0x181/0x240 [<ffffffff8166c8e2>] netlink_sendmsg+0x2c2/0x3b0 [<ffffffff816112e8>] sock_sendmsg+0x38/0x50 [<ffffffff816117a2>] SYSC_sendto+0x102/0x190 [<ffffffff81612f4e>] SyS_sendto+0xe/0x10 [<ffffffff8176d337>] entry_SYSCALL_64_fastpath+0x1a/0xa5 [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Shu Wang <shuwang@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> -
device-dax: fix sysfs duplicate warnings
Fix warnings of the form... WARNING: CPU: 10 PID: 4983 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80 sysfs: cannot create duplicate filename '/class/dax/dax12.0' Call Trace: dump_stack+0x63/0x86 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5a/0x80 ? kernfs_path_from_node+0x4f/0x60 sysfs_warn_dup+0x62/0x80 sysfs_do_create_link_sd.isra.2+0x97/0xb0 sysfs_create_link+0x25/0x40 device_add+0x266/0x630 devm_create_dax_dev+0x2cf/0x340 [dax] dax_pmem_probe+0x1f5/0x26e [dax_pmem] nvdimm_bus_probe+0x71/0x120 ...by reusing the namespace id for the device-dax instance name. Now that we have decided that there will never by more than one device-dax instance per libnvdimm-namespace parent device [1], we can directly reuse the namepace ids. There are some possible follow-on cleanups, but those are saved for a later patch to simplify the -stable backport. [1]: https://lists.01.org/pipermail/linux-nvdimm/2016-December/008266.html Fixes: 98a29c3 ("libnvdimm, namespace: allow creation of multiple pmem...") Cc: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@vger.kernel.org> Reported-by: Dariusz Dokupil <dariusz.dokupil@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>djbw committedJul 19, 2017
Commits on Jul 18, 2017
-
Merge tag 'md/4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/…
…git/shli/md Pull MD fixes from Shaohua Li: - raid5-ppl fix by Artur. This one is introduced in this release cycle. - raid5 reshape fix by Xiao. This is an old bug and will be added to stable. - bitmap fix by Guoqing. * tag 'md/4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: raid5-ppl: use BIOSET_NEED_BVECS when creating bioset Raid5 should update rdev->sectors after reshape md/bitmap: don't read page from device with Bitmap_sync
torvalds committedJul 18, 2017 -
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/gi…
…t/dledford/rdma Pull rdma fixes from Doug Ledford: "First set of -rc fixes for 4.13 cycle: - misc iSER fixes - namespace fixups - fix the fact that IPoIB didn't use the proper API for noio mem allocs - rxe driver fixes - hns_roce fixes - misc core fixes - misc IPoIB fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (27 commits) IB/core: Allow QP state transition from reset to error IB/hns: Fix for checkpatch.pl comment style warnings IB/hns: Fix the bug with modifying the MAC address without removing the driver IB/hns: Fix the bug with rdma operation IB/hns: Fix the bug with wild pointer when destroy rc qp IB/hns: Fix the bug of polling cq failed for loopback Qps IB/rxe: Set dma_mask and coherent_dma_mask IB/rxe: Fix kernel panic from skb destructor IB/ipoib: Let lower driver handle get_stats64 call IB/core: Add ordered workqueue for RoCE GID management IB/mlx5: Clean mr_cache debugfs in case of failure IB/core: Remove NOIO QP create flag {net, IB}/mlx4: Remove gfp flags argument IB/{rdmavt, qib, hfi1}: Remove gfp flags argument IB/IPoIB: Convert IPoIB to memalloc_noio_* calls IB/IPoIB: Forward MTU change to driver below IB: Convert msleep below 20ms to usleep_range IB/uverbs: Make use of ib_modify_qp variant to avoid resolving DMAC IB/core: Introduce modify QP operation with udata IB/core: Don't resolve IP address to the loopback device ...torvalds committedJul 18, 2017 -
Merge tag 'nfsd-4.13-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fix from Bruce Fields: "One fix for a problem introduced in the most recent merge window and found by Dave Jones and KASAN" * tag 'nfsd-4.13-1' of git://linux-nfs.org/~bfields/linux: nfsd: Fix a memory scribble in the callback channel
torvalds committedJul 18, 2017 -
hfsplus: Don't clear SGID when inheriting ACLs
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by creating __hfsplus_set_posix_acl() function that does not call posix_acl_update_mode() and use it when inheriting ACLs. That prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 0739310 CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
jankara committedJul 18, 2017 -
isofs: Fix off-by-one in 'session' mount option parsing
According to ECMA-130 standard maximum valid track number is 99. Since 'session' mount option starts indexing at 0 (and we add 1 to the passed number), we should refuse value 99. Also the condition in isofs_get_last_session() unnecessarily repeats the check - remove it. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
jankara committedJul 18, 2017 -
reiserfs: preserve i_mode if __reiserfs_set_acl() fails
When changing a file's acl mask, reiserfs_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by only changing the inode mode after the acl has been set. Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
-
ext2: preserve i_mode if ext2_set_acl() fails
When changing a file's acl mask, ext2_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by only changing the inode mode after the acl has been set. [JK: Rebased on top of "ext2: Don't clear SGID when inheriting ACLs"] Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
-
Before retrying to flush data or dentry pages, we need to release cpu in order to prevent watchdog. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim committedJul 18, 2017 -
f2fs: include seq_file.h for sysfs.c
This patch includes seq_file.h to avoid compile error. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim committedJul 18, 2017 -
IB/core: Allow QP state transition from reset to error
Playing with IP-O-IB interface can trigger a warning message: "ib0: Failed to modify QP to ERROR state" to be logged. This happens when the QP is in IB_QPS_RESET state and the stack is trying to transition it to IB_QPS_ERR state in ipoib_ib_dev_stop(). According to the IB spec, Table 91 - "QP State Transition Properties" it looks like the transition from reset to error is valid: Transition: Any State to Error Required Attributes: None Optional Attributes: None allowed Actions: Queue processing is stopped. Work Requests pending or in process are completed in error, when possible. This patch allows the transition and quiets the message. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
IB/hns: Fix for checkpatch.pl comment style warnings
This patch correct the comment style warnings caught by checkpatch.pl script. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
IB/hns: Fix the bug with modifying the MAC address without removing t…
…he driver When modified the MAC address used hns_roce_mac function, we release and create reserved qp again, It is not necessary to use spin_lock_bh and spin_unlock_bh in handle_en_event, Otherwise, it will occur a error. This patch mainly fixes it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
IB/hns: Fix the bug with rdma operation
When opcode of work request is RDMA read and write, it should use rdma_wr to get remote_addr and rkey. This patch fixes it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
IB/hns: Fix the bug with wild pointer when destroy rc qp
When destroyed rc qp, the hr_qp will be used after freed. This patch will fix it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
IB/hns: Fix the bug of polling cq failed for loopback Qps
In hip06 SoC, RoCE driver creates 8 reserved loopback QPs to ensure zero wqe when free mr. However, if the enabled phy port number is less than 6, it will fail in polling cqe with 8 reserved loopback QPs. In order to solve this problem, the number of loopback Qps will be adjusted based on the number of enabled phy port. Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
IB/rxe: Set dma_mask and coherent_dma_mask
The RXE coupled with dummy device causes to the kernel panic attached below. The panic happens when ib_register_device tries to set dma_mask by accessing a NULLed parent device. The RXE does not actually use DMA, so we can set the dma_mask to architecture value. [16240.199689] RIP: 0010:ib_register_device+0x468/0x5a0 [ib_core] [16240.205289] RSP: 0018:ffffc9000220fc10 EFLAGS: 00010246 [16240.209909] RAX: 0000000000000024 RBX: ffff880220d1a2a8 RCX: 0000000000000000 [16240.212244] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009 [16240.214385] RBP: ffffc9000220fcb0 R08: 0000000000000000 R09: 000000000000023f [16240.254465] R10: 0000000000000007 R11: 0000000000000000 R12: 0000000000000000 [16240.259467] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880220d1a2a8 [16240.263314] FS: 00007fd8ecca0740(0000) GS:ffff8802364c0000(0000) knlGS:0000000000000000 [16240.267292] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [16240.273503] CR2: 0000000000000218 CR3: 00000002253ba000 CR4: 00000000000006e0 [16240.277066] Call Trace: [16240.281836] ? __kmalloc+0x26f/0x280 [16240.286596] rxe_register_device+0x297/0x300 [rdma_rxe] [16240.291377] rxe_add+0x535/0x5b0 [rdma_rxe] [16240.297586] rxe_net_add+0x3e/0xc0 [rdma_rxe] [16240.302375] rxe_param_set_add+0x65/0x144 [rdma_rxe] [16240.307769] param_attr_store+0x68/0xd0 [16240.311640] module_attr_store+0x1d/0x30 [16240.316421] sysfs_kf_write+0x3a/0x50 [16240.317802] kernfs_fop_write+0xff/0x180 [16240.322989] __vfs_write+0x37/0x140 [16240.328164] ? handle_mm_fault+0xce/0x240 [16240.333340] vfs_write+0xb2/0x1b0 [16240.335013] SyS_write+0x55/0xc0 [16240.340632] entry_SYSCALL_64_fastpath+0x1a/0xa9 Fixes: 8700e3e ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
IB/rxe: Fix kernel panic from skb destructor
In the time between rxe_send has finished and skb destructor called, the QP's ref count might be 0, leading to a possible QP destruction. This will lead to a kernel panic when the destructor dereferences the QP. The operation of incrementing QP ref count at rxe_send and decrementing from skb destructor will prevent this crash. BUG: unable to handle kernel NULL pointer dereference at 000000000000072c IP: [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe] PGD 0 [16240.211178] Oops: 0002 [#1] SMP CPU: 3 PID: 0 Comm: swapper/3 Tainted: G OE 4.9.0-mlnx #1 Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 task: ffff88042d6b1480 task.stack: ffffc90001904000 RIP: 0010:[<ffffffffa05df765>] [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe] RSP: 0018:ffff88043fcc3df0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880429684700 RCX: ffff88042d248200 RDX: 00000000ffffffff RSI: 00000000fffffe01 RDI: ffff880429684700 RBP: ffff88043fcc3e00 R08: ffff88043fcda240 R09: 00000000ff2d1de6 R10: 0000000000000000 R11: 00000000f49cf6fe R12: ffff880429684700 R13: ffffffff81893f96 R14: ffffffff817d66f0 R15: ffff880427f74200 FS: 0000000000000000(0000) GS:ffff88043fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000072c CR3: 000000041d3df000 CR4: 00000000000006e0 Stack: ffffffff817b29cf ffff880429684700 ffff88043fcc3e18 ffffffff817b42c2 ffff880429684700 ffff88043fcc3e40 ffffffff817b4332 ffff880429684700 ffff880427f74238 ffff880427f74228 ffff88043fcc3e58 ffffffff81893f96 Call Trace: <IRQ> [16240.336345] [<ffffffff817b29cf>] ? skb_release_head_state+0x4f/0xb0 [<ffffffff817b42c2>] skb_release_all+0x12/0x30 [<ffffffff817b4332>] kfree_skb+0x32/0x90 [<ffffffff81893f96>] ndisc_error_report+0x36/0x40 [<ffffffff817d4de1>] neigh_invalidate+0x81/0xf0 [<ffffffff817d68f7>] neigh_timer_handler+0x207/0x2b0 [<ffffffff81109295>] call_timer_fn+0x35/0x120 [<ffffffff81109db7>] run_timer_softirq+0x1d7/0x460 [<ffffffff8106155e>] ? kvm_sched_clock_read+0x1e/0x30 [<ffffffff810366b9>] ? sched_clock+0x9/0x10 [<ffffffff810cfed2>] ? sched_clock_cpu+0x72/0xa0 [<ffffffff818dd537>] __do_softirq+0xd7/0x289 [<ffffffff810a6c95>] irq_exit+0xb5/0xc0 [<ffffffff818dd372>] smp_apic_timer_interrupt+0x42/0x50 [<ffffffff818dc682>] apic_timer_interrupt+0x82/0x90 <EOI> [16240.395776] [<ffffffff818da156>] ? native_safe_halt+0x6/0x10 [<ffffffff818d9e6e>] default_idle+0x1e/0xd0 [<ffffffff8103797f>] arch_cpu_idle+0xf/0x20 [<ffffffff818da2c5>] default_idle_call+0x35/0x40 [<ffffffff810e3eb5>] cpu_startup_entry+0x185/0x210 [<ffffffff81050433>] start_secondary+0x103/0x130 RIP [<ffffffffa05df765>] rxe_skb_tx_dtor+0x15/0x50 [rdma_rxe] Fixes: 8700e3e ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
IB/ipoib: Let lower driver handle get_stats64 call
The driver checks if the lower level driver supports get_stats, and if so calls it to get the updated statistics, otherwise takes from the current netdevice stats object. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
IB/core: Add ordered workqueue for RoCE GID management
Currently the RoCE GID management uses the ib_wq to do add and delete new GIDs according to the netdev events. The ib_wq isn't an ordered workqueue and thus two work elements can be executed concurrently which will result in unexpected behavior and inconsistency of the GIDs cache content. Example: ifconfig eth1 11.11.11.11/16 up This command will invoke the following netdev events in the following order: 1. NETDEV_UP 2. NETDEV_DOWN 3. NETDEV_UP If (2) and (3) will be executed concurrently or in reverse order, instead of having a new GID with 11.11.11.11 IP, we will end up without any new GIDs. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>