Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-Wparentheses-equality in fs/ocfs2/dlm/dlmthread.c #40

Closed
nickdesaulniers opened this issue Sep 11, 2018 · 13 comments
Closed

-Wparentheses-equality in fs/ocfs2/dlm/dlmthread.c #40

nickdesaulniers opened this issue Sep 11, 2018 · 13 comments
Assignees
Labels
-Wparentheses-equality [BUG] linux A bug that should be fixed in the mainline kernel. [FIXED][LINUX] 4.20 This bug was fixed in Linux 4.20

Comments

@nickdesaulniers
Copy link
Member

  CC      fs/ocfs2/dlm/dlmthread.o
fs/ocfs2/dlm/dlmthread.c:534:18: warning: equality comparison with extraneous
      parentheses [-Wparentheses-equality]
        if ((res->owner == dlm->node_num)) {
             ~~~~~~~~~~~^~~~~~~~~~~~~~~~
fs/ocfs2/dlm/dlmthread.c:534:18: note: remove extraneous parentheses around the
      comparison to silence this warning
        if ((res->owner == dlm->node_num)) {
            ~           ^               ~
fs/ocfs2/dlm/dlmthread.c:534:18: note: use '=' to turn this equality comparison into
      an assignment
        if ((res->owner == dlm->node_num)) {
                        ^~
                        =

Probably requires CONFIG_OCFS2_FS=y (there's like 6 OCFS2 configs).

@nickdesaulniers nickdesaulniers added help wanted good first issue Good for newcomers [BUG] linux A bug that should be fixed in the mainline kernel. [ARCH] arm64 This bug impacts ARCH=arm64 low priority This bug is not critical and not a priority labels Sep 11, 2018
@nickdesaulniers nickdesaulniers added this to the ARCH=arm64 allyesconfig milestone Sep 11, 2018
@nathanchance
Copy link
Member

CONFIG_OCFS2_FS + CONFIG_OCFS2_FS_O2CB

I'll send a patch for this one.

nickdesaulniers pushed a commit that referenced this issue Sep 11, 2018
In 4.19-rc1, Eugeniy reported weird boot and IO errors on ARC HSDK

| INFO: task syslogd:77 blocked for more than 10 seconds.
|       Not tainted 4.19.0-rc1-00007-gf213acea4e88 #40
| "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
| message.
| syslogd         D    0    77     76 0x00000000
|
| Stack Trace:
|  __switch_to+0x0/0xac
|  __schedule+0x1b2/0x730
|  io_schedule+0x5c/0xc0
|  __lock_page+0x98/0xdc
|  find_lock_entry+0x38/0x100
|  shmem_getpage_gfp.isra.3+0x82/0xbfc
|  shmem_fault+0x46/0x138
|  handle_mm_fault+0x5bc/0x924
|  do_page_fault+0x100/0x2b8
|  ret_from_exception+0x0/0x8

He bisected to 84c6591 ("locking/atomics,
asm-generic/bitops/lock.h: Rewrite using atomic_fetch_*()")

This commit however only unmasked the real issue introduced by commit
4aef66c ("locking/atomic, arch/arc: Fix build") which missed the
retry-if-scond-failed branch in atomic_fetch_##op() macros.

The bisected commit started using atomic_fetch_##op() macros for building
the rest of atomics.

Fixes: 4aef66c ("locking/atomic, arch/arc: Fix build")
Reported-by: Eugeniy Paltsev <paltsev@synopsys.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: wrote changelog]
@nickdesaulniers
Copy link
Member Author

There's another easy one in:

  CC      drivers/ata/pata_atiixp.o
drivers/ata/pata_atiixp.c:282:19: warning: equality comparison with extraneous
      parentheses [-Wparentheses-equality]
        if((pdev->device == PCI_DEVICE_ID_ATI_IXP600_IDE))
            ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/ata/pata_atiixp.c:282:19: note: remove extraneous parentheses around the
      comparison to silence this warning
        if((pdev->device == PCI_DEVICE_ID_ATI_IXP600_IDE))
           ~             ^                              ~
drivers/ata/pata_atiixp.c:282:19: note: use '=' to turn this equality comparison into
      an assignment
        if((pdev->device == PCI_DEVICE_ID_ATI_IXP600_IDE))
                         ^~
                         =

(should be a separate patch, and probably GH issue). Bust out your old parallel ATA hard drives! 🎆

@nickdesaulniers nickdesaulniers removed [ARCH] arm64 This bug impacts ARCH=arm64 labels Sep 11, 2018
@nickdesaulniers
Copy link
Member Author

nickdesaulniers commented Sep 11, 2018

  CC      drivers/block/rsxx/cregs.o
drivers/block/rsxx/cregs.c:279:15: warning: equality comparison with extraneous
      parentheses [-Wparentheses-equality]
        if ((cmd->op == CREG_OP_READ)) {
             ~~~~~~~~^~~~~~~~~~~~~~~
drivers/block/rsxx/cregs.c:279:15: note: remove extraneous parentheses around the
      comparison to silence this warning
        if ((cmd->op == CREG_OP_READ)) {
            ~        ^              ~
drivers/block/rsxx/cregs.c:279:15: note: use '=' to turn this equality comparison into
      an assignment
        if ((cmd->op == CREG_OP_READ)) {
                     ^~
                     =

CONFIG_BLK_DEV_RSXX=y

@nathanchance
Copy link
Member

@nickdesaulniers do you want a Reported-by: tag?

@nickdesaulniers
Copy link
Member Author

Yeah man, include me in the screenshot. 🤳

@nathanchance
Copy link
Member

@nathanchance
Copy link
Member

Patch sent for the second warning: https://lore.kernel.org/lkml/20180911214338.1109-1-natechancellor@gmail.com/

@nathanchance
Copy link
Member

@eiais eiais added the [PATCH] Exists There is a patch that fixes this issue label Sep 13, 2018
@nathanchance nathanchance added [PATCH] Submitted A patch has been submitted for review [PATCH] Accepted A submitted patch has been accepted upstream and removed [PATCH] Exists There is a patch that fixes this issue labels Sep 21, 2018
@nathanchance
Copy link
Member

The second and third patches have been accepted and are currently in -next:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ce42c1768152277658d48196f144aa406a4ea52a

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=798ef9e70110d0245797526c930beec7fded7b15

I'll resend the first patch on Monday (looks like I need to send it to Andrew Morton based on git history).

@nathanchance
Copy link
Member

@tpimh tpimh removed the good first issue Good for newcomers label Sep 25, 2018
@nathanchance
Copy link
Member

Patch has been accepted for a while but it just showed up in -next a few days ago: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=90bb74ee01489f89a2bcf3a191510e26bf1e3a64

@nathanchance nathanchance removed the [PATCH] Submitted A patch has been submitted for review label Oct 5, 2018
@nathanchance
Copy link
Member

@nathanchance nathanchance added [FIXED][LINUX] 4.20 This bug was fixed in Linux 4.20 and removed [PATCH] Accepted A submitted patch has been accepted upstream labels Oct 27, 2018
@tpimh tpimh removed the low priority This bug is not critical and not a priority label Oct 27, 2018
nathanchance pushed a commit that referenced this issue Nov 9, 2018
Increase kasan instrumented kernel stack size from 32k to 64k. Other
architectures seems to get away with just doubling kernel stack size under
kasan, but on s390 this appears to be not enough due to bigger frame size.
The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE
vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting
stack overflow is fs sync on xfs filesystem:

 #0 [9a0681e8]  704 bytes  check_usage at 34b1fc
 #1 [9a0684a8]  432 bytes  check_usage at 34c710
 #2 [9a068658]  1048 bytes  validate_chain at 35044a
 #3 [9a068a70]  312 bytes  __lock_acquire at 3559fe
 #4 [9a068ba8]  440 bytes  lock_acquire at 3576ee
 #5 [9a068d60]  104 bytes  _raw_spin_lock at 21b44e0
 #6 [9a068dc8]  1992 bytes  enqueue_entity at 2dbf72
 #7 [9a069590]  1496 bytes  enqueue_task_fair at 2df5f0
 #8 [9a069b68]  64 bytes  ttwu_do_activate at 28f438
 #9 [9a069ba8]  552 bytes  try_to_wake_up at 298c4c
 #10 [9a069dd0]  168 bytes  wake_up_worker at 23f97c
 #11 [9a069e78]  200 bytes  insert_work at 23fc2e
 #12 [9a069f40]  648 bytes  __queue_work at 2487c0
 #13 [9a06a1c8]  200 bytes  __queue_delayed_work at 24db28
 #14 [9a06a290]  248 bytes  mod_delayed_work_on at 24de84
 #15 [9a06a388]  24 bytes  kblockd_mod_delayed_work_on at 153e2a0
 #16 [9a06a3a0]  288 bytes  __blk_mq_delay_run_hw_queue at 158168c
 #17 [9a06a4c0]  192 bytes  blk_mq_run_hw_queue at 1581a3c
 #18 [9a06a580]  184 bytes  blk_mq_sched_insert_requests at 15a2192
 #19 [9a06a638]  1024 bytes  blk_mq_flush_plug_list at 1590f3a
 #20 [9a06aa38]  704 bytes  blk_flush_plug_list at 1555028
 #21 [9a06acf8]  320 bytes  schedule at 219e476
 #22 [9a06ae38]  760 bytes  schedule_timeout at 21b0aac
 #23 [9a06b130]  408 bytes  wait_for_common at 21a1706
 #24 [9a06b2c8]  360 bytes  xfs_buf_iowait at fa1540
 #25 [9a06b430]  256 bytes  __xfs_buf_submit at fadae6
 #26 [9a06b530]  264 bytes  xfs_buf_read_map at fae3f6
 #27 [9a06b638]  656 bytes  xfs_trans_read_buf_map at 10ac9a8
 #28 [9a06b8c8]  304 bytes  xfs_btree_kill_root at e72426
 #29 [9a06b9f8]  288 bytes  xfs_btree_lookup_get_block at e7bc5e
 #30 [9a06bb18]  624 bytes  xfs_btree_lookup at e7e1a6
 #31 [9a06bd88]  2664 bytes  xfs_alloc_ag_vextent_near at dfa070
 #32 [9a06c7f0]  144 bytes  xfs_alloc_ag_vextent at dff3ca
 #33 [9a06c880]  1128 bytes  xfs_alloc_vextent at e05fce
 #34 [9a06cce8]  584 bytes  xfs_bmap_btalloc at e58342
 #35 [9a06cf30]  1336 bytes  xfs_bmapi_write at e618de
 #36 [9a06d468]  776 bytes  xfs_iomap_write_allocate at ff678e
 #37 [9a06d770]  720 bytes  xfs_map_blocks at f82af8
 #38 [9a06da40]  928 bytes  xfs_writepage_map at f83cd6
 #39 [9a06dde0]  320 bytes  xfs_do_writepage at f85872
 #40 [9a06df20]  1320 bytes  write_cache_pages at 73dfe8
 #41 [9a06e448]  208 bytes  xfs_vm_writepages at f7f892
 #42 [9a06e518]  88 bytes  do_writepages at 73fe6a
 #43 [9a06e570]  872 bytes  __writeback_single_inode at a20cb6
 #44 [9a06e8d8]  664 bytes  writeback_sb_inodes at a23be2
 #45 [9a06eb70]  296 bytes  __writeback_inodes_wb at a242e0
 #46 [9a06ec98]  928 bytes  wb_writeback at a2500e
 #47 [9a06f038]  848 bytes  wb_do_writeback at a260ae
 #48 [9a06f388]  536 bytes  wb_workfn at a28228
 #49 [9a06f5a0]  1088 bytes  process_one_work at 24a234
 #50 [9a06f9e0]  1120 bytes  worker_thread at 24ba26
 #51 [9a06fe40]  104 bytes  kthread at 26545a
 #52 [9a06fea8]             kernel_thread_starter at 21b6b62

To be able to increase the stack size to 64k reuse LLILL instruction
in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE
(65192) value as unsigned.

Reported-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
nathanchance pushed a commit that referenced this issue May 31, 2019
…text

stub_probe() and stub_disconnect() call functions which could call
sleeping function in invalid context whil holding busid_lock.

Fix the problem by refining the lock holds to short critical sections
to change the busid_priv fields. This fix restructures the code to
limit the lock holds in stub_probe() and stub_disconnect().

stub_probe():

[15217.927028] BUG: sleeping function called from invalid context at mm/slab.h:418
[15217.927038] in_atomic(): 1, irqs_disabled(): 0, pid: 29087, name: usbip
[15217.927044] 5 locks held by usbip/29087:
[15217.927047]  #0: 0000000091647f28 (sb_writers#6){....}, at: vfs_write+0x191/0x1c0
[15217.927062]  #1: 000000008f9ba75b (&of->mutex){....}, at: kernfs_fop_write+0xf7/0x1b0
[15217.927072]  #2: 00000000872e5b4b (&dev->mutex){....}, at: __device_driver_lock+0x3b/0x50
[15217.927082]  #3: 00000000e74ececc (&dev->mutex){....}, at: __device_driver_lock+0x46/0x50
[15217.927090]  #4: 00000000b20abbe0 (&(&busid_table[i].busid_lock)->rlock){....}, at: get_busid_priv+0x48/0x60 [usbip_host]
[15217.927103] CPU: 3 PID: 29087 Comm: usbip Tainted: G        W         5.1.0-rc6+ #40
[15217.927106] Hardware name: Dell Inc. OptiPlex 790/0HY9JP, BIOS A18 09/24/2013
[15217.927109] Call Trace:
[15217.927118]  dump_stack+0x63/0x85
[15217.927127]  ___might_sleep+0xff/0x120
[15217.927133]  __might_sleep+0x4a/0x80
[15217.927143]  kmem_cache_alloc_trace+0x1aa/0x210
[15217.927156]  stub_probe+0xe8/0x440 [usbip_host]
[15217.927171]  usb_probe_device+0x34/0x70

stub_disconnect():

[15279.182478] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908
[15279.182487] in_atomic(): 1, irqs_disabled(): 0, pid: 29114, name: usbip
[15279.182492] 5 locks held by usbip/29114:
[15279.182494]  #0: 0000000091647f28 (sb_writers#6){....}, at: vfs_write+0x191/0x1c0
[15279.182506]  #1: 00000000702cf0f3 (&of->mutex){....}, at: kernfs_fop_write+0xf7/0x1b0
[15279.182514]  #2: 00000000872e5b4b (&dev->mutex){....}, at: __device_driver_lock+0x3b/0x50
[15279.182522]  #3: 00000000e74ececc (&dev->mutex){....}, at: __device_driver_lock+0x46/0x50
[15279.182529]  #4: 00000000b20abbe0 (&(&busid_table[i].busid_lock)->rlock){....}, at: get_busid_priv+0x48/0x60 [usbip_host]
[15279.182541] CPU: 0 PID: 29114 Comm: usbip Tainted: G        W         5.1.0-rc6+ #40
[15279.182543] Hardware name: Dell Inc. OptiPlex 790/0HY9JP, BIOS A18 09/24/2013
[15279.182546] Call Trace:
[15279.182554]  dump_stack+0x63/0x85
[15279.182561]  ___might_sleep+0xff/0x120
[15279.182566]  __might_sleep+0x4a/0x80
[15279.182574]  __mutex_lock+0x55/0x950
[15279.182582]  ? get_busid_priv+0x48/0x60 [usbip_host]
[15279.182587]  ? reacquire_held_locks+0xec/0x1a0
[15279.182591]  ? get_busid_priv+0x48/0x60 [usbip_host]
[15279.182597]  ? find_held_lock+0x94/0xa0
[15279.182609]  mutex_lock_nested+0x1b/0x20
[15279.182614]  ? mutex_lock_nested+0x1b/0x20
[15279.182618]  kernfs_remove_by_name_ns+0x2a/0x90
[15279.182625]  sysfs_remove_file_ns+0x15/0x20
[15279.182629]  device_remove_file+0x19/0x20
[15279.182634]  stub_disconnect+0x6d/0x180 [usbip_host]
[15279.182643]  usb_unbind_device+0x27/0x60

Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nathanchance pushed a commit that referenced this issue Jan 29, 2020
The test_progs send_signal() is amended to test
bpf_send_signal_thread() as well.

   $ ./test_progs -n 40
   #40/1 send_signal_tracepoint:OK
   #40/2 send_signal_perf:OK
   #40/3 send_signal_nmi:OK
   #40/4 send_signal_tracepoint_thread:OK
   #40/5 send_signal_perf_thread:OK
   #40/6 send_signal_nmi_thread:OK
   #40 send_signal:OK
   Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED

Also took this opportunity to rewrite the send_signal test
using skeleton framework and array mmap to make code
simpler and more readable.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200115035003.602425-1-yhs@fb.com
nathanchance pushed a commit that referenced this issue Jun 4, 2020
The mmu mapping logic uses a different logic depending on the
RAM size: if it is lower than 2GB, it uses kmem_cache_zalloc(),
but if memory is bigger than that, it uses its own way to
allocate memory.

Yet, when freeing, it uses kmem_cache_free() for any cases.

On recent Kernels, slab tracks the memory allocated on it,
with causes those warnings:

 virt_to_cache: Object is not a Slab page!
 WARNING: CPU: 0 PID: 758 at mm/slab.h:475 cache_from_obj+0xab/0xf0
 Modules linked in: snd_soc_sst_cht_bsw_rt5645(E) mei_hdcp(E) gpio_keys(E) intel_rapl_msr(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) atomisp_ov2680(CE) intel_cstate(E) asus_nb_wmi(E) wdat_wdt(E) pcspkr(E) ath10k_pci(E) ath10k_core(E) intel_chtdc_ti_pwrbtn(E) ath(E) mac80211(E) btusb(E) joydev(E) btrtl(E) btbcm(E) btintel(E) bluetooth(E) libarc4(E) ecdh_generic(E) cfg80211(E) ecc(E) hid_sensor_gyro_3d(E) hid_sensor_accel_3d(E) hid_sensor_trigger(E) hid_sensor_iio_common(E) industrialio_triggered_buffer(E) kfifo_buf(E) industrialio(E) atomisp(CE) videobuf_vmalloc(E) videobuf_core(E) videodev(E) mc(E) snd_soc_rt5645(E) snd_soc_rl6231(E) snd_intel_sst_acpi(E) snd_intel_sst_core(E) snd_soc_sst_atom_hifi2_platform(E) snd_soc_acpi_intel_match(E) intel_hid(E) spi_pxa2xx_platform(E) snd_soc_acpi(E) snd_soc_core(E) snd_compress(E) dw_dmac(E) intel_xhci_usb_role_switch(E) int3406_thermal(E)
  snd_hdmi_lpe_audio(E) int3403_thermal(E) int3400_thermal(E) acpi_thermal_rel(E) snd_seq(E) intel_int0002_vgpio(E) soc_button_array(E) snd_seq_device(E) acpi_pad(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) lpc_ich(E) mei_txe(E) mei(E) processor_thermal_device(E) intel_soc_dts_iosf(E) intel_rapl_common(E) int340x_thermal_zone(E) ip_tables(E) hid_sensor_hub(E) intel_ishtp_loader(E) intel_ishtp_hid(E) mmc_block(E) hid_multitouch(E) crc32c_intel(E) i915(E) i2c_algo_bit(E) drm_kms_helper(E) hid_asus(E) asus_wmi(E) sparse_keymap(E) rfkill(E) drm(E) intel_ish_ipc(E) intel_ishtp(E) wmi(E) video(E) i2c_hid(E) sdhci_acpi(E) sdhci(E) mmc_core(E) pwm_lpss_platform(E) pwm_lpss(E) fuse(E)
 CPU: 0 PID: 758 Comm: v4l_id Tainted: G         C  E     5.7.0-rc2+ #40
 Hardware name: ASUSTeK COMPUTER INC. T101HA/T101HA, BIOS T101HA.306 04/23/2019
 RIP: 0010:cache_from_obj+0xab/0xf0
 Code: c3 31 c0 80 3d 1c 38 72 01 00 75 f0 48 c7 c6 20 12 06 b5 48 c7 c7 10 f3 37 b5 48 89 04 24 c6 05 01 38 72 01 01 e8 2c 99 e0 ff <0f> 0b 48 8b 04 24 eb ca 48 8b 57 58 48 8b 48 58 48 c7 c6 30 12 06
 RSP: 0018:ffffb0a4c07cfb10 EFLAGS: 00010282
 RAX: 0000000000000029 RBX: 0000000000000048 RCX: 0000000000000000
 RDX: ffffa004fbca5b80 RSI: ffffa004fbc19cc8 RDI: ffffa004fbc19cc8
 RBP: 0000000000c49000 R08: 00000000000004f7 R09: 0000000000000001
 R10: 0000000000aaaaaa R11: ffffffffb50e0600 R12: ffffffffc0be0a00
 R13: ffffa003f2448000 R14: 0000000000c49000 R15: ffffa003f2448000
 FS:  00007f9060c9cb80(0000) GS:ffffa004fbc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000559fc55b8000 CR3: 0000000165b02000 CR4: 00000000001006f0
 Call Trace:
  kmem_cache_free+0x19/0x180
  mmu_l2_unmap+0xd1/0x100 [atomisp]
  mmu_unmap+0xd0/0xf0 [atomisp]
  hmm_bo_unbind+0x62/0xb0 [atomisp]
  hmm_free+0x44/0x60 [atomisp]
  ia_css_spctrl_unload_fw+0x30/0x50 [atomisp]
  ia_css_uninit+0x3a/0x90 [atomisp]
  atomisp_open+0x50b/0x5c0 [atomisp]
  v4l2_open+0x85/0xf0 [videodev]
  chrdev_open+0xdd/0x210
  ? cdev_device_add+0xc0/0xc0
  do_dentry_open+0x13a/0x380
  path_openat+0xa9a/0xfe0
  do_filp_open+0x75/0x100
  ? __check_object_size+0x12e/0x13c
  ? __alloc_fd+0x44/0x150
  do_sys_openat2+0x8a/0x130
  __x64_sys_openat+0x46/0x70
  do_syscall_64+0x5b/0xf0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Solve it by calling free_page() directly

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
nathanchance pushed a commit that referenced this issue Aug 16, 2020
…card()

If create a loop device with a backing NVMe SSD, current loop device
driver doesn't correctly set its  queue's limits.discard_granularity and
leaves it as 0. If a discard request at LBA 0 on this loop device, in
__blkdev_issue_discard() the calculated req_sects will be 0, and a zero
length discard request will trigger a BUG() panic in generic block layer
code at block/blk-mq.c:563.

[  955.565006][   C39] ------------[ cut here ]------------
[  955.559660][   C39] invalid opcode: 0000 [#1] SMP NOPTI
[  955.622171][   C39] CPU: 39 PID: 248 Comm: ksoftirqd/39 Tainted: G            E     5.8.0-default+ #40
[  955.622171][   C39] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE160M-2.70]- 07/17/2020
[  955.622175][   C39] RIP: 0010:blk_mq_end_request+0x107/0x110
[  955.622177][   C39] Code: 48 8b 03 e9 59 ff ff ff 48 89 df 5b 5d 41 5c e9 9f ed ff ff 48 8b 35 98 3c f4 00 48 83 c7 10 48 83 c6 19 e8 cb 56 c9 ff eb cb <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 54
[  955.622179][   C39] RSP: 0018:ffffb1288701fe28 EFLAGS: 00010202
[  955.749277][   C39] RAX: 0000000000000001 RBX: ffff956fffba5080 RCX: 0000000000004003
[  955.749278][   C39] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000000
[  955.749279][   C39] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  955.749279][   C39] R10: ffffb1288701fd28 R11: 0000000000000001 R12: ffffffffa8e05160
[  955.749280][   C39] R13: 0000000000000004 R14: 0000000000000004 R15: ffffffffa7ad3a1e
[  955.749281][   C39] FS:  0000000000000000(0000) GS:ffff95bfbda00000(0000) knlGS:0000000000000000
[  955.749282][   C39] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  955.749282][   C39] CR2: 00007f6f0ef766a8 CR3: 0000005a37012002 CR4: 00000000007606e0
[  955.749283][   C39] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  955.749284][   C39] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  955.749284][   C39] PKRU: 55555554
[  955.749285][   C39] Call Trace:
[  955.749290][   C39]  blk_done_softirq+0x99/0xc0
[  957.550669][   C39]  __do_softirq+0xd3/0x45f
[  957.550677][   C39]  ? smpboot_thread_fn+0x2f/0x1e0
[  957.550679][   C39]  ? smpboot_thread_fn+0x74/0x1e0
[  957.550680][   C39]  ? smpboot_thread_fn+0x14e/0x1e0
[  957.550684][   C39]  run_ksoftirqd+0x30/0x60
[  957.550687][   C39]  smpboot_thread_fn+0x149/0x1e0
[  957.886225][   C39]  ? sort_range+0x20/0x20
[  957.886226][   C39]  kthread+0x137/0x160
[  957.886228][   C39]  ? kthread_park+0x90/0x90
[  957.886231][   C39]  ret_from_fork+0x22/0x30
[  959.117120][   C39] ---[ end trace 3dacdac97e2ed164 ]---

This is the procedure to reproduce the panic,
  # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
  # losetup -f /dev/nvme0n1 --direct-io=on
  # blkdiscard /dev/loop0 -o 0 -l 0x200

This patch fixes the issue by checking q->limits.discard_granularity in
__blkdev_issue_discard() before composing the discard bio. If the value
is 0, then prints a warning oops information and returns -EOPNOTSUPP to
the caller to indicate that this buggy device driver doesn't support
discard request.

Fixes: 9b15d10 ("block: improve discard bio alignment in __blkdev_issue_discard()")
Fixes: c52abf5 ("loop: Better discard support for block devices")
Reported-and-suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Enzo Matsumiya <ematsumiya@suse.com>
Cc: Evan Green <evgreen@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Xiao Ni <xni@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-Wparentheses-equality [BUG] linux A bug that should be fixed in the mainline kernel. [FIXED][LINUX] 4.20 This bug was fixed in Linux 4.20
Projects
None yet
Development

No branches or pull requests

4 participants