Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xen/drm: race condition while freeing GEM objects #1

Closed
andr2000 opened this issue Mar 17, 2017 · 0 comments
Closed

xen/drm: race condition while freeing GEM objects #1

andr2000 opened this issue Mar 17, 2017 · 0 comments

Comments

@andr2000
Copy link
Owner

[ 25.157177] [drm:xdrv_shbuf_free_by_dumb_cookie] ERROR Freeing shared buffer dumb_cookie ffff8000294dc900

[ 25.158712] CPU: 2 PID: 1143 Comm: pvr_defer_free Tainted: G O 4.9.0-domu+ xen-troops#76
[ 25.158713] Hardware name: XENVM-4.8 (DT)
[ 25.158715] Call trace:
[ 25.158720] [] dump_backtrace+0x0/0x1a8
[ 25.158723] [] show_stack+0x14/0x20
[ 25.158726] [] dump_stack+0x94/0xb8
[ 25.158827] [] OSReleasePhysPageAddr+0x34/0x1b0 [pvrsrvkm]
[ 25.158910] [] PMRFinalizeWrap+0x28/0x40 [pvrsrvkm]
[ 25.158984] [] _UnrefAndMaybeDestroy+0x44/0x80 [pvrsrvkm]
[ 25.159057] [] PMRUnrefPMR+0xc/0x18 [pvrsrvkm]
[ 25.159128] [] FreeHandleDataWrapper+0xb8/0x1a0 [pvrsrvkm]
[ 25.159197] [] HandleIterFuncWrapper+0x14/0x20 [pvrsrvkm]
[ 25.159201] [] idr_for_each+0xa4/0x108
[ 25.159271] [] IterateOverHandles+0x18/0x28 [pvrsrvkm]
[ 25.159342] [] PVRSRVFreeHandleBase+0x7c/0x110 [pvrsrvkm]
[ 25.159414] [] ConnectionDataDestroy+0x80/0x140 [pvrsrvkm]
[ 25.159485] [] _CleanupThreadPurgeConnectionData+0x28/0x60 [pvrsrvkm]
[ 25.159558] [] CleanupThread+0xb4/0x158 [pvrsrvkm]
[ 25.159625] [] OSThreadRun+0x24/0x68 [pvrsrvkm]
[ 25.159629] [] kthread+0xd0/0xe8
[ 25.159631] [] ret_from_fork+0x10/0x50
[ 25.159699] ----------- put_page N= 469

[ 25.178675] CPU: 3 PID: 1229 Comm: kmscube Tainted: G O 4.9.0-domu+ xen-troops#76
[ 25.178697] Hardware name: XENVM-4.8 (DT)
[ 25.178706] Call trace:
[ 25.178718] [] dump_backtrace+0x0/0x1a8
[ 25.178731] [] show_stack+0x14/0x20
[ 25.178745] [] dump_stack+0x94/0xb8
[ 25.178760] [] xendrm_free_object+0x108/0x1a0
[ 25.178776] [] drm_gem_object_free+0x1c/0x58
[ 25.178790] [] drm_gem_object_unreference_unlocked+0x54/0x130
[ 25.178811] [] drm_gem_dmabuf_release+0x1c/0x30
[ 25.178827] [] dma_buf_release+0x5c/0x170
[ 25.178840] [] __fput+0x90/0x1d0
[ 25.178852] [] ____fput+0xc/0x18
[ 25.178867] [] task_work_run+0xbc/0xd8
[ 25.178881] [] do_exit+0x2d0/0x978
[ 25.178894] [] do_group_exit+0x38/0xa0
[ 25.178907] [] get_signal+0x20c/0x510
[ 25.178920] [] do_signal+0x74/0x550
[ 25.178932] [] do_notify_resume+0x90/0xb0
[ 25.178944] [] work_pending+0x8/0x10

andr2000 pushed a commit that referenced this issue May 30, 2017
This patch fixes the problem that occasionally released resources
before the end of interrupt processing.

[   58.156412] Unable to handle kernel NULL pointer dereference at virtual
 address 00000000
[   58.166155] pgd = ffff8006f78b0000
[   58.169822] [00000000] *pgd=000000073773b003 , *pud=0000000737c70003 ,
 *pmd=0000000000000000
[   58.179738]
[   58.181548] Internal error: Oops: 96000046 [#1] PREEMPT SMP
[   58.187484] Modules linked in:
[   58.190919] CPU: 0 PID: 2898 Comm: dma_ioctl Not tainted 4.9.0-00002-g501
07f2-dirty xen-troops#147
[   58.199438] Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
[   58.206680] task: ffff8006f77b1900 task.stack: ffff8006f7bd4000
[   58.213090] PC is at rcar_dmac_chan_prep_sg+0xa4/0x3f0
[   58.218725] LR is at rcar_dmac_chan_prep_sg+0x6c/0x3f0

Signed-off-by: Hiroyuki Yokoyama <hiroyuki.yokoyama.vx@renesas.com>
andr2000 pushed a commit that referenced this issue May 30, 2017
This supports USB2 PHY channel #1 on H3ULCB board with R8A7795 ES1.x SoC

Signed-off-by: Vladimir Barinov <vladimir.barinov@cogentembedded.com>
Signed-off-by: Takeshi Kihara <takeshi.kihara.df@renesas.com>
andr2000 pushed a commit that referenced this issue Aug 7, 2017
Since commit a729526 ("ASoC: sgtl5000: add avc support") the
following kernel crash happens after running a 'reboot' command:

ALSA: Storing mixer settings...
[   20.031604] Unable to handle kernel paging request at virtual address fffffffe
[   20.039268] pgd = de2a0000
[   20.041999] [fffffffe] *pgd=8fffd861, *pte=00000000, *ppte=00000000
[   20.048387] Internal error: Oops: 80000007 [#1] SMP ARM

The function that takes a kcontrol parameter and returns the
codec that registered the control is snd_soc_kcontrol_codec(),
so use the correct function to fix the problem.

Fixes: a729526 ("ASoC: sgtl5000: add avc support")
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Tested-by: Richard Leitner <richard.leitner@skidata.com>
Reviewed-by: Richard Leitner <richard.leitner@skidata.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
…dline

If ipv6 has been disabled from cmdline since kernel started, it makes
no sense to allow users to create any ip6 tunnel. Otherwise, it could
some potential problem.

Jianlin found a kernel crash caused by this in ip6_gre when he set
ipv6.disable=1 in grub:

[  209.588865] Unable to handle kernel paging request for data at address 0x00000080
[  209.588872] Faulting instruction address: 0xc000000000a3aa6c
[  209.588879] Oops: Kernel access of bad area, sig: 11 [#1]
[  209.589062] NIP [c000000000a3aa6c] fib_rules_lookup+0x4c/0x260
[  209.589071] LR [c000000000b9ad90] fib6_rule_lookup+0x50/0xb0
[  209.589076] Call Trace:
[  209.589097] fib6_rule_lookup+0x50/0xb0
[  209.589106] rt6_lookup+0xc4/0x110
[  209.589116] ip6gre_tnl_link_config+0x214/0x2f0 [ip6_gre]
[  209.589125] ip6gre_newlink+0x138/0x3a0 [ip6_gre]
[  209.589134] rtnl_newlink+0x798/0xb80
[  209.589142] rtnetlink_rcv_msg+0xec/0x390
[  209.589151] netlink_rcv_skb+0x138/0x150
[  209.589159] rtnetlink_rcv+0x48/0x70
[  209.589169] netlink_unicast+0x538/0x640
[  209.589175] netlink_sendmsg+0x40c/0x480
[  209.589184] ___sys_sendmsg+0x384/0x4e0
[  209.589194] SyS_sendmsg+0xd4/0x140
[  209.589201] SyS_socketcall+0x3e0/0x4f0
[  209.589209] system_call+0x38/0xe0

This patch is to return -EOPNOTSUPP in ip6_tunnel_init if ipv6 has been
disabled from cmdline.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
The "Release:" field of the spec file is determined based on the
.version file.

However, the .version file is not copied to the source tar file.
So, when we build the kernel from the source package, the UTS_VERSION
always indicates #1.  This does not match with "rpm -q".

The kernel UTS_VERSION and "rpm -q" do not agree for binrpm-pkg, either.
Please note the kernel has already been built before the spec file is
created.  Currently, mkspec invokes mkversion.  This script returns an
incremented version.  So, the "Release:" field of the spec file is
greater than the version in the kernel by one.

For the source package build (where .version file is missing), we can
give KBUILD_BUILD_VERSION=%{release} to the build command.

For the binary package build, we can simply read out the .version file
because it contains the version number that was used for building the
kernel image.

We can remove scripts/mkversion because scripts/package/Makefile need
not touch the .version file.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
…LPAR

Commit 215ee76 ("powerpc: pseries: remove dlpar_attach_node
dependency on full path") reworked dlpar_attach_node() to no longer
look up the parent node "/cpus", but instead to have the parent node
passed by the caller in the function parameter list.

As a result dlpar_attach_node() is no longer responsible for freeing
the reference to the parent node. However, commit 215ee76 failed
to remove the of_node_put(parent) call in dlpar_attach_node(), or to
take into account that the reference to the parent in the caller
dlpar_cpu_add() needs to be held until after dlpar_attach_node()
returns.

As a result doing repeated cpu add/remove dlpar operations will
eventually result in the following error:

  OF: ERROR: Bad of_node_put() on /cpus
  CPU: 0 PID: 10896 Comm: drmgr Not tainted 4.13.0-autotest #1
  Call Trace:
   dump_stack+0x15c/0x1f8 (unreliable)
   of_node_release+0x1a4/0x1c0
   kobject_put+0x1a8/0x310
   kobject_del+0xbc/0xf0
   __of_detach_node_sysfs+0x144/0x210
   of_detach_node+0xf0/0x180
   dlpar_detach_node+0xc4/0x120
   dlpar_cpu_remove+0x280/0x560
   dlpar_cpu_release+0xbc/0x1b0
   arch_cpu_release+0x6c/0xb0
   cpu_release_store+0xa0/0x100
   dev_attr_store+0x68/0xa0
   sysfs_kf_write+0xa8/0xf0
   kernfs_fop_write+0x2cc/0x400
   __vfs_write+0x5c/0x340
   vfs_write+0x1a8/0x3d0
   SyS_write+0xa8/0x1a0
   system_call+0x58/0x6c

Fix the issue by removing the of_node_put(parent) call from
dlpar_attach_node(), and ensuring that the reference to the parent
node is properly held and released by the caller dlpar_cpu_add().

Fixes: 215ee76 ("powerpc: pseries: remove dlpar_attach_node dependency on full path")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
[mpe: Add a comment in the code and frob the change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
…() returns

The cgroup_taskset structure within the larger cgroup_mgctx structure
is supposed to be used once and then discarded. That is not really the
case in the hotplug code path:

cpuset_hotplug_workfn()
 - cgroup_transfer_tasks()
   - cgroup_migrate()
     - cgroup_migrate_add_task()
     - cgroup_migrate_execute()

In this case, the cgroup_migrate() function is called multiple time
with the same cgroup_mgctx structure to transfer the tasks from
one cgroup to another one-by-one. The second time cgroup_migrate()
is called, the cgroup_taskset will be in an incorrect state and so
may cause the system to panic. For example,

  [  150.888410] Faulting instruction address: 0xc0000000001db648
  [  150.888414] Oops: Kernel access of bad area, sig: 11 [#1]
  [  150.888417] SMP NR_CPUS=2048
  [  150.888417] NUMA
  [  150.888419] pSeries
    :
  [  150.888545] NIP [c0000000001db648] cpuset_can_attach+0x58/0x1b0
  [  150.888548] LR [c0000000001db638] cpuset_can_attach+0x48/0x1b0
  [  150.888551] Call Trace:
  [  150.888554] [c0000005f65cb940] [c0000000001db638] cpuset_can_attach+0x48/0x1b 0 (unreliable)
  [  150.888559] [c0000005f65cb9a0] [c0000000001cff04] cgroup_migrate_execute+0xc4/0x4b0
  [  150.888563] [c0000005f65cba20] [c0000000001d7d14] cgroup_transfer_tasks+0x1d4/0x370
  [  150.888568] [c0000005f65cbb70] [c0000000001ddcb0] cpuset_hotplug_workfn+0x710/0x8f0
  [  150.888572] [c0000005f65cbc80] [c00000000012032c] process_one_work+0x1ac/0x4d0
  [  150.888576] [c0000005f65cbd20] [c0000000001206f8] worker_thread+0xa8/0x5b0
  [  150.888580] [c0000005f65cbdc0] [c0000000001293f8] kthread+0x168/0x1b0
  [  150.888584] [c0000005f65cbe30] [c00000000000b368] ret_from_kernel_thread+0x5c/0x74

To allow reuse of the cgroup_mgctx structure, some fields in that
structure are now re-initialized at the end of cgroup_migrate_execute()
function call so that the structure can be reused again in a later
iteration without causing problem.

This bug was introduced in the commit e595cd7 ("group: track
migration context in cgroup_mgctx") in 4.11. This commit moves the
cgroup_taskset initialization out of cgroup_migrate(). The commit
10467270fb3 ("cgroup: don't call migration methods if there are no
tasks to migrate") helped, but did not completely resolve the problem.

Fixes: e595cd7 ("group: track migration context in cgroup_mgctx")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v4.11+
andr2000 pushed a commit that referenced this issue Nov 2, 2017
Build with the latest patches resulted in panic:
11384.486289] BUG: unable to handle kernel NULL pointer dereference at
         (null)
[11384.486293] IP:           (null)
[11384.486295] PGD 0
[11384.486295] P4D 0
[11384.486296]
[11384.486299] Oops: 0010 [#1] SMP
......... snip ......
[11384.486401] CPU: 0 PID: 968 Comm: kworker/0:1H Tainted: G        W  O
    4.13.0-a-stream-20170825 #1
[11384.486402] Hardware name: Intel Corporation S2600WT2R/S2600WT2R,
BIOS SE5C610.86B.01.01.0014.121820151719 12/18/2015
[11384.486418] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[11384.486419] task: ffff880850579680 task.stack: ffffc90007fec000
[11384.486420] RIP: 0010:          (null)
[11384.486420] RSP: 0018:ffffc90007fef970 EFLAGS: 00010206
[11384.486421] RAX: ffff88084cfe8000 RBX: ffff88084dce4000 RCX:
ffffc90007fef978
[11384.486422] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff88084cfe8000
[11384.486422] RBP: ffffc90007fefab0 R08: 0000000000000000 R09:
ffff88084dce4080
[11384.486423] R10: ffffffffa02d7f60 R11: 0000000000000000 R12:
ffff88105af65a00
[11384.486423] R13: ffff88084dce4000 R14: 000000000000c000 R15:
000000000000c000
[11384.486424] FS:  0000000000000000(0000) GS:ffff88085f400000(0000)
knlGS:0000000000000000
[11384.486425] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11384.486425] CR2: 0000000000000000 CR3: 0000000001c09000 CR4:
00000000001406f0
[11384.486426] Call Trace:
[11384.486431]  ? is_valid_mcast_lid.isra.21+0xfb/0x110 [ib_core]
[11384.486436]  ib_attach_mcast+0x6f/0xa0 [ib_core]
[11384.486441]  ipoib_mcast_attach+0x81/0x190 [ib_ipoib]
[11384.486443]  ipoib_mcast_join_complete+0x354/0xb40 [ib_ipoib]
[11384.486448]  mcast_work_handler+0x330/0x6c0 [ib_core]
[11384.486452]  join_handler+0x101/0x220 [ib_core]
[11384.486455]  ib_sa_mcmember_rec_callback+0x54/0x80 [ib_core]
[11384.486459]  recv_handler+0x3a/0x60 [ib_core]
[11384.486462]  ib_mad_recv_done+0x423/0x9b0 [ib_core]
[11384.486466]  __ib_process_cq+0x5d/0xb0 [ib_core]
[11384.486469]  ib_cq_poll_work+0x20/0x60 [ib_core]
[11384.486472]  process_one_work+0x149/0x360
[11384.486474]  worker_thread+0x4d/0x3c0
[11384.486487]  kthread+0x109/0x140
[11384.486488]  ? rescuer_thread+0x380/0x380
[11384.486489]  ? kthread_park+0x60/0x60
[11384.486490]  ? kthread_park+0x60/0x60
[11384.486493]  ret_from_fork+0x25/0x30
[11384.486493] Code:  Bad RIP value.
[11384.486493] Code:  Bad RIP value.
[11384.486496] RIP:           (null) RSP: ffffc90007fef970
[11384.486497] CR2: 0000000000000000
[11384.486531] ---[ end trace b1acec6fb4ff6e75 ]---
[11384.532133] Kernel panic - not syncing: Fatal exception
[11384.536541] Kernel Offset: disabled
[11384.969491] ---[ end Kernel panic - not syncing: Fatal exception
[11384.976875] sched: Unexpected reschedule of offline CPU#1!
[11384.983646] ------------[ cut here ]------------

Rdma device driver may not have implemented (*get_link_layer)()
so it can not be called directly. Should use appropriate helper function.

Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Fixes: 5236333 ("IB/core: Fix the validations of a multicast LID in attach or detach operations")
Cc: stable@kernel.org # 4.13
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
Netdev event notifier registration/de-registration is not
synchronized with a lock and there is a possibility of a
duplicate registration of notifier before the unregister
completes.

Register netdev event notifiers during module init and
de-register them at module exit.

This avoids the need to tie the registration to first netdev
client interface open and de-registration to last client
interface close and the synchronization to achieve it.

This also fixes a crash due to duplicate registration.

BUG: unable to handle kernel paging request at ffffffffa0d60388
IP: [<ffffffff8160f75d>] notifier_call_chain+0x3d/0x70
PGD 190d067 PUD 190e063 PMD 76c840067 PTE 0
Oops: 0000 [#1] SMP
Modules linked in: i40e(OF-) fuse btrfs zlib_deflate raid6_pq xor vfat msdos
[..]
e1000e vxlan ip_tunnel ptp pps_core i2c_core video [last unloaded: i40iw]
CPU: 1 PID: 27101 Comm: modprobe Tainted: GF       W  O--------------   3.10.0-229.el7.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q87M-D2H, BIOS F7 01/17/2014
task: ffff88076e8a96c0 ti: ffff8806959c8000 task.ti: ffff8806959c8000
RIP: 0010:[<ffffffff8160f75d>]  [<ffffffff8160f75d>] notifier_call_chain+0x3d/0x70
RSP: 0018:ffff8806959cbb38  EFLAGS: 00010282
RAX: ffffffffa0d60380 RBX: 00000000fffffffd RCX: 0000000000000000
0708] RDX: 0000000000000000 RSI: ffff88081227a000 RDI: 0000000000000002
RBP: ffff8806959cbb60 R08: 0000000000000246 R09: 000000000000700c
R10: ffff88080e16ea40 R11: 00000000000ae8df R12: ffffffffa0d60380
R13: 0000000000000002 R14: ffff88076e738800 R15: 0000000000000000
FS:  00007f604ef4a740(0000) GS:ffff88083e240000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0d60388 CR3: 0000000753cd2000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffffffff819e73a0 0000000000000000 0000000000000002 ffff88076e738800
00000000ffffffff ffff8806959cbba0 ffffffff8109d61d 0000000000000000
0000000000000000 ffff88076e738800 0000000000000000 ffff88076e738800
Call Trace:
[<ffffffff8109d61d>] __blocking_notifier_call_chain+0x4d/0x70
[<ffffffff8109d656>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff8156b9e4>] __inet_del_ifa+0x154/0x2b0
[<ffffffff8156d102>] inetdev_event+0x182/0x530
[<ffffffff8160f76c>] notifier_call_chain+0x4c/0x70
[<ffffffff8109d446>] raw_notifier_call_chain+0x16/0x20
[<ffffffff814f71fd>] call_netdevice_notifiers+0x2d/0x60
[<ffffffff814f8845>] rollback_registered_many+0x105/0x220
[<ffffffff814f89a0>] rollback_registered+0x40/0x70
[<ffffffff814f9c88>] unregister_netdevice_queue+0x48/0x80
[<ffffffff814f9cdc>] unregister_netdev+0x1c/0x30
[<ffffffffa0067139>] i40e_vsi_release+0x2a9/0x2b0 [i40e]
[<ffffffffa00674e8>] i40e_remove+0x128/0x2b0 [i40e]
[<ffffffff813092db>] pci_device_remove+0x3b/0xb0
[<ffffffff813d26ef>] __device_release_driver+0x7f/0xf0
[<ffffffff813d3068>] driver_detach+0xb8/0xc0
[<ffffffff813d22db>] bus_remove_driver+0x9b/0x120
[<ffffffff813d36dc>] driver_unregister+0x2c/0x50
[<ffffffff81307d4c>] pci_unregister_driver+0x2c/0x90
[<ffffffffa008f9d0>] i40e_exit_module+0x10/0x23 [i40e]
[<ffffffff810dad0b>] SyS_delete_module+0x16b/0x2d0
[<ffffffff81013b0c>] ? do_notify_resume+0x9c/0xb0
[<ffffffff81613da9>] system_call_fastpath+0x16/0x1b
Code: e5 41 57 4d 89 c7 41 56 49 89 d6 41 55 49 89 f5 41 54 53 89 cb
75 14 eb 3d 0f 1f 44 00 00 83 eb 01 74 25 4d 85 e4 74 20 4c 89 e0 <4c>
8b 60 08 4c 89 f2 4c 89 ee 48 89 c7 ff 10 4d 85 ff 74 04 41
RIP  [<ffffffff8160f75d>] notifier_call_chain+0x3d/0x70

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
Xiaolong reported a suspicious rcu_dereference_check in the device
unregister notifier callback. Since we do not dereference the
rx_handler_data, it's ok to just check for the value of the pointer.
Note that this section is already protected by rtnl_lock.

[  101.364846] WARNING: suspicious RCU usage
[  101.365654] 4.13.0-rc6-01701-gceed73a #1 Not tainted
[  101.370873] -----------------------------
[  101.372472] drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c:57 suspicious rcu_dereference_check() usage!
[  101.374427]
[  101.374427] other info that might help us debug this:
[  101.374427]
[  101.387491]
[  101.387491] rcu_scheduler_active = 2, debug_locks = 1
[  101.389368] 1 lock held by trinity-main/2809:
[  101.390736]  #0:  (rtnl_mutex){+.+.+.}, at: [<8146085b>] rtnl_lock+0xf/0x11
[  101.395482]
[  101.395482] stack backtrace:
[  101.396948] CPU: 0 PID: 2809 Comm: trinity-main Not tainted 4.13.0-rc6-01701-gceed73a #1
[  101.398857] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
[  101.401079] Call Trace:
[  101.401656]  dump_stack+0xa1/0xeb
[  101.402871]  lockdep_rcu_suspicious+0xc7/0xd0
[  101.403665]  rmnet_is_real_dev_registered+0x40/0x4e
[  101.405199]  rmnet_config_notify_cb+0x2c/0x142
[  101.406344]  ? wireless_nlevent_flush+0x47/0x71
[  101.407385]  notifier_call_chain+0x2d/0x47
[  101.408645]  raw_notifier_call_chain+0xc/0xe
[  101.409882]  call_netdevice_notifiers_info+0x41/0x49
[  101.411402]  call_netdevice_notifiers+0xc/0xe
[  101.412713]  rollback_registered_many+0x268/0x36e
[  101.413702]  rollback_registered+0x39/0x56
[  101.414965]  unregister_netdevice_queue+0x79/0x88
[  101.415908]  unregister_netdev+0x16/0x1d

Fixes: ceed73a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
The driver calls spi_get_drvdata() in its ->remove hook even though it
has never called spi_set_drvdata().  Stack trace for posterity:

Unable to handle kernel NULL pointer dereference at virtual address 00000220
Internal error: Oops: 5 [#1] SMP ARM
[<8072f564>] (mutex_lock) from [<7f1400d0>] (iio_device_unregister+0x24/0x7c [industrialio])
[<7f1400d0>] (iio_device_unregister [industrialio]) from [<7f15e020>] (mcp320x_remove+0x20/0x30 [mcp320x])
[<7f15e020>] (mcp320x_remove [mcp320x]) from [<8055a8cc>] (spi_drv_remove+0x2c/0x44)
[<8055a8cc>] (spi_drv_remove) from [<805087bc>] (__device_release_driver+0x98/0x134)
[<805087bc>] (__device_release_driver) from [<80509180>] (driver_detach+0xdc/0xe0)
[<80509180>] (driver_detach) from [<8050823c>] (bus_remove_driver+0x5c/0xb0)
[<8050823c>] (bus_remove_driver) from [<80509ab0>] (driver_unregister+0x38/0x58)
[<80509ab0>] (driver_unregister) from [<7f15e69c>] (mcp320x_driver_exit+0x14/0x1c [mcp320x])
[<7f15e69c>] (mcp320x_driver_exit [mcp320x]) from [<801a78d0>] (SyS_delete_module+0x184/0x1d0)
[<801a78d0>] (SyS_delete_module) from [<80108100>] (ret_fast_syscall+0x0/0x1c)

Fixes: f5ce4a7 ("iio: adc: add driver for MCP3204/08 12-bit ADC")
Cc: Oskar Andero <oskar.andero@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
Because keyctl_read_key() looks up the key with no permissions
requested, it may find a negatively instantiated key.  If the key is
also possessed, we went ahead and called ->read() on the key.  But the
key payload will actually contain the ->reject_error rather than the
normal payload.  Thus, the kernel oopses trying to read the
user_key_payload from memory address (int)-ENOKEY = 0x00000000ffffff82.

Fortunately the payload data is stored inline, so it shouldn't be
possible to abuse this as an arbitrary memory read primitive...

Reproducer:
    keyctl new_session
    keyctl request2 user desc '' @s
    keyctl read $(keyctl show | awk '/user: desc/ {print $1}')

It causes a crash like the following:
     BUG: unable to handle kernel paging request at 00000000ffffff92
     IP: user_read+0x33/0xa0
     PGD 36a54067 P4D 36a54067 PUD 0
     Oops: 0000 [#1] SMP
     CPU: 0 PID: 211 Comm: keyctl Not tainted 4.14.0-rc1 torvalds#337
     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
     task: ffff90aa3b74c3c0 task.stack: ffff9878c0478000
     RIP: 0010:user_read+0x33/0xa0
     RSP: 0018:ffff9878c047bee8 EFLAGS: 00010246
     RAX: 0000000000000001 RBX: ffff90aa3d7da340 RCX: 0000000000000017
     RDX: 0000000000000000 RSI: 00000000ffffff82 RDI: ffff90aa3d7da340
     RBP: ffff9878c047bf00 R08: 00000024f95da94f R09: 0000000000000000
     R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
     R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
     FS:  00007f58ece69740(0000) GS:ffff90aa3e200000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 00000000ffffff92 CR3: 0000000036adc001 CR4: 00000000003606f0
     Call Trace:
      keyctl_read_key+0xac/0xe0
      SyS_keyctl+0x99/0x120
      entry_SYSCALL_64_fastpath+0x1f/0xbe
     RIP: 0033:0x7f58ec787bb9
     RSP: 002b:00007ffc8d401678 EFLAGS: 00000206 ORIG_RAX: 00000000000000fa
     RAX: ffffffffffffffda RBX: 00007ffc8d402800 RCX: 00007f58ec787bb9
     RDX: 0000000000000000 RSI: 00000000174a63ac RDI: 000000000000000b
     RBP: 0000000000000004 R08: 00007ffc8d402809 R09: 0000000000000020
     R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffc8d402800
     R13: 00007ffc8d4016e0 R14: 0000000000000000 R15: 0000000000000000
     Code: e5 41 55 49 89 f5 41 54 49 89 d4 53 48 89 fb e8 a4 b4 ad ff 85 c0 74 09 80 3d b9 4c 96 00 00 74 43 48 8b b3 20 01 00 00 4d 85 ed <0f> b7 5e 10 74 29 4d 85 e4 74 24 4c 39 e3 4c 89 e2 4c 89 ef 48
     RIP: user_read+0x33/0xa0 RSP: ffff9878c047bee8
     CR2: 00000000ffffff92

Fixes: 61ea0c0 ("KEYS: Skip key state checks when checking for possession")
Cc: <stable@vger.kernel.org>	[v3.13+]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
…rse nlmsg properly

ChunYu found a kernel crash by syzkaller:

[  651.617875] kasan: CONFIG_KASAN_INLINE enabled
[  651.618217] kasan: GPF could be caused by NULL-ptr deref or user memory access
[  651.618731] general protection fault: 0000 [#1] SMP KASAN
[  651.621543] CPU: 1 PID: 9539 Comm: scsi Not tainted 4.11.0.cov xen-troops#32
[  651.621938] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  651.622309] task: ffff880117780000 task.stack: ffff8800a3188000
[  651.622762] RIP: 0010:skb_release_data+0x26c/0x590
[...]
[  651.627260] Call Trace:
[  651.629156]  skb_release_all+0x4f/0x60
[  651.629450]  consume_skb+0x1a5/0x600
[  651.630705]  netlink_unicast+0x505/0x720
[  651.632345]  netlink_sendmsg+0xab2/0xe70
[  651.633704]  sock_sendmsg+0xcf/0x110
[  651.633942]  ___sys_sendmsg+0x833/0x980
[  651.637117]  __sys_sendmsg+0xf3/0x240
[  651.638820]  SyS_sendmsg+0x32/0x50
[  651.639048]  entry_SYSCALL_64_fastpath+0x1f/0xc2

It's caused by skb_shared_info at the end of sk_buff was overwritten by
ISCSI_KEVENT_IF_ERROR when parsing nlmsg info from skb in iscsi_if_rx.

During the loop if skb->len == nlh->nlmsg_len and both are sizeof(*nlh),
ev = nlmsg_data(nlh) will acutally get skb_shinfo(SKB) instead and set a
new value to skb_shinfo(SKB)->nr_frags by ev->type.

This patch is to fix it by checking nlh->nlmsg_len properly there to
avoid over accessing sk_buff.

Reported-by: ChunYu Wang <chunwang@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
Guillaume Nault says:

====================
l2tp: fix some races in session deletion

L2TP provides several interfaces for deleting sessions. Using two of
them concurrently can lead to use-after-free bugs.

Patch #2 uses a flag to prevent double removal of L2TP sessions.
Patch #1 fixes a bug found in the way. Fixing this bug is also
necessary for patch #2 to handle all cases.

This issue is similar to the tunnel deletion bug being worked on by
Sabrina: https://patchwork.ozlabs.org/patch/814173/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
ti-cpufreq and cpufreq-dt-platdev drivers are registering platform-device
with same name "cpufreq-dt" using platform_device_register_*() routines.
This is leading to build warnings appended below.

Providing hardware information to OPP framework along with the platform-
device creation should be done by ti-cpufreq driver before cpufreq-dt
driver comes into place.

This patch add's TI am33xx, am43 and dra7 platforms (which use opp-v2
property) to the blacklist of devices in cpufreq-dt-platform driver to
avoid creating platform-device twice and remove build warnings.

[    2.370167] ------------[ cut here ]------------
[    2.375087] WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x58/0x78
[    2.383112] sysfs: cannot create duplicate filename '/devices/platform/cpufreq-dt'
[    2.391219] Modules linked in:
[    2.394506] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-next-20170912 #1
[    2.402006] Hardware name: Generic AM33XX (Flattened Device Tree)
[    2.408437] [<c0110a28>] (unwind_backtrace) from [<c010ca84>] (show_stack+0x10/0x14)
[    2.416568] [<c010ca84>] (show_stack) from [<c0827d64>] (dump_stack+0xac/0xe0)
[    2.424165] [<c0827d64>] (dump_stack) from [<c0137470>] (__warn+0xd8/0x104)
[    2.431488] [<c0137470>] (__warn) from [<c01374d0>] (warn_slowpath_fmt+0x34/0x44)
[    2.439351] [<c01374d0>] (warn_slowpath_fmt) from [<c03459d0>] (sysfs_warn_dup+0x58/0x78)
[    2.447938] [<c03459d0>] (sysfs_warn_dup) from [<c0345ab8>] (sysfs_create_dir_ns+0x80/0x98)
[    2.456719] [<c0345ab8>] (sysfs_create_dir_ns) from [<c082c554>] (kobject_add_internal+0x9c/0x2d4)
[    2.466124] [<c082c554>] (kobject_add_internal) from [<c082c7d8>] (kobject_add+0x4c/0x9c)
[    2.474712] [<c082c7d8>] (kobject_add) from [<c05803e4>] (device_add+0xcc/0x57c)
[    2.482489] [<c05803e4>] (device_add) from [<c0584b74>] (platform_device_add+0x100/0x220)
[    2.491085] [<c0584b74>] (platform_device_add) from [<c05855a8>] (platform_device_register_full+0xf4/0x118)
[    2.501305] [<c05855a8>] (platform_device_register_full) from [<c067023c>] (ti_cpufreq_init+0x150/0x22c)
[    2.511253] [<c067023c>] (ti_cpufreq_init) from [<c0101df4>] (do_one_initcall+0x3c/0x170)
[    2.519838] [<c0101df4>] (do_one_initcall) from [<c0c00eb4>] (kernel_init_freeable+0x1fc/0x2c4)
[    2.528974] [<c0c00eb4>] (kernel_init_freeable) from [<c083bcac>] (kernel_init+0x8/0x110)
[    2.537565] [<c083bcac>] (kernel_init) from [<c0107d18>] (ret_from_fork+0x14/0x3c)
[    2.545981] ---[ end trace 2fc00e213c13ab20 ]---
[    2.551051] ------------[ cut here ]------------
[    2.555931] WARNING: CPU: 0 PID: 1 at lib/kobject.c:240 kobject_add_internal+0x254/0x2d4
[    2.564578] kobject_add_internal failed for cpufreq-dt with -EEXIST, don't try to register
things with the same name in the same directory.
[    2.577977] Modules linked in:
[    2.581261] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       4.13.0-next-20170912 #1
[    2.590013] Hardware name: Generic AM33XX (Flattened Device Tree)
[    2.596437] [<c0110a28>] (unwind_backtrace) from [<c010ca84>] (show_stack+0x10/0x14)
[    2.604573] [<c010ca84>] (show_stack) from [<c0827d64>] (dump_stack+0xac/0xe0)
[    2.612172] [<c0827d64>] (dump_stack) from [<c0137470>] (__warn+0xd8/0x104)
[    2.619494] [<c0137470>] (__warn) from [<c01374d0>] (warn_slowpath_fmt+0x34/0x44)
[    2.627362] [<c01374d0>] (warn_slowpath_fmt) from [<c082c70c>] (kobject_add_internal+0x254/0x2d4)
[    2.636666] [<c082c70c>] (kobject_add_internal) from [<c082c7d8>] (kobject_add+0x4c/0x9c)
[    2.645255] [<c082c7d8>] (kobject_add) from [<c05803e4>] (device_add+0xcc/0x57c)
[    2.653027] [<c05803e4>] (device_add) from [<c0584b74>] (platform_device_add+0x100/0x220)
[    2.661615] [<c0584b74>] (platform_device_add) from [<c05855a8>] (platform_device_register_full+0xf4/0x118)
[    2.671833] [<c05855a8>] (platform_device_register_full) from [<c067023c>] (ti_cpufreq_init+0x150/0x22c)
[    2.681779] [<c067023c>] (ti_cpufreq_init) from [<c0101df4>] (do_one_initcall+0x3c/0x170)
[    2.690377] [<c0101df4>] (do_one_initcall) from [<c0c00eb4>] (kernel_init_freeable+0x1fc/0x2c4)
[    2.699510] [<c0c00eb4>] (kernel_init_freeable) from [<c083bcac>] (kernel_init+0x8/0x110)
[    2.708106] [<c083bcac>] (kernel_init) from [<c0107d18>] (ret_from_fork+0x14/0x3c)
[    2.716217] ---[ end trace 2fc00e213c13ab21 ]---

Fixes: edeec42 (cpufreq: dt-cpufreq: platdev Automatically create device with OPP v2)
Signed-off-by: Suniel Mahesh <sunil.m@techveda.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
The ARM short descriptor has already limited the physical address
to 32bit after the commit <76557391433c> ("iommu/io-pgtable: Sanitise
map/unmap addresses"). But in MediaTek 4GB mode, the physical address
is from 0x1_0000_0000 to 0x1_ffff_ffff. this will cause:

WARNING: CPU: 4 PID: 3900 at
xxx/drivers/iommu/io-pgtable-arm-v7s.c:482 arm_v7s_map+0x40/0xf8
Modules linked in:

CPU: 4 PID: 3900 Comm: weston Tainted: G S      W       4.9.44 #1
Hardware name: MediaTek MT2712m1v1 board (DT)
task: ffffffc0eaa5b280 task.stack: ffffffc0e9858000
PC is at arm_v7s_map+0x40/0xf8
LR is at mtk_iommu_map+0x64/0x90
pc : [<ffffff80085b09e8>] lr : [<ffffff80085b29fc>] pstate: 000001c5
sp : ffffffc0e985b920
x29: ffffffc0e985b920 x28: 0000000127d00000
x27: 0000000000100000 x26: ffffff8008f9e000
x25: 0000000000000003 x24: 0000000000100000
x23: 0000000127d00000 x22: 00000000ff800000
x21: ffffffc0f7ec8ce0 x20: 0000000000000003
x19: 0000000000000003 x18: 0000000000000002
x17: 0000007f7e5d72c0 x16: ffffff80082b0f08
x15: 0000000000000001 x14: 000000000000003f
x13: 0000000000000000 x12: 0000000000000028
x11: 0088000000000000 x10: 0000000000000000
x9 : ffffff80092fa000 x8 : ffffffc0e9858000
x7 : ffffff80085b29d8 x6 : 0000000000000000
x5 : ffffff80085b09a8 x4 : 0000000000000003
x3 : 0000000000100000 x2 : 0000000127d00000
x1 : 00000000ff800000 x0 : 0000000000000001
...
Call trace:
[<ffffff80085b09e8>] arm_v7s_map+0x40/0xf8
[<ffffff80085b29fc>] mtk_iommu_map+0x64/0x90
[<ffffff80085ab5f8>] iommu_map+0x100/0x3a0
[<ffffff80085ab99c>] default_iommu_map_sg+0x104/0x168
[<ffffff80085aead8>] iommu_dma_alloc+0x238/0x3f8
[<ffffff8008098b30>] __iommu_alloc_attrs+0xa8/0x260
[<ffffff80085f364c>] mtk_drm_gem_create+0xac/0x180
[<ffffff80085f3894>] mtk_drm_gem_dumb_create+0x54/0xc8
[<ffffff80085d576c>] drm_mode_create_dumb_ioctl+0xa4/0xd8
[<ffffff80085cb2a0>] drm_ioctl+0x1c0/0x490

In order to satify this, Limit the physical address to 32bit.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
…ping

When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
mapping overlaps with kernel module virtual space. When mapping in this space
is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
finish at 2MB boundary.

When module loading is just on top of the 2MB space, got below warning:

WARNING: at mm/vmalloc.c:106 vmap_pte_range+0x14e/0x190()
Call Trace:
 [<ffffffff81117083>] warn_alloc_failed+0xf3/0x160
 [<ffffffff81146022>] __vmalloc_area_node+0x182/0x1c0
 [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
 [<ffffffff81145df7>] __vmalloc_node_range+0xa7/0x110
 [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
 [<ffffffff8103ca54>] module_alloc+0x64/0x70
 [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
 [<ffffffff810ac91e>] module_alloc_update_bounds+0x1e/0x80
 [<ffffffff810ac9a7>] move_module+0x27/0x150
 [<ffffffff810aefa0>] layout_and_allocate+0x120/0x1b0
 [<ffffffff810af0a8>] load_module+0x78/0x640
 [<ffffffff811ff90b>] ? security_file_permission+0x8b/0x90
 [<ffffffff810af6d2>] sys_init_module+0x62/0x1e0
 [<ffffffff815154c2>] system_call_fastpath+0x16/0x1b

Then the mapping of 2MB is cleared, finally oops when the page in that space is
accessed.

BUG: unable to handle kernel paging request at ffff880022600000
IP: [<ffffffff81260877>] clear_page_c_e+0x7/0x10
PGD 1788067 PUD 178c067 PMD 22434067 PTE 0
Oops: 0002 [#1] SMP
Call Trace:
 [<ffffffff81116ef7>] ? prep_new_page+0x127/0x1c0
 [<ffffffff81117d42>] get_page_from_freelist+0x1e2/0x550
 [<ffffffff81133010>] ? ii_iovec_copy_to_user+0x90/0x140
 [<ffffffff81119c9d>] __alloc_pages_nodemask+0x12d/0x230
 [<ffffffff81155516>] alloc_pages_vma+0xc6/0x1a0
 [<ffffffff81006ffd>] ? pte_mfn_to_pfn+0x7d/0x100
 [<ffffffff81134cfb>] do_anonymous_page+0x16b/0x350
 [<ffffffff81139c34>] handle_pte_fault+0x1e4/0x200
 [<ffffffff8100712e>] ? xen_pmd_val+0xe/0x10
 [<ffffffff810052c9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff81139dab>] handle_mm_fault+0x15b/0x270
 [<ffffffff81510c10>] do_page_fault+0x140/0x470
 [<ffffffff8150d7d5>] page_fault+0x25/0x30

Call xen_cleanhighmap() with 4MB aligned for page tables mapping to fix it.
The unnecessory call of xen_cleanhighmap() in DEBUG mode is also removed.

-v2: add comment about XEN alignment from Juergen.

References: https://lists.xen.org/archives/html/xen-devel/2012-07/msg01562.html
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

[boris: added 'xen/mmu' tag to commit subject]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
We currently route pte translation faults via do_page_fault, which elides
the address check against TASK_SIZE before invoking the mm fault handling
code. However, this can cause issues with the path walking code in
conjunction with our word-at-a-time implementation because
load_unaligned_zeropad can end up faulting in kernel space if it reads
across a page boundary and runs into a page fault (e.g. by attempting to
read from a guard region).

In the case of such a fault, load_unaligned_zeropad has registered a
fixup to shift the valid data and pad with zeroes, however the abort is
reported as a level 3 translation fault and we dispatch it straight to
do_page_fault, despite it being a kernel address. This results in calling
a sleeping function from atomic context:

  BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313
  in_atomic(): 0, irqs_disabled(): 0, pid: 10290
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  [...]
  [<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144
  [<ffffff8e016cd158>] __might_sleep+0x7c/0x8c
  [<ffffff8e016977f0>] do_page_fault+0x140/0x330
  [<ffffff8e01681328>] do_mem_abort+0x54/0xb0
  Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0)
  [...]
  [<ffffff8e016844fc>] el1_da+0x18/0x78
  [<ffffff8e017f399c>] path_parentat+0x44/0x88
  [<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8
  [<ffffff8e017f5044>] filename_create+0x4c/0x128
  [<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8
  [<ffffff8e01684e30>] el0_svc_naked+0x24/0x28
  Code: 36380080 d5384100 f9400800 9402566d (d4210000)
  ---[ end trace 2d01889f2bca9b9f ]---

Fix this by dispatching all translation faults to do_translation_faults,
which avoids invoking the page fault logic for faults on kernel addresses.

Cc: <stable@vger.kernel.org>
Reported-by: Ankit Jain <ankijain@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
The following lockdep splat has been noticed during LTP testing

  ======================================================
  WARNING: possible circular locking dependency detected
  4.13.0-rc3-next-20170807 xen-troops#12 Not tainted
  ------------------------------------------------------
  a.out/4771 is trying to acquire lock:
   (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140

  but task is already holding lock:
   (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #3 (&mm->mmap_sem){++++++}:
         lock_acquire+0xc9/0x230
         __might_fault+0x70/0xa0
         _copy_to_user+0x23/0x70
         filldir+0xa7/0x110
         xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
         xfs_readdir+0x1fa/0x2c0 [xfs]
         xfs_file_readdir+0x30/0x40 [xfs]
         iterate_dir+0x17a/0x1a0
         SyS_getdents+0xb0/0x160
         entry_SYSCALL_64_fastpath+0x1f/0xbe

  -> #2 (&type->i_mutex_dir_key#3){++++++}:
         lock_acquire+0xc9/0x230
         down_read+0x51/0xb0
         lookup_slow+0xde/0x210
         walk_component+0x160/0x250
         link_path_walk+0x1a6/0x610
         path_openat+0xe4/0xd50
         do_filp_open+0x91/0x100
         file_open_name+0xf5/0x130
         filp_open+0x33/0x50
         kernel_read_file_from_path+0x39/0x80
         _request_firmware+0x39f/0x880
         request_firmware_direct+0x37/0x50
         request_microcode_fw+0x64/0xe0
         reload_store+0xf7/0x180
         dev_attr_store+0x18/0x30
         sysfs_kf_write+0x44/0x60
         kernfs_fop_write+0x113/0x1a0
         __vfs_write+0x37/0x170
         vfs_write+0xc7/0x1c0
         SyS_write+0x58/0xc0
         do_syscall_64+0x6c/0x1f0
         return_from_SYSCALL_64+0x0/0x7a

  -> #1 (microcode_mutex){+.+.+.}:
         lock_acquire+0xc9/0x230
         __mutex_lock+0x88/0x960
         mutex_lock_nested+0x1b/0x20
         microcode_init+0xbb/0x208
         do_one_initcall+0x51/0x1a9
         kernel_init_freeable+0x208/0x2a7
         kernel_init+0xe/0x104
         ret_from_fork+0x2a/0x40

  -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
         __lock_acquire+0x153c/0x1550
         lock_acquire+0xc9/0x230
         cpus_read_lock+0x4b/0x90
         drain_all_stock.part.35+0x18/0x140
         try_charge+0x3ab/0x6e0
         mem_cgroup_try_charge+0x7f/0x2c0
         shmem_getpage_gfp+0x25f/0x1050
         shmem_fault+0x96/0x200
         __do_fault+0x1e/0xa0
         __handle_mm_fault+0x9c3/0xe00
         handle_mm_fault+0x16e/0x380
         __do_page_fault+0x24a/0x530
         do_page_fault+0x30/0x80
         page_fault+0x28/0x30

  other info that might help us debug this:

  Chain exists of:
    cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&mm->mmap_sem);
                                 lock(&type->i_mutex_dir_key#3);
                                 lock(&mm->mmap_sem);
    lock(cpu_hotplug_lock.rw_sem);

   *** DEADLOCK ***

  2 locks held by a.out/4771:
   #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
   #1:  (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0

The problem is very similar to the one fixed by commit a459eeb
("mm, page_alloc: do not depend on cpu hotplug locks inside the
allocator").  We are taking hotplug locks while we can be sitting on top
of basically arbitrary locks.  This just calls for problems.

We can get rid of {get,put}_online_cpus, fortunately.  We do not have to
be worried about races with memory hotplug because drain_local_stock,
which is called from both the WQ draining and the memory hotplug
contexts, is always operating on the local cpu stock with IRQs disabled.

The only thing to be careful about is that the target memcg doesn't
vanish while we are still in drain_all_stock so take a reference on it.

Link: http://lkml.kernel.org/r/20170913090023.28322-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Artem Savkov <asavkov@redhat.com>
Tested-by: Artem Savkov <asavkov@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
printk_ratelimit() invokes ___ratelimit() which may invoke a normal
printk() (pr_warn() in this particular case) to warn about suppressed
output.  Given that printk_ratelimit() may be called from anywhere, that
pr_warn() is dangerous - it may end up deadlocking the system.  Fix
___ratelimit() by using deferred printk().

Sasha reported the following lockdep error:

 : Unregister pv shared memory for cpu 8
 : select_fallback_rq: 3 callbacks suppressed
 : process 8583 (trinity-c78) no longer affine to cpu8
 :
 : ======================================================
 : WARNING: possible circular locking dependency detected
 : 4.14.0-rc2-next-20170927+ torvalds#252 Not tainted
 : ------------------------------------------------------
 : migration/8/62 is trying to acquire lock:
 : (&port_lock_key){-.-.}, at: serial8250_console_write()
 :
 : but task is already holding lock:
 : (&rq->lock){-.-.}, at: sched_cpu_dying()
 :
 : which lock already depends on the new lock.
 :
 :
 : the existing dependency chain (in reverse order) is:
 :
 : -> #3 (&rq->lock){-.-.}:
 : __lock_acquire()
 : lock_acquire()
 : _raw_spin_lock()
 : task_fork_fair()
 : sched_fork()
 : copy_process.part.31()
 : _do_fork()
 : kernel_thread()
 : rest_init()
 : start_kernel()
 : x86_64_start_reservations()
 : x86_64_start_kernel()
 : verify_cpu()
 :
 : -> #2 (&p->pi_lock){-.-.}:
 : __lock_acquire()
 : lock_acquire()
 : _raw_spin_lock_irqsave()
 : try_to_wake_up()
 : default_wake_function()
 : woken_wake_function()
 : __wake_up_common()
 : __wake_up_common_lock()
 : __wake_up()
 : tty_wakeup()
 : tty_port_default_wakeup()
 : tty_port_tty_wakeup()
 : uart_write_wakeup()
 : serial8250_tx_chars()
 : serial8250_handle_irq.part.25()
 : serial8250_default_handle_irq()
 : serial8250_interrupt()
 : __handle_irq_event_percpu()
 : handle_irq_event_percpu()
 : handle_irq_event()
 : handle_level_irq()
 : handle_irq()
 : do_IRQ()
 : ret_from_intr()
 : native_safe_halt()
 : default_idle()
 : arch_cpu_idle()
 : default_idle_call()
 : do_idle()
 : cpu_startup_entry()
 : rest_init()
 : start_kernel()
 : x86_64_start_reservations()
 : x86_64_start_kernel()
 : verify_cpu()
 :
 : -> #1 (&tty->write_wait){-.-.}:
 : __lock_acquire()
 : lock_acquire()
 : _raw_spin_lock_irqsave()
 : __wake_up_common_lock()
 : __wake_up()
 : tty_wakeup()
 : tty_port_default_wakeup()
 : tty_port_tty_wakeup()
 : uart_write_wakeup()
 : serial8250_tx_chars()
 : serial8250_handle_irq.part.25()
 : serial8250_default_handle_irq()
 : serial8250_interrupt()
 : __handle_irq_event_percpu()
 : handle_irq_event_percpu()
 : handle_irq_event()
 : handle_level_irq()
 : handle_irq()
 : do_IRQ()
 : ret_from_intr()
 : native_safe_halt()
 : default_idle()
 : arch_cpu_idle()
 : default_idle_call()
 : do_idle()
 : cpu_startup_entry()
 : rest_init()
 : start_kernel()
 : x86_64_start_reservations()
 : x86_64_start_kernel()
 : verify_cpu()
 :
 : -> #0 (&port_lock_key){-.-.}:
 : check_prev_add()
 : __lock_acquire()
 : lock_acquire()
 : _raw_spin_lock_irqsave()
 : serial8250_console_write()
 : univ8250_console_write()
 : console_unlock()
 : vprintk_emit()
 : vprintk_default()
 : vprintk_func()
 : printk()
 : ___ratelimit()
 : __printk_ratelimit()
 : select_fallback_rq()
 : sched_cpu_dying()
 : cpuhp_invoke_callback()
 : take_cpu_down()
 : multi_cpu_stop()
 : cpu_stopper_thread()
 : smpboot_thread_fn()
 : kthread()
 : ret_from_fork()
 :
 : other info that might help us debug this:
 :
 : Chain exists of:
 :   &port_lock_key --> &p->pi_lock --> &rq->lock
 :
 :  Possible unsafe locking scenario:
 :
 :        CPU0                    CPU1
 :        ----                    ----
 :   lock(&rq->lock);
 :                                lock(&p->pi_lock);
 :                                lock(&rq->lock);
 :   lock(&port_lock_key);
 :
 :  *** DEADLOCK ***
 :
 : 4 locks held by migration/8/62:
 : #0: (&p->pi_lock){-.-.}, at: sched_cpu_dying()
 : #1: (&rq->lock){-.-.}, at: sched_cpu_dying()
 : #2: (printk_ratelimit_state.lock){....}, at: ___ratelimit()
 : #3: (console_lock){+.+.}, at: vprintk_emit()
 :
 : stack backtrace:
 : CPU: 8 PID: 62 Comm: migration/8 Not tainted 4.14.0-rc2-next-20170927+ torvalds#252
 : Call Trace:
 : dump_stack()
 : print_circular_bug()
 : check_prev_add()
 : ? add_lock_to_list.isra.26()
 : ? check_usage()
 : ? kvm_clock_read()
 : ? kvm_sched_clock_read()
 : ? sched_clock()
 : ? check_preemption_disabled()
 : __lock_acquire()
 : ? __lock_acquire()
 : ? add_lock_to_list.isra.26()
 : ? debug_check_no_locks_freed()
 : ? memcpy()
 : lock_acquire()
 : ? serial8250_console_write()
 : _raw_spin_lock_irqsave()
 : ? serial8250_console_write()
 : serial8250_console_write()
 : ? serial8250_start_tx()
 : ? lock_acquire()
 : ? memcpy()
 : univ8250_console_write()
 : console_unlock()
 : ? __down_trylock_console_sem()
 : vprintk_emit()
 : vprintk_default()
 : vprintk_func()
 : printk()
 : ? show_regs_print_info()
 : ? lock_acquire()
 : ___ratelimit()
 : __printk_ratelimit()
 : select_fallback_rq()
 : sched_cpu_dying()
 : ? sched_cpu_starting()
 : ? rcutree_dying_cpu()
 : ? sched_cpu_starting()
 : cpuhp_invoke_callback()
 : ? cpu_disable_common()
 : take_cpu_down()
 : ? trace_hardirqs_off_caller()
 : ? cpuhp_invoke_callback()
 : multi_cpu_stop()
 : ? __this_cpu_preempt_check()
 : ? cpu_stop_queue_work()
 : cpu_stopper_thread()
 : ? cpu_stop_create()
 : smpboot_thread_fn()
 : ? sort_range()
 : ? schedule()
 : ? __kthread_parkme()
 : kthread()
 : ? sort_range()
 : ? kthread_create_on_node()
 : ret_from_fork()
 : process 9121 (trinity-c78) no longer affine to cpu8
 : smpboot: CPU 8 is now offline

Link: http://lkml.kernel.org/r/20170928120405.18273-1-sergey.senozhatsky@gmail.com
Fixes: 6b1d174 ("ratelimit: extend to print suppressed messages on release")
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Borislav Petkov <bp@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
flush_tlb_kernel_range() may call smp_call_function_many() which expects
interrupts to be enabled. This results in a traceback.

WARNING: CPU: 0 PID: 1 at kernel/smp.c:416 smp_call_function_many+0xcc/0x2fc
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc1-00009-g0666f56 #1
task: cf830000 task.stack: cf82e000
NIP:  c00a93c8 LR: c00a9634 CTR: 00000001
REGS: cf82fde0 TRAP: 0700   Not tainted  (4.14.0-rc1-00009-g0666f56)
MSR:  00021000 <CE,ME>  CR: 24000082  XER: 00000000

GPR00: c00a9634 cf82fe90 cf830000 c050ad3c c0015a54 00000000 00000001 00000001
GPR08: 00000001 00000000 00000000 cf82e000 24000084 00000000 c0003150 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000001 00000000 c0510000
GPR24: 00000000 c0015a54 00000000 c050ad3c c051823c c050ad3c 00000025 00000000
NIP [c00a93c8] smp_call_function_many+0xcc/0x2fc
LR [c00a9634] smp_call_function+0x3c/0x50
Call Trace:
[cf82fe90] [00000010] 0x10 (unreliable)
[cf82fed0] [c00a9634] smp_call_function+0x3c/0x50
[cf82fee0] [c0015d2c] flush_tlb_kernel_range+0x20/0x38
[cf82fef0] [c001524c] mark_initmem_nx+0x154/0x16c
[cf82ff20] [c001484c] free_initmem+0x20/0x4c
[cf82ff30] [c000316c] kernel_init+0x1c/0x108
[cf82ff40] [c000f3a8] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
7c0803a6 7d808120 38210040 4e800020 3d20c052 812981a0 2f890000 40beffac
3d20c051 8929ac64 2f890000 40beff9c <0fe00000> 4bffff94 7fc3f378 7f64db78

Fixes: 3184cc4 ("powerpc/mm: Fix kernel RAM protection after freeing ...")
Fixes: e611939 ("powerpc/mm: Ensure change_page_attr() doesn't ...")
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
andr2000 pushed a commit that referenced this issue Nov 2, 2017
While line6_probe() may kick off URB for a control MIDI endpoint, the
function doesn't clean up it properly at its error path.  This results
in a leftover URB action that is eventually triggered later and causes
an Oops like:
  general protection fault: 0000 [#1] PREEMPT SMP KASAN
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted
  RIP: 0010:usb_fill_bulk_urb ./include/linux/usb.h:1619
  RIP: 0010:line6_start_listen+0x3fe/0x9e0 sound/usb/line6/driver.c:76
  Call Trace:
   <IRQ>
   line6_data_received+0x1f7/0x470 sound/usb/line6/driver.c:326
   __usb_hcd_giveback_urb+0x2e0/0x650 drivers/usb/core/hcd.c:1779
   usb_hcd_giveback_urb+0x337/0x420 drivers/usb/core/hcd.c:1845
   dummy_timer+0xba9/0x39f0 drivers/usb/gadget/udc/dummy_hcd.c:1965
   call_timer_fn+0x2a2/0x940 kernel/time/timer.c:1281
   ....

Since the whole clean-up procedure is done in line6_disconnect()
callback, we can simply call it in the error path instead of
open-coding the whole again.  It'll fix such an issue automagically.

The bug was spotted by syzkaller.

Fixes: eedd0e9 ("ALSA: line6: Don't forget to call driver's destructor at error path")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
andr2000 pushed a commit that referenced this issue Dec 22, 2017
Add missing init_list_head for the registered buffer list.
Absence of the init could lead to a unhandled kernel paging
request as below, when streamon/streamoff are called in row.

[338046.571321] Unable to handle kernel paging request at virtual address fffffffffffffe00
[338046.574849] pgd = ffff800034820000
[338046.582381] [fffffffffffffe00] *pgd=00000000b60f5003[338046.582545]
, *pud=00000000b1f31003
, *pmd=0000000000000000[338046.592082]
[338046.597754] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[338046.601671] Modules linked in: venus_enc venus_dec venus_core
usb_f_ecm g_ether usb_f_rndis u_ether libcomposite ipt_MASQUERADE
nf_nat_masquerade_ipv4 arc4 wcn36xx mac80211 btqcomsmd btqca iptable_nat
nf_co]
[338046.662408] CPU: 0 PID: 5433 Comm: irq/160-venus Tainted: G        W
4.9.39+ torvalds#232
[338046.668024] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC
(DT)
[338046.675268] task: ffff80003541cb00 task.stack: ffff800026e20000
[338046.682097] PC is at venus_helper_release_buf_ref+0x28/0x88
[venus_core]
[338046.688282] LR is at vdec_event_notify+0xe8/0x150 [venus_dec]
[338046.695029] pc : [<ffff000000af6c48>] lr : [<ffff000000a6fc60>]
pstate: a0000145
[338046.701256] sp : ffff800026e23bc0
[338046.708494] x29: ffff800026e23bc0 x28: 0000000000000000
[338046.718853] x27: ffff000000afd4f8 x26: ffff800031faa700
[338046.729253] x25: ffff000000afd790 x24: ffff800031faa618
[338046.739664] x23: ffff800003e18138 x22: ffff800002fc9810
[338046.750109] x21: ffff800026e23c28 x20: 0000000000000001
[338046.760592] x19: ffff80002a13b800 x18: 0000000000000010
[338046.771099] x17: 0000ffffa3d01600 x16: ffff000008100428
[338046.781654] x15: 0000000000000006 x14: ffff000089045ba7
[338046.792250] x13: ffff000009045bb6 x12: 00000000004f37c8
[338046.802894] x11: 0000000000267211 x10: 0000000000000000
[338046.813574] x9 : 0000000000032000 x8 : 00000000dc400000
[338046.824274] x7 : 0000000000000000 x6 : ffff800031faa728
[338046.835005] x5 : ffff80002a13b850 x4 : 0000000000000000
[338046.845793] x3 : fffffffffffffdf8 x2 : 0000000000000000
[338046.856602] x1 : 0000000000000003 x0 : ffff80002a13b800

Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Hans Verkuil <hansverk@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
andr2000 pushed a commit that referenced this issue Dec 22, 2017
An out of bounds error was detected on an ARM64 target with
Android based kernel 4.9. This occurs while trying to
restore mark on a skb from an inet request socket.

BUG: KASAN: slab-out-of-bounds in socket_match.isra.2+0xc8/0x1f0 net/netfilter/xt_socket.c:248
Read of size 4 at addr ffffffc06a8d824c by task syz-fuzzer/1532
CPU: 7 PID: 1532 Comm: syz-fuzzer Tainted: G        W  O    4.9.41+ #1
Call trace:
[<ffffff900808d2f8>] dump_backtrace+0x0/0x440 arch/arm64/kernel/traps.c:76
[<ffffff900808d760>] show_stack+0x28/0x38 arch/arm64/kernel/traps.c:226
[<ffffff90085f7dc8>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffff90085f7dc8>] dump_stack+0xe4/0x134 lib/dump_stack.c:51
[<ffffff900830f358>] print_address_description+0x68/0x258 mm/kasan/report.c:248
[<ffffff900830f770>] kasan_report_error mm/kasan/report.c:347 [inline]
[<ffffff900830f770>] kasan_report.part.2+0x228/0x2f0 mm/kasan/report.c:371
[<ffffff900830fdec>] kasan_report+0x5c/0x70 mm/kasan/report.c:372
[<ffffff900830de98>] check_memory_region_inline mm/kasan/kasan.c:308 [inline]
[<ffffff900830de98>] __asan_load4+0x88/0xa0 mm/kasan/kasan.c:740
[<ffffff90097498f8>] socket_match.isra.2+0xc8/0x1f0 net/netfilter/xt_socket.c:248
[<ffffff9009749a5c>] socket_mt4_v1_v2_v3+0x3c/0x48 net/netfilter/xt_socket.c:272
[<ffffff90097f7e4c>] ipt_do_table+0x54c/0xad8 net/ipv4/netfilter/ip_tables.c:311
[<ffffff90097fcf14>] iptable_mangle_hook+0x6c/0x220 net/ipv4/netfilter/iptable_mangle.c:90
...
Allocated by task 1532:
 save_stack_trace_tsk+0x0/0x2a0 arch/arm64/kernel/stacktrace.c:131
 save_stack_trace+0x28/0x38 arch/arm64/kernel/stacktrace.c:215
 save_stack mm/kasan/kasan.c:495 [inline]
 set_track mm/kasan/kasan.c:507 [inline]
 kasan_kmalloc+0xd8/0x188 mm/kasan/kasan.c:599
 kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:537
 slab_post_alloc_hook mm/slab.h:417 [inline]
 slab_alloc_node mm/slub.c:2728 [inline]
 slab_alloc mm/slub.c:2736 [inline]
 kmem_cache_alloc+0x14c/0x2e8 mm/slub.c:2741
 reqsk_alloc include/net/request_sock.h:87 [inline]
 inet_reqsk_alloc+0x4c/0x238 net/ipv4/tcp_input.c:6236
 tcp_conn_request+0x2b0/0xea8 net/ipv4/tcp_input.c:6341
 tcp_v4_conn_request+0xe0/0x100 net/ipv4/tcp_ipv4.c:1256
 tcp_rcv_state_process+0x384/0x18a8 net/ipv4/tcp_input.c:5926
 tcp_v4_do_rcv+0x2f0/0x3e0 net/ipv4/tcp_ipv4.c:1430
 tcp_v4_rcv+0x1278/0x1350 net/ipv4/tcp_ipv4.c:1709
 ip_local_deliver_finish+0x174/0x3e0 net/ipv4/ip_input.c:216

v1->v2: Change socket_mt6_v1_v2_v3() as well as mentioned by Eric
v2->v3: Put the correct fixes tag

Fixes: 01555e7 ("netfilter: xt_socket: add XT_SOCKET_RESTORESKMARK flag")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
andr2000 pushed a commit that referenced this issue Dec 22, 2017
Check for a NULL dsaddr in filelayout_free_lseg() before calling
nfs4_fl_put_deviceid().  This fixes the following oops:

[ 1967.645207] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[ 1967.646010] IP: [<ffffffffc06d6aea>] nfs4_put_deviceid_node+0xa/0x90 [nfsv4]
[ 1967.646010] PGD c08bc067 PUD 915d3067 PMD 0
[ 1967.753036] Oops: 0000 [#1] SMP
[ 1967.753036] Modules linked in: nfs_layout_nfsv41_files ext4 mbcache jbd2 loop rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache amd64_edac_mod ipmi_ssif edac_mce_amd edac_core kvm_amd sg kvm ipmi_si ipmi_devintf irqbypass pcspkr k8temp ipmi_msghandler i2c_piix4 shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common amdkfd amd_iommu_v2 radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mptsas ttm scsi_transport_sas mptscsih drm mptbase serio_raw i2c_core bnx2 dm_mirror dm_region_hash dm_log dm_mod
[ 1967.790031] CPU: 2 PID: 1370 Comm: ls Not tainted 3.10.0-709.el7.test.bz1463784.x86_64 #1
[ 1967.790031] Hardware name: IBM BladeCenter LS21 -[7971AC1]-/Server Blade, BIOS -[BAE155AUS-1.10]- 06/03/2009
[ 1967.790031] task: ffff8800c42a3f40 ti: ffff8800c4064000 task.ti: ffff8800c4064000
[ 1967.790031] RIP: 0010:[<ffffffffc06d6aea>]  [<ffffffffc06d6aea>] nfs4_put_deviceid_node+0xa/0x90 [nfsv4]
[ 1967.790031] RSP: 0000:ffff8800c4067978  EFLAGS: 00010246
[ 1967.790031] RAX: ffffffffc062f000 RBX: ffff8801d468a540 RCX: dead000000000200
[ 1967.790031] RDX: ffff8800c40679f8 RSI: ffff8800c4067a0c RDI: 0000000000000000
[ 1967.790031] RBP: ffff8800c4067980 R08: ffff8801d468a540 R09: 0000000000000000
[ 1967.790031] R10: 0000000000000000 R11: ffffffffffffffff R12: ffff8801d468a540
[ 1967.790031] R13: ffff8800c40679f8 R14: ffff8801d5645300 R15: ffff880126f15ff0
[ 1967.790031] FS:  00007f11053c9800(0000) GS:ffff88012bd00000(0000) knlGS:0000000000000000
[ 1967.790031] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1967.790031] CR2: 0000000000000030 CR3: 0000000094b55000 CR4: 00000000000007e0
[ 1967.790031] Stack:
[ 1967.790031]  ffff8801d468a540 ffff8800c4067990 ffffffffc062d2fe ffff8800c40679b0
[ 1967.790031]  ffffffffc062b5b4 ffff8800c40679f8 ffff8801d468a540 ffff8800c40679d8
[ 1967.790031]  ffffffffc06d39af ffff8800c40679f8 ffff880126f16078 0000000000000001
[ 1967.790031] Call Trace:
[ 1967.790031]  [<ffffffffc062d2fe>] nfs4_fl_put_deviceid+0xe/0x10 [nfs_layout_nfsv41_files]
[ 1967.790031]  [<ffffffffc062b5b4>] filelayout_free_lseg+0x24/0x90 [nfs_layout_nfsv41_files]
[ 1967.790031]  [<ffffffffc06d39af>] pnfs_free_lseg_list+0x5f/0x80 [nfsv4]
[ 1967.790031]  [<ffffffffc06d5a67>] _pnfs_return_layout+0x157/0x270 [nfsv4]
[ 1967.790031]  [<ffffffffc06c17dd>] nfs4_evict_inode+0x4d/0x70 [nfsv4]
[ 1967.790031]  [<ffffffff8121de19>] evict+0xa9/0x180
[ 1967.790031]  [<ffffffff8121e729>] iput+0xf9/0x190
[ 1967.790031]  [<ffffffffc0652cea>] nfs_dentry_iput+0x3a/0x50 [nfs]
[ 1967.790031]  [<ffffffff8121ab4f>] shrink_dentry_list+0x20f/0x490
[ 1967.790031]  [<ffffffff8121b018>] d_invalidate+0xd8/0x150
[ 1967.790031]  [<ffffffffc065446b>] nfs_readdir_page_filler+0x40b/0x600 [nfs]
[ 1967.790031]  [<ffffffffc0654bbd>] nfs_readdir_xdr_to_array+0x20d/0x3b0 [nfs]
[ 1967.790031]  [<ffffffff811f3482>] ? __mem_cgroup_commit_charge+0xe2/0x2f0
[ 1967.790031]  [<ffffffff81183208>] ? __add_to_page_cache_locked+0x48/0x170
[ 1967.790031]  [<ffffffffc0654d60>] ? nfs_readdir_xdr_to_array+0x3b0/0x3b0 [nfs]
[ 1967.790031]  [<ffffffffc0654d82>] nfs_readdir_filler+0x22/0x90 [nfs]
[ 1967.790031]  [<ffffffff8118351f>] do_read_cache_page+0x7f/0x190
[ 1967.790031]  [<ffffffff81215d30>] ? fillonedir+0xe0/0xe0
[ 1967.790031]  [<ffffffff8118366c>] read_cache_page+0x1c/0x30
[ 1967.790031]  [<ffffffffc0654f9b>] nfs_readdir+0x1ab/0x6b0 [nfs]
[ 1967.790031]  [<ffffffffc06bd1c0>] ? nfs4_xdr_dec_layoutget+0x270/0x270 [nfsv4]
[ 1967.790031]  [<ffffffff81215d30>] ? fillonedir+0xe0/0xe0
[ 1967.790031]  [<ffffffff81215c20>] vfs_readdir+0xb0/0xe0
[ 1967.790031]  [<ffffffff81216045>] SyS_getdents+0x95/0x120
[ 1967.790031]  [<ffffffff816b9449>] system_call_fastpath+0x16/0x1b
[ 1967.790031] Code: 90 31 d2 48 89 d0 5d c3 85 f6 74 f5 8d 4e 01 89 f0 f0 0f b1 0f 39 f0 74 e2 89 c6 eb eb 0f 1f 40 00 66 66 66 66 90 55 48 89 e5 53 <48> 8b 47 30 48 89 fb a8 04 74 3b 8b 57 60 83 fa 02 74 19 8d 4a
[ 1967.790031] RIP  [<ffffffffc06d6aea>] nfs4_put_deviceid_node+0xa/0x90 [nfsv4]
[ 1967.790031]  RSP <ffff8800c4067978>
[ 1967.790031] CR2: 0000000000000030

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Fixes: 1ebf980 ("NFS/filelayout: Fix racy setting of fl->dsaddr...")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
andr2000 pushed a commit that referenced this issue Dec 22, 2017
ppp_release() tries to ensure that netdevices are unregistered before
decrementing the unit refcount and running ppp_destroy_interface().

This is all fine as long as the the device is unregistered by
ppp_release(): the unregister_netdevice() call, followed by
rtnl_unlock(), guarantee that the unregistration process completes
before rtnl_unlock() returns.

However, the device may be unregistered by other means (like
ppp_nl_dellink()). If this happens right before ppp_release() calling
rtnl_lock(), then ppp_release() has to wait for the concurrent
unregistration code to release the lock.
But rtnl_unlock() releases the lock before completing the device
unregistration process. This allows ppp_release() to proceed and
eventually call ppp_destroy_interface() before the unregistration
process completes. Calling free_netdev() on this partially unregistered
device will BUG():

 ------------[ cut here ]------------
 kernel BUG at net/core/dev.c:8141!
 invalid opcode: 0000 [#1] SMP

 CPU: 1 PID: 1557 Comm: pppd Not tainted 4.14.0-rc2+ xen-troops#4
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014

 Call Trace:
  ppp_destroy_interface+0xd8/0xe0 [ppp_generic]
  ppp_disconnect_channel+0xda/0x110 [ppp_generic]
  ppp_unregister_channel+0x5e/0x110 [ppp_generic]
  pppox_unbind_sock+0x23/0x30 [pppox]
  pppoe_connect+0x130/0x440 [pppoe]
  SYSC_connect+0x98/0x110
  ? do_fcntl+0x2c0/0x5d0
  SyS_connect+0xe/0x10
  entry_SYSCALL_64_fastpath+0x1a/0xa5

 RIP: free_netdev+0x107/0x110 RSP: ffffc28a40573d88
 ---[ end trace ed294ff0cc40eeff ]---

We could set the ->needs_free_netdev flag on PPP devices and move the
ppp_destroy_interface() logic in the ->priv_destructor() callback. But
that'd be quite intrusive as we'd first need to unlink from the other
channels and units that depend on the device (the ones that used the
PPPIOCCONNECT and PPPIOCATTACH ioctls).

Instead, we can just let the netdevice hold a reference on its
ppp_file. This reference is dropped in ->priv_destructor(), at the very
end of the unregistration process, so that neither ppp_release() nor
ppp_disconnect_channel() can call ppp_destroy_interface() in the interim.

Reported-by: Beniamino Galvani <bgalvani@redhat.com>
Fixes: 8cb775b ("ppp: fix device unregistration upon netns deletion")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Dec 22, 2017
Since commit:

  1fd7e41 ("perf/core: Remove perf_cpu_context::unique_pmu")

... when a PMU is unregistered then its associated ->pmu_cpu_context is
unconditionally freed. Whilst this is fine for dynamically allocated
context types (i.e. those registered using perf_invalid_context), this
causes a problem for sharing of static contexts such as
perf_{sw,hw}_context, which are used by multiple built-in PMUs and
effectively have a global lifetime.

Whilst testing the ARM SPE driver, which must use perf_sw_context to
support per-task AUX tracing, unregistering the driver as a result of a
module unload resulted in:

 Unable to handle kernel NULL pointer dereference at virtual address 00000038
 Internal error: Oops: 96000004 [#1] PREEMPT SMP
 Modules linked in: [last unloaded: arm_spe_pmu]
 PC is at ctx_resched+0x38/0xe8
 LR is at perf_event_exec+0x20c/0x278
 [...]
 ctx_resched+0x38/0xe8
 perf_event_exec+0x20c/0x278
 setup_new_exec+0x88/0x118
 load_elf_binary+0x26c/0x109c
 search_binary_handler+0x90/0x298
 do_execveat_common.isra.14+0x540/0x618
 SyS_execve+0x38/0x48

since the software context has been freed and the ctx.pmu->pmu_disable_count
field has been set to NULL.

This patch fixes the problem by avoiding the freeing of static PMU contexts
altogether. Whilst the sharing of dynamic contexts is questionable, this
actually requires the caller to share their context pointer explicitly
and so the burden is on them to manage the object lifetime.

Reported-by: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 1fd7e41 ("perf/core: Remove perf_cpu_context::unique_pmu")
Link: http://lkml.kernel.org/r/1507040450-7730-1-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
andr2000 pushed a commit that referenced this issue Dec 22, 2017
Tetsuo Handa and Fengguang Wu reported a panic in the unwinder:

  BUG: unable to handle kernel NULL pointer dereference at 000001f2
  IP: update_stack_state+0xd4/0x340
  *pde = 00000000

  Oops: 0000 [#1] PREEMPT SMP
  CPU: 0 PID: 18728 Comm: 01-cpu-hotplug Not tainted 4.13.0-rc4-00170-gb09be67 torvalds#592
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
  task: bb0b53c0 task.stack: bb3ac000
  EIP: update_stack_state+0xd4/0x340
  EFLAGS: 00010002 CPU: 0
  EAX: 0000a570 EBX: bb3adccb ECX: 0000f401 EDX: 0000a570
  ESI: 00000001 EDI: 000001ba EBP: bb3adc6b ESP: bb3adc3f
   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  CR0: 80050033 CR2: 000001f2 CR3: 0b3a7000 CR4: 00140690
  DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
  DR6: fffe0ff0 DR7: 00000400
  Call Trace:
   ? unwind_next_frame+0xea/0x400
   ? __unwind_start+0xf5/0x180
   ? __save_stack_trace+0x81/0x160
   ? save_stack_trace+0x20/0x30
   ? __lock_acquire+0xfa5/0x12f0
   ? lock_acquire+0x1c2/0x230
   ? tick_periodic+0x3a/0xf0
   ? _raw_spin_lock+0x42/0x50
   ? tick_periodic+0x3a/0xf0
   ? tick_periodic+0x3a/0xf0
   ? debug_smp_processor_id+0x12/0x20
   ? tick_handle_periodic+0x23/0xc0
   ? local_apic_timer_interrupt+0x63/0x70
   ? smp_trace_apic_timer_interrupt+0x235/0x6a0
   ? trace_apic_timer_interrupt+0x37/0x3c
   ? strrchr+0x23/0x50
  Code: 0f 95 c1 89 c7 89 45 e4 0f b6 c1 89 c6 89 45 dc 8b 04 85 98 cb 74 bc 88 4d e3 89 45 f0 83 c0 01 84 c9 89 04 b5 98 cb 74 bc 74 3b <8b> 47 38 8b 57 34 c6 43 1d 01 25 00 00 02 00 83 e2 03 09 d0 83
  EIP: update_stack_state+0xd4/0x340 SS:ESP: 0068:bb3adc3f
  CR2: 00000000000001f2
  ---[ end trace 0d147fd4aba8ff50 ]---
  Kernel panic - not syncing: Fatal exception in interrupt

On x86-32, after decoding a frame pointer to get a regs address,
regs_size() dereferences the regs pointer when it checks regs->cs to see
if the regs are user mode.  This is dangerous because it's possible that
what looks like a decoded frame pointer is actually a corrupt value, and
we don't want the unwinder to make things worse.

Instead of calling regs_size() on an unsafe pointer, just assume they're
kernel regs to start with.  Later, once it's safe to access the regs, we
can do the user mode check and corresponding safety check for the
remaining two regs.

Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 5ed8d8b ("x86/unwind: Move common code into update_stack_state()")
Link: http://lkml.kernel.org/r/7f95b9a6993dec7674b3f3ab3dcd3294f7b9644d.1507597785.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
andr2000 pushed a commit that referenced this issue Dec 22, 2017
The dummy-hcd driver calls the gadget driver's disconnect callback
under the wrong conditions.  It should invoke the callback when Vbus
power is turned off, but instead it does so when the D+ pullup is
turned off.

This can cause a deadlock in the composite core when a gadget driver
is unregistered:

[   88.361471] ============================================
[   88.362014] WARNING: possible recursive locking detected
[   88.362580] 4.14.0-rc2+ xen-troops#9 Not tainted
[   88.363010] --------------------------------------------
[   88.363561] v4l_id/526 is trying to acquire lock:
[   88.364062]  (&(&cdev->lock)->rlock){....}, at: [<ffffffffa0547e03>] composite_disconnect+0x43/0x100 [libcomposite]
[   88.365051]
[   88.365051] but task is already holding lock:
[   88.365826]  (&(&cdev->lock)->rlock){....}, at: [<ffffffffa0547b09>] usb_function_deactivate+0x29/0x80 [libcomposite]
[   88.366858]
[   88.366858] other info that might help us debug this:
[   88.368301]  Possible unsafe locking scenario:
[   88.368301]
[   88.369304]        CPU0
[   88.369701]        ----
[   88.370101]   lock(&(&cdev->lock)->rlock);
[   88.370623]   lock(&(&cdev->lock)->rlock);
[   88.371145]
[   88.371145]  *** DEADLOCK ***
[   88.371145]
[   88.372211]  May be due to missing lock nesting notation
[   88.372211]
[   88.373191] 2 locks held by v4l_id/526:
[   88.373715]  #0:  (&(&cdev->lock)->rlock){....}, at: [<ffffffffa0547b09>] usb_function_deactivate+0x29/0x80 [libcomposite]
[   88.374814]  #1:  (&(&dum_hcd->dum->lock)->rlock){....}, at: [<ffffffffa05bd48d>] dummy_pullup+0x7d/0xf0 [dummy_hcd]
[   88.376289]
[   88.376289] stack backtrace:
[   88.377726] CPU: 0 PID: 526 Comm: v4l_id Not tainted 4.14.0-rc2+ xen-troops#9
[   88.378557] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[   88.379504] Call Trace:
[   88.380019]  dump_stack+0x86/0xc7
[   88.380605]  __lock_acquire+0x841/0x1120
[   88.381252]  lock_acquire+0xd5/0x1c0
[   88.381865]  ? composite_disconnect+0x43/0x100 [libcomposite]
[   88.382668]  _raw_spin_lock_irqsave+0x40/0x54
[   88.383357]  ? composite_disconnect+0x43/0x100 [libcomposite]
[   88.384290]  composite_disconnect+0x43/0x100 [libcomposite]
[   88.385490]  set_link_state+0x2d4/0x3c0 [dummy_hcd]
[   88.386436]  dummy_pullup+0xa7/0xf0 [dummy_hcd]
[   88.387195]  usb_gadget_disconnect+0xd8/0x160 [udc_core]
[   88.387990]  usb_gadget_deactivate+0xd3/0x160 [udc_core]
[   88.388793]  usb_function_deactivate+0x64/0x80 [libcomposite]
[   88.389628]  uvc_function_disconnect+0x1e/0x40 [usb_f_uvc]

This patch changes the code to test the port-power status bit rather
than the port-connect status bit when deciding whether to isue the
callback.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: David Tulloh <david@tulloh.id.au>
CC: <stable@vger.kernel.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
andr2000 pushed a commit that referenced this issue Dec 22, 2017
Stack trace output during a stress test:
 [    4.310049] Freeing initrd memory: 22592K
[    4.310646] rtas_flash: no firmware flash support
[    4.313341] cpuhp/64: page allocation failure: order:0, mode:0x14480c0(GFP_KERNEL|__GFP_ZERO|__GFP_THISNODE), nodemask=(null)
[    4.313465] cpuhp/64 cpuset=/ mems_allowed=0
[    4.313521] CPU: 64 PID: 392 Comm: cpuhp/64 Not tainted 4.11.0-39.el7a.ppc64le #1
[    4.313588] Call Trace:
[    4.313622] [c000000f1fb1b8e0] [c000000000c09388] dump_stack+0xb0/0xf0 (unreliable)
[    4.313694] [c000000f1fb1b920] [c00000000030ef6c] warn_alloc+0x12c/0x1c0
[    4.313753] [c000000f1fb1b9c0] [c00000000030ff68] __alloc_pages_nodemask+0xea8/0x1000
[    4.313823] [c000000f1fb1bbb0] [c000000000113a8c] core_imc_mem_init+0xbc/0x1c0
[    4.313892] [c000000f1fb1bc00] [c000000000113cdc] ppc_core_imc_cpu_online+0x14c/0x170
[    4.313962] [c000000f1fb1bc90] [c000000000125758] cpuhp_invoke_callback+0x198/0x5d0
[    4.314031] [c000000f1fb1bd00] [c00000000012782c] cpuhp_thread_fun+0x8c/0x3d0
[    4.314101] [c000000f1fb1bd60] [c0000000001678d0] smpboot_thread_fn+0x290/0x2a0
[    4.314169] [c000000f1fb1bdc0] [c00000000015ee78] kthread+0x168/0x1b0
[    4.314229] [c000000f1fb1be30] [c00000000000b368] ret_from_kernel_thread+0x5c/0x74
[    4.314313] Mem-Info:
[    4.314356] active_anon:0 inactive_anon:0 isolated_anon:0

core_imc_mem_init() at system boot use alloc_pages_node() to get memory
and alloc_pages_node() throws this stack dump when tried to allocate
memory from a node which has no memory behind it. Add a ___GFP_NOWARN
flag in allocation request as a fix.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Venkat R.B <venkatb3@in.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
andr2000 pushed a commit that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action police rate 3mbit burst 250k pass index 90
 # tc actions replace action police \
 > rate 3mbit burst 250k goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action police rate 3mbit burst

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0:  police 0x5a rate 3Mbit burst 250Kb mtu 2Kb  action goto chain 42 overhead 0b
         ref 2 bind 1
         cookie c1a0c1a0

Then, when crash0 starts transmitting more than 3Mbit/s, the following
kernel crash is observed:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000007a779067 P4D 800000007a779067 PUD 2ad96067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 5032 Comm: netperf Not tainted 5.0.0-rc4.gotochain_crash+ torvalds#533
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffb0e04064fa60 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff93bb3322cce0 RCX: 0000000000000005
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff93bb3322cce0
 RBP: ffffb0e04064fb00 R08: 0000000000000022 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000001 R12: ffff93bb3beed300
 R13: ffff93bb3beed308 R14: 0000000000000001 R15: ffff93bb3b64d000
 FS:  00007f0bc6be5740(0000) GS:ffff93bb3db80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000000746a8001 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ipt_do_table+0x31c/0x420 [ip_tables]
  ? ip_finish_output2+0x16f/0x430
  ip_finish_output2+0x16f/0x430
  ? ip_output+0x69/0xe0
  ip_output+0x69/0xe0
  ? ip_forward_options+0x1a0/0x1a0
  __tcp_transmit_skb+0x563/0xa40
  tcp_write_xmit+0x243/0xfa0
  __tcp_push_pending_frames+0x32/0xf0
  tcp_sendmsg_locked+0x404/0xd30
  tcp_sendmsg+0x27/0x40
  sock_sendmsg+0x36/0x40
  __sys_sendto+0x10e/0x140
  ? __sys_connect+0x87/0xf0
  ? syscall_trace_enter+0x1df/0x2e0
  ? __audit_syscall_exit+0x216/0x260
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f0bc5ffbafd
 Code: 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 8b 05 ae c4 2c 00 85 c0 75 2d 45 31 c9 45 31 c0 4c 63 d1 48 63 ff b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 63 63 2c 00 f7 d8 64 89 02 48
 RSP: 002b:00007fffef94b7f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 RAX: ffffffffffffffda RBX: 0000000000004000 RCX: 00007f0bc5ffbafd
 RDX: 0000000000004000 RSI: 00000000017e5420 RDI: 0000000000000004
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
 R13: 00000000017e51d0 R14: 0000000000000010 R15: 0000000000000006
 Modules linked in: act_police veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic mbcache crct10dif_pclmul jbd2 crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper snd_timer snd joydev pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_blk virtio_net virtio_console net_failover failover crc32c_intel ata_piix libata serio_raw virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_police_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action sample rate 1024 group 4 pass index 90
 # tc actions replace action sample \
 > rate 1024 group 4 goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action sample

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: sample rate 1/1024 group 4 goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 8000000079966067 P4D 8000000079966067 PUD 7987b067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.0.0-rc4.gotochain_crash+ torvalds#536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffbee60033fad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff99d7ae6e3b00 RCX: 00000000e555df9b
 RDX: 0000000000000000 RSI: 00000000b0352718 RDI: ffff99d7fda1fcf0
 RBP: ffffbee60033fb70 R08: 0000000070731ab1 R09: 0000000000000400
 R10: 0000000000000000 R11: ffff99d7ac733838 R12: ffff99d7f3c2be00
 R13: ffff99d7f3c2be08 R14: 0000000000000001 R15: ffff99d7f3c2b600
 FS:  0000000000000000(0000) GS:ffff99d7fda00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000000797de006 CR4: 00000000001606f0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_sample psample veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device aesni_intel crypto_simd snd_pcm cryptd glue_helper snd_timer joydev snd pcspkr virtio_balloon i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops net_failover ttm failover virtio_blk virtio_console drm ata_piix serio_raw crc32c_intel libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_sample_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action simple sdata hello pass index 90
 # tc actions replace action simple \
 > sdata world goto chain 42 index 90 cookie c1a0c1a0
 # tc action show action simple

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: Simple <world>
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000006a6fb067 P4D 800000006a6fb067 PUD 6aed6067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 2 PID: 3241 Comm: kworker/2:0 Not tainted 5.0.0-rc4.gotochain_crash+ torvalds#536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffbe6781763ad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9e59bdb80e00 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff9e59b4716738 RDI: ffff9e59ab12d140
 RBP: ffffbe6781763b70 R08: 0000000000000234 R09: 0000000000aaaaaa
 R10: 0000000000000000 R11: ffff9e59b247cd50 R12: ffff9e59b112f100
 R13: ffff9e59b112f108 R14: 0000000000000001 R15: ffff9e59ab12d0c0
 FS:  0000000000000000(0000) GS:ffff9e59b4700000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000006af92004 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_simple veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep mbcache snd_hda_core jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net ttm net_failover virtio_console virtio_blk failover drm crc32c_intel serio_raw floppy ata_piix libata virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_simple_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action skbedit ptype host pass index 90
 # tc actions replace action skbedit \
 > ptype host goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action skbedit

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: skbedit  ptype host goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 3467 Comm: kworker/3:3 Not tainted 5.0.0-rc4.gotochain_crash+ torvalds#536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffffb50a81e1fad0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9aa47ba4ea00 RCX: 0000000000000001
 RDX: 0000000000000000 RSI: ffff9aa469eeb3c0 RDI: ffff9aa47ba4ea00
 RBP: ffffb50a81e1fb70 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: ffff9aa47bce0638 R12: ffff9aa4793b0c00
 R13: ffff9aa4793b0c08 R14: 0000000000000001 R15: ffff9aa469eeb3c0
 FS:  0000000000000000(0000) GS:ffff9aa474780000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007360e005 CR4: 00000000001606e0
 Call Trace:
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ndisc_next_option+0x50/0x50
  ? ___neigh_create+0x4d5/0x680
  ? ip6_finish_output2+0x1b5/0x590
  ip6_finish_output2+0x1b5/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.28+0x79/0xc0
  ndisc_send_skb+0x248/0x2e0
  ndisc_send_ns+0xf8/0x200
  ? addrconf_dad_work+0x389/0x4b0
  addrconf_dad_work+0x389/0x4b0
  ? __switch_to_asm+0x34/0x70
  ? process_one_work+0x195/0x380
  ? addrconf_dad_completed+0x370/0x370
  process_one_work+0x195/0x380
  worker_thread+0x30/0x390
  ? process_one_work+0x380/0x380
  kthread+0x113/0x130
  ? kthread_park+0x90/0x90
  ret_from_fork+0x35/0x40
 Modules linked in: act_skbedit veth ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep mbcache snd_hda_core jbd2 snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd snd_timer glue_helper snd joydev soundcore pcspkr virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net net_failover drm failover virtio_blk virtio_console ata_piix virtio_pci crc32c_intel serio_raw libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_skbedit_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action skbmod set smac 00:c1:a0:c1:a0:00 pass index 90
 # tc actions replace action skbmod \
 > set smac 00:c1:a0:c1:a0:00 goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action skbmod

had the following output:

 src MAC address <00:c1:a0:c1:a0:00>
 src MAC address <00:c1:a0:c1:a0:00>
 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: skbmod goto chain 42 set smac 00:c1:a0:c1:a0:00
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000002d5c7067 P4D 800000002d5c7067 PUD 77e16067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ torvalds#536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff8987ffd83be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff8987aeb68800 RCX: ffff8987fa263640
 RDX: 0000000000000000 RSI: ffff8987f51c8802 RDI: 00000000000000a0
 RBP: ffff8987ffd83c80 R08: ffff8987f939bac8 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8987f5c77d00
 R13: ffff8987f5c77d08 R14: 0000000000000001 R15: ffff8987f0c29f00
 FS:  0000000000000000(0000) GS:ffff8987ffd80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007832c004 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 56 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffa2a1c038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffffa94184f0 RBX: 0000000000000003 RCX: 0000000000000001
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003
 RBP: 0000000000000003 R08: 001123cfc2ba71ac R09: 0000000000000000
 R10: 0000000000000000 R11: 00000000000f4240 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_skbmod veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mbcache jbd2 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device aesni_intel crypto_simd cryptd glue_helper snd_pcm joydev pcspkr virtio_balloon snd_timer snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops net_failover virtio_console ttm virtio_blk failover drm crc32c_intel serio_raw ata_piix virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_skbmod_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action tunnel_key set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 \
 > nocsum id 1 pass index 90
 # tc actions replace action tunnel_key \
 > set src_ip 10.10.10.1 dst_ip 20.20.2 dst_port 3128 nocsum id 1 \
 > goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action tunnel_key

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: tunnel_key  set
         src_ip 10.10.10.1
         dst_ip 20.20.2.0
         key_id 1
         dst_port 3128
         nocsum goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000002aba4067 P4D 800000002aba4067 PUD 795f9067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ torvalds#536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff9346bdb83be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff9346bb795c00 RCX: 0000000000000002
 RDX: 0000000000000000 RSI: ffff93466c881700 RDI: 0000000000000246
 RBP: ffff9346bdb83c80 R08: ffff9346b3e1e0c8 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9346b978f000
 R13: ffff9346b978f008 R14: 0000000000000001 R15: ffff93466dceeb40
 FS:  0000000000000000(0000) GS:ffff9346bdb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007a6c2002 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? tick_sched_timer+0x37/0x70
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 55 ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffa48a8038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffffaa8184f0 RBX: 0000000000000003 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000000087 RDI: 0000000000000003
 RBP: 0000000000000003 R08: 0011251c6fcfac49 R09: ffff9346b995be00
 R10: ffffa48a805e7ce8 R11: 00000000024c38dd R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_tunnel_key veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel mbcache snd_hda_intel jbd2 snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer snd pcspkr virtio_balloon soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect virtio_net sysimgblt fb_sys_fops ttm net_failover virtio_console virtio_blk failover drm serio_raw crc32c_intel ata_piix virtio_pci floppy virtio_ring libata virtio dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_tunnel_key_init() proved to fix
the above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 4, 2019
the following script:

 # tc qdisc add dev crash0 clsact
 # tc filter add dev crash0 egress matchall \
 > action vlan pop pass index 90
 # tc actions replace action vlan \
 > pop goto chain 42 index 90 cookie c1a0c1a0
 # tc actions show action vlan

had the following output:

 Error: Failed to init TC action chain.
 We have an error talking to the kernel
 total acts 1

         action order 0: vlan  pop goto chain 42
          index 90 ref 2 bind 1
         cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 #PF error: [normal kernel read fault]
 PGD 800000007974f067 P4D 800000007974f067 PUD 79638067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.0.0-rc4.gotochain_crash+ torvalds#536
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_action_exec+0xb8/0x100
 Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
 RSP: 0018:ffff982dfdb83be0 EFLAGS: 00010246
 RAX: 000000002000002a RBX: ffff982dfc55db00 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffff982df97099c0 RDI: ffff982dfc55db00
 RBP: ffff982dfdb83c80 R08: ffff982df983fec8 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff982df5aacd00
 R13: ffff982df5aacd08 R14: 0000000000000001 R15: ffff982df97099c0
 FS:  0000000000000000(0000) GS:ffff982dfdb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000000796d0005 CR4: 00000000001606e0
 Call Trace:
  <IRQ>
  tcf_classify+0x58/0x120
  __dev_queue_xmit+0x40a/0x890
  ? ip6_finish_output2+0x369/0x590
  ip6_finish_output2+0x369/0x590
  ? ip6_output+0x68/0x110
  ip6_output+0x68/0x110
  ? nf_hook.constprop.35+0x79/0xc0
  mld_sendpack+0x16f/0x220
  mld_ifc_timer_expire+0x195/0x2c0
  ? igmp6_timer_handler+0x70/0x70
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? enqueue_hrtimer+0x39/0x90
  __do_softirq+0xe3/0x2f5
  irq_exit+0xf0/0x100
  smp_apic_timer_interrupt+0x6c/0x130
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:native_safe_halt+0x2/0x10
 Code: 7b ff ff ff 7f f3 c3 65 48 8b 04 25 00 5c 01 00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
 RSP: 0018:ffffa4714038feb8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
 RAX: ffffffff840184f0 RBX: 0000000000000003 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000001e57d3f387
 RBP: 0000000000000003 R08: 001125d9ca39e1eb R09: 0000000000000000
 R10: 000000000000027d R11: 000000000009f400 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  ? __sched_text_end+0x1/0x1
  default_idle+0x1c/0x140
  do_idle+0x1c4/0x280
  cpu_startup_entry+0x19/0x20
  start_secondary+0x1a7/0x200
  secondary_startup_64+0xa4/0xb0
 Modules linked in: act_vlan veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic mbcache crct10dif_pclmul jbd2 snd_hda_intel crc32_pclmul snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd cryptd glue_helper joydev snd_timer virtio_balloon snd pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt virtio_net fb_sys_fops virtio_blk ttm net_failover virtio_console failover ata_piix drm libata crc32c_intel virtio_pci serio_raw virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
 CR2: 0000000000000000

Validating the control action within tcf_vlan_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514 ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 4, 2019
A new mirred action is created by the tcf_mirred_init function. This
contains a list head struct which is inserted into a global list on
successful creation of a new action. However, after a creation, it is
still possible to error out and call the tcf_idr_release function. This,
in turn, calls the act_mirr cleanup function via __tcf_idr_release and
__tcf_action_put. This cleanup function tries to delete the list entry
which is as yet uninitialised, leading to a NULL pointer exception.

Fix this by initialising the list entry on creation of a new action.

Bug report:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 8000000840c73067 P4D 8000000840c73067 PUD 858dcc067 PMD 0
Oops: 0002 [#1] SMP PTI
CPU: 32 PID: 5636 Comm: handler194 Tainted: G           OE     5.0.0+ torvalds#186
Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.3.6 06/03/2015
RIP: 0010:tcf_mirred_release+0x42/0xa7 [act_mirred]
Code: f0 90 39 c0 e8 52 04 57 c8 48 c7 c7 b8 80 39 c0 e8 94 fa d4 c7 48 8b 93 d0 00 00 00 48 8b 83 d8 00 00 00 48 c7 c7 f0 90 39 c0 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 d0 00
RSP: 0018:ffffac4aa059f688 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff9dcd1b214d00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9dcd1fa165f8 RDI: ffffffffc03990f0
RBP: ffff9dccf9c7af80 R08: 0000000000000a3b R09: 0000000000000000
R10: ffff9dccfa11f420 R11: 0000000000000000 R12: 0000000000000001
R13: ffff9dcd16b433c0 R14: ffff9dcd1b214d80 R15: 0000000000000000
FS:  00007f441bfff700(0000) GS:ffff9dcd1fa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000839e64004 CR4: 00000000001606e0
Call Trace:
tcf_action_cleanup+0x59/0xca
__tcf_action_put+0x54/0x6b
__tcf_idr_release.cold.33+0x9/0x12
tcf_mirred_init.cold.20+0x22e/0x3b0 [act_mirred]
tcf_action_init_1+0x3d0/0x4c0
tcf_action_init+0x9c/0x130
tcf_exts_validate+0xab/0xc0
fl_change+0x1ca/0x982 [cls_flower]
tc_new_tfilter+0x647/0x8d0
? load_balance+0x14b/0x9e0
rtnetlink_rcv_msg+0xe3/0x370
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? _cond_resched+0x15/0x30
? __kmalloc_node_track_caller+0x1d4/0x2b0
? rtnl_calcit.isra.31+0xf0/0xf0
netlink_rcv_skb+0x49/0x110
netlink_unicast+0x16f/0x210
netlink_sendmsg+0x1df/0x390
sock_sendmsg+0x36/0x40
___sys_sendmsg+0x27b/0x2c0
? futex_wake+0x80/0x140
? do_futex+0x2b9/0xac0
? ep_scan_ready_list.constprop.22+0x1f2/0x210
? ep_poll+0x7a/0x430
__sys_sendmsg+0x47/0x80
do_syscall_64+0x55/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 4e23281 ("net: sched: act_mirred: remove dependency on rtnl lock")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 4, 2019
Chandan reported that fstests' generic/026 test hit a crash:

  BUG: Unable to handle kernel data access at 0xc00000062ac40000
  Faulting instruction address: 0xc000000000092240
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
  CPU: 0 PID: 27828 Comm: chacl Not tainted 5.0.0-rc2-next-20190115-00001-g6de6dba64dda #1
  NIP:  c000000000092240 LR: c00000000066a55c CTR: 0000000000000000
  REGS: c00000062c0c3430 TRAP: 0300   Not tainted  (5.0.0-rc2-next-20190115-00001-g6de6dba64dda)
  MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44000842  XER: 20000000
  CFAR: 00007fff7f3108ac DAR: c00000062ac40000 DSISR: 40000000 IRQMASK: 0
  GPR00: 0000000000000000 c00000062c0c36c0 c0000000017f4c00 c00000000121a660
  GPR04: c00000062ac3fff9 0000000000000004 0000000000000020 00000000275b19c4
  GPR08: 000000000000000c 46494c4500000000 5347495f41434c5f c0000000026073a0
  GPR12: 0000000000000000 c0000000027a0000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: c00000062ea70020 c00000062c0c38d0 0000000000000002 0000000000000002
  GPR24: c00000062ac3ffe8 00000000275b19c4 0000000000000001 c00000062ac30000
  GPR28: c00000062c0c38d0 c00000062ac30050 c00000062ac30058 0000000000000000
  NIP memcmp+0x120/0x690
  LR  xfs_attr3_leaf_lookup_int+0x53c/0x5b0
  Call Trace:
    xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable)
    xfs_da3_node_lookup_int+0x32c/0x5a0
    xfs_attr_node_addname+0x170/0x6b0
    xfs_attr_set+0x2ac/0x340
    __xfs_set_acl+0xf0/0x230
    xfs_set_acl+0xd0/0x160
    set_posix_acl+0xc0/0x130
    posix_acl_xattr_set+0x68/0x110
    __vfs_setxattr+0xa4/0x110
    __vfs_setxattr_noperm+0xac/0x240
    vfs_setxattr+0x128/0x130
    setxattr+0x248/0x600
    path_setxattr+0x108/0x120
    sys_setxattr+0x28/0x40
    system_call+0x5c/0x70
  Instruction dump:
  7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 2c050000
  4182ff6c 20c50008 54c61838 7d201c28 <7d402428> 7d293436 7d4a3436 7c295040

The instruction dump decodes as:
  subfic  r6,r5,8
  rlwinm  r6,r6,3,0,28
  ldbrx   r9,0,r3
  ldbrx   r10,0,r4      <-

Which shows us doing an 8 byte load from c00000062ac3fff9, which
crosses the page boundary at c00000062ac40000 and faults.

It's not OK for memcmp to read past the end of the source or
destination buffers if that would cross a page boundary, because we
don't know that the next page is mapped.

As pointed out by Segher, we can read past the end of the source or
destination as long as we don't cross a 4K boundary, because that's
our minimum page size on all platforms.

The bug is in the code at the .Lcmp_rest_lt8bytes label. When we get
there we know that s1 is 8-byte aligned and we have at least 1 byte to
read, so a single 8-byte load won't read past the end of s1 and cross
a page boundary.

But we have to be more careful with s2. So check if it's within 8
bytes of a 4K boundary and if so go to the byte-by-byte loop.

Fixes: 2d9ee32 ("powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()")
Cc: stable@vger.kernel.org # v4.19+
Reported-by: Chandan Rajendra <chandan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Tested-by: Chandan Rajendra <chandan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
andr2000 pushed a commit that referenced this issue Apr 4, 2019
Inside sbitmap_queue_clear(), once the clear bit is set, it will be
visiable to allocation path immediately. Meantime READ/WRITE on old
associated instance(such as request in case of blk-mq) may be
out-of-order with the setting clear bit, so race with re-allocation
may be triggered.

Adds one memory barrier for ordering READ/WRITE of the freed associated
instance with setting clear bit for avoiding race with re-allocation.

The following kernel oops triggerd by block/006 on aarch64 may be fixed:

[  142.330954] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000330
[  142.338794] Mem abort info:
[  142.341554]   ESR = 0x96000005
[  142.344632]   Exception class = DABT (current EL), IL = 32 bits
[  142.350500]   SET = 0, FnV = 0
[  142.353544]   EA = 0, S1PTW = 0
[  142.356678] Data abort info:
[  142.359528]   ISV = 0, ISS = 0x00000005
[  142.363343]   CM = 0, WnR = 0
[  142.366305] user pgtable: 64k pages, 48-bit VAs, pgdp = 000000002a3c51c0
[  142.372983] [0000000000000330] pgd=0000000000000000, pud=0000000000000000
[  142.379777] Internal error: Oops: 96000005 [#1] SMP
[  142.384613] Modules linked in: null_blk ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp vfat fat rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm libiscsi ib_umad scsi_transport_iscsi ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core sbsa_gwdt crct10dif_ce ghash_ce ipmi_ssif sha2_ce ipmi_devintf sha256_arm64 sg sha1_ce ipmi_msghandler ip_tables xfs libcrc32c mlx5_core sdhci_acpi mlxfw ahci_platform at803x sdhci libahci_platform qcom_emac mmc_core hdma hdma_mgmt i2c_dev [last unloaded: null_blk]
[  142.429753] CPU: 7 PID: 1983 Comm: fio Not tainted 5.0.0.cki #2
[  142.449458] pstate: 00400005 (nzcv daif +PAN -UAO)
[  142.454239] pc : __blk_mq_free_request+0x4c/0xa8
[  142.458830] lr : blk_mq_free_request+0xec/0x118
[  142.463344] sp : ffff00003360f6a0
[  142.466646] x29: ffff00003360f6a0 x28: ffff000010e70000
[  142.471941] x27: ffff801729a50048 x26: 0000000000010000
[  142.477232] x25: ffff00003360f954 x24: ffff7bdfff021440
[  142.482529] x23: 0000000000000000 x22: 00000000ffffffff
[  142.487830] x21: ffff801729810000 x20: 0000000000000000
[  142.493123] x19: ffff801729a50000 x18: 0000000000000000
[  142.498413] x17: 0000000000000000 x16: 0000000000000001
[  142.503709] x15: 00000000000000ff x14: ffff7fe000000000
[  142.509003] x13: ffff8017dcde09a0 x12: 0000000000000000
[  142.514308] x11: 0000000000000001 x10: 0000000000000008
[  142.519597] x9 : ffff8017dcde09a0 x8 : 0000000000002000
[  142.524889] x7 : ffff8017dcde0a00 x6 : 000000015388f9be
[  142.530187] x5 : 0000000000000001 x4 : 0000000000000000
[  142.535478] x3 : 0000000000000000 x2 : 0000000000000000
[  142.540777] x1 : 0000000000000001 x0 : ffff00001041b194
[  142.546071] Process fio (pid: 1983, stack limit = 0x000000006460a0ea)
[  142.552500] Call trace:
[  142.554926]  __blk_mq_free_request+0x4c/0xa8
[  142.559181]  blk_mq_free_request+0xec/0x118
[  142.563352]  blk_mq_end_request+0xfc/0x120
[  142.567444]  end_cmd+0x3c/0xa8 [null_blk]
[  142.571434]  null_complete_rq+0x20/0x30 [null_blk]
[  142.576194]  blk_mq_complete_request+0x108/0x148
[  142.580797]  null_handle_cmd+0x1d4/0x718 [null_blk]
[  142.585662]  null_queue_rq+0x60/0xa8 [null_blk]
[  142.590171]  blk_mq_try_issue_directly+0x148/0x280
[  142.594949]  blk_mq_try_issue_list_directly+0x9c/0x108
[  142.600064]  blk_mq_sched_insert_requests+0xb0/0xd0
[  142.604926]  blk_mq_flush_plug_list+0x16c/0x2a0
[  142.609441]  blk_flush_plug_list+0xec/0x118
[  142.613608]  blk_finish_plug+0x3c/0x4c
[  142.617348]  blkdev_direct_IO+0x3b4/0x428
[  142.621336]  generic_file_read_iter+0x84/0x180
[  142.625761]  blkdev_read_iter+0x50/0x78
[  142.629579]  aio_read.isra.6+0xf8/0x190
[  142.633409]  __io_submit_one.isra.8+0x148/0x738
[  142.637912]  io_submit_one.isra.9+0x88/0xb8
[  142.642078]  __arm64_sys_io_submit+0xe0/0x238
[  142.646428]  el0_svc_handler+0xa0/0x128
[  142.650238]  el0_svc+0x8/0xc
[  142.653104] Code: b9402a63 f9000a7f 3100047f 540000a0 (f9419a81)
[  142.659202] ---[ end trace 467586bc175eb09d ]---

Fixes: ea86ea2 ("sbitmap: ammortize cost of clearing bits")
Reported-and-bisected_and_tested-by: Yi Zhang <yi.zhang@redhat.com>
Cc: Yi Zhang <yi.zhang@redhat.com>
Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
andr2000 pushed a commit that referenced this issue Apr 4, 2019
Function __hw_perf_event_init() used a CPU variable without
ensuring CPU preemption has been disabled. This caused the
following warning in the kernel log:

  [ 7.277085] BUG: using smp_processor_id() in preemptible
                 [00000000] code: cf-csdiag/1892
  [ 7.277111] caller is cf_diag_event_init+0x13a/0x338
  [ 7.277122] CPU: 10 PID: 1892 Comm: cf-csdiag Not tainted
                 5.0.0-20190318.rc0.git0.9e1a11e0f602.300.fc29.s390x+debug #1
  [ 7.277131] Hardware name: IBM 2964 NC9 712 (LPAR)
  [ 7.277139] Call Trace:
  [ 7.277150] ([<000000000011385a>] show_stack+0x82/0xd0)
  [ 7.277161]  [<0000000000b7a71a>] dump_stack+0x92/0xd0
  [ 7.277174]  [<00000000007b7e9c>] check_preemption_disabled+0xe4/0x100
  [ 7.277183]  [<00000000001228aa>] cf_diag_event_init+0x13a/0x338
  [ 7.277195]  [<00000000002cf3aa>] perf_try_init_event+0x72/0xf0
  [ 7.277204]  [<00000000002d0bba>] perf_event_alloc+0x6fa/0xce0
  [ 7.277214]  [<00000000002dc4a8>] __s390x_sys_perf_event_open+0x398/0xd50
  [ 7.277224]  [<0000000000b9e8f0>] system_call+0xdc/0x2d8
  [ 7.277233] 2 locks held by cf-csdiag/1892:
  [ 7.277241]  #0: 00000000976f5510 (&sig->cred_guard_mutex){+.+.},
                  at: __s390x_sys_perf_event_open+0xd2e/0xd50
  [ 7.277257]  #1: 00000000363b11bd (&pmus_srcu){....},
                  at: perf_event_alloc+0x52e/0xce0

The variable is now accessed in proper context. Use
get_cpu_var()/put_cpu_var() pair to disable
preemption during access.
As the hardware authorization settings apply to all CPUs, it
does not matter which CPU is used to check the authorization setting.

Remove the event->count assignment. It is not needed as function
perf_event_alloc() allocates memory for the event with kzalloc() and
thus count is already set to zero.

Fixes: fe5908b ("s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace")

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
andr2000 pushed a commit that referenced this issue Apr 4, 2019
Syzkaller reports:

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 5373 Comm: syz-executor.0 Not tainted 5.0.0-rc8+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:put_links+0x101/0x440 fs/proc/proc_sysctl.c:1599
Code: 00 0f 85 3a 03 00 00 48 8b 43 38 48 89 44 24 20 48 83 c0 38 48 89 c2 48 89 44 24 28 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 fe 02 00 00 48 8b 74 24 20 48 c7 c7 60 2a 9d 91
RSP: 0018:ffff8881d828f238 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8881e01b1140 RCX: ffffffff8ee98267
RDX: 0000000000000007 RSI: ffffc90001479000 RDI: ffff8881e01b1178
RBP: dffffc0000000000 R08: ffffed103ee27259 R09: ffffed103ee27259
R10: 0000000000000001 R11: ffffed103ee27258 R12: fffffffffffffff4
R13: 0000000000000006 R14: ffff8881f59838c0 R15: dffffc0000000000
FS:  00007f072254f700(0000) GS:ffff8881f7100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff8b286668 CR3: 00000001f0542002 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 drop_sysctl_table+0x152/0x9f0 fs/proc/proc_sysctl.c:1629
 get_subdir fs/proc/proc_sysctl.c:1022 [inline]
 __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
 br_netfilter_init+0xbc/0x1000 [br_netfilter]
 do_one_initcall+0xfa/0x5ca init/main.c:887
 do_init_module+0x204/0x5f6 kernel/module.c:3460
 load_module+0x66b2/0x8570 kernel/module.c:3808
 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f072254ec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
RBP: 00007f072254ec70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f072254f6bc
R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
Modules linked in: br_netfilter(+) dvb_usb_dibusb_mc_common dib3000mc dibx000_common dvb_usb_dibusb_common dvb_usb_dw2102 dvb_usb classmate_laptop palmas_regulator cn videobuf2_v4l2 v4l2_common snd_soc_bd28623 mptbase snd_usb_usx2y snd_usbmidi_lib snd_rawmidi wmi libnvdimm lockd sunrpc grace rc_kworld_pc150u rc_core rtc_da9063 sha1_ssse3 i2c_cros_ec_tunnel adxl34x_spi adxl34x nfnetlink lib80211 i5500_temp dvb_as102 dvb_core videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops udc_core lnbp22 leds_lp3952 hid_roccat_ryos s1d13xxxfb mtd vport_geneve openvswitch nf_conncount nf_nat_ipv6 nsh geneve udp_tunnel ip6_udp_tunnel snd_soc_mt6351 sis_agp phylink snd_soc_adau1761_spi snd_soc_adau1761 snd_soc_adau17x1 snd_soc_core snd_pcm_dmaengine ac97_bus snd_compress snd_soc_adau_utils snd_soc_sigmadsp_regmap snd_soc_sigmadsp raid_class hid_roccat_konepure hid_roccat_common hid_roccat c2port_duramar2150 core mdio_bcm_unimac iptable_security iptable_raw iptable_mangle
 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim devlink vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev mousedev ide_pci_generic piix aesni_intel aes_x86_64 ide_core crypto_simd atkbd cryptd glue_helper serio_raw ata_generic pata_acpi i2c_piix4 floppy sch_fq_codel ip_tables x_tables ipv6 [last unloaded: lm73]
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace 770020de38961fd0 ]---

A new dir entry can be created in get_subdir and its 'header->parent' is
set to NULL.  Only after insert_header success, it will be set to 'dir',
otherwise 'header->parent' is set to NULL and drop_sysctl_table is called.
However in err handling path of get_subdir, drop_sysctl_table also be
called on 'new->header' regardless its value of parent pointer.  Then
put_links is called, which triggers NULL-ptr deref when access member of
header->parent.

In fact we have multiple error paths which call drop_sysctl_table() there,
upon failure on insert_links() we also call drop_sysctl_table().And even
in the successful case on __register_sysctl_table() we still always call
drop_sysctl_table().This patch fix it.

Link: http://lkml.kernel.org/r/20190314085527.13244-1-yuehaibing@huawei.com
Fixes: 0e47c99 ("sysctl: Replace root_list with links between sysctl_table_sets")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>    [3.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
andr2000 pushed a commit that referenced this issue Apr 5, 2019
Syzkaller report this:

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 4378 Comm: syz-executor.0 Tainted: G         C        5.0.0+ xen-troops#5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:__lock_acquire+0x95b/0x3200 kernel/locking/lockdep.c:3573
Code: 00 0f 85 28 1e 00 00 48 81 c4 08 01 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 24 00 00 49 81 7d 00 e0 de 03 a6 41 bc 00 00
RSP: 0018:ffff8881e3c07a40 EFLAGS: 00010002
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000080
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: ffff8881e3c07d98 R11: ffff8881c7f21f80 R12: 0000000000000001
R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000001
FS:  00007fce2252e700(0000) GS:ffff8881f2400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fffc7eb0228 CR3: 00000001e5bea002 CR4: 00000000007606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 lock_acquire+0xff/0x2c0 kernel/locking/lockdep.c:4211
 __mutex_lock_common kernel/locking/mutex.c:925 [inline]
 __mutex_lock+0xdf/0x1050 kernel/locking/mutex.c:1072
 drain_workqueue+0x24/0x3f0 kernel/workqueue.c:2934
 destroy_workqueue+0x23/0x630 kernel/workqueue.c:4319
 __do_sys_delete_module kernel/module.c:1018 [inline]
 __se_sys_delete_module kernel/module.c:961 [inline]
 __x64_sys_delete_module+0x30c/0x480 kernel/module.c:961
 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462e99
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fce2252dc58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000140
RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fce2252e6bc
R13: 00000000004bcca9 R14: 00000000006f6b48 R15: 00000000ffffffff

If alloc_workqueue fails, it should return -ENOMEM, otherwise may
trigger this NULL pointer dereference while unloading drivers.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 0a38c17 ("fm10k: Remove create_workqueue")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
andr2000 pushed a commit that referenced this issue Apr 5, 2019
Andrii Nakryiko says:

====================
This patch set fixes bug in btf_dedup_is_equiv() check mishandling equivalence
comparison between VOID kind in candidate type graph versus anonymous non-VOID
kind in canonical type graph.

Patch #1 fixes bug, by comparing candidate and canonical kinds for equality,
before proceeding to kind-specific checks.
Patch #2 adds a test case testing this specific scenario.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
andr2000 pushed a commit that referenced this issue Apr 5, 2019
__netif_receive_skb_list_ptype() leaves skb->next poisoned before passing
it to pt_prev->func handler, what may produce (in certain cases, e.g. DSA
setup) crashes like:

[ 88.606777] CPU 0 Unable to handle kernel paging request at virtual address 0000000e, epc == 80687078, ra == 8052cc7c
[ 88.618666] Oops[#1]:
[ 88.621196] CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc2-dlink-00206-g4192a172-dirty #1473
[ 88.630885] $ 0 : 00000000 10000400 00000002 864d7850
[ 88.636709] $ 4 : 87c0ddf0 864d7800 87c0ddf0 00000000
[ 88.642526] $ 8 : 00000000 49600000 00000001 00000001
[ 88.648342] $12 : 00000000 c288617b dadbee27 25d17c41
[ 88.654159] $16 : 87c0ddf0 85cff080 80790000 fffffffd
[ 88.659975] $20 : 80797b20 ffffffff 00000001 864d7800
[ 88.665793] $24 : 00000000 8011e658
[ 88.671609] $28 : 80790000 87c0dbc0 87cabf00 8052cc7c
[ 88.677427] Hi : 00000003
[ 88.680622] Lo : 7b5b4220
[ 88.683840] epc : 80687078 vlan_dev_hard_start_xmit+0x1c/0x1a0
[ 88.690532] ra : 8052cc7c dev_hard_start_xmit+0xac/0x188
[ 88.696734] Status: 10000404	IEp
[ 88.700422] Cause : 50000008 (ExcCode 02)
[ 88.704874] BadVA : 0000000e
[ 88.708069] PrId : 0001a120 (MIPS interAptiv (multi))
[ 88.713005] Modules linked in:
[ 88.716407] Process swapper (pid: 0, threadinfo=(ptrval), task=(ptrval), tls=00000000)
[ 88.725219] Stack : 85f61c28 00000000 0000000e 80780000 87c0ddf0 85cff080 80790000 8052cc7c
[ 88.734529] 87cabf00 00000000 00000001 85f5fb40 807b0000 864d7850 87cabf00 807d0000
[ 88.743839] 864d7800 8655f600 00000000 85cff080 87c1c000 0000006a 00000000 8052d96c
[ 88.753149] 807a0000 8057adb8 87c0dcc8 87c0dc50 85cfff08 00000558 87cabf00 85f58c50
[ 88.762460] 00000002 85f58c00 864d7800 80543308 fffffff4 00000001 85f58c00 864d7800
[ 88.771770] ...
[ 88.774483] Call Trace:
[ 88.777199] [<80687078>] vlan_dev_hard_start_xmit+0x1c/0x1a0
[ 88.783504] [<8052cc7c>] dev_hard_start_xmit+0xac/0x188
[ 88.789326] [<8052d96c>] __dev_queue_xmit+0x6e8/0x7d4
[ 88.794955] [<805a8640>] ip_finish_output2+0x238/0x4d0
[ 88.800677] [<805ab6a0>] ip_output+0xc8/0x140
[ 88.805526] [<805a68f4>] ip_forward+0x364/0x560
[ 88.810567] [<805a4ff8>] ip_rcv+0x48/0xe4
[ 88.815030] [<80528d44>] __netif_receive_skb_one_core+0x44/0x58
[ 88.821635] [<8067f220>] dsa_switch_rcv+0x108/0x1ac
[ 88.827067] [<80528f80>] __netif_receive_skb_list_core+0x228/0x26c
[ 88.833951] [<8052ed84>] netif_receive_skb_list+0x1d4/0x394
[ 88.840160] [<80355a88>] lunar_rx_poll+0x38c/0x828
[ 88.845496] [<8052fa78>] net_rx_action+0x14c/0x3cc
[ 88.850835] [<806ad300>] __do_softirq+0x178/0x338
[ 88.856077] [<8012a2d4>] irq_exit+0xbc/0x100
[ 88.860846] [<802f8b70>] plat_irq_dispatch+0xc0/0x144
[ 88.866477] [<80105974>] handle_int+0x14c/0x158
[ 88.871516] [<806acfb0>] r4k_wait+0x30/0x40
[ 88.876462] Code: afb10014 8c8200a0 00803025 <9443000c> 94a20468 00000000 10620042 00a08025 9605046a
[ 88.887332]
[ 88.888982] ---[ end trace eb863d007da11cf1 ]---
[ 88.894122] Kernel panic - not syncing: Fatal exception in interrupt
[ 88.901202] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Fix this by pulling skb off the sublist and zeroing skb->next pointer
before calling ptype callback.

Fixes: 88eb194 ("net: core: propagate SKB lists through packet_type lookup")
Reviewed-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 5, 2019
The FIXMAP area overlaps with VMALLOC area in Linux-5.1-rc1 hence we get
below warning in Linux RISC-V 32bit kernel. This warning does not show-up
in Linux RISC-V 64bit kernel due to large VMALLOC area.

WARNING: CPU: 0 PID: 22 at mm/vmalloc.c:150 vmap_page_range_noflush+0x134/0x15c
Modules linked in:
CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 5.1.0-rc1-00005-gebc2f658040e #1
Workqueue: events pcpu_balance_workfn
Call Trace:
[<c002b950>] walk_stackframe+0x0/0xa0
[<c002baac>] show_stack+0x28/0x32
[<c0587354>] dump_stack+0x62/0x7e
[<c002fdee>] __warn+0x98/0xce
[<c002fe52>] warn_slowpath_null+0x2e/0x3c
[<c00e71ce>] vmap_page_range_noflush+0x134/0x15c
[<c00e7886>] map_kernel_range_noflush+0xc/0x14
[<c00d54b8>] pcpu_populate_chunk+0x19e/0x236
[<c00d610e>] pcpu_balance_workfn+0x448/0x464
[<c00408d6>] process_one_work+0x16c/0x2ea
[<c0040b46>] worker_thread+0xf2/0x3b2
[<c004519a>] kthread+0xce/0xdc
[<c002a974>] ret_from_exception+0x0/0xc

This patch fixes above warning by placing FIXMAP area below VMALLOC area.

Fixes: f2c17aa ("RISC-V: Implement compile-time fixed mappings")
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
andr2000 pushed a commit that referenced this issue Apr 5, 2019
If drm_modeset_lock() reports a deadlock it sets the ctx->contexted
field and insists that the caller calls drm_modeset_backoff() or else it
generates a WARN on cleanup.

<4> [1601.870376] WARNING: CPU: 3 PID: 8445 at drivers/gpu/drm/drm_modeset_lock.c:228 drm_modeset_drop_locks+0x35/0x40
<4> [1601.870395] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal i915 coretemp crct10dif_pclmul
<6> [1601.870403] Console: switching
<4> [1601.870403]  snd_hda_intel
<4> [1601.870406] to colour frame buffer device 320x90
<4> [1601.870406]  crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel e1000e snd_hda_core cdc_ether ptp usbnet mii pps_core snd_pcm i2c_i801 mei_me mei prime_numbers
<4> [1601.870422] CPU: 3 PID: 8445 Comm: cat Tainted: G     U            5.0.0-rc7-CI-CI_DRM_5650+ #1
<4> [1601.870424] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.2402.AD3.1810170014 10/17/2018
<4> [1601.870427] RIP: 0010:drm_modeset_drop_locks+0x35/0x40
<4> [1601.870430] Code: 29 48 8b 43 60 48 8d 6b 60 48 39 c5 74 19 48 8b 43 60 48 8d b8 70 ff ff ff e8 87 ff ff ff 48 8b 43 60 48 39 c5 75 e7 5b 5d c3 <0f> 0b eb d3 0f 1f 80 00 00 00 00 41 56 41 55 41 54 55 53 48 8b 6f
<4> [1601.870432] RSP: 0018:ffffc90000d67ce8 EFLAGS: 00010282
<4> [1601.870435] RAX: 00000000ffffffdd RBX: ffffc90000d67d00 RCX: 5dbbe23d00000000
<4> [1601.870437] RDX: 0000000000000000 RSI: 0000000093e6194a RDI: ffffc90000d67d00
<4> [1601.870439] RBP: ffff88849e62e678 R08: 0000000003b7329a R09: 0000000000000001
<4> [1601.870441] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888492100410
<4> [1601.870442] R13: ffff88849ea50958 R14: ffff8884a67eb028 R15: ffff8884a67eb028
<4> [1601.870445] FS:  00007fa7a27745c0(0000) GS:ffff8884aff80000(0000) knlGS:0000000000000000
<4> [1601.870447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [1601.870449] CR2: 000055af07e66000 CR3: 00000004a8cc2006 CR4: 0000000000760ee0
<4> [1601.870451] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [1601.870453] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4> [1601.870454] PKRU: 55555554
<4> [1601.870456] Call Trace:
<4> [1601.870505]  i915_dsc_fec_support_show+0x91/0x190 [i915]
<4> [1601.870522]  seq_read+0xdb/0x3c0
<4> [1601.870531]  full_proxy_read+0x51/0x80
<4> [1601.870538]  __vfs_read+0x31/0x190
<4> [1601.870546]  ? __se_sys_newfstat+0x3c/0x60
<4> [1601.870552]  vfs_read+0x9e/0x150
<4> [1601.870557]  ksys_read+0x50/0xc0
<4> [1601.870564]  do_syscall_64+0x55/0x190
<4> [1601.870569]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [1601.870572] RIP: 0033:0x7fa7a226d081
<4> [1601.870574] Code: fe ff ff 48 8d 3d 67 9c 0a 00 48 83 ec 08 e8 a6 4c 02 00 66 0f 1f 44 00 00 48 8d 05 81 08 2e 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [1601.870576] RSP: 002b:00007ffcc05140c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
<4> [1601.870579] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fa7a226d081
<4> [1601.870581] RDX: 0000000000020000 RSI: 000055af07e63000 RDI: 0000000000000007
<4> [1601.870583] RBP: 0000000000020000 R08: 000000000000007b R09: 0000000000000000
<4> [1601.870585] R10: 000055af07e60010 R11: 0000000000000246 R12: 000055af07e63000
<4> [1601.870587] R13: 0000000000000007 R14: 000055af07e634bf R15: 0000000000020000

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109745
Fixes: e845f09 ("drm/i915/dsc: Add Per connector debugfs node for DSC support/enable")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190329165152.29259-1-chris@chris-wilson.co.uk
(cherry picked from commit ee6df56)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
andr2000 pushed a commit that referenced this issue Apr 5, 2019
When kcm is loaded while many processes try to create a KCM socket, a
crash occurs:
 BUG: unable to handle kernel NULL pointer dereference at 000000000000000e
 IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
 PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0
 Oops: 0002 [#1] SMP KASAN PTI
 CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased)
 RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
 RSP: 0018:ffff88000d487a00 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719
 ...
 CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0
 Call Trace:
  kcm_create+0x600/0xbf0 [kcm]
  __sock_create+0x324/0x750 net/socket.c:1272
 ...

This is due to race between sock_create and unfinished
register_pernet_device. kcm_create tries to do "net_generic(net,
kcm_net_id)". but kcm_net_id is not initialized yet.

So switch the order of the two to close the race.

This can be reproduced with mutiple processes doing socket(PF_KCM, ...)
and one process doing module removal.

Fixes: ab7ac4e ("kcm: Kernel Connection Multiplexor module")
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 5, 2019
When reboot the system again and again, may cause a memory
overwrite.

[   15.638922] systemd[1]: Reached target Swap.
[   15.667561] tun: Universal TUN/TAP device driver, 1.6
[   15.676756] Bridge firewalling registered
[   17.344135] Unable to handle kernel paging request at virtual address 0000000200000040
[   17.352179] Mem abort info:
[   17.355007]   ESR = 0x96000004
[   17.358105]   Exception class = DABT (current EL), IL = 32 bits
[   17.364112]   SET = 0, FnV = 0
[   17.367209]   EA = 0, S1PTW = 0
[   17.370393] Data abort info:
[   17.373315]   ISV = 0, ISS = 0x00000004
[   17.377206]   CM = 0, WnR = 0
[   17.380214] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
[   17.386926] [0000000200000040] pgd=0000000000000000
[   17.391878] Internal error: Oops: 96000004 [#1] SMP
[   17.396824] CPU: 23 PID: 95 Comm: kworker/u130:0 Tainted: G            E     4.19.25-1.2.78.aarch64 #1
[   17.414175] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.54 08/16/2018
[   17.425615] Workqueue: events_unbound async_run_entry_fn
[   17.435151] pstate: 00000005 (nzcv daif -PAN -UAO)
[   17.444139] pc : __mutex_lock.isra.1+0x74/0x540
[   17.453002] lr : __mutex_lock.isra.1+0x3c/0x540
[   17.461701] sp : ffff000100d9bb60
[   17.469146] x29: ffff000100d9bb60 x28: 0000000000000000
[   17.478547] x27: 0000000000000000 x26: ffff802fb8945000
[   17.488063] x25: 0000000000000000 x24: ffff802fa32081a8
[   17.497381] x23: 0000000000000002 x22: ffff801fa2b15220
[   17.506701] x21: ffff000009809000 x20: ffff802fa23a0888
[   17.515980] x19: ffff801fa2b15220 x18: 0000000000000000
[   17.525272] x17: 0000000200000000 x16: 0000000200000000
[   17.534511] x15: 0000000000000000 x14: 0000000000000000
[   17.543652] x13: ffff000008d95db8 x12: 000000000000000d
[   17.552780] x11: ffff000008d95d90 x10: 0000000000000b00
[   17.561819] x9 : ffff000100d9bb90 x8 : ffff802fb89d6560
[   17.570829] x7 : 0000000000000004 x6 : 00000004a1801d05
[   17.579839] x5 : 0000000000000000 x4 : 0000000000000000
[   17.588852] x3 : ffff802fb89d5a00 x2 : 0000000000000000
[   17.597734] x1 : 0000000200000000 x0 : 0000000200000000
[   17.606631] Process kworker/u130:0 (pid: 95, stack limit = 0x(____ptrval____))
[   17.617438] Call trace:
[   17.623349]  __mutex_lock.isra.1+0x74/0x540
[   17.630927]  __mutex_lock_slowpath+0x24/0x30
[   17.638602]  mutex_lock+0x50/0x60
[   17.645295]  drain_workqueue+0x34/0x198
[   17.652623]  __sas_drain_work+0x7c/0x168
[   17.659903]  sas_drain_work+0x60/0x68
[   17.666947]  hisi_sas_scan_finished+0x30/0x40 [hisi_sas_main]
[   17.676129]  do_scsi_scan_host+0x70/0xb0
[   17.683534]  do_scan_async+0x20/0x228
[   17.690586]  async_run_entry_fn+0x4c/0x1d0
[   17.697997]  process_one_work+0x1b4/0x3f8
[   17.705296]  worker_thread+0x54/0x470

Every time the call trace is not the same, but the overwrite address
is always the same:
Unable to handle kernel paging request at virtual address 0000000200000040

The root cause is, when write the reg XGMAC_MAC_TX_LF_RF_CONTROL_REG,
didn't use the io_base offset.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 5, 2019
When a bpf program is uploaded, the driver computes the number of
xdp tx queues resulting in the allocation of additional qsets.
Starting from commit '2ecbe4f4a027 ("net: thunderx: replace global
nicvf_rx_mode_wq work queue for all VFs to private for each of them")'
the driver runs link state polling for each VF resulting in the
following NULL pointer dereference:

[   56.169256] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[   56.178032] Mem abort info:
[   56.180834]   ESR = 0x96000005
[   56.183877]   Exception class = DABT (current EL), IL = 32 bits
[   56.189792]   SET = 0, FnV = 0
[   56.192834]   EA = 0, S1PTW = 0
[   56.195963] Data abort info:
[   56.198831]   ISV = 0, ISS = 0x00000005
[   56.202662]   CM = 0, WnR = 0
[   56.205619] user pgtable: 64k pages, 48-bit VAs, pgdp = 0000000021f0c7a0
[   56.212315] [0000000000000020] pgd=0000000000000000, pud=0000000000000000
[   56.219094] Internal error: Oops: 96000005 [#1] SMP
[   56.260459] CPU: 39 PID: 2034 Comm: ip Not tainted 5.1.0-rc3+ #3
[   56.266452] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS T49 02/02/2018
[   56.273315] pstate: 80000005 (Nzcv daif -PAN -UAO)
[   56.278098] pc : __ll_sc___cmpxchg_case_acq_64+0x4/0x20
[   56.283312] lr : mutex_lock+0x2c/0x50
[   56.286962] sp : ffff0000219af1b0
[   56.290264] x29: ffff0000219af1b0 x28: ffff800f64de49a0
[   56.295565] x27: 0000000000000000 x26: 0000000000000015
[   56.300865] x25: 0000000000000000 x24: 0000000000000000
[   56.306165] x23: 0000000000000000 x22: ffff000011117000
[   56.311465] x21: ffff800f64dfc080 x20: 0000000000000020
[   56.316766] x19: 0000000000000020 x18: 0000000000000001
[   56.322066] x17: 0000000000000000 x16: ffff800f2e077080
[   56.327367] x15: 0000000000000004 x14: 0000000000000000
[   56.332667] x13: ffff000010964438 x12: 0000000000000002
[   56.337967] x11: 0000000000000000 x10: 0000000000000c70
[   56.343268] x9 : ffff0000219af120 x8 : ffff800f2e077d50
[   56.348568] x7 : 0000000000000027 x6 : 000000062a9d6a84
[   56.353869] x5 : 0000000000000000 x4 : ffff800f2e077480
[   56.359169] x3 : 0000000000000008 x2 : ffff800f2e077080
[   56.364469] x1 : 0000000000000000 x0 : 0000000000000020
[   56.369770] Process ip (pid: 2034, stack limit = 0x00000000c862da3a)
[   56.376110] Call trace:
[   56.378546]  __ll_sc___cmpxchg_case_acq_64+0x4/0x20
[   56.383414]  drain_workqueue+0x34/0x198
[   56.387247]  nicvf_open+0x48/0x9e8 [nicvf]
[   56.391334]  nicvf_open+0x898/0x9e8 [nicvf]
[   56.395507]  nicvf_xdp+0x1bc/0x238 [nicvf]
[   56.399595]  dev_xdp_install+0x68/0x90
[   56.403333]  dev_change_xdp_fd+0xc8/0x240
[   56.407333]  do_setlink+0x8e0/0xbe8
[   56.410810]  __rtnl_newlink+0x5b8/0x6d8
[   56.414634]  rtnl_newlink+0x54/0x80
[   56.418112]  rtnetlink_rcv_msg+0x22c/0x2f8
[   56.422199]  netlink_rcv_skb+0x60/0x120
[   56.426023]  rtnetlink_rcv+0x28/0x38
[   56.429587]  netlink_unicast+0x1c8/0x258
[   56.433498]  netlink_sendmsg+0x1b4/0x350
[   56.437410]  sock_sendmsg+0x4c/0x68
[   56.440887]  ___sys_sendmsg+0x240/0x280
[   56.444711]  __sys_sendmsg+0x68/0xb0
[   56.448275]  __arm64_sys_sendmsg+0x2c/0x38
[   56.452361]  el0_svc_handler+0x9c/0x128
[   56.456186]  el0_svc+0x8/0xc
[   56.459056] Code: 35ffff91 2a1003e0 d65f03c0 f9800011 (c85ffc10)
[   56.465166] ---[ end trace 4a57fdc27b0a572c ]---
[   56.469772] Kernel panic - not syncing: Fatal exception

Fix it by checking nicvf_rx_mode_wq pointer in nicvf_open and nicvf_stop

Fixes: 2ecbe4f ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them")
Fixes: 2c632ad ("net: thunderx: move link state polling function to VF")
Reported-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 5, 2019
the control path of 'sample' action does not validate the value of 'rate'
provided by the user, but then it uses it as divisor in the traffic path.
Validate it in tcf_sample_init(), and return -EINVAL with a proper extack
message in case that value is zero, to fix a splat with the script below:

 # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2
 # tc -s a s action sample
 total acts 1

         action order 0: sample rate 1/0 group 1 pipe
          index 2 ref 1 bind 1 installed 19 sec used 19 sec
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0
 # ping 192.0.2.1 -I test0 -c1 -q

 divide error: 0000 [#1] SMP PTI
 CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ torvalds#591
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample]
 Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01
 RSP: 0018:ffffae320190ba30 EFLAGS: 00010246
 RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49
 RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0
 RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000
 R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80
 R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000
 FS:  00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0
 Call Trace:
  tcf_action_exec+0x7c/0x1c0
  tcf_classify+0x57/0x160
  __dev_queue_xmit+0x3dc/0xd10
  ip_finish_output2+0x257/0x6d0
  ip_output+0x75/0x280
  ip_send_skb+0x15/0x40
  raw_sendmsg+0xae3/0x1410
  sock_sendmsg+0x36/0x40
  __sys_sendto+0x10e/0x140
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x60/0x210
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [...]
  Kernel panic - not syncing: Fatal exception in interrupt

Add a TDC selftest to document that 'rate' is now being validated.

Reported-by: Matteo Croce <mcroce@redhat.com>
Fixes: 5c5670f ("net/sched: Introduce sample tc action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Yotam Gigi <yotam.gi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Apr 5, 2019
Fix device initialization completion handling for vNIC adapters.
Initialize the completion structure on probe and reinitialize when needed.
This also fixes a race condition during kdump where the driver can attempt
to access the completion struct before it is initialized:

Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc0000000081acbe0
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: ibmvnic(+) ibmveth sunrpc overlay squashfs loop
CPU: 19 PID: 301 Comm: systemd-udevd Not tainted 4.18.0-64.el8.ppc64le #1
NIP:  c0000000081acbe0 LR: c0000000081ad964 CTR: c0000000081ad900
REGS: c000000027f3f990 TRAP: 0300   Not tainted  (4.18.0-64.el8.ppc64le)
MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28228288  XER: 00000006
CFAR: c000000008008934 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1
GPR00: c0000000081ad964 c000000027f3fc10 c0000000095b5800 c0000000221b4e58
GPR04: 0000000000000003 0000000000000001 000049a086918581 00000000000000d4
GPR08: 0000000000000007 0000000000000000 ffffffffffffffe8 d0000000014dde28
GPR12: c0000000081ad900 c000000009a00c00 0000000000000001 0000000000000100
GPR16: 0000000000000038 0000000000000007 c0000000095e2230 0000000000000006
GPR20: 0000000000400140 0000000000000001 c00000000910c880 0000000000000000
GPR24: 0000000000000000 0000000000000006 0000000000000000 0000000000000003
GPR28: 0000000000000001 0000000000000001 c0000000221b4e60 c0000000221b4e58
NIP [c0000000081acbe0] __wake_up_locked+0x50/0x100
LR [c0000000081ad964] complete+0x64/0xa0
Call Trace:
[c000000027f3fc10] [c000000027f3fc60] 0xc000000027f3fc60 (unreliable)
[c000000027f3fc60] [c0000000081ad964] complete+0x64/0xa0
[c000000027f3fca0] [d0000000014dad58] ibmvnic_handle_crq+0xce0/0x1160 [ibmvnic]
[c000000027f3fd50] [d0000000014db270] ibmvnic_tasklet+0x98/0x130 [ibmvnic]
[c000000027f3fda0] [c00000000813f334] tasklet_action_common.isra.3+0xc4/0x1a0
[c000000027f3fe00] [c000000008cd13f4] __do_softirq+0x164/0x400
[c000000027f3fef0] [c00000000813ed64] irq_exit+0x184/0x1c0
[c000000027f3ff20] [c0000000080188e8] __do_irq+0xb8/0x210
[c000000027f3ff90] [c00000000802d0a4] call_do_irq+0x14/0x24
[c000000026a5b010] [c000000008018adc] do_IRQ+0x9c/0x130
[c000000026a5b060] [c000000008008ce4] hardware_interrupt_common+0x114/0x120

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Aug 13, 2020
…pping_late()

Commit

  84e6ffb ("arm: add support for folded p4d page tables")

updated create_mapping_late() to take folded P4Ds into account when
creating mappings, but inverted the p4d_alloc() failure test, resulting
in no mapping to be created at all.

When the EFI rtc driver subsequently tries to invoke the EFI GetTime()
service, the memory regions covering the EFI data structures are missing
from the page tables, resulting in a crash like

  Unable to handle kernel paging request at virtual address 5ae0cf28
  pgd = (ptrval)
  [5ae0cf28] *pgd=80000040205003, *pmd=00000000
  Internal error: Oops: 207 [#1] SMP THUMB2
  Modules linked in:
  CPU: 0 PID: 7 Comm: kworker/u32:0 Not tainted 5.7.0+ xen-troops#92
  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
  Workqueue: efi_rts_wq efi_call_rts
  PC is at efi_call_rts+0x94/0x294
  LR is at efi_call_rts+0x83/0x294
  pc : [<c0b4f098>]    lr : [<c0b4f087>]    psr: 30000033
  sp : e6219ef0  ip : 00000000  fp : ffffe000
  r10: 00000000  r9 : 00000000  r8 : 30000013
  r7 : e6201dd0  r6 : e6201ddc  r5 : 00000000  r4 : c181f264
  r3 : 5ae0cf10  r2 : 00000001  r1 : e6201dd0  r0 : e6201ddc
  Flags: nzCV  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment none
  Control: 70c5383d  Table: 661cc840  DAC: 00000001
  Process kworker/u32:0 (pid: 7, stack limit = 0x(ptrval))
  ...
  [<c0b4f098>] (efi_call_rts) from [<c0448219>] (process_one_work+0x16d/0x3d8)
  [<c0448219>] (process_one_work) from [<c0448581>] (worker_thread+0xfd/0x408)
  [<c0448581>] (worker_thread) from [<c044ca7b>] (kthread+0x103/0x104)
  ...

Fixes: 84e6ffb ("arm: add support for folded p4d page tables")
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
andr2000 pushed a commit that referenced this issue Aug 13, 2020
mlx5e_rep_is_lag_netdev is used as first check as part of netdev events
handler for bond device of non-uplink representors, this handler can get
any netdevice under the same network namespace of mlx5e netdevice. Current
code treats the netdev as mlx5e netdev and only later on verifies this,
hence causes the following Kasan trace:
[15402.744990] ==================================================================
[15402.746942] BUG: KASAN: slab-out-of-bounds in mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
[15402.749009] Read of size 8 at addr ffff880391f3f6b0 by task ovs-vswitchd/5347

[15402.752065] CPU: 7 PID: 5347 Comm: ovs-vswitchd Kdump: loaded Tainted: G    B      O     --------- -t - 4.18.0-g3dcc204d291d-dirty #1
[15402.755349] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[15402.757600] Call Trace:
[15402.758968]  dump_stack+0x71/0xab
[15402.760427]  print_address_description+0x6a/0x270
[15402.761969]  kasan_report+0x179/0x2d0
[15402.763445]  ? mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
[15402.765121]  mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
[15402.766782]  mlx5e_rep_esw_bond_netevent+0x129/0x620 [mlx5_core]

Fix by deferring the violating access to be post the netdev verify check.

Fixes: 7e51891 ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
andr2000 pushed a commit that referenced this issue Aug 13, 2020
After the cited commit, function 'mlx5_eswitch_set_vport_vlan' started
to acquire esw->state_lock.
However, esw is not defined for VF devices, hence attempting to set vf
VLANID on a VF dev will cause a kernel panic.

Fix it by moving up the (redundant) esw validation from function
'__mlx5_eswitch_set_vport_vlan' since the rest of the callers now have
and use a valid esw.

For example with vf device eth4:
 # ip link set dev eth4 vf 0 vlan 0

Trace of the panic:
 [  411.409842] BUG: unable to handle page fault for address: 00000000000011b8
 [  411.449745] #PF: supervisor read access in kernel mode
 [  411.452348] #PF: error_code(0x0000) - not-present page
 [  411.454938] PGD 80000004189c9067 P4D 80000004189c9067 PUD 41899a067 PMD 0
 [  411.458382] Oops: 0000 [#1] SMP PTI
 [  411.460268] CPU: 4 PID: 5711 Comm: ip Not tainted 5.8.0-rc4_for_upstream_min_debug_2020_07_08_22_04 #1
 [  411.462447] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
 [  411.464158] RIP: 0010:__mutex_lock+0x4e/0x940
 [  411.464928] Code: fd 41 54 49 89 f4 41 52 53 89 d3 48 83 ec 70 44 8b 1d ee 03 b0 01 65 48 8b 04 25 28 00 00 00 48 89 45 c8 31 c0 45 85 db 75 0a <48> 3b 7f 60 0f 85 7e 05 00 00 49 8d 45 68 41 56 41 b8 01 00 00 00
 [  411.467678] RSP: 0018:ffff88841fcd74b0 EFLAGS: 00010246
 [  411.468562] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
 [  411.469715] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000001158
 [  411.470812] RBP: ffff88841fcd7550 R08: ffffffffa00fa1ce R09: 0000000000000000
 [  411.471835] R10: ffff88841fcd7570 R11: 0000000000000000 R12: 0000000000000002
 [  411.472862] R13: 0000000000001158 R14: ffffffffa00fa1ce R15: 0000000000000000
 [  411.474004] FS:  00007faee7ca6b80(0000) GS:ffff88846fc00000(0000) knlGS:0000000000000000
 [  411.475237] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [  411.476129] CR2: 00000000000011b8 CR3: 000000041909c006 CR4: 0000000000360ea0
 [  411.477260] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [  411.478340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [  411.479332] Call Trace:
 [  411.479760]  ? __nla_validate_parse.part.6+0x57/0x8f0
 [  411.482825]  ? mlx5_eswitch_set_vport_vlan+0x3e/0xa0 [mlx5_core]
 [  411.483804]  mlx5_eswitch_set_vport_vlan+0x3e/0xa0 [mlx5_core]
 [  411.484733]  mlx5e_set_vf_vlan+0x41/0x50 [mlx5_core]
 [  411.485545]  do_setlink+0x613/0x1000
 [  411.486165]  __rtnl_newlink+0x53d/0x8c0
 [  411.486791]  ? mark_held_locks+0x49/0x70
 [  411.487429]  ? __lock_acquire+0x8fe/0x1eb0
 [  411.488085]  ? rcu_read_lock_sched_held+0x52/0x60
 [  411.488998]  ? kmem_cache_alloc_trace+0x16d/0x2d0
 [  411.489759]  rtnl_newlink+0x47/0x70
 [  411.490357]  rtnetlink_rcv_msg+0x24e/0x450
 [  411.490978]  ? netlink_deliver_tap+0x92/0x3d0
 [  411.491631]  ? validate_linkmsg+0x330/0x330
 [  411.492262]  netlink_rcv_skb+0x47/0x110
 [  411.492852]  netlink_unicast+0x1ac/0x270
 [  411.493551]  netlink_sendmsg+0x336/0x450
 [  411.494209]  sock_sendmsg+0x30/0x40
 [  411.494779]  ____sys_sendmsg+0x1dd/0x1f0
 [  411.495378]  ? copy_msghdr_from_user+0x5c/0x90
 [  411.496082]  ___sys_sendmsg+0x87/0xd0
 [  411.496683]  ? lock_acquire+0xb9/0x3a0
 [  411.497322]  ? lru_cache_add+0x5/0x170
 [  411.497944]  ? find_held_lock+0x2d/0x90
 [  411.498568]  ? handle_mm_fault+0xe46/0x18c0
 [  411.499205]  ? __sys_sendmsg+0x51/0x90
 [  411.499784]  __sys_sendmsg+0x51/0x90
 [  411.500341]  do_syscall_64+0x59/0x2e0
 [  411.500938]  ? asm_exc_page_fault+0x8/0x30
 [  411.501609]  ? rcu_read_lock_sched_held+0x52/0x60
 [  411.502350]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 [  411.503093] RIP: 0033:0x7faee73b85a7
 [  411.503654] Code: Bad RIP value.

Fixes: 0e18134 ("net/mlx5e: Eswitch, use state_lock to synchronize vlan change")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
andr2000 pushed a commit that referenced this issue Aug 13, 2020
It appears that not disabling a PCI device on .shutdown may lead to
a Hardware Error with particular (perhaps buggy) BIOS versions:

    mlx4_en: eth0: Close port called
    mlx4_en 0000:04:00.0: removed PHC
    reboot: Restarting system
    {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
    {1}[Hardware Error]: event severity: fatal
    {1}[Hardware Error]:  Error 0, type: fatal
    {1}[Hardware Error]:   section_type: PCIe error
    {1}[Hardware Error]:   port_type: 4, root port
    {1}[Hardware Error]:   version: 1.16
    {1}[Hardware Error]:   command: 0x4010, status: 0x0143
    {1}[Hardware Error]:   device_id: 0000:00:02.2
    {1}[Hardware Error]:   slot: 0
    {1}[Hardware Error]:   secondary_bus: 0x04
    {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f06
    {1}[Hardware Error]:   class_code: 000604
    {1}[Hardware Error]:   bridge: secondary_status: 0x2000, control: 0x0003
    {1}[Hardware Error]:   aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00000000
    {1}[Hardware Error]:   aer_uncor_severity: 0x00062030
    {1}[Hardware Error]:   TLP Header: 40000018 040000ff 791f4080 00000000
[hw error repeats]
    Kernel panic - not syncing: Fatal hardware error!
    CPU: 0 PID: 2189 Comm: reboot Kdump: loaded Not tainted 5.6.x-blabla #1
    Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 05/05/2017

Fix the mlx4 driver.

This is a very similar problem to what had been fixed in:
commit 0d98ba8 ("scsi: hpsa: disable device during shutdown")
to address https://bugzilla.kernel.org/show_bug.cgi?id=199779.

Fixes: 2ba5fbd ("net/mlx4_core: Handle AER flow properly")
Reported-by: Jake Lawrence <lawja@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Aug 13, 2020
Ido Schimmel says:

====================
mlxsw fixes

This patch set contains various fixes for mlxsw.

Patches #1-#2 fix two trap related issues introduced in previous cycle.

Patches #3-xen-troops#5 fix rare use-after-frees discovered by syzkaller. After
over a week of fuzzing with the fixes, the bugs did not reproduce.

Patch xen-troops#6 from Amit fixes an issue in the ethtool selftest that was
recently discovered after running the test on a new platform that
supports only 1Gbps and 10Gbps speeds.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
andr2000 pushed a commit that referenced this issue Aug 13, 2020
In DCT and RSS RAW QP creation flows, the QP mutex wasn't initialized and
the magic field inside lock was missing. This caused to the following
kernel warning for kernels build with CONFIG_DEBUG_MUTEXES.

 DEBUG_LOCKS_WARN_ON(lock->magic != lock)
 WARNING: CPU: 3 PID: 16261 at kernel/locking/mutex.c:938 __mutex_lock+0x60e/0x940
 Modules linked in: bonding nf_tables ipip tunnel4 geneve ip6_udp_tunnel udp_tunnel ip6_gre ip6_tunnel tunnel6 ip_gre gre ip_tunnel mlx5_ib mlx5_core mlxfw ptp pps_core rdma_ucm ib_uverbs ib_ipoib ib_umad openvswitch nsh xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay ib_srp scsi_transport_srp rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core [last unloaded: mlxfw]
 CPU: 3 PID: 16261 Comm: ib_send_bw Not tainted 5.8.0-rc4_for_upstream_min_debug_2020_07_08_22_04 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
 RIP: 0010:__mutex_lock+0x60e/0x940
 Code: c0 0f 84 6d fa ff ff 44 8b 15 4e 9d ba 00 45 85 d2 0f 85 5d fa ff ff 48 c7 c6 f2 de 2b 82 48 c7 c7 f1 8a 2b 82 e8 d2 4d 72 ff <0f> 0b 4c 8b 4d 88 e9 3f fa ff ff f6 c2 04 0f 84 37 fe ff ff 48 89
 RSP: 0018:ffff88810bb8b870 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: ffff88829f1dd880 RSI: 0000000000000000 RDI: ffffffff81192afa
 RBP: ffff88810bb8b910 R08: 0000000000000000 R09: 0000000000000028
 R10: 0000000000000000 R11: 0000000000003f85 R12: 0000000000000002
 R13: ffff88827d8d3ce0 R14: ffffffffa059f615 R15: ffff8882a4d02610
 FS:  00007f3f6988e740(0000) GS:ffff8882f5b80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000556556158000 CR3: 000000010a63c005 CR4: 0000000000360ea0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  ? cmd_exec+0x947/0xe60 [mlx5_core]
  ? __mutex_lock+0x76/0x940
  ? mlx5_ib_qp_set_counter+0x25/0xa0 [mlx5_ib]
  mlx5_ib_qp_set_counter+0x25/0xa0 [mlx5_ib]
  mlx5_ib_counter_bind_qp+0x9b/0xe0 [mlx5_ib]
  __rdma_counter_bind_qp+0x6b/0xa0 [ib_core]
  rdma_counter_bind_qp_auto+0x363/0x520 [ib_core]
  _ib_modify_qp+0x316/0x580 [ib_core]
  ib_modify_qp_with_udata+0x19/0x30 [ib_core]
  modify_qp+0x4c4/0x600 [ib_uverbs]
  ib_uverbs_ex_modify_qp+0x87/0xe0 [ib_uverbs]
  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x129/0x1c0 [ib_uverbs]
  ib_uverbs_cmd_verbs.isra.5+0x5d5/0x11f0 [ib_uverbs]
  ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x120/0x120 [ib_uverbs]
  ? lock_acquire+0xb9/0x3a0
  ? ib_uverbs_ioctl+0xd0/0x210 [ib_uverbs]
  ? ib_uverbs_ioctl+0x175/0x210 [ib_uverbs]
  ib_uverbs_ioctl+0x14b/0x210 [ib_uverbs]
  ? ib_uverbs_ioctl+0xd0/0x210 [ib_uverbs]
  ksys_ioctl+0x234/0x7d0
  ? exc_page_fault+0x202/0x640
  ? do_syscall_64+0x1f/0x2e0
  __x64_sys_ioctl+0x16/0x20
  do_syscall_64+0x59/0x2e0
  ? asm_exc_page_fault+0x8/0x30
  ? rcu_read_lock_sched_held+0x52/0x60
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: b4aaa1f ("IB/mlx5: Handle type IB_QPT_DRIVER when creating a QP")
Link: https://lore.kernel.org/r/20200730082719.1582397-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
andr2000 pushed a commit that referenced this issue Aug 13, 2020
Fix kernel oops observed when an ext adv data is larger than 31 bytes.

This can be reproduced by setting up an advertiser with advertisement
larger than 31 bytes.  The issue is not sensitive to the advertisement
content.  In particular, this was reproduced with an advertisement of
229 bytes filled with 'A'.  See stack trace below.

This is fixed by not catching ext_adv as legacy adv are only cached to
be able to concatenate a scanable adv with its scan response before
sending it up through mgmt.

With ext_adv, this is no longer necessary.

  general protection fault: 0000 [#1] SMP PTI
  CPU: 6 PID: 205 Comm: kworker/u17:0 Not tainted 5.4.0-37-generic xen-troops#41-Ubuntu
  Hardware name: Dell Inc. XPS 15 7590/0CF6RR, BIOS 1.7.0 05/11/2020
  Workqueue: hci0 hci_rx_work [bluetooth]
  RIP: 0010:hci_bdaddr_list_lookup+0x1e/0x40 [bluetooth]
  Code: ff ff e9 26 ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 07 48 89 e5 48 39 c7 75 0a eb 24 48 8b 00 48 39 f8 74 1c 44 8b 06 <44> 39 40 10 75 ef 44 0f b7 4e 04 66 44 39 48 14 75 e3 38 50 16 75
  RSP: 0018:ffffbc6a40493c70 EFLAGS: 00010286
  RAX: 4141414141414141 RBX: 000000000000001b RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffff9903e76c100f RDI: ffff9904289d4b28
  RBP: ffffbc6a40493c70 R08: 0000000093570362 R09: 0000000000000000
  R10: 0000000000000000 R11: ffff9904344eae38 R12: ffff9904289d4000
  R13: 0000000000000000 R14: 00000000ffffffa3 R15: ffff9903e76c100f
  FS: 0000000000000000(0000) GS:ffff990434580000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007feed125a000 CR3: 00000001b860a003 CR4: 00000000003606e0
  Call Trace:
    process_adv_report+0x12e/0x560 [bluetooth]
    hci_le_meta_evt+0x7b2/0xba0 [bluetooth]
    hci_event_packet+0x1c29/0x2a90 [bluetooth]
    hci_rx_work+0x19b/0x360 [bluetooth]
    process_one_work+0x1eb/0x3b0
    worker_thread+0x4d/0x400
    kthread+0x104/0x140

Fixes: c215e93 ("Bluetooth: Process extended ADV report event")
Reported-by: Andy Nguyen <theflow@google.com>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
Signed-off-by: Alain Michaud <alainm@chromium.org>
Tested-by: Sonny Sasaka <sonnysasaka@chromium.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
andr2000 pushed a commit that referenced this issue Aug 13, 2020
I compiled with AddressSanitizer and I had these memory leaks while I
was using the tep_parse_format function:

    Direct leak of 28 byte(s) in 4 object(s) allocated from:
        #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
        #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985
        #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140
        #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206
        xen-troops#4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291
        xen-troops#5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299
        xen-troops#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849
        xen-troops#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161
        xen-troops#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207
        xen-troops#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786
        xen-troops#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285
        xen-troops#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369
        xen-troops#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335
        xen-troops#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389
        xen-troops#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431
        xen-troops#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251
        xen-troops#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284
        xen-troops#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593
        xen-troops#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727
        xen-troops#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048
        xen-troops#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127
        xen-troops#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152
        xen-troops#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252
        xen-troops#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347
        xen-troops#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461
        xen-troops#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673
        xen-troops#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

The token variable in the process_dynamic_array_len function is
allocated in the read_expect_type function, but is not freed before
calling the read_token function.

Free the token variable before calling read_token in order to plug the
leak.

Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant