Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on building kernel #12

Closed
Xephi opened this issue Apr 2, 2015 · 0 comments
Closed

Error on building kernel #12

Xephi opened this issue Apr 2, 2015 · 0 comments

Comments

@Xephi
Copy link

Xephi commented Apr 2, 2015

Hi ! I've got that kind of error while trying to compile for cancro :/ :

arch/arm/mach-msm/smd_init_dt.c:24:25: fatal error: smd_private.h: No such file or directory
compilation terminated.
scripts/Makefile.build:307: recipe for target 'arch/arm/mach-msm/smd_init_dt.o' failed
make[1]: *** [arch/arm/mach-msm/smd_init_dt.o] Error 1
Makefile:953: recipe for target 'arch/arm/mach-msm' failed
make: *** [arch/arm/mach-msm] Error 2

but the file exist >.<

Fixed in this :
#13

@Xephi Xephi closed this as completed Apr 7, 2015
msfkonsole pushed a commit to msfkonsole/android_kernel_xiaomi_dior that referenced this issue Mar 13, 2016
If extent cache is disable, we will encounter oops when triggering direct
IO as below:

BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs]
*pdpt = 000000002bb9a001 *pde = 0000000000000000
Oops: 0000 [MiCode#1] SMP
Modules linked in: f2fs(O) fuse bnep rfcomm bluetooth nfsd dm_crypt nfs_acl auth_rpcgss oid_registry nfs binfmt_misc fscache lockd
sunrpc grace snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
snd_seq_device snd soundcore joydev psmouse hid_generic i2c_piix4 serio_raw ppdev mac_hid parport_pc lp parport ext4 jbd2 mbcache
usbhid hid e1000
CPU: 3 PID: 3608 Comm: dd Tainted: G           O    4.2.0-rc4 MiCode#12
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: ef161600 ti: ebd5e000 task.ti: ebd5e000
EIP: 0060:[<f0b9c61e>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_drop_largest_extent+0xe/0x30 [f2fs]
EAX: 00000000 EBX: ddebc000 ECX: 00000000 EDX: 00000000
ESI: ebd5fdf8 EDI: 00000000 EBP: ebd5fd58 ESP: ebd5fd58
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 0000000c CR3: 2c24ee40 CR4: 000006f0
Stack:
 ebd5fda4 f0b8c005 00000000 00000001 00000000 f0b8c430 c816cd68 ddebc000
 ddebc088 00001000 00000555 00000555 ffffffff c160bb00 00055501 00000000
 00000000 00000100 00000000 ebd5fe20 f0b8c430 00000046 ef161600 00001000
Call Trace:
 [<f0b8c005>] __allocate_data_block+0x1a5/0x260 [f2fs]
 [<f0b8c430>] ? f2fs_direct_IO+0x370/0x440 [f2fs]
 [<c160bb00>] ? down_read+0x30/0x50
 [<f0b8c430>] f2fs_direct_IO+0x370/0x440 [f2fs]
 [<c113e115>] generic_file_direct_write+0xa5/0x260
 [<c10b53f8>] ? current_fs_time+0x18/0x50
 [<c113e38b>] __generic_file_write_iter+0xbb/0x210
 [<c113e50f>] ? generic_file_write_iter+0x2f/0x320
 [<c113e63c>] generic_file_write_iter+0x15c/0x320
 [<f0b77f29>] f2fs_file_write_iter+0x39/0x80 [f2fs]
 [<c11984d9>] __vfs_write+0xa9/0xe0
 [<c1199227>] vfs_write+0x97/0x180
 [<c119955b>] SyS_write+0x5b/0xd0
 [<c160dcd0>] sysenter_do_call+0x12/0x12
Code: 10 8b 50 1c 89 53 14 eb ca 8d 74 26 00 85 f6 74 86 eb a6 0f 0b 90 8d b4 26 00 00 00 00 55 89 e5 3e 8d 74 26 00 8b 80 d4 02 00
00 <8b> 48 0c 39 d1 77 0e 03 48 14 39 ca 73 07 c7 40 14 00 00 00 00
EIP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs] SS:ESP 0068:ebd5fd58
CR2: 000000000000000c
---[ end trace a38c07026a1afffd ]---

This is because when extent cache is disable, extent_tree pointer in struct
f2fs_inode_info should be NULL, but in f2fs_drop_largest_extent we access
this NULL pointer directly without checking state of extent cache, then,
the oops occurs. Let's fix it by checking state of extent cache before
accessing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
bgcngm pushed a commit to Mi5Devs/android_kernel_xiaomi_msm8996 that referenced this issue Oct 8, 2016
[ Upstream commit ecf5fc6 ]

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  MiCode#3 io_schedule_timeout at ffffffff815aad6a
  MiCode#4 bit_wait_io at ffffffff815abfc6
  MiCode#5 __wait_on_bit at ffffffff815abda5
  MiCode#6 wait_on_page_bit at ffffffff8111fd4f
  MiCode#7 shrink_page_list at ffffffff81135445
  MiCode#8 shrink_inactive_list at ffffffff81135845
  MiCode#9 shrink_lruvec at ffffffff81135ead
 MiCode#10 shrink_zone at ffffffff811360c3
 MiCode#11 shrink_zones at ffffffff81136eff
 MiCode#12 do_try_to_free_pages at ffffffff8113712f
 MiCode#13 try_to_free_mem_cgroup_pages at ffffffff811372be
 MiCode#14 try_charge at ffffffff81189423
 MiCode#15 mem_cgroup_try_charge at ffffffff8118c6f5
 MiCode#16 __add_to_page_cache_locked at ffffffff8112137d
 MiCode#17 add_to_page_cache_lru at ffffffff81121618
 MiCode#18 pagecache_get_page at ffffffff8112170b
 MiCode#19 grow_dev_page at ffffffff811c8297
 MiCode#20 __getblk_slow at ffffffff811c91d6
 MiCode#21 __getblk_gfp at ffffffff811c92c1
 MiCode#22 ext4_ext_grow_indepth at ffffffff8124565c
 MiCode#23 ext4_ext_create_new_leaf at ffffffff81246ca8
 MiCode#24 ext4_ext_insert_extent at ffffffff81246f09
 MiCode#25 ext4_ext_map_blocks at ffffffff8124a848
 MiCode#26 ext4_map_blocks at ffffffff8121a5b7
 MiCode#27 mpage_map_one_extent at ffffffff8121b1fa
 MiCode#28 mpage_map_and_submit_extent at ffffffff8121f07b
 MiCode#29 ext4_writepages at ffffffff8121f6d5
 MiCode#30 do_writepages at ffffffff8112c490
 MiCode#31 __filemap_fdatawrite_range at ffffffff81120199
 MiCode#32 filemap_flush at ffffffff8112041c
 MiCode#33 ext4_alloc_da_blocks at ffffffff81219da1
 MiCode#34 ext4_rename at ffffffff81229b91
 MiCode#35 ext4_rename2 at ffffffff81229e32
 MiCode#36 vfs_rename at ffffffff811a08a5
 MiCode#37 SYSC_renameat2 at ffffffff811a3ffc
 MiCode#38 sys_renameat2 at ffffffff811a408e
 MiCode#39 sys_rename at ffffffff8119e51e
 MiCode#40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
AndropaX pushed a commit to AndropaX/android_kernel_xiaomi_msm8992 that referenced this issue Jul 10, 2017
commit 45caeaa upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 MiCode#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 MiCode#9 [] tcp_rcv_established at ffffffff81580b64
MiCode#10 [] tcp_v4_do_rcv at ffffffff8158b54a
MiCode#11 [] tcp_v4_rcv at ffffffff8158cd02
MiCode#12 [] ip_local_deliver_finish at ffffffff815668f4
MiCode#13 [] ip_local_deliver at ffffffff81566bd9
MiCode#14 [] ip_rcv_finish at ffffffff8156656d
MiCode#15 [] ip_rcv at ffffffff81566f06
MiCode#16 [] __netif_receive_skb_core at ffffffff8152b3a2
MiCode#17 [] __netif_receive_skb at ffffffff8152b608
MiCode#18 [] netif_receive_skb at ffffffff8152b690
MiCode#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
MiCode#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
MiCode#21 [] net_rx_action at ffffffff8152bac2
MiCode#22 [] __do_softirq at ffffffff81084b4f
MiCode#23 [] call_softirq at ffffffff8164845c
MiCode#24 [] do_softirq at ffffffff81016fc5
MiCode#25 [] irq_exit at ffffffff81084ee5
MiCode#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)�
 225 {�
 226 �       const struct inet_connection_sock *icsk = inet_csk(sk);�
 227 �       const struct dst_entry *dst = __sk_dst_get(sk);�
 228 �
 229 �       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||�
 230 �       �       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);�
 231 }�

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
mihirshah006 pushed a commit to mihirshah006/Xiaomi_Kernel_OpenSource that referenced this issue Jul 25, 2017
commit ecf5fc6 upstream.

Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  premaca#1 schedule at ffffffff815ab76e
  premaca#2 schedule_timeout at ffffffff815ae5e5
  MiCode#3 io_schedule_timeout at ffffffff815aad6a
  MiCode#4 bit_wait_io at ffffffff815abfc6
  MiCode#5 __wait_on_bit at ffffffff815abda5
  MiCode#6 wait_on_page_bit at ffffffff8111fd4f
  MiCode#7 shrink_page_list at ffffffff81135445
  MiCode#8 shrink_inactive_list at ffffffff81135845
  MiCode#9 shrink_lruvec at ffffffff81135ead
 MiCode#10 shrink_zone at ffffffff811360c3
 MiCode#11 shrink_zones at ffffffff81136eff
 MiCode#12 do_try_to_free_pages at ffffffff8113712f
 MiCode#13 try_to_free_mem_cgroup_pages at ffffffff811372be
 MiCode#14 try_charge at ffffffff81189423
 MiCode#15 mem_cgroup_try_charge at ffffffff8118c6f5
 MiCode#16 __add_to_page_cache_locked at ffffffff8112137d
 MiCode#17 add_to_page_cache_lru at ffffffff81121618
 MiCode#18 pagecache_get_page at ffffffff8112170b
 MiCode#19 grow_dev_page at ffffffff811c8297
 MiCode#20 __getblk_slow at ffffffff811c91d6
 MiCode#21 __getblk_gfp at ffffffff811c92c1
 MiCode#22 ext4_ext_grow_indepth at ffffffff8124565c
 MiCode#23 ext4_ext_create_new_leaf at ffffffff81246ca8
 MiCode#24 ext4_ext_insert_extent at ffffffff81246f09
 MiCode#25 ext4_ext_map_blocks at ffffffff8124a848
 MiCode#26 ext4_map_blocks at ffffffff8121a5b7
 MiCode#27 mpage_map_one_extent at ffffffff8121b1fa
 MiCode#28 mpage_map_and_submit_extent at ffffffff8121f07b
 MiCode#29 ext4_writepages at ffffffff8121f6d5
 MiCode#30 do_writepages at ffffffff8112c490
 MiCode#31 __filemap_fdatawrite_range at ffffffff81120199
 MiCode#32 filemap_flush at ffffffff8112041c
 MiCode#33 ext4_alloc_da_blocks at ffffffff81219da1
 MiCode#34 ext4_rename at ffffffff81229b91
 MiCode#35 ext4_rename2 at ffffffff81229e32
 MiCode#36 vfs_rename at ffffffff811a08a5
 MiCode#37 SYSC_renameat2 at ffffffff811a3ffc
 MiCode#38 sys_renameat2 at ffffffff811a408e
 MiCode#39 sys_rename at ffffffff8119e51e
 MiCode#40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[@MSF-Jarvis: Fix conflicts from "mm: vmscan: stall page reclaim after a list of pages have been processed" ]

Change-Id: I09aa7c565388b4b323034d5c71a463f4fb175462
mihirshah006 pushed a commit to mihirshah006/Xiaomi_Kernel_OpenSource that referenced this issue Jul 25, 2017
commit 45caeaa upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 MiCode#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 MiCode#9 [] tcp_rcv_established at ffffffff81580b64
MiCode#10 [] tcp_v4_do_rcv at ffffffff8158b54a
MiCode#11 [] tcp_v4_rcv at ffffffff8158cd02
MiCode#12 [] ip_local_deliver_finish at ffffffff815668f4
MiCode#13 [] ip_local_deliver at ffffffff81566bd9
MiCode#14 [] ip_rcv_finish at ffffffff8156656d
MiCode#15 [] ip_rcv at ffffffff81566f06
MiCode#16 [] __netif_receive_skb_core at ffffffff8152b3a2
MiCode#17 [] __netif_receive_skb at ffffffff8152b608
MiCode#18 [] netif_receive_skb at ffffffff8152b690
MiCode#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
MiCode#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
MiCode#21 [] net_rx_action at ffffffff8152bac2
MiCode#22 [] __do_softirq at ffffffff81084b4f
MiCode#23 [] call_softirq at ffffffff8164845c
MiCode#24 [] do_softirq at ffffffff81016fc5
MiCode#25 [] irq_exit at ffffffff81084ee5
MiCode#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)�
 225 {�
 226 �       const struct inet_connection_sock *icsk = inet_csk(sk);�
 227 �       const struct dst_entry *dst = __sk_dst_get(sk);�
 228 �
 229 �       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||�
 230 �       �       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);�
 231 }�

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
maxprzemo pushed a commit to maxprzemo/Xiaomi_Kernel_OpenSource that referenced this issue Feb 18, 2018
commit 45caeaa upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 MiCode#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 MiCode#9 [] tcp_rcv_established at ffffffff81580b64
MiCode#10 [] tcp_v4_do_rcv at ffffffff8158b54a
MiCode#11 [] tcp_v4_rcv at ffffffff8158cd02
MiCode#12 [] ip_local_deliver_finish at ffffffff815668f4
MiCode#13 [] ip_local_deliver at ffffffff81566bd9
MiCode#14 [] ip_rcv_finish at ffffffff8156656d
MiCode#15 [] ip_rcv at ffffffff81566f06
MiCode#16 [] __netif_receive_skb_core at ffffffff8152b3a2
MiCode#17 [] __netif_receive_skb at ffffffff8152b608
MiCode#18 [] netif_receive_skb at ffffffff8152b690
MiCode#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
MiCode#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
MiCode#21 [] net_rx_action at ffffffff8152bac2
MiCode#22 [] __do_softirq at ffffffff81084b4f
MiCode#23 [] call_softirq at ffffffff8164845c
MiCode#24 [] do_softirq at ffffffff81016fc5
MiCode#25 [] irq_exit at ffffffff81084ee5
MiCode#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxprzemo pushed a commit to maxprzemo/Xiaomi_Kernel_OpenSource that referenced this issue Feb 18, 2018
commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 MiCode#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
MiCode#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
MiCode#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
MiCode#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
MiCode#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
MiCode#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
MiCode#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
MiCode#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
MiCode#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
MiCode#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
MiCode#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
MiCode#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
MiCode#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
MiCode#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

[nborisov: backported to 4.4]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
maxprzemo pushed a commit to maxprzemo/Xiaomi_Kernel_OpenSource that referenced this issue Feb 18, 2018
commit ba80aa9 upstream.

This patch closes a long standing race in configfs between
the creation of a new symlink in create_link(), while the
symlink target's config_item is being concurrently removed
via configfs_rmdir().

This can happen because the symlink target's reference
is obtained by config_item_get() in create_link() before
the CONFIGFS_USET_DROPPING bit set by configfs_detach_prep()
during configfs_rmdir() shutdown is actually checked..

This originally manifested itself on ppc64 on v4.8.y under
heavy load using ibmvscsi target ports with Novalink API:

[ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added
[ 7879.893760] ------------[ cut here ]------------
[ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs]
[ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G           O 4.8.17-customv2.22 MiCode#12
[ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000
[ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870
[ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700   Tainted: G O     (4.8.17-customv2.22)
[ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28222242  XER: 00000000
[ 7879.893820] CFAR: d000000002c664bc SOFTE: 1
                GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820
                GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000
                GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80
                GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40
                GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940
                GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000
                GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490
                GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940
[ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs]
[ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
[ 7879.893842] Call Trace:
[ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
[ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460
[ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490
[ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170
[ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390
[ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec
[ 7879.893856] Instruction dump:
[ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000
[ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000
[ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]---

To close this race, go ahead and obtain the symlink's target
config_item reference only after the existing CONFIGFS_USET_DROPPING
check succeeds.

This way, if configfs_rmdir() wins create_link() will return -ENONET,
and if create_link() wins configfs_rmdir() will return -EBUSY.

Reported-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Goayandi pushed a commit to Goayandi/android_kernel_xiaomi_cappu that referenced this issue Apr 3, 2018
commit 45caeaa upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 MiCode#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 MiCode#9 [] tcp_rcv_established at ffffffff81580b64
MiCode#10 [] tcp_v4_do_rcv at ffffffff8158b54a
MiCode#11 [] tcp_v4_rcv at ffffffff8158cd02
MiCode#12 [] ip_local_deliver_finish at ffffffff815668f4
MiCode#13 [] ip_local_deliver at ffffffff81566bd9
MiCode#14 [] ip_rcv_finish at ffffffff8156656d
MiCode#15 [] ip_rcv at ffffffff81566f06
MiCode#16 [] __netif_receive_skb_core at ffffffff8152b3a2
MiCode#17 [] __netif_receive_skb at ffffffff8152b608
MiCode#18 [] netif_receive_skb at ffffffff8152b690
MiCode#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
MiCode#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
MiCode#21 [] net_rx_action at ffffffff8152bac2
MiCode#22 [] __do_softirq at ffffffff81084b4f
MiCode#23 [] call_softirq at ffffffff8164845c
MiCode#24 [] do_softirq at ffffffff81016fc5
MiCode#25 [] irq_exit at ffffffff81084ee5
MiCode#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Goayandi pushed a commit to Goayandi/android_kernel_xiaomi_cappu that referenced this issue Apr 3, 2018
commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 MiCode#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
MiCode#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
MiCode#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
MiCode#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
MiCode#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
MiCode#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
MiCode#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
MiCode#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
MiCode#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
MiCode#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
MiCode#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
MiCode#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
MiCode#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
MiCode#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

[nborisov: backported to 4.4]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Goayandi pushed a commit to Goayandi/android_kernel_xiaomi_cappu that referenced this issue Apr 3, 2018
commit 45caeaa upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 MiCode#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 MiCode#9 [] tcp_rcv_established at ffffffff81580b64
MiCode#10 [] tcp_v4_do_rcv at ffffffff8158b54a
MiCode#11 [] tcp_v4_rcv at ffffffff8158cd02
MiCode#12 [] ip_local_deliver_finish at ffffffff815668f4
MiCode#13 [] ip_local_deliver at ffffffff81566bd9
MiCode#14 [] ip_rcv_finish at ffffffff8156656d
MiCode#15 [] ip_rcv at ffffffff81566f06
MiCode#16 [] __netif_receive_skb_core at ffffffff8152b3a2
MiCode#17 [] __netif_receive_skb at ffffffff8152b608
MiCode#18 [] netif_receive_skb at ffffffff8152b690
MiCode#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
MiCode#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
MiCode#21 [] net_rx_action at ffffffff8152bac2
MiCode#22 [] __do_softirq at ffffffff81084b4f
MiCode#23 [] call_softirq at ffffffff8164845c
MiCode#24 [] do_softirq at ffffffff81016fc5
MiCode#25 [] irq_exit at ffffffff81084ee5
MiCode#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Goayandi pushed a commit to Goayandi/android_kernel_xiaomi_cappu that referenced this issue Apr 3, 2018
commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 MiCode#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
MiCode#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
MiCode#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
MiCode#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
MiCode#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
MiCode#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
MiCode#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
MiCode#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
MiCode#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
MiCode#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
MiCode#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
MiCode#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
MiCode#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
MiCode#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

[nborisov: backported to 4.4]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Goayandi pushed a commit to Goayandi/android_kernel_xiaomi_cappu that referenced this issue Apr 4, 2018
commit ba80aa9 upstream.

This patch closes a long standing race in configfs between
the creation of a new symlink in create_link(), while the
symlink target's config_item is being concurrently removed
via configfs_rmdir().

This can happen because the symlink target's reference
is obtained by config_item_get() in create_link() before
the CONFIGFS_USET_DROPPING bit set by configfs_detach_prep()
during configfs_rmdir() shutdown is actually checked..

This originally manifested itself on ppc64 on v4.8.y under
heavy load using ibmvscsi target ports with Novalink API:

[ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added
[ 7879.893760] ------------[ cut here ]------------
[ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs]
[ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G           O 4.8.17-customv2.22 MiCode#12
[ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000
[ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870
[ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700   Tainted: G O     (4.8.17-customv2.22)
[ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28222242  XER: 00000000
[ 7879.893820] CFAR: d000000002c664bc SOFTE: 1
                GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820
                GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000
                GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80
                GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40
                GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940
                GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000
                GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490
                GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940
[ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs]
[ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
[ 7879.893842] Call Trace:
[ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
[ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460
[ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490
[ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170
[ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390
[ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec
[ 7879.893856] Instruction dump:
[ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000
[ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000
[ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]---

To close this race, go ahead and obtain the symlink's target
config_item reference only after the existing CONFIGFS_USET_DROPPING
check succeeds.

This way, if configfs_rmdir() wins create_link() will return -ENONET,
and if create_link() wins configfs_rmdir() will return -EBUSY.

Reported-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Goayandi pushed a commit to Goayandi/android_kernel_xiaomi_cappu that referenced this issue Apr 13, 2018
commit 45caeaa upstream.

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 MiCode#8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 MiCode#9 [] tcp_rcv_established at ffffffff81580b64
MiCode#10 [] tcp_v4_do_rcv at ffffffff8158b54a
MiCode#11 [] tcp_v4_rcv at ffffffff8158cd02
MiCode#12 [] ip_local_deliver_finish at ffffffff815668f4
MiCode#13 [] ip_local_deliver at ffffffff81566bd9
MiCode#14 [] ip_rcv_finish at ffffffff8156656d
MiCode#15 [] ip_rcv at ffffffff81566f06
MiCode#16 [] __netif_receive_skb_core at ffffffff8152b3a2
MiCode#17 [] __netif_receive_skb at ffffffff8152b608
MiCode#18 [] netif_receive_skb at ffffffff8152b690
MiCode#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
MiCode#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
MiCode#21 [] net_rx_action at ffffffff8152bac2
MiCode#22 [] __do_softirq at ffffffff81084b4f
MiCode#23 [] call_softirq at ffffffff8164845c
MiCode#24 [] do_softirq at ffffffff81016fc5
MiCode#25 [] irq_exit at ffffffff81084ee5
MiCode#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Goayandi pushed a commit to Goayandi/android_kernel_xiaomi_cappu that referenced this issue Apr 13, 2018
commit 4dfce57 upstream.

There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
    [exception RIP: xfs_trans_log_inode+0x10]
 MiCode#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
MiCode#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
MiCode#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
MiCode#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
MiCode#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
MiCode#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
MiCode#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
MiCode#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
MiCode#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
MiCode#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
MiCode#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
MiCode#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
MiCode#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
MiCode#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d

As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate.  When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.

This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.

Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.

Thanks to dchinner for looking at this with me and spotting the
root cause.

[nborisov: backported to 4.4]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Goayandi pushed a commit to Goayandi/android_kernel_xiaomi_cappu that referenced this issue Apr 14, 2018
commit ba80aa9 upstream.

This patch closes a long standing race in configfs between
the creation of a new symlink in create_link(), while the
symlink target's config_item is being concurrently removed
via configfs_rmdir().

This can happen because the symlink target's reference
is obtained by config_item_get() in create_link() before
the CONFIGFS_USET_DROPPING bit set by configfs_detach_prep()
during configfs_rmdir() shutdown is actually checked..

This originally manifested itself on ppc64 on v4.8.y under
heavy load using ibmvscsi target ports with Novalink API:

[ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added
[ 7879.893760] ------------[ cut here ]------------
[ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs]
[ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G           O 4.8.17-customv2.22 MiCode#12
[ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000
[ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870
[ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700   Tainted: G O     (4.8.17-customv2.22)
[ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28222242  XER: 00000000
[ 7879.893820] CFAR: d000000002c664bc SOFTE: 1
                GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820
                GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000
                GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80
                GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40
                GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940
                GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000
                GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490
                GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940
[ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs]
[ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
[ 7879.893842] Call Trace:
[ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs]
[ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460
[ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490
[ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170
[ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390
[ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec
[ 7879.893856] Instruction dump:
[ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000
[ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000
[ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]---

To close this race, go ahead and obtain the symlink's target
config_item reference only after the existing CONFIGFS_USET_DROPPING
check succeeds.

This way, if configfs_rmdir() wins create_link() will return -ENONET,
and if create_link() wins configfs_rmdir() will return -EBUSY.

Reported-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Goayandi pushed a commit to Goayandi/android_kernel_xiaomi_cappu that referenced this issue Aug 12, 2018
[ Upstream commit 2bbea6e ]

when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 MiCode#1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 MiCode#2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 MiCode#3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 MiCode#4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 MiCode#5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 MiCode#6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 MiCode#7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 MiCode#8 [ffff880078447da8] mount_bdev at ffffffff81202570
 MiCode#9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
MiCode#10 [ffff880078447e28] mount_fs at ffffffff81202d09
MiCode#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
MiCode#12 [ffff880078447ea8] do_mount at ffffffff81220fee
MiCode#13 [ffff880078447f28] sys_mount at ffffffff812218d6
MiCode#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 MiCode#1 [ffff880078417900] schedule at ffffffff8168dc59
 MiCode#2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 MiCode#3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 MiCode#4 [ffff8800784179d0] down_read at ffffffff8168cde0
 MiCode#5 [ffff8800784179e8] get_super at ffffffff81201cc7
 MiCode#6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 MiCode#7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 MiCode#8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 MiCode#9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
MiCode#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
MiCode#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
MiCode#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
MiCode#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
MiCode#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
MiCode#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
MiCode#16 [ffff880078417d00] do_last at ffffffff8120d53d
MiCode#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
MiCode#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
MiCode#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
MiCode#20 [ffff880078417f70] sys_open at ffffffff811fde4e
MiCode#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ahmed-Hady pushed a commit to Ahmed-Hady/android_xiaomi_daisy that referenced this issue Nov 18, 2018
[ Upstream commit 2bbea6e ]

when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 MiCode#2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 MiCode#3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 MiCode#4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 MiCode#5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 MiCode#6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 MiCode#7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 MiCode#8 [ffff880078447da8] mount_bdev at ffffffff81202570
 MiCode#9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
MiCode#10 [ffff880078447e28] mount_fs at ffffffff81202d09
MiCode#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
MiCode#12 [ffff880078447ea8] do_mount at ffffffff81220fee
MiCode#13 [ffff880078447f28] sys_mount at ffffffff812218d6
MiCode#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 #1 [ffff880078417900] schedule at ffffffff8168dc59
 MiCode#2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 MiCode#3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 MiCode#4 [ffff8800784179d0] down_read at ffffffff8168cde0
 MiCode#5 [ffff8800784179e8] get_super at ffffffff81201cc7
 MiCode#6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 MiCode#7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 MiCode#8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 MiCode#9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
MiCode#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
MiCode#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
MiCode#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
MiCode#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
MiCode#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
MiCode#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
MiCode#16 [ffff880078417d00] do_last at ffffffff8120d53d
MiCode#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
MiCode#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
MiCode#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
MiCode#20 [ffff880078417f70] sys_open at ffffffff811fde4e
MiCode#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ahmed-Hady pushed a commit to Ahmed-Hady/android_xiaomi_daisy that referenced this issue Dec 19, 2018
commit a007734 upstream.

The bus master was not removed after unloading the module
or unbinding the driver. That lead to oopses like this

[  127.842987] Unable to handle kernel paging request at virtual address bf01d04c
[  127.850646] pgd = 70e3cd9a
[  127.853698] [bf01d04c] *pgd=8f908811, *pte=00000000, *ppte=00000000
[  127.860412] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM
[  127.866668] Modules linked in: bq27xxx_battery overlay [last unloaded: omap_hdq]
[  127.874542] CPU: 0 PID: 1022 Comm: w1_bus_master1 Not tainted 4.19.0-rc4-00001-g2d51da718324 MiCode#12
[  127.883819] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[  127.890441] PC is at 0xbf01d04c
[  127.893798] LR is at w1_search_process_cb+0x4c/0xfc
[  127.898956] pc : [<bf01d04c>]    lr : [<c05f9580>]    psr: a0070013
[  127.905609] sp : cf885f48  ip : bf01d04c  fp : ddf1e11c
[  127.911132] r10: cf8fe040  r9 : c05f8d00  r8 : cf8fe040
[  127.916656] r7 : 000000f0  r6 : cf8fe02c  r5 : cf8fe000  r4 : cf8fe01c
[  127.923553] r3 : c05f8d00  r2 : 000000f0  r1 : cf8fe000  r0 : dde1ef10
[  127.930450] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  127.938018] Control: 10c5387d  Table: 8f8f0019  DAC: 00000051
[  127.944091] Process w1_bus_master1 (pid: 1022, stack limit = 0x9135699f)
[  127.951171] Stack: (0xcf885f48 to 0xcf886000)
[  127.955810] 5f40:                   cf8fe000 00000000 cf884000 cf8fe090 000003e8 c05f8d00
[  127.964477] 5f60: dde5fc34 c05f9700 ddf1e100 ddf1e540 cf884000 cf8fe000 c05f9694 00000000
[  127.973114] 5f80: dde5fc34 c01499a4 00000000 ddf1e540 c0149874 00000000 00000000 00000000
[  127.981781] 5fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
[  127.990447] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  127.999114] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[  128.007781] [<c05f9580>] (w1_search_process_cb) from [<c05f9700>] (w1_process+0x6c/0x118)
[  128.016479] [<c05f9700>] (w1_process) from [<c01499a4>] (kthread+0x130/0x148)
[  128.024047] [<c01499a4>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[  128.031677] Exception stack(0xcf885fb0 to 0xcf885ff8)
[  128.037017] 5fa0:                                     00000000 00000000 00000000 00000000
[  128.045684] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  128.054351] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  128.061340] Code: bad PC value
[  128.064697] ---[ end trace af066e33c0e14119 ]---

Cc: <stable@vger.kernel.org>
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ahmed-Hady pushed a commit to Ahmed-Hady/android_xiaomi_daisy that referenced this issue Dec 19, 2018
[ Upstream commit c5a94f4 ]

It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().

At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.

When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared.  This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared

There is some uncertainty in this analysis, but it seems to be fit the
observations.  Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.

Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.

The backtrace for the blocked process looked like:

PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
 #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
 MiCode#2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
 MiCode#3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
 MiCode#4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
 MiCode#5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
 MiCode#6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
 MiCode#7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
 MiCode#8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
 MiCode#9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
MiCode#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
MiCode#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
MiCode#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
MiCode#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Azizb750 pushed a commit to Azizb750/android_kernel_xiaomi_hermes-1 that referenced this issue Jan 16, 2019
When we try to open a file with O_TMPFILE flag, we will trigger a bug.
The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and
this check always fails because we set ->i_nlink = 1 in
inode_init_always().  We can use the following program to trigger it:

int main(int argc, char *argv[])
{
	int fd;

	fd = open(argv[1], O_TMPFILE, 0666);
	if (fd < 0) {
		perror("open ");
		return -1;
	}
	close(fd);
	return 0;
}

The oops message looks like this:

kernel BUG at fs/ext4/namei.c:2572!
invalid opcode: 0000 [MiCode#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: dlci bridge stp hidp cmtp kernelcapi l2tp_ppp l2tp_netlink l2tp_core sctp libcrc32c rfcomm tun fuse nfnetli
nk can_raw ipt_ULOG can_bcm x25 scsi_transport_iscsi ipx p8023 p8022 appletalk phonet psnap vmw_vsock_vmci_transport af_key vmw_vmci rose vsock atm can netrom ax25 af_rxrpc ir
da pppoe pppox ppp_generic slhc bluetooth nfc rfkill rds caif_socket caif crc_ccitt af_802154 llc2 llc snd_hda_codec_realtek snd_hda_intel snd_hda_codec serio_raw snd_pcm pcsp
kr edac_core snd_page_alloc snd_timer snd soundcore r8169 mii sr_mod cdrom pata_atiixp radeon backlight drm_kms_helper ttm
CPU: 1 PID: 1812571 Comm: trinity-child2 Not tainted 3.11.0-rc1+ MiCode#12
Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
task: ffff88007dfe69a0 ti: ffff88010f7b6000 task.ti: ffff88010f7b6000
RIP: 0010:[<ffffffff8125ce69>]  [<ffffffff8125ce69>] ext4_orphan_add+0x299/0x2b0
RSP: 0018:ffff88010f7b7cf8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8800966d3020 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88007dfe70b8 RDI: 0000000000000001
RBP: ffff88010f7b7d40 R08: ffff880126a3c4e0 R09: ffff88010f7b7ca0
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801271fd668
R13: ffff8800966d2f78 R14: ffff88011d7089f0 R15: ffff88007dfe69a0
FS:  00007f70441a3740(0000) GS:ffff88012a800000(0000) knlGS:00000000f77c96c0
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000002834000 CR3: 0000000107964000 CR4: 00000000000007e0
DR0: 0000000000780000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Stack:
 0000000000002000 00000020810b6dde 0000000000000000 ffff88011d46db00
 ffff8800966d3020 ffff88011d7089f0 ffff88009c7f4c10 ffff88010f7b7f2c
 ffff88007dfe69a0 ffff88010f7b7da8 ffffffff8125cfac ffff880100000004
Call Trace:
 [<ffffffff8125cfac>] ext4_tmpfile+0x12c/0x180
 [<ffffffff811cba78>] path_openat+0x238/0x700
 [<ffffffff8100afc4>] ? native_sched_clock+0x24/0x80
 [<ffffffff811cc647>] do_filp_open+0x47/0xa0
 [<ffffffff811db73f>] ? __alloc_fd+0xaf/0x200
 [<ffffffff811ba2e4>] do_sys_open+0x124/0x210
 [<ffffffff81010725>] ? syscall_trace_enter+0x25/0x290
 [<ffffffff811ba3ee>] SyS_open+0x1e/0x20
 [<ffffffff816ca8d4>] tracesys+0xdd/0xe2
 [<ffffffff81001001>] ? start_thread_common.constprop.6+0x1/0xa0
Code: 04 00 00 00 89 04 24 31 c0 e8 c4 77 04 00 e9 43 fe ff ff 66 25 00 d0 66 3d 00 80 0f 84 0e fe ff ff 83 7b 48 00 0f 84 04 fe ff ff <0f> 0b 49 8b 8c 24 50 07 00 00 e9 88 fe ff ff 0f 1f 84 00 00 00

Here we couldn't call clear_nlink() directly because in d_tmpfile() we
will call inode_dec_link_count() to decrease ->i_nlink.  So this commit
tries to call d_tmpfile() before ext4_orphan_add() to fix this problem.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Ahmed-Hady pushed a commit to Ahmed-Hady/android_xiaomi_daisy that referenced this issue May 10, 2019
[ Upstream commit d982b33 ]

  =================================================================
  ==20875==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 1160 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      #1 0x55bd50005599 in zalloc util/util.h:23
      MiCode#2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327
      MiCode#3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216
      MiCode#4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69
      MiCode#5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358
      MiCode#6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388
      MiCode#7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583
      MiCode#8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722
      MiCode#9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      MiCode#10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      MiCode#11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      MiCode#12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520
      MiCode#13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 19 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      #1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields")
Link: http://lkml.kernel.org/r/20190316080556.3075-17-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Ahmed-Hady pushed a commit to Ahmed-Hady/android_xiaomi_daisy that referenced this issue Jun 11, 2019
[ Upstream commit 5712f33 ]

The spinlock in the raw3270_view structure is used by con3270, tty3270
and fs3270 in different ways. For con3270 the lock can be acquired in
irq context, for tty3270 and fs3270 the highest context is bh.

Lockdep sees the view->lock as a single class and if the 3270 driver
is used for the console the following message is generated:

WARNING: inconsistent lock state
5.1.0-rc3-05157-g5c168033979d MiCode#12 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
swapper/0/1 [HC0[0]:SC1[1]:HE1:SE0] takes:
(____ptrval____) (&(&view->lock)->rlock){?.-.}, at: tty3270_update+0x7c/0x330

Introduce a lockdep subclass for the view lock to distinguish bh from
irq locks.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

Signed-off-by: Sasha Levin <sashal@kernel.org>
pix106 pushed a commit to pix106/android_kernel_xiaomi that referenced this issue Jun 15, 2019
[ Upstream commit 42dfa45 ]

Using gcc's ASan, Changbin reports:

  =================================================================
  ==7494==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 48 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      MiCode#1 0x5625e5330a5e in zalloc util/util.h:23
      MiCode#2 0x5625e5330a9b in perf_counts__new util/counts.c:10
      MiCode#3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      MiCode#4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      MiCode#5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      MiCode#6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      MiCode#7 0x5625e51528e6 in run_test tests/builtin-test.c:358
      MiCode#8 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      MiCode#9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      MiCode#10 0x5625e515572f in cmd_test tests/builtin-test.c:722
      MiCode#11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      MiCode#12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      MiCode#13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      MiCode#14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      MiCode#15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 72 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      MiCode#1 0x5625e532560d in zalloc util/util.h:23
      MiCode#2 0x5625e532566b in xyarray__new util/xyarray.c:10
      MiCode#3 0x5625e5330aba in perf_counts__new util/counts.c:15
      MiCode#4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
      MiCode#5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
      MiCode#6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
      MiCode#7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
      MiCode#8 0x5625e51528e6 in run_test tests/builtin-test.c:358
      MiCode#9 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      MiCode#10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      MiCode#11 0x5625e515572f in cmd_test tests/builtin-test.c:722
      MiCode#12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      MiCode#13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      MiCode#14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      MiCode#15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      MiCode#16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

His patch took care of evsel->prev_raw_counts, but the above backtraces
are about evsel->counts, so fix that instead.

Reported-by: Changbin Du <changbin.du@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/n/tip-hd1x13g59f0nuhe4anxhsmfp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
pix106 pushed a commit to pix106/android_kernel_xiaomi that referenced this issue Jun 15, 2019
…_event_on_all_cpus test

[ Upstream commit 93faa52 ]

  =================================================================
  ==7497==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 40 byte(s) in 1 object(s) allocated from:
      #0 0x7f0333a88f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      MiCode#1 0x5625e5326213 in cpu_map__trim_new util/cpumap.c:45
      MiCode#2 0x5625e5326703 in cpu_map__read util/cpumap.c:103
      MiCode#3 0x5625e53267ef in cpu_map__read_all_cpu_map util/cpumap.c:120
      MiCode#4 0x5625e5326915 in cpu_map__new util/cpumap.c:135
      MiCode#5 0x5625e517b355 in test__openat_syscall_event_on_all_cpus tests/openat-syscall-all-cpus.c:36
      MiCode#6 0x5625e51528e6 in run_test tests/builtin-test.c:358
      MiCode#7 0x5625e5152baf in test_and_print tests/builtin-test.c:388
      MiCode#8 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
      MiCode#9 0x5625e515572f in cmd_test tests/builtin-test.c:722
      MiCode#10 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      MiCode#11 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      MiCode#12 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      MiCode#13 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
      MiCode#14 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: f30a79b ("perf tools: Add reference counting for cpu_map object")
Link: http://lkml.kernel.org/r/20190316080556.3075-15-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
pix106 pushed a commit to pix106/android_kernel_xiaomi that referenced this issue Jun 15, 2019
[ Upstream commit d982b33 ]

  =================================================================
  ==20875==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 1160 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      MiCode#1 0x55bd50005599 in zalloc util/util.h:23
      MiCode#2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327
      MiCode#3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216
      MiCode#4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69
      MiCode#5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358
      MiCode#6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388
      MiCode#7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583
      MiCode#8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722
      MiCode#9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      MiCode#10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      MiCode#11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      MiCode#12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520
      MiCode#13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 19 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      MiCode#1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields")
Link: http://lkml.kernel.org/r/20190316080556.3075-17-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
pix106 pushed a commit to pix106/android_kernel_xiaomi that referenced this issue Jun 15, 2019
[ Upstream commit 5712f33 ]

The spinlock in the raw3270_view structure is used by con3270, tty3270
and fs3270 in different ways. For con3270 the lock can be acquired in
irq context, for tty3270 and fs3270 the highest context is bh.

Lockdep sees the view->lock as a single class and if the 3270 driver
is used for the console the following message is generated:

WARNING: inconsistent lock state
5.1.0-rc3-05157-g5c168033979d MiCode#12 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
swapper/0/1 [HC0[0]:SC1[1]:HE1:SE0] takes:
(____ptrval____) (&(&view->lock)->rlock){?.-.}, at: tty3270_update+0x7c/0x330

Introduce a lockdep subclass for the view lock to distinguish bh from
irq locks.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

Signed-off-by: Sasha Levin <sashal@kernel.org>
acervenky pushed a commit to acervenky/quax_raphael-q that referenced this issue Dec 7, 2019
[ Upstream commit 5712f33 ]

The spinlock in the raw3270_view structure is used by con3270, tty3270
and fs3270 in different ways. For con3270 the lock can be acquired in
irq context, for tty3270 and fs3270 the highest context is bh.

Lockdep sees the view->lock as a single class and if the 3270 driver
is used for the console the following message is generated:

WARNING: inconsistent lock state
5.1.0-rc3-05157-g5c168033979d MiCode#12 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
swapper/0/1 [HC0[0]:SC1[1]:HE1:SE0] takes:
(____ptrval____) (&(&view->lock)->rlock){?.-.}, at: tty3270_update+0x7c/0x330

Introduce a lockdep subclass for the view lock to distinguish bh from
irq locks.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

Signed-off-by: Sasha Levin <sashal@kernel.org>
acervenky pushed a commit to acervenky/quax_raphael-q that referenced this issue Dec 7, 2019
commit d0a255e upstream.

A deadlock with this stacktrace was observed.

The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio
shrinker and the shrinker depends on I/O completion in the dm-bufio
subsystem.

In order to fix the deadlock (and other similar ones), we set the flag
PF_MEMALLOC_NOIO at loop thread entry.

PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
   #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
   MiCode#1 [ffff8813dedfb990] schedule at ffffffff8173fa27
   MiCode#2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
   MiCode#3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
   MiCode#4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
   MiCode#5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
   MiCode#6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
   MiCode#7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
   MiCode#8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
   MiCode#9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
  MiCode#10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
  MiCode#11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
  MiCode#12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
  MiCode#13 [ffff8813dedfbec0] kthread at ffffffff810a8428
  MiCode#14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242

  PID: 14127  TASK: ffff881455749c00  CPU: 11  COMMAND: "loop1"
   #0 [ffff88272f5af228] __schedule at ffffffff8173f405
   MiCode#1 [ffff88272f5af280] schedule at ffffffff8173fa27
   MiCode#2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e
   MiCode#3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5
   MiCode#4 [ffff88272f5af330] mutex_lock at ffffffff81742133
   MiCode#5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio]
   MiCode#6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd
   MiCode#7 [ffff88272f5af470] shrink_zone at ffffffff811ad778
   MiCode#8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34
   MiCode#9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8
  MiCode#10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3
  MiCode#11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71
  MiCode#12 [ffff88272f5af760] new_slab at ffffffff811f4523
  MiCode#13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5
  MiCode#14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b
  MiCode#15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3
  MiCode#16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3
  MiCode#17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs]
  MiCode#18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994
  MiCode#19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs]
  MiCode#20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop]
  MiCode#21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop]
  MiCode#22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c
  MiCode#23 [ffff88272f5afec0] kthread at ffffffff810a8428
  MiCode#24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
acervenky pushed a commit to acervenky/quax_raphael-q that referenced this issue Dec 7, 2019
commit cf3591e upstream.

Revert the commit bd293d0. The proper
fix has been made available with commit d0a255e ("loop: set
PF_MEMALLOC_NOIO for the worker thread").

Note that the fix offered by commit bd293d0 doesn't really prevent
the deadlock from occuring - if we look at the stacktrace reported by
Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex -
i.e. it has already successfully taken the mutex. Changing the mutex
from mutex_lock to mutex_trylock won't help with deadlocks that happen
afterwards.

PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
   #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
   MiCode#1 [ffff8813dedfb990] schedule at ffffffff8173fa27
   MiCode#2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
   MiCode#3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
   MiCode#4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
   MiCode#5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
   MiCode#6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
   MiCode#7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
   MiCode#8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
   MiCode#9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
  MiCode#10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
  MiCode#11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
  MiCode#12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
  MiCode#13 [ffff8813dedfbec0] kthread at ffffffff810a8428
  MiCode#14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: bd293d0 ("dm bufio: fix deadlock with loop device")
Depends-on: d0a255e ("loop: set PF_MEMALLOC_NOIO for the worker thread")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Inkypen79 pushed a commit to Inkypen79/kernel_xiaomi_andromeda_old that referenced this issue Jan 15, 2020
[ Upstream commit c5a94f4 ]

It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().

At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.

When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared.  This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared

There is some uncertainty in this analysis, but it seems to be fit the
observations.  Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.

Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.

The backtrace for the blocked process looked like:

PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
 #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
 #1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
 MiCode#2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
 MiCode#3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
 MiCode#4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
 MiCode#5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
 MiCode#6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
 MiCode#7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
 MiCode#8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
 MiCode#9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
MiCode#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
MiCode#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
MiCode#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
MiCode#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
pix106 pushed a commit to pix106/android_kernel_xiaomi that referenced this issue Aug 1, 2020
[ Upstream commit 46ef5b8 ]

KASAN report null-ptr-deref error when register_netdev() failed:

KASAN: null-ptr-deref in range [0x00000000000003c0-0x00000000000003c7]
CPU: 2 PID: 422 Comm: ip Not tainted 5.8.0-rc4+ MiCode#12
Call Trace:
 ip6gre_init_net+0x4ab/0x580
 ? ip6gre_tunnel_uninit+0x3f0/0x3f0
 ops_init+0xa8/0x3c0
 setup_net+0x2de/0x7e0
 ? rcu_read_lock_bh_held+0xb0/0xb0
 ? ops_init+0x3c0/0x3c0
 ? kasan_unpoison_shadow+0x33/0x40
 ? __kasan_kmalloc.constprop.0+0xc2/0xd0
 copy_net_ns+0x27d/0x530
 create_new_namespaces+0x382/0xa30
 unshare_nsproxy_namespaces+0xa1/0x1d0
 ksys_unshare+0x39c/0x780
 ? walk_process_tree+0x2a0/0x2a0
 ? trace_hardirqs_on+0x4a/0x1b0
 ? _raw_spin_unlock_irq+0x1f/0x30
 ? syscall_trace_enter+0x1a7/0x330
 ? do_syscall_64+0x1c/0xa0
 __x64_sys_unshare+0x2d/0x40
 do_syscall_64+0x56/0xa0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

ip6gre_tunnel_uninit() has set 'ign->fb_tunnel_dev' to NULL, later
access to ign->fb_tunnel_dev cause null-ptr-deref. Fix it by saving
'ign->fb_tunnel_dev' to local variable ndev.

Fixes: dafabb6 ("ip6_gre: fix use-after-free in ip6gre_tunnel_lookup()")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pix106 pushed a commit to pix106/android_kernel_xiaomi that referenced this issue Aug 24, 2020
[ Upstream commit e24c644 ]

I compiled with AddressSanitizer and I had these memory leaks while I
was using the tep_parse_format function:

    Direct leak of 28 byte(s) in 4 object(s) allocated from:
        #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
        MiCode#1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985
        MiCode#2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140
        MiCode#3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206
        MiCode#4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291
        MiCode#5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299
        MiCode#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849
        MiCode#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161
        MiCode#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207
        MiCode#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786
        MiCode#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285
        MiCode#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369
        MiCode#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335
        MiCode#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389
        MiCode#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431
        MiCode#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251
        MiCode#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284
        MiCode#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593
        MiCode#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727
        MiCode#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048
        MiCode#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127
        MiCode#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152
        MiCode#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252
        MiCode#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347
        MiCode#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461
        MiCode#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673
        MiCode#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

The token variable in the process_dynamic_array_len function is
allocated in the read_expect_type function, but is not freed before
calling the read_token function.

Free the token variable before calling read_token in order to plug the
leak.

Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
pix106 pushed a commit to pix106/android_kernel_xiaomi that referenced this issue Oct 1, 2020
[ Upstream commit d26383d ]

The following leaks were detected by ASAN:

  Indirect leak of 360 byte(s) in 9 object(s) allocated from:
    #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
    MiCode#1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333
    MiCode#2 0x560578f752fc in perf_pmu_parse util/pmu.y:59
    MiCode#3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73
    MiCode#4 0x560578e07045 in test__pmu tests/pmu.c:155
    MiCode#5 0x560578de109b in run_test tests/builtin-test.c:410
    MiCode#6 0x560578de109b in test_and_print tests/builtin-test.c:440
    MiCode#7 0x560578de401a in __cmd_test tests/builtin-test.c:661
    MiCode#8 0x560578de401a in cmd_test tests/builtin-test.c:807
    MiCode#9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    MiCode#10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    MiCode#11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    MiCode#12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    MiCode#13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: cff7f95 ("perf tests: Move pmu tests into separate object")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
guozhiye pushed a commit to guozhiye/sirius_Kernel that referenced this issue Nov 30, 2020
While I play with lockdep, I encountered below warning message.

The patch[1] removed [get|put]_online_cpus in rebuild_sched_domains_locked
but introduced cpu_hotplug_mutex_held check. I assume the author wanted to
rely on cpu_hotplug.lock through get_onlline_cpus to protect *some race*.
However, cpu_hotplug_mutex is already released before get_online_cpus
returning so it seems to be bug.
Thus, this patch removes the pointless assertion.

Please check this with the owner what he want to *do* to protect *something*.

[    5.564220] WARNING: CPU: 7 PID: 592 at /usr/local/google/home/minchan/nvme/kernel/crosshatch-kernel/private/msm-google/kernel/cpu.c:252 cpu_hotplug_mutex_held+0x30/0x3c
[    5.564227] Modules linked in:
[    5.564240] CPU: 7 PID: 592 Comm: init Tainted: G        W       4.9.140-ge3e51f4d9b5b-dirty_audio-g3dce958 MiCode#12
[    5.564246] Hardware name: Google Inc. MSM sdm845 C1 Proto v1.0 (DT)
[    5.564253] task: fffffff967a99b00 task.stack: fffffff9665a0000
[    5.564260] PC is at cpu_hotplug_mutex_held+0x30/0x3c
[    5.564267] LR is at cpu_hotplug_mutex_held+0x2c/0x3c
[    5.564273] pc : [<ffffff98366ade64>] lr : [<ffffff98366ade60>] pstate: 60400145
[    5.564279] sp : fffffff9665a3c80
[    5.564285] x29: fffffff9665a3c80 x28: ffffff5d80a853a0
..
..
[    5.565118] [<ffffff98366ade64>] cpu_hotplug_mutex_held+0x30/0x3c
[    5.565126] [<ffffff9836784e60>] rebuild_sched_domains_unlocked+0x20/0x4dc
[    5.565133] [<ffffff9836785fc8>] cpuset_write_resmask+0x918/0xcb4
[    5.565140] [<ffffff9836778578>] cgroup_file_write+0x44/0x144
[    5.565148] [<ffffff98368e998c>] kernfs_fop_write+0xcc/0x1d8
[    5.565155] [<ffffff9836856aec>] vfs_write+0xb8/0x1d4
[    5.565161] [<ffffff9836857eb0>] SyS_write+0x50/0xb0
[    5.565168] [<ffffff9836683a00>] el0_svc_naked+0x34/0x38

[1] e9e7b75,  cgroup/cpuset: remove circular dependency deadlock
Bug: 122098329
Change-Id: I00a68c403e261545a07bc13343ef4e7e2ed218b0
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
guozhiye pushed a commit to guozhiye/sirius_Kernel that referenced this issue Dec 2, 2020
While I play with lockdep, I encountered below warning message.

The patch[1] removed [get|put]_online_cpus in rebuild_sched_domains_locked
but introduced cpu_hotplug_mutex_held check. I assume the author wanted to
rely on cpu_hotplug.lock through get_onlline_cpus to protect *some race*.
However, cpu_hotplug_mutex is already released before get_online_cpus
returning so it seems to be bug.
Thus, this patch removes the pointless assertion.

Please check this with the owner what he want to *do* to protect *something*.

[    5.564220] WARNING: CPU: 7 PID: 592 at /usr/local/google/home/minchan/nvme/kernel/crosshatch-kernel/private/msm-google/kernel/cpu.c:252 cpu_hotplug_mutex_held+0x30/0x3c
[    5.564227] Modules linked in:
[    5.564240] CPU: 7 PID: 592 Comm: init Tainted: G        W       4.9.140-ge3e51f4d9b5b-dirty_audio-g3dce958 MiCode#12
[    5.564246] Hardware name: Google Inc. MSM sdm845 C1 Proto v1.0 (DT)
[    5.564253] task: fffffff967a99b00 task.stack: fffffff9665a0000
[    5.564260] PC is at cpu_hotplug_mutex_held+0x30/0x3c
[    5.564267] LR is at cpu_hotplug_mutex_held+0x2c/0x3c
[    5.564273] pc : [<ffffff98366ade64>] lr : [<ffffff98366ade60>] pstate: 60400145
[    5.564279] sp : fffffff9665a3c80
[    5.564285] x29: fffffff9665a3c80 x28: ffffff5d80a853a0
..
..
[    5.565118] [<ffffff98366ade64>] cpu_hotplug_mutex_held+0x30/0x3c
[    5.565126] [<ffffff9836784e60>] rebuild_sched_domains_unlocked+0x20/0x4dc
[    5.565133] [<ffffff9836785fc8>] cpuset_write_resmask+0x918/0xcb4
[    5.565140] [<ffffff9836778578>] cgroup_file_write+0x44/0x144
[    5.565148] [<ffffff98368e998c>] kernfs_fop_write+0xcc/0x1d8
[    5.565155] [<ffffff9836856aec>] vfs_write+0xb8/0x1d4
[    5.565161] [<ffffff9836857eb0>] SyS_write+0x50/0xb0
[    5.565168] [<ffffff9836683a00>] el0_svc_naked+0x34/0x38

[1] e9e7b75,  cgroup/cpuset: remove circular dependency deadlock
Bug: 122098329
Change-Id: I00a68c403e261545a07bc13343ef4e7e2ed218b0
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
mi-code pushed a commit that referenced this issue Mar 3, 2021
https://bugzilla.kernel.org/show_bug.cgi?id=208565

PID: 257    TASK: ecdd0000  CPU: 0   COMMAND: "init"
  #0 [<c0b420ec>] (__schedule) from [<c0b423c8>]
  #1 [<c0b423c8>] (schedule) from [<c0b459d4>]
  #2 [<c0b459d4>] (rwsem_down_read_failed) from [<c0b44fa0>]
  #3 [<c0b44fa0>] (down_read) from [<c044233c>]
  #4 [<c044233c>] (f2fs_truncate_blocks) from [<c0442890>]
  #5 [<c0442890>] (f2fs_truncate) from [<c044d408>]
  #6 [<c044d408>] (f2fs_evict_inode) from [<c030be18>]
  #7 [<c030be18>] (evict) from [<c030a558>]
  #8 [<c030a558>] (iput) from [<c047c600>]
  #9 [<c047c600>] (f2fs_sync_node_pages) from [<c0465414>]
 #10 [<c0465414>] (f2fs_write_checkpoint) from [<c04575f4>]
 #11 [<c04575f4>] (f2fs_sync_fs) from [<c0441918>]
 #12 [<c0441918>] (f2fs_do_sync_file) from [<c0441098>]
 #13 [<c0441098>] (f2fs_sync_file) from [<c0323fa0>]
 #14 [<c0323fa0>] (vfs_fsync_range) from [<c0324294>]
 #15 [<c0324294>] (do_fsync) from [<c0324014>]
 #16 [<c0324014>] (sys_fsync) from [<c0108bc0>]

This can be caused by flush_dirty_inode() in f2fs_sync_node_pages() where
iput() requires f2fs_lock_op() again resulting in livelock.

Change-Id: I5d7ef35a21cdb074e7bf5288371f579bfc0eb19d
Reported-by: Zhiguo Niu <Zhiguo.Niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Git-commit: b0f3b87
Git-repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/
Signed-off-by: Sayali Lokhande <sayalil@codeaurora.org>
hectorvax pushed a commit to hectorvax/Xiaomi_Kernel_OpenSource that referenced this issue Jan 3, 2023
[ Upstream commit 46ef5b8 ]

KASAN report null-ptr-deref error when register_netdev() failed:

KASAN: null-ptr-deref in range [0x00000000000003c0-0x00000000000003c7]
CPU: 2 PID: 422 Comm: ip Not tainted 5.8.0-rc4+ MiCode#12
Call Trace:
 ip6gre_init_net+0x4ab/0x580
 ? ip6gre_tunnel_uninit+0x3f0/0x3f0
 ops_init+0xa8/0x3c0
 setup_net+0x2de/0x7e0
 ? rcu_read_lock_bh_held+0xb0/0xb0
 ? ops_init+0x3c0/0x3c0
 ? kasan_unpoison_shadow+0x33/0x40
 ? __kasan_kmalloc.constprop.0+0xc2/0xd0
 copy_net_ns+0x27d/0x530
 create_new_namespaces+0x382/0xa30
 unshare_nsproxy_namespaces+0xa1/0x1d0
 ksys_unshare+0x39c/0x780
 ? walk_process_tree+0x2a0/0x2a0
 ? trace_hardirqs_on+0x4a/0x1b0
 ? _raw_spin_unlock_irq+0x1f/0x30
 ? syscall_trace_enter+0x1a7/0x330
 ? do_syscall_64+0x1c/0xa0
 __x64_sys_unshare+0x2d/0x40
 do_syscall_64+0x56/0xa0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

ip6gre_tunnel_uninit() has set 'ign->fb_tunnel_dev' to NULL, later
access to ign->fb_tunnel_dev cause null-ptr-deref. Fix it by saving
'ign->fb_tunnel_dev' to local variable ndev.

Fixes: dafabb6 ("ip6_gre: fix use-after-free in ip6gre_tunnel_lookup()")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
hectorvax pushed a commit to hectorvax/Xiaomi_Kernel_OpenSource that referenced this issue Jan 3, 2023
[ Upstream commit e24c644 ]

I compiled with AddressSanitizer and I had these memory leaks while I
was using the tep_parse_format function:

    Direct leak of 28 byte(s) in 4 object(s) allocated from:
        #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
        MiCode#1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985
        MiCode#2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140
        MiCode#3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206
        MiCode#4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291
        MiCode#5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299
        MiCode#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849
        MiCode#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161
        MiCode#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207
        MiCode#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786
        MiCode#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285
        MiCode#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369
        MiCode#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335
        MiCode#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389
        MiCode#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431
        MiCode#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251
        MiCode#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284
        MiCode#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593
        MiCode#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727
        MiCode#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048
        MiCode#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127
        MiCode#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152
        MiCode#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252
        MiCode#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347
        MiCode#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461
        MiCode#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673
        MiCode#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

The token variable in the process_dynamic_array_len function is
allocated in the read_expect_type function, but is not freed before
calling the read_token function.

Free the token variable before calling read_token in order to plug the
leak.

Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
hectorvax pushed a commit to hectorvax/Xiaomi_Kernel_OpenSource that referenced this issue Jan 3, 2023
[ Upstream commit d26383d ]

The following leaks were detected by ASAN:

  Indirect leak of 360 byte(s) in 9 object(s) allocated from:
    #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
    MiCode#1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333
    MiCode#2 0x560578f752fc in perf_pmu_parse util/pmu.y:59
    MiCode#3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73
    MiCode#4 0x560578e07045 in test__pmu tests/pmu.c:155
    MiCode#5 0x560578de109b in run_test tests/builtin-test.c:410
    MiCode#6 0x560578de109b in test_and_print tests/builtin-test.c:440
    MiCode#7 0x560578de401a in __cmd_test tests/builtin-test.c:661
    MiCode#8 0x560578de401a in cmd_test tests/builtin-test.c:807
    MiCode#9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    MiCode#10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    MiCode#11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    MiCode#12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    MiCode#13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: cff7f95 ("perf tests: Move pmu tests into separate object")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
roniwae pushed a commit to roniwae/xiaomi_kernel that referenced this issue Dec 28, 2023
[ Upstream commit 5712f33 ]

The spinlock in the raw3270_view structure is used by con3270, tty3270
and fs3270 in different ways. For con3270 the lock can be acquired in
irq context, for tty3270 and fs3270 the highest context is bh.

Lockdep sees the view->lock as a single class and if the 3270 driver
is used for the console the following message is generated:

WARNING: inconsistent lock state
5.1.0-rc3-05157-g5c168033979d MiCode#12 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
swapper/0/1 [HC0[0]:SC1[1]:HE1:SE0] takes:
(____ptrval____) (&(&view->lock)->rlock){?.-.}, at: tty3270_update+0x7c/0x330

Introduce a lockdep subclass for the view lock to distinguish bh from
irq locks.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant