Error on building kernel #12

Xephi · 2015-04-02T17:49:27Z

Hi ! I've got that kind of error while trying to compile for cancro :/ :

arch/arm/mach-msm/smd_init_dt.c:24:25: fatal error: smd_private.h: No such file or directory
compilation terminated.
scripts/Makefile.build:307: recipe for target 'arch/arm/mach-msm/smd_init_dt.o' failed
make[1]: *** [arch/arm/mach-msm/smd_init_dt.o] Error 1
Makefile:953: recipe for target 'arch/arm/mach-msm' failed
make: *** [arch/arm/mach-msm] Error 2

but the file exist >.<

Fixed in this :
#13

If extent cache is disable, we will encounter oops when triggering direct IO as below: BUG: unable to handle kernel NULL pointer dereference at 0000000c IP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs] *pdpt = 000000002bb9a001 *pde = 0000000000000000 Oops: 0000 [MiCode#1] SMP Modules linked in: f2fs(O) fuse bnep rfcomm bluetooth nfsd dm_crypt nfs_acl auth_rpcgss oid_registry nfs binfmt_misc fscache lockd sunrpc grace snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore joydev psmouse hid_generic i2c_piix4 serio_raw ppdev mac_hid parport_pc lp parport ext4 jbd2 mbcache usbhid hid e1000 CPU: 3 PID: 3608 Comm: dd Tainted: G O 4.2.0-rc4 MiCode#12 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: ef161600 ti: ebd5e000 task.ti: ebd5e000 EIP: 0060:[<f0b9c61e>] EFLAGS: 00010202 CPU: 3 EIP is at f2fs_drop_largest_extent+0xe/0x30 [f2fs] EAX: 00000000 EBX: ddebc000 ECX: 00000000 EDX: 00000000 ESI: ebd5fdf8 EDI: 00000000 EBP: ebd5fd58 ESP: ebd5fd58 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 80050033 CR2: 0000000c CR3: 2c24ee40 CR4: 000006f0 Stack: ebd5fda4 f0b8c005 00000000 00000001 00000000 f0b8c430 c816cd68 ddebc000 ddebc088 00001000 00000555 00000555 ffffffff c160bb00 00055501 00000000 00000000 00000100 00000000 ebd5fe20 f0b8c430 00000046 ef161600 00001000 Call Trace: [<f0b8c005>] __allocate_data_block+0x1a5/0x260 [f2fs] [<f0b8c430>] ? f2fs_direct_IO+0x370/0x440 [f2fs] [<c160bb00>] ? down_read+0x30/0x50 [<f0b8c430>] f2fs_direct_IO+0x370/0x440 [f2fs] [<c113e115>] generic_file_direct_write+0xa5/0x260 [<c10b53f8>] ? current_fs_time+0x18/0x50 [<c113e38b>] __generic_file_write_iter+0xbb/0x210 [<c113e50f>] ? generic_file_write_iter+0x2f/0x320 [<c113e63c>] generic_file_write_iter+0x15c/0x320 [<f0b77f29>] f2fs_file_write_iter+0x39/0x80 [f2fs] [<c11984d9>] __vfs_write+0xa9/0xe0 [<c1199227>] vfs_write+0x97/0x180 [<c119955b>] SyS_write+0x5b/0xd0 [<c160dcd0>] sysenter_do_call+0x12/0x12 Code: 10 8b 50 1c 89 53 14 eb ca 8d 74 26 00 85 f6 74 86 eb a6 0f 0b 90 8d b4 26 00 00 00 00 55 89 e5 3e 8d 74 26 00 8b 80 d4 02 00 00 <8b> 48 0c 39 d1 77 0e 03 48 14 39 ca 73 07 c7 40 14 00 00 00 00 EIP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs] SS:ESP 0068:ebd5fd58 CR2: 000000000000000c ---[ end trace a38c07026a1afffd ]--- This is because when extent cache is disable, extent_tree pointer in struct f2fs_inode_info should be NULL, but in f2fs_drop_largest_extent we access this NULL pointer directly without checking state of extent cache, then, the oops occurs. Let's fix it by checking state of extent cache before accessing. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

[ Upstream commit ecf5fc6 ] Nikolay has reported a hang when a memcg reclaim got stuck with the following backtrace: PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync" #0 __schedule at ffffffff815ab152 #1 schedule at ffffffff815ab76e #2 schedule_timeout at ffffffff815ae5e5 MiCode#3 io_schedule_timeout at ffffffff815aad6a MiCode#4 bit_wait_io at ffffffff815abfc6 MiCode#5 __wait_on_bit at ffffffff815abda5 MiCode#6 wait_on_page_bit at ffffffff8111fd4f MiCode#7 shrink_page_list at ffffffff81135445 MiCode#8 shrink_inactive_list at ffffffff81135845 MiCode#9 shrink_lruvec at ffffffff81135ead MiCode#10 shrink_zone at ffffffff811360c3 MiCode#11 shrink_zones at ffffffff81136eff MiCode#12 do_try_to_free_pages at ffffffff8113712f MiCode#13 try_to_free_mem_cgroup_pages at ffffffff811372be MiCode#14 try_charge at ffffffff81189423 MiCode#15 mem_cgroup_try_charge at ffffffff8118c6f5 MiCode#16 __add_to_page_cache_locked at ffffffff8112137d MiCode#17 add_to_page_cache_lru at ffffffff81121618 MiCode#18 pagecache_get_page at ffffffff8112170b MiCode#19 grow_dev_page at ffffffff811c8297 MiCode#20 __getblk_slow at ffffffff811c91d6 MiCode#21 __getblk_gfp at ffffffff811c92c1 MiCode#22 ext4_ext_grow_indepth at ffffffff8124565c MiCode#23 ext4_ext_create_new_leaf at ffffffff81246ca8 MiCode#24 ext4_ext_insert_extent at ffffffff81246f09 MiCode#25 ext4_ext_map_blocks at ffffffff8124a848 MiCode#26 ext4_map_blocks at ffffffff8121a5b7 MiCode#27 mpage_map_one_extent at ffffffff8121b1fa MiCode#28 mpage_map_and_submit_extent at ffffffff8121f07b MiCode#29 ext4_writepages at ffffffff8121f6d5 MiCode#30 do_writepages at ffffffff8112c490 MiCode#31 __filemap_fdatawrite_range at ffffffff81120199 MiCode#32 filemap_flush at ffffffff8112041c MiCode#33 ext4_alloc_da_blocks at ffffffff81219da1 MiCode#34 ext4_rename at ffffffff81229b91 MiCode#35 ext4_rename2 at ffffffff81229e32 MiCode#36 vfs_rename at ffffffff811a08a5 MiCode#37 SYSC_renameat2 at ffffffff811a3ffc MiCode#38 sys_renameat2 at ffffffff811a408e MiCode#39 sys_rename at ffffffff8119e51e MiCode#40 system_call_fastpath at ffffffff815afa89 Dave Chinner has properly pointed out that this is a deadlock in the reclaim code because ext4 doesn't submit pages which are marked by PG_writeback right away. The heuristic was introduced by commit e62e384 ("memcg: prevent OOM with too many dirty pages") and it was applied only when may_enter_fs was specified. The code has been changed by c3b94f4 ("memcg: further prevent OOM with too many dirty pages") which has removed the __GFP_FS restriction with a reasoning that we do not get into the fs code. But this is not sufficient apparently because the fs doesn't necessarily submit pages marked PG_writeback for IO right away. ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily submit the bio. Instead it tries to map more pages into the bio and mpage_map_one_extent might trigger memcg charge which might end up waiting on a page which is marked PG_writeback but hasn't been submitted yet so we would end up waiting for something that never finishes. Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2) before we go to wait on the writeback. The page fault path, which is the only path that triggers memcg oom killer since 3.12, shouldn't require GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue which was originally addressed by the heuristic. As per David Chinner the xfs is doing similar thing since 2.6.15 already so ext4 is not the only affected filesystem. Moreover he notes: : For example: IO completion might require unwritten extent conversion : which executes filesystem transactions and GFP_NOFS allocations. The : writeback flag on the pages can not be cleared until unwritten : extent conversion completes. Hence memory reclaim cannot wait on : page writeback to complete in GFP_NOFS context because it is not : safe to do so, memcg reclaim or otherwise. Cc: stable@vger.kernel.org # 3.9+ [tytso@mit.edu: corrected the control flow] Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages") Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

commit 45caeaa upstream. As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: MiCode#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . MiCode#9 [] tcp_rcv_established at ffffffff81580b64 MiCode#10 [] tcp_v4_do_rcv at ffffffff8158b54a MiCode#11 [] tcp_v4_rcv at ffffffff8158cd02 MiCode#12 [] ip_local_deliver_finish at ffffffff815668f4 MiCode#13 [] ip_local_deliver at ffffffff81566bd9 MiCode#14 [] ip_rcv_finish at ffffffff8156656d MiCode#15 [] ip_rcv at ffffffff81566f06 MiCode#16 [] __netif_receive_skb_core at ffffffff8152b3a2 MiCode#17 [] __netif_receive_skb at ffffffff8152b608 MiCode#18 [] netif_receive_skb at ffffffff8152b690 MiCode#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] MiCode#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] MiCode#21 [] net_rx_action at ffffffff8152bac2 MiCode#22 [] __do_softirq at ffffffff81084b4f MiCode#23 [] call_softirq at ffffffff8164845c MiCode#24 [] do_softirq at ffffffff81016fc5 MiCode#25 [] irq_exit at ffffffff81084ee5 MiCode#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)â�© 225 {â�© 226 â�¹ const struct inet_connection_sock *icsk = inet_csk(sk);â�© 227 â�¹ const struct dst_entry *dst = __sk_dst_get(sk);â�© 228 â�© 229 â�¹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||â�© 230 â�¹ â�¹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);â�© 231 }â�© But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>

@MSF-Jarvis

commit ecf5fc6 upstream. Nikolay has reported a hang when a memcg reclaim got stuck with the following backtrace: PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync" #0 __schedule at ffffffff815ab152 premaca#1 schedule at ffffffff815ab76e premaca#2 schedule_timeout at ffffffff815ae5e5 MiCode#3 io_schedule_timeout at ffffffff815aad6a MiCode#4 bit_wait_io at ffffffff815abfc6 MiCode#5 __wait_on_bit at ffffffff815abda5 MiCode#6 wait_on_page_bit at ffffffff8111fd4f MiCode#7 shrink_page_list at ffffffff81135445 MiCode#8 shrink_inactive_list at ffffffff81135845 MiCode#9 shrink_lruvec at ffffffff81135ead MiCode#10 shrink_zone at ffffffff811360c3 MiCode#11 shrink_zones at ffffffff81136eff MiCode#12 do_try_to_free_pages at ffffffff8113712f MiCode#13 try_to_free_mem_cgroup_pages at ffffffff811372be MiCode#14 try_charge at ffffffff81189423 MiCode#15 mem_cgroup_try_charge at ffffffff8118c6f5 MiCode#16 __add_to_page_cache_locked at ffffffff8112137d MiCode#17 add_to_page_cache_lru at ffffffff81121618 MiCode#18 pagecache_get_page at ffffffff8112170b MiCode#19 grow_dev_page at ffffffff811c8297 MiCode#20 __getblk_slow at ffffffff811c91d6 MiCode#21 __getblk_gfp at ffffffff811c92c1 MiCode#22 ext4_ext_grow_indepth at ffffffff8124565c MiCode#23 ext4_ext_create_new_leaf at ffffffff81246ca8 MiCode#24 ext4_ext_insert_extent at ffffffff81246f09 MiCode#25 ext4_ext_map_blocks at ffffffff8124a848 MiCode#26 ext4_map_blocks at ffffffff8121a5b7 MiCode#27 mpage_map_one_extent at ffffffff8121b1fa MiCode#28 mpage_map_and_submit_extent at ffffffff8121f07b MiCode#29 ext4_writepages at ffffffff8121f6d5 MiCode#30 do_writepages at ffffffff8112c490 MiCode#31 __filemap_fdatawrite_range at ffffffff81120199 MiCode#32 filemap_flush at ffffffff8112041c MiCode#33 ext4_alloc_da_blocks at ffffffff81219da1 MiCode#34 ext4_rename at ffffffff81229b91 MiCode#35 ext4_rename2 at ffffffff81229e32 MiCode#36 vfs_rename at ffffffff811a08a5 MiCode#37 SYSC_renameat2 at ffffffff811a3ffc MiCode#38 sys_renameat2 at ffffffff811a408e MiCode#39 sys_rename at ffffffff8119e51e MiCode#40 system_call_fastpath at ffffffff815afa89 Dave Chinner has properly pointed out that this is a deadlock in the reclaim code because ext4 doesn't submit pages which are marked by PG_writeback right away. The heuristic was introduced by commit e62e384 ("memcg: prevent OOM with too many dirty pages") and it was applied only when may_enter_fs was specified. The code has been changed by c3b94f4 ("memcg: further prevent OOM with too many dirty pages") which has removed the __GFP_FS restriction with a reasoning that we do not get into the fs code. But this is not sufficient apparently because the fs doesn't necessarily submit pages marked PG_writeback for IO right away. ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily submit the bio. Instead it tries to map more pages into the bio and mpage_map_one_extent might trigger memcg charge which might end up waiting on a page which is marked PG_writeback but hasn't been submitted yet so we would end up waiting for something that never finishes. Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2) before we go to wait on the writeback. The page fault path, which is the only path that triggers memcg oom killer since 3.12, shouldn't require GFP_NOFS and so we shouldn't reintroduce the premature OOM killer issue which was originally addressed by the heuristic. As per David Chinner the xfs is doing similar thing since 2.6.15 already so ext4 is not the only affected filesystem. Moreover he notes: : For example: IO completion might require unwritten extent conversion : which executes filesystem transactions and GFP_NOFS allocations. The : writeback flag on the pages can not be cleared until unwritten : extent conversion completes. Hence memory reclaim cannot wait on : page writeback to complete in GFP_NOFS context because it is not : safe to do so, memcg reclaim or otherwise. [tytso@mit.edu: corrected the control flow] Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages") Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [@MSF-Jarvis: Fix conflicts from "mm: vmscan: stall page reclaim after a list of pages have been processed" ] Change-Id: I09aa7c565388b4b323034d5c71a463f4fb175462

commit 45caeaa upstream. As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: MiCode#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . MiCode#9 [] tcp_rcv_established at ffffffff81580b64 MiCode#10 [] tcp_v4_do_rcv at ffffffff8158b54a MiCode#11 [] tcp_v4_rcv at ffffffff8158cd02 MiCode#12 [] ip_local_deliver_finish at ffffffff815668f4 MiCode#13 [] ip_local_deliver at ffffffff81566bd9 MiCode#14 [] ip_rcv_finish at ffffffff8156656d MiCode#15 [] ip_rcv at ffffffff81566f06 MiCode#16 [] __netif_receive_skb_core at ffffffff8152b3a2 MiCode#17 [] __netif_receive_skb at ffffffff8152b608 MiCode#18 [] netif_receive_skb at ffffffff8152b690 MiCode#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] MiCode#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] MiCode#21 [] net_rx_action at ffffffff8152bac2 MiCode#22 [] __do_softirq at ffffffff81084b4f MiCode#23 [] call_softirq at ffffffff8164845c MiCode#24 [] do_softirq at ffffffff81016fc5 MiCode#25 [] irq_exit at ffffffff81084ee5 MiCode#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)â�© 225 {â�© 226 â�¹ const struct inet_connection_sock *icsk = inet_csk(sk);â�© 227 â�¹ const struct dst_entry *dst = __sk_dst_get(sk);â�© 228 â�© 229 â�¹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||â�© 230 â�¹ â�¹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);â�© 231 }â�© But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>

commit 45caeaa upstream. As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: MiCode#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . MiCode#9 [] tcp_rcv_established at ffffffff81580b64 MiCode#10 [] tcp_v4_do_rcv at ffffffff8158b54a MiCode#11 [] tcp_v4_rcv at ffffffff8158cd02 MiCode#12 [] ip_local_deliver_finish at ffffffff815668f4 MiCode#13 [] ip_local_deliver at ffffffff81566bd9 MiCode#14 [] ip_rcv_finish at ffffffff8156656d MiCode#15 [] ip_rcv at ffffffff81566f06 MiCode#16 [] __netif_receive_skb_core at ffffffff8152b3a2 MiCode#17 [] __netif_receive_skb at ffffffff8152b608 MiCode#18 [] netif_receive_skb at ffffffff8152b690 MiCode#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] MiCode#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] MiCode#21 [] net_rx_action at ffffffff8152bac2 MiCode#22 [] __do_softirq at ffffffff81084b4f MiCode#23 [] call_softirq at ffffffff8164845c MiCode#24 [] do_softirq at ffffffff81016fc5 MiCode#25 [] irq_exit at ffffffff81084ee5 MiCode#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4dfce57 upstream. There have been several reports over the years of NULL pointer dereferences in xfs_trans_log_inode during xfs_fsr processes, when the process is doing an fput and tearing down extents on the temporary inode, something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 PID: 29439 TASK: ffff880550584fa0 CPU: 6 COMMAND: "xfs_fsr" [exception RIP: xfs_trans_log_inode+0x10] MiCode#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs] MiCode#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs] MiCode#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs] MiCode#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs] MiCode#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs] MiCode#14 [ffff8800a57bbe00] evict at ffffffff811e1b67 MiCode#15 [ffff8800a57bbe28] iput at ffffffff811e23a5 MiCode#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8 MiCode#17 [ffff8800a57bbe88] dput at ffffffff811dd06c MiCode#18 [ffff8800a57bbea8] __fput at ffffffff811c823b MiCode#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e MiCode#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27 MiCode#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c MiCode#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d As it turns out, this is because the i_itemp pointer, along with the d_ops pointer, has been overwritten with zeros when we tear down the extents during truncate. When the in-core inode fork on the temporary inode used by xfs_fsr was originally set up during the extent swap, we mistakenly looked at di_nextents to determine whether all extents fit inline, but this misses extents generated by speculative preallocation; we should be using if_bytes instead. This mistake corrupts the in-memory inode, and code in xfs_iext_remove_inline eventually gets bad inputs, causing it to memmove and memset incorrect ranges; this became apparent because the two values in ifp->if_u2.if_inline_ext[1] contained what should have been in d_ops and i_itemp; they were memmoved due to incorrect array indexing and then the original locations were zeroed with memset, again due to an array overrun. Fix this by properly using i_df.if_bytes to determine the number of extents, not di_nextents. Thanks to dchinner for looking at this with me and spotting the root cause. [nborisov: backported to 4.4] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ba80aa9 upstream. This patch closes a long standing race in configfs between the creation of a new symlink in create_link(), while the symlink target's config_item is being concurrently removed via configfs_rmdir(). This can happen because the symlink target's reference is obtained by config_item_get() in create_link() before the CONFIGFS_USET_DROPPING bit set by configfs_detach_prep() during configfs_rmdir() shutdown is actually checked.. This originally manifested itself on ppc64 on v4.8.y under heavy load using ibmvscsi target ports with Novalink API: [ 7877.289863] rpadlpar_io: slot U8247.22L.212A91A-V1-C8 added [ 7879.893760] ------------[ cut here ]------------ [ 7879.893768] WARNING: CPU: 15 PID: 17585 at ./include/linux/kref.h:46 config_item_get+0x7c/0x90 [configfs] [ 7879.893811] CPU: 15 PID: 17585 Comm: targetcli Tainted: G O 4.8.17-customv2.22 MiCode#12 [ 7879.893812] task: c00000018a0d3400 task.stack: c0000001f3b40000 [ 7879.893813] NIP: d000000002c664ec LR: d000000002c60980 CTR: c000000000b70870 [ 7879.893814] REGS: c0000001f3b43810 TRAP: 0700 Tainted: G O (4.8.17-customv2.22) [ 7879.893815] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28222242 XER: 00000000 [ 7879.893820] CFAR: d000000002c664bc SOFTE: 1 GPR00: d000000002c60980 c0000001f3b43a90 d000000002c70908 c0000000fbc06820 GPR04: c0000001ef1bd900 0000000000000004 0000000000000001 0000000000000000 GPR08: 0000000000000000 0000000000000001 d000000002c69560 d000000002c66d80 GPR12: c000000000b70870 c00000000e798700 c0000001f3b43ca0 c0000001d4949d40 GPR16: c00000014637e1c0 0000000000000000 0000000000000000 c0000000f2392940 GPR20: c0000001f3b43b98 0000000000000041 0000000000600000 0000000000000000 GPR24: fffffffffffff000 0000000000000000 d000000002c60be0 c0000001f1dac490 GPR28: 0000000000000004 0000000000000000 c0000001ef1bd900 c0000000f2392940 [ 7879.893839] NIP [d000000002c664ec] config_item_get+0x7c/0x90 [configfs] [ 7879.893841] LR [d000000002c60980] check_perm+0x80/0x2e0 [configfs] [ 7879.893842] Call Trace: [ 7879.893844] [c0000001f3b43ac0] [d000000002c60980] check_perm+0x80/0x2e0 [configfs] [ 7879.893847] [c0000001f3b43b10] [c000000000329770] do_dentry_open+0x2c0/0x460 [ 7879.893849] [c0000001f3b43b70] [c000000000344480] path_openat+0x210/0x1490 [ 7879.893851] [c0000001f3b43c80] [c00000000034708c] do_filp_open+0xfc/0x170 [ 7879.893853] [c0000001f3b43db0] [c00000000032b5bc] do_sys_open+0x1cc/0x390 [ 7879.893856] [c0000001f3b43e30] [c000000000009584] system_call+0x38/0xec [ 7879.893856] Instruction dump: [ 7879.893858] 409d0014 38210030 e8010010 7c0803a6 4e800020 3d220000 e94981e0 892a0000 [ 7879.893861] 2f890000 409effe0 39200001 992a0000 <0fe00000> 4bffffd0 60000000 60000000 [ 7879.893866] ---[ end trace 14078f0b3b5ad0aa ]--- To close this race, go ahead and obtain the symlink's target config_item reference only after the existing CONFIGFS_USET_DROPPING check succeeds. This way, if configfs_rmdir() wins create_link() will return -ENONET, and if create_link() wins configfs_rmdir() will return -EBUSY. Reported-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>