Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CONFIG_IPV6_PRIVACY #8

Closed
charlie2alpha opened this issue Jun 2, 2013 · 2 comments
Closed

CONFIG_IPV6_PRIVACY #8

charlie2alpha opened this issue Jun 2, 2013 · 2 comments

Comments

@charlie2alpha
Copy link

Having CONFIG_IPV6_PRIVACY=y causes problems with the Google Messaging service. The service mistakenly uses a temporary IPv6 address to establish a link to the Google servers, using it for push notifications and sync. The temporary address however becomes invalid after a very short period of time, causing the connection to fail. The service tries to connect again, and again, in an endless loop.

Ideally, this should be fixed in the Google code, to use a normal IPv6 outbound address, but this is closed source Google code.

As a workaround I propose that CONFIG_IPV6_PRIVACY is set to "N" (CONFIG_IPV6_PRIVACY=n) so we can avoid such issues, this will force the system to always use the normal IPv6 address and not a temporary address.

@charlie2alpha
Copy link
Author

The build that @Kayant made to test the RIL issues has the 4e6c6fe commit included, which disables the IPv6 temporary addresses. Push notifications work again correctly with IPv6,

Interesting thing to note, my Nexus 7 tablet has CONFIG_IPV6_PRIVACY=y, but has no problem with the notifications. We cannot know though what Google has done with the code there, it's safer to keep it disabled for the DEFY.

Since this is now working, I am closing the issue.

@Quarx2k
Copy link
Owner

Quarx2k commented Jun 5, 2013

Nexus7 use 3.x kernel, updated Netfilter. That's why no problems.

Blechd0se pushed a commit that referenced this issue Dec 3, 2013
…condition

commit 26c1917 upstream.

When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.

PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
 #0 [f06a9dd8] crash_kexec at c049b5ec
 #1 [f06a9e2c] oops_end at c083d1c2
 #2 [f06a9e40] no_context at c0433ded
 #3 [f06a9e64] bad_area_nosemaphore at c043401a
 #4 [f06a9e6c] __do_page_fault at c0434493
 #5 [f06a9eec] do_page_fault at c083eb45
 #6 [f06a9f04] error_code (via page_fault) at c083c5d5
    EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
    00000000
    DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
    CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
 #7 [f06a9f38] _spin_lock at c083bc14
 #8 [f06a9f44] sys_mincore at c0507b7d
 #9 [f06a9fb0] system_call at c083becd
                         start           len
    EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
    SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
    CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286

This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.

With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.

This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.

Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.

NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.

This bug was discovered and fully debugged by Ulrich, quote:

----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.

    496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
    *pmd)
    497 {
    498         /* depend on compiler for an atomic pmd read */
    499         pmd_t pmdval = *pmd;

                                // edi = pmd pointer
0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
...
                                // edx = PTE page table high address
0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
...
                                // eax = PTE page table low address
0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax

[..]

Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.

-  The PMD entry {high|low} is 0x0000000000000000.
   The "mov" at 0xc0507a84 loads 0x00000000 into edx.

-  A page fault (on another CPU) sneaks in between the two "mov"
   instructions and instantiates the PMD.

-  The PMD entry {high|low} is now 0x00000003fda38067.
   The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----

Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Blechd0se pushed a commit that referenced this issue Dec 3, 2013
commit 3cf003c upstream.

[The async read code was broadened to include uncached reads in 3.5, so
the mainline patch did not apply directly. This patch is just a backport
to account for that change.]

Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:

crash> bt
PID: 2789   TASK: f02edaa0  CPU: 3   COMMAND: "fsx"
 #0 [eed63cbc] schedule at c083c5b3
 #1 [eed63d80] kmap_high at c0500ec8
 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
 #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
 #4 [eed63e50] do_writepages at c04f3e32
 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a
 #6 [eed63ea4] filemap_fdatawrite at c04e1b3e
 #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
 #8 [eed63ecc] do_sync_write at c052d202
 #9 [eed63f74] vfs_write at c052d4ee
#10 [eed63f94] sys_write at c052df4c
#11 [eed63fb0] ia32_sysenter_target at c0409a98
    EAX: 00000004  EBX: 00000003  ECX: abd73b73  EDX: 012a65c6
    DS:  007b      ESI: 012a65c6  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bf8db178  EBP: bf8db1f8  GS:  0033
    CS:  0073      EIP: 40000424  ERR: 00000004  EFLAGS: 00000246

Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.

This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.

There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.

Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Blechd0se pushed a commit that referenced this issue Dec 3, 2013
…d reasons

commit 5cf02d0 upstream.

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Blechd0se pushed a commit that referenced this issue Dec 3, 2013
commit bea6832 upstream.

On architectures where cputime_t is 64 bit type, is possible to trigger
divide by zero on do_div(temp, (__force u32) total) line, if total is a
non zero number but has lower 32 bit's zeroed. Removing casting is not
a good solution since some do_div() implementations do cast to u32
internally.

This problem can be triggered in practice on very long lived processes:

  PID: 2331   TASK: ffff880472814b00  CPU: 2   COMMAND: "oraagent.bin"
   #0 [ffff880472a51b70] machine_kexec at ffffffff8103214b
   #1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2
   #2 [ffff880472a51ca0] oops_end at ffffffff814f0b00
   #3 [ffff880472a51cd0] die at ffffffff8100f26b
   #4 [ffff880472a51d00] do_trap at ffffffff814f03f4
   #5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff
   #6 [ffff880472a51e00] divide_error at ffffffff8100be7b
      [exception RIP: thread_group_times+0x56]
      RIP: ffffffff81056a16  RSP: ffff880472a51eb8  RFLAGS: 00010046
      RAX: bc3572c9fe12d194  RBX: ffff880874150800  RCX: 0000000110266fad
      RDX: 0000000000000000  RSI: ffff880472a51eb8  RDI: 001038ae7d9633dc
      RBP: ffff880472a51ef8   R8: 00000000b10a3a64   R9: ffff880874150800
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: ffff880472a51f08
      R13: ffff880472a51f10  R14: 0000000000000000  R15: 0000000000000007
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #7 [ffff880472a51f00] do_sys_times at ffffffff8108845d
   #8 [ffff880472a51f40] sys_times at ffffffff81088524
   #9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2
      RIP: 0000003808caac3a  RSP: 00007fcba27ab6d8  RFLAGS: 00000202
      RAX: 0000000000000064  RBX: ffffffff8100b0f2  RCX: 0000000000000000
      RDX: 00007fcba27ab6e0  RSI: 000000000076d58e  RDI: 00007fcba27ab6e0
      RBP: 00007fcba27ab700   R8: 0000000000000020   R9: 000000000000091b
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: 00007fff9ca41940
      R13: 0000000000000000  R14: 00007fcba27ac9c0  R15: 00007fff9ca41940
      ORIG_RAX: 0000000000000064  CS: 0033  SS: 002b

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120808092714.GA3580@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Blechd0se pushed a commit that referenced this issue Dec 3, 2013
commit 9b469a6 upstream.

Add 6 new devices and one modified device, based on
information from laptop vendor Windows drivers.

Sony provides a driver with two new devices using
a Gobi 2k+ layout (1199:68a5 and 1199:68a9).  The
Sony driver also adds a non-standard QMI/net
interface to the already supported 1199:9011
Gobi device. We do not know whether this is an
alternate interface number or an additional
interface which might be present, but that doesn't
really matter.

Lenovo provides a driver supporting 4 new devices:
 - MC7770 (1199:901b) with standard Gobi 2k+ layout
 - MC7700 (0f3d:68a2) with layout similar to MC7710
 - MC7750 (114f:68a2) with layout similar to MC7710
 - EM7700 (1199:901c) with layout similar to MC7710

Note regaring the three devices similar to MC7710:

The Windows drivers only support interface #8 on these
devices.  The MC7710 can support QMI/net functions on
interface #19 and #20 as well, and this driver is
verified to work on interface #19 (a firmware bug is
suspected to prevent #20 from working).

We do not enable these additional interfaces until they
either show up in a Windows driver or are verified to
work in some other way.  Therefore limiting the new
devices to interface #8 for now.

[bmork: backported to 3.4: use driver whitelisting]
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Blechd0se pushed a commit that referenced this issue Dec 3, 2013
commit 412d32e upstream.

A rescue thread exiting TASK_INTERRUPTIBLE can lead to a task scheduling
off, never to be seen again.  In the case where this occurred, an exiting
thread hit reiserfs homebrew conditional resched while holding a mutex,
bringing the box to its knees.

PID: 18105  TASK: ffff8807fd412180  CPU: 5   COMMAND: "kdmflush"
 #0 [ffff8808157e7670] schedule at ffffffff8143f489
 #1 [ffff8808157e77b8] reiserfs_get_block at ffffffffa038ab2d [reiserfs]
 #2 [ffff8808157e79a8] __block_write_begin at ffffffff8117fb14
 #3 [ffff8808157e7a98] reiserfs_write_begin at ffffffffa0388695 [reiserfs]
 #4 [ffff8808157e7ad8] generic_perform_write at ffffffff810ee9e2
 #5 [ffff8808157e7b58] generic_file_buffered_write at ffffffff810eeb41
 #6 [ffff8808157e7ba8] __generic_file_aio_write at ffffffff810f1a3a
 #7 [ffff8808157e7c58] generic_file_aio_write at ffffffff810f1c88
 #8 [ffff8808157e7cc8] do_sync_write at ffffffff8114f850
 #9 [ffff8808157e7dd8] do_acct_process at ffffffff810a268f
    [exception RIP: kernel_thread_helper]
    RIP: ffffffff8144a5c0  RSP: ffff8808157e7f58  RFLAGS: 00000202
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffffffff8107af60  RDI: ffff8803ee491d18
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Quarx2k pushed a commit that referenced this issue May 12, 2014
The warning below triggers on AMD MCM packages because physical package
IDs on the cores of a _physical_ socket are the same. I.e., this field
says which CPUs belong to the same physical package.

However, the same two CPUs belong to two different internal, i.e.
"logical" nodes in the same physical socket which is reflected in the
CPU-to-node map on x86 with NUMA.

Which makes this check wrong on the above topologies so circumvent it.

[    0.444413] Booting Node   0, Processors  #1 #2 #3 #4 #5 Ok.
[    0.461388] ------------[ cut here ]------------
[    0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
[    0.473960] Hardware name: Dinar
[    0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.486860] Booting Node   1, Processors  #6
[    0.491104] Modules linked in:
[    0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1
[    0.499510] Call Trace:
[    0.501946]  [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81
[    0.508185]  [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d
[    0.514163]  [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48
[    0.519881]  [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81
[    0.525943]  [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371
[    0.532004]  [<ffffffff8144c4ee>] start_secondary+0x19a/0x218
[    0.537729] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.628197]  #7 #8 #9 #10 #11 Ok.
[    0.807108] Booting Node   3, Processors  #12 #13 #14 #15 #16 #17 Ok.
[    0.897587] Booting Node   2, Processors  #18 #19 #20 #21 #22 #23 Ok.
[    0.917443] Brought up 24 CPUs

We ran a topology sanity check test we have here on it and
it all looks ok... hopefully :).

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Quarx2k pushed a commit that referenced this issue May 12, 2014
Jian found that when he ran fsx on a 32 bit arch with a large wsize the
process and one of the bdi writeback kthreads would sometimes deadlock
with a stack trace like this:

crash> bt
PID: 2789   TASK: f02edaa0  CPU: 3   COMMAND: "fsx"
 #0 [eed63cbc] schedule at c083c5b3
 #1 [eed63d80] kmap_high at c0500ec8
 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs]
 #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs]
 #4 [eed63e50] do_writepages at c04f3e32
 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a
 #6 [eed63ea4] filemap_fdatawrite at c04e1b3e
 #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs]
 #8 [eed63ecc] do_sync_write at c052d202
 #9 [eed63f74] vfs_write at c052d4ee
#10 [eed63f94] sys_write at c052df4c
#11 [eed63fb0] ia32_sysenter_target at c0409a98
    EAX: 00000004  EBX: 00000003  ECX: abd73b73  EDX: 012a65c6
    DS:  007b      ESI: 012a65c6  ES:  007b      EDI: 00000000
    SS:  007b      ESP: bf8db178  EBP: bf8db1f8  GS:  0033
    CS:  0073      EIP: 40000424  ERR: 00000004  EFLAGS: 00000246

Each task would kmap part of its address array before getting stuck, but
not enough to actually issue the write.

This patch fixes this by serializing the marshal_iov operations for
async reads and writes. The idea here is to ensure that cifs
aggressively tries to populate a request before attempting to fulfill
another one. As soon as all of the pages are kmapped for a request, then
we can unlock and allow another one to proceed.

There's no need to do this serialization on non-CONFIG_HIGHMEM arches
however, so optimize all of this out when CONFIG_HIGHMEM isn't set.

Cc: <stable@vger.kernel.org>
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Quarx2k pushed a commit that referenced this issue May 12, 2014
…d reasons

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Quarx2k pushed a commit that referenced this issue May 12, 2014
On architectures where cputime_t is 64 bit type, is possible to trigger
divide by zero on do_div(temp, (__force u32) total) line, if total is a
non zero number but has lower 32 bit's zeroed. Removing casting is not
a good solution since some do_div() implementations do cast to u32
internally.

This problem can be triggered in practice on very long lived processes:

  PID: 2331   TASK: ffff880472814b00  CPU: 2   COMMAND: "oraagent.bin"
   #0 [ffff880472a51b70] machine_kexec at ffffffff8103214b
   #1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2
   #2 [ffff880472a51ca0] oops_end at ffffffff814f0b00
   #3 [ffff880472a51cd0] die at ffffffff8100f26b
   #4 [ffff880472a51d00] do_trap at ffffffff814f03f4
   #5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff
   #6 [ffff880472a51e00] divide_error at ffffffff8100be7b
      [exception RIP: thread_group_times+0x56]
      RIP: ffffffff81056a16  RSP: ffff880472a51eb8  RFLAGS: 00010046
      RAX: bc3572c9fe12d194  RBX: ffff880874150800  RCX: 0000000110266fad
      RDX: 0000000000000000  RSI: ffff880472a51eb8  RDI: 001038ae7d9633dc
      RBP: ffff880472a51ef8   R8: 00000000b10a3a64   R9: ffff880874150800
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: ffff880472a51f08
      R13: ffff880472a51f10  R14: 0000000000000000  R15: 0000000000000007
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #7 [ffff880472a51f00] do_sys_times at ffffffff8108845d
   #8 [ffff880472a51f40] sys_times at ffffffff81088524
   #9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2
      RIP: 0000003808caac3a  RSP: 00007fcba27ab6d8  RFLAGS: 00000202
      RAX: 0000000000000064  RBX: ffffffff8100b0f2  RCX: 0000000000000000
      RDX: 00007fcba27ab6e0  RSI: 000000000076d58e  RDI: 00007fcba27ab6e0
      RBP: 00007fcba27ab700   R8: 0000000000000020   R9: 000000000000091b
      R10: 00007fcba27ab680  R11: 0000000000000202  R12: 00007fff9ca41940
      R13: 0000000000000000  R14: 00007fcba27ac9c0  R15: 00007fff9ca41940
      ORIG_RAX: 0000000000000064  CS: 0033  SS: 002b

Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120808092714.GA3580@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Quarx2k pushed a commit that referenced this issue May 12, 2014
Add 6 new devices and one modified device, based on
information from laptop vendor Windows drivers.

Sony provides a driver with two new devices using
a Gobi 2k+ layout (1199:68a5 and 1199:68a9).  The
Sony driver also adds a non-standard QMI/net
interface to the already supported 1199:9011
Gobi device. We do not know whether this is an
alternate interface number or an additional
interface which might be present, but that doesn't
really matter.

Lenovo provides a driver supporting 4 new devices:
 - MC7770 (1199:901b) with standard Gobi 2k+ layout
 - MC7700 (0f3d:68a2) with layout similar to MC7710
 - MC7750 (114f:68a2) with layout similar to MC7710
 - EM7700 (1199:901c) with layout similar to MC7710

Note regaring the three devices similar to MC7710:

The Windows drivers only support interface #8 on these
devices.  The MC7710 can support QMI/net functions on
interface #19 and #20 as well, and this driver is
verified to work on interface #19 (a firmware bug is
suspected to prevent #20 from working).

We do not enable these additional interfaces until they
either show up in a Windows driver or are verified to
work in some other way.  Therefore limiting the new
devices to interface #8 for now.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quarx2k pushed a commit that referenced this issue May 12, 2014
The bfi instruction is not available on ARMv6, so instead use an and/orr
sequence in the contextidr_notifier. This gets rid of the assembler
error:

  Assembler messages:
  Error: selected processor does not support ARM mode `bfi r3,r2,#0,#8'

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Quarx2k pushed a commit that referenced this issue May 12, 2014
This moves ARM over to the asm-generic/unaligned.h header. This has the
benefit of better code generated especially for ARMv7 on gcc 4.7+
compilers.

As Arnd Bergmann, points out: The asm-generic version uses the "struct"
version for native-endian unaligned access and the "byteshift" version
for the opposite endianess. The current ARM version however uses the
"byteshift" implementation for both.

Thanks to Nicolas Pitre for the excellent analysis:

Test case:

int foo (int *x) { return get_unaligned(x); }
long long bar (long long *x) { return get_unaligned(x); }

With the current ARM version:

foo:
	ldrb	r3, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	r3, r3, asl #16	@ tmp154, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	r3, r3, r1, asl #8	@, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r3, r2	@ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
	orr	r0, r3, r0, asl #24	@,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	mov	r2, #0	@ tmp184,
	ldrb	r5, [r0, #6]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
	ldrb	r4, [r0, #5]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
	ldrb	ip, [r0, #2]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
	ldrb	r1, [r0, #4]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
	mov	r5, r5, asl #16	@ tmp175, MEM[(const u8 *)x_1(D) + 6B],
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
	orr	r5, r5, r4, asl #8	@, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
	ldrb	r6, [r0, #7]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
	orr	r5, r5, r1	@ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
	ldrb	r4, [r0, #0]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
	mov	ip, ip, asl #16	@ tmp188, MEM[(const u8 *)x_1(D) + 2B],
	ldrb	r1, [r0, #3]	@ zero_extendqisi2	@ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
	orr	ip, ip, r7, asl #8	@, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
	orr	r3, r5, r6, asl #24	@,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
	orr	ip, ip, r4	@ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
	orr	ip, ip, r1, asl #24	@, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
	mov	r1, r3	@,
	orr	r0, r2, ip	@ tmp171, tmp184, tmp194
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

In both cases the code is slightly suboptimal.  One may wonder why
wasting r2 with the constant 0 in the second case for example.  And all
the mov's could be folded in subsequent orr's, etc.

Now with the asm-generic version:

foo:
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	bx	lr	@

bar:
	mov	r3, r0	@ x, x
	ldr	r0, [r0, #0]	@ unaligned	@,* x
	ldr	r1, [r3, #4]	@ unaligned	@,
	bx	lr	@

This is way better of course, but only because this was compiled for
ARMv7. In this case the compiler knows that the hardware can do
unaligned word access.  This isn't that obvious for foo(), but if we
remove the get_unaligned() from bar as follows:

long long bar (long long *x) {return *x; }

then the resulting code is:

bar:
	ldmia	r0, {r0, r1}	@ x,,
	bx	lr	@

So this proves that the presumed aligned vs unaligned cases does have
influence on the instructions the compiler may use and that the above
unaligned code results are not just an accident.

Still... this isn't fully conclusive without at least looking at the
resulting assembly fron a pre ARMv6 compilation.  Let's see with an
ARMv5 target:

foo:
	ldrb	r3, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r1, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r2, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r0, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r3, r3, r1, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r2, asl #16	@, tmp145, tmp142, tmp143,
	orr	r0, r3, r0, asl #24	@,, tmp145, tmp146,
	bx	lr	@

bar:
	stmfd	sp!, {r4, r5, r6, r7}	@,
	ldrb	r2, [r0, #0]	@ zero_extendqisi2	@ tmp139,* x
	ldrb	r7, [r0, #1]	@ zero_extendqisi2	@ tmp140,
	ldrb	r3, [r0, #4]	@ zero_extendqisi2	@ tmp149,
	ldrb	r6, [r0, #5]	@ zero_extendqisi2	@ tmp150,
	ldrb	r5, [r0, #2]	@ zero_extendqisi2	@ tmp143,
	ldrb	r4, [r0, #6]	@ zero_extendqisi2	@ tmp153,
	ldrb	r1, [r0, #7]	@ zero_extendqisi2	@ tmp156,
	ldrb	ip, [r0, #3]	@ zero_extendqisi2	@ tmp146,
	orr	r2, r2, r7, asl #8	@, tmp142, tmp139, tmp140,
	orr	r3, r3, r6, asl #8	@, tmp152, tmp149, tmp150,
	orr	r2, r2, r5, asl #16	@, tmp145, tmp142, tmp143,
	orr	r3, r3, r4, asl #16	@, tmp155, tmp152, tmp153,
	orr	r0, r2, ip, asl #24	@,, tmp145, tmp146,
	orr	r1, r3, r1, asl #24	@,, tmp155, tmp156,
	ldmfd	sp!, {r4, r5, r6, r7}
	bx	lr

Compared to the initial results, this is really nicely optimized and I
couldn't do much better if I were to hand code it myself.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Quarx2k pushed a commit that referenced this issue May 12, 2014
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs.  This prevents occasional crashes on unmount like so:

PID: 21602  TASK: ee9df060  CPU: 0   COMMAND: "kworker/0:3"
 #0 [c5377d28] crash_kexec at c0292c94
 #1 [c5377d80] oops_end at c07090c2
 #2 [c5377d98] no_context at c06f614e
 #3 [c5377dbc] __bad_area_nosemaphore at c06f6281
 #4 [c5377df4] bad_area_nosemaphore at c06f629b
 #5 [c5377e00] do_page_fault at c070b0cb
 #6 [c5377e7c] error_code (via page_fault) at c070892c
    EAX: f300c6a8  EBX: f300c6a8  ECX: 000000c0  EDX: 000000c0  EBP: c5377ed0
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000001  GS:  ffffad20
    CS:  0060      EIP: c0481ad0  ERR: ffffffff  EFLAGS: 00010246
 #7 [c5377eb0] atomic64_read_cx8 at c0481ad0
 #8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
 #9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834

PID: 26653  TASK: e79143b0  CPU: 3   COMMAND: "umount"
 #0 [cde0fda0] __schedule at c0706595
 #1 [cde0fe28] schedule at c0706b89
 #2 [cde0fe30] schedule_timeout at c0705600
 #3 [cde0fe94] __down_common at c0706098
 #4 [cde0fec8] __down at c0706122
 #5 [cde0fed0] down at c025936f
 #6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
 #7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
 #8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
 #9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66

commit 11159a0 added this to xfs_log_unmount and needs to be cleaned up
at a later date.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Quarx2k pushed a commit that referenced this issue May 12, 2014
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs.  This prevents occasional crashes on unmount like so:

PID: 21602  TASK: ee9df060  CPU: 0   COMMAND: "kworker/0:3"
 #0 [c5377d28] crash_kexec at c0292c94
 #1 [c5377d80] oops_end at c07090c2
 #2 [c5377d98] no_context at c06f614e
 #3 [c5377dbc] __bad_area_nosemaphore at c06f6281
 #4 [c5377df4] bad_area_nosemaphore at c06f629b
 #5 [c5377e00] do_page_fault at c070b0cb
 #6 [c5377e7c] error_code (via page_fault) at c070892c
    EAX: f300c6a8  EBX: f300c6a8  ECX: 000000c0  EDX: 000000c0  EBP: c5377ed0
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000001  GS:  ffffad20
    CS:  0060      EIP: c0481ad0  ERR: ffffffff  EFLAGS: 00010246
 #7 [c5377eb0] atomic64_read_cx8 at c0481ad0
 #8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
 #9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834

PID: 26653  TASK: e79143b0  CPU: 3   COMMAND: "umount"
 #0 [cde0fda0] __schedule at c0706595
 #1 [cde0fe28] schedule at c0706b89
 #2 [cde0fe30] schedule_timeout at c0705600
 #3 [cde0fe94] __down_common at c0706098
 #4 [cde0fec8] __down at c0706122
 #5 [cde0fed0] down at c025936f
 #6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
 #7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
 #8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
 #9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66

commit 11159a0 added this to xfs_log_unmount and needs to be cleaned up
at a later date.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Quarx2k pushed a commit that referenced this issue May 12, 2014
A rescue thread exiting TASK_INTERRUPTIBLE can lead to a task scheduling
off, never to be seen again.  In the case where this occurred, an exiting
thread hit reiserfs homebrew conditional resched while holding a mutex,
bringing the box to its knees.

PID: 18105  TASK: ffff8807fd412180  CPU: 5   COMMAND: "kdmflush"
 #0 [ffff8808157e7670] schedule at ffffffff8143f489
 #1 [ffff8808157e77b8] reiserfs_get_block at ffffffffa038ab2d [reiserfs]
 #2 [ffff8808157e79a8] __block_write_begin at ffffffff8117fb14
 #3 [ffff8808157e7a98] reiserfs_write_begin at ffffffffa0388695 [reiserfs]
 #4 [ffff8808157e7ad8] generic_perform_write at ffffffff810ee9e2
 #5 [ffff8808157e7b58] generic_file_buffered_write at ffffffff810eeb41
 #6 [ffff8808157e7ba8] __generic_file_aio_write at ffffffff810f1a3a
 #7 [ffff8808157e7c58] generic_file_aio_write at ffffffff810f1c88
 #8 [ffff8808157e7cc8] do_sync_write at ffffffff8114f850
 #9 [ffff8808157e7dd8] do_acct_process at ffffffff810a268f
    [exception RIP: kernel_thread_helper]
    RIP: ffffffff8144a5c0  RSP: ffff8808157e7f58  RFLAGS: 00000202
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffffffff8107af60  RDI: ffff8803ee491d18
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Quarx2k pushed a commit that referenced this issue May 12, 2014
Platforms export their SMP callbacks by populating arc_smp_ops.
The population itself needs to be done pretty early, from init_early
callback.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Quarx2k pushed a commit that referenced this issue May 12, 2014
…ernel/git/vgupta/arc

Pull new ARC architecture from Vineet Gupta:
 "Initial ARC Linux port with some fixes on top for 3.9-rc1:

  I would like to introduce the Linux port to ARC Processors (from
  Synopsys) for 3.9-rc1.  The patch-set has been discussed on the public
  lists since Nov and has received a fair bit of review, specially from
  Arnd, tglx, Al and other subsystem maintainers for DeviceTree, kgdb...

  The arch bits are in arch/arc, some asm-generic changes (acked by
  Arnd), a minor change to PARISC (acked by Helge).

  The series is a touch bigger for a new port for 2 main reasons:

   1. It enables a basic kernel in first sub-series and adds
      ptrace/kgdb/.. later

   2. Some of the fallout of review (DeviceTree support, multi-platform-
      image support) were added on top of orig series, primarily to
      record the revision history.

  This updated pull request additionally contains

   - fixes due to our GNU tools catching up with the new syscall/ptrace
     ABI

   - some (minor) cross-arch Kconfig updates."

* tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (82 commits)
  ARC: split elf.h into uapi and export it for userspace
  ARC: Fixup the current ABI version
  ARC: gdbserver using regset interface possibly broken
  ARC: Kconfig cleanup tracking cross-arch Kconfig pruning in merge window
  ARC: make a copy of flat DT
  ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed"
  ARC: Ensure CONFIG_VIRT_TO_BUS is not enabled
  ARC: Fix pt_orig_r8 access
  ARC: [3.9] Fallout of hlist iterator update
  ARC: 64bit RTSC timestamp hardware issue
  ARC: Don't fiddle with non-existent caches
  ARC: Add self to MAINTAINERS
  ARC: Provide a default serial.h for uart drivers needing BASE_BAUD
  ARC: [plat-arcfpga] defconfig for fully loaded ARC Linux
  ARC: [Review] Multi-platform image #8: platform registers SMP callbacks
  ARC: [Review] Multi-platform image #7: SMP common code to use callbacks
  ARC: [Review] Multi-platform image #6: cpu-to-dma-addr optional
  ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core
  ARC: [Review] Multi-platform image #4: Isolate platform headers
  ARC: [Review] Multi-platform image #3: switch to board callback
  ...
Quarx2k pushed a commit that referenced this issue May 12, 2014
The following script will produce a kernel oops:

    sudo ip netns add v
    sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo
    sudo ip netns exec v ip link set lo up
    sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo
    sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo
    sudo ip netns exec v ip link set vxlan0 up
    sudo ip netns del v

where inspect by gdb:

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 107]
    0xffffffffa0289e33 in ?? ()
    (gdb) bt
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    #1  vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087
    #2  0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299
    #3  0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335
    #4  0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851
    #5  0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752
    #6  0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170
    #7  0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302
    #8  0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157
    #9  0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276
    #10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168
    #11 <signal handler called>
    #12 0x0000000000000000 in ?? ()
    #13 0x0000000000000000 in ?? ()
    (gdb) fr 0
    #0  vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533
    533		struct sock *sk = vn->sock->sk;
    (gdb) l
    528	static int vxlan_leave_group(struct net_device *dev)
    529	{
    530		struct vxlan_dev *vxlan = netdev_priv(dev);
    531		struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
    532		int err = 0;
    533		struct sock *sk = vn->sock->sk;
    534		struct ip_mreqn mreq = {
    535			.imr_multiaddr.s_addr	= vxlan->gaddr,
    536			.imr_ifindex		= vxlan->link,
    537		};
    (gdb) p vn->sock
    $4 = (struct socket *) 0x0

The kernel calls `vxlan_exit_net` when deleting the netns before shutting down
vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock`
is already gone causes the oops. so we should manually shutdown all interfaces
before deleting `vn->sock` as the patch does.

Signed-off-by: Zang MingJie <zealot0630@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quarx2k pushed a commit that referenced this issue May 12, 2014
We can deadlock (s_active and fcoe_config_mutex) if a
port is being destroyed at the same time one is being created.

[ 4200.503113] ======================================================
[ 4200.503114] [ INFO: possible circular locking dependency detected ]
[ 4200.503116] 3.8.0-rc5+ #8 Not tainted
[ 4200.503117] -------------------------------------------------------
[ 4200.503118] kworker/3:2/2492 is trying to acquire lock:
[ 4200.503119]  (s_active#292){++++.+}, at: [<ffffffff8122d20b>] sysfs_addrm_finish+0x3b/0x70
[ 4200.503127]
but task is already holding lock:
[ 4200.503128]  (fcoe_config_mutex){+.+.+.}, at: [<ffffffffa02f3338>] fcoe_destroy_work+0xe8/0x120 [fcoe]
[ 4200.503133]
which lock already depends on the new lock.

[ 4200.503135]
the existing dependency chain (in reverse order) is:
[ 4200.503136]
-> #1 (fcoe_config_mutex){+.+.+.}:
[ 4200.503139]        [<ffffffff810c7711>] lock_acquire+0xa1/0x140
[ 4200.503143]        [<ffffffff816ca7be>] mutex_lock_nested+0x6e/0x360
[ 4200.503146]        [<ffffffffa02f11bd>] fcoe_enable+0x1d/0xb0 [fcoe]
[ 4200.503148]        [<ffffffffa02f127d>] fcoe_ctlr_enabled+0x2d/0x50 [fcoe]
[ 4200.503151]        [<ffffffffa02ffbe8>] store_ctlr_enabled+0x38/0x90 [libfcoe]
[ 4200.503154]        [<ffffffff81424878>] dev_attr_store+0x18/0x30
[ 4200.503157]        [<ffffffff8122b750>] sysfs_write_file+0xe0/0x150
[ 4200.503160]        [<ffffffff811b334c>] vfs_write+0xac/0x180
[ 4200.503162]        [<ffffffff811b3692>] sys_write+0x52/0xa0
[ 4200.503164]        [<ffffffff816d7159>] system_call_fastpath+0x16/0x1b
[ 4200.503167]
-> #0 (s_active#292){++++.+}:
[ 4200.503170]        [<ffffffff810c680f>] __lock_acquire+0x135f/0x1c90
[ 4200.503172]        [<ffffffff810c7711>] lock_acquire+0xa1/0x140
[ 4200.503174]        [<ffffffff8122c626>] sysfs_deactivate+0x116/0x160
[ 4200.503176]        [<ffffffff8122d20b>] sysfs_addrm_finish+0x3b/0x70
[ 4200.503178]        [<ffffffff8122b2eb>] sysfs_hash_and_remove+0x5b/0xb0
[ 4200.503180]        [<ffffffff8122f3d1>] sysfs_remove_group+0x61/0x100
[ 4200.503183]        [<ffffffff814251eb>] device_remove_groups+0x3b/0x60
[ 4200.503185]        [<ffffffff81425534>] device_remove_attrs+0x44/0x80
[ 4200.503187]        [<ffffffff81425e97>] device_del+0x127/0x1c0
[ 4200.503189]        [<ffffffff81425f52>] device_unregister+0x22/0x60
[ 4200.503191]        [<ffffffffa0300970>] fcoe_ctlr_device_delete+0xe0/0xf0 [libfcoe]
[ 4200.503194]        [<ffffffffa02f1b5c>] fcoe_interface_cleanup+0x6c/0xa0 [fcoe]
[ 4200.503196]        [<ffffffffa02f3355>] fcoe_destroy_work+0x105/0x120 [fcoe]
[ 4200.503198]        [<ffffffff8107ee91>] process_one_work+0x1a1/0x580
[ 4200.503203]        [<ffffffff81080c6e>] worker_thread+0x15e/0x440
[ 4200.503205]        [<ffffffff8108715a>] kthread+0xea/0xf0
[ 4200.503207]        [<ffffffff816d70ac>] ret_from_fork+0x7c/0xb0

[ 4200.503209]
other info that might help us debug this:

[ 4200.503211]  Possible unsafe locking scenario:

[ 4200.503212]        CPU0                    CPU1
[ 4200.503213]        ----                    ----
[ 4200.503214]   lock(fcoe_config_mutex);
[ 4200.503215]                                lock(s_active#292);
[ 4200.503218]                                lock(fcoe_config_mutex);
[ 4200.503219]   lock(s_active#292);
[ 4200.503221]
 *** DEADLOCK ***

[ 4200.503223] 3 locks held by kworker/3:2/2492:
[ 4200.503224]  #0:  (fcoe){.+.+.+}, at: [<ffffffff8107ee2b>] process_one_work+0x13b/0x580
[ 4200.503228]  #1:  ((&port->destroy_work)){+.+.+.}, at: [<ffffffff8107ee2b>] process_one_work+0x13b/0x580
[ 4200.503232]  #2:  (fcoe_config_mutex){+.+.+.}, at: [<ffffffffa02f3338>] fcoe_destroy_work+0xe8/0x120 [fcoe]
[ 4200.503236]
stack backtrace:
[ 4200.503238] Pid: 2492, comm: kworker/3:2 Not tainted 3.8.0-rc5+ #8
[ 4200.503240] Call Trace:
[ 4200.503243]  [<ffffffff816c2f09>] print_circular_bug+0x1fb/0x20c
[ 4200.503246]  [<ffffffff810c680f>] __lock_acquire+0x135f/0x1c90
[ 4200.503248]  [<ffffffff810c463a>] ? debug_check_no_locks_freed+0x9a/0x180
[ 4200.503250]  [<ffffffff810c7711>] lock_acquire+0xa1/0x140
[ 4200.503253]  [<ffffffff8122d20b>] ? sysfs_addrm_finish+0x3b/0x70
[ 4200.503255]  [<ffffffff8122c626>] sysfs_deactivate+0x116/0x160
[ 4200.503258]  [<ffffffff8122d20b>] ? sysfs_addrm_finish+0x3b/0x70
[ 4200.503260]  [<ffffffff8122d20b>] sysfs_addrm_finish+0x3b/0x70
[ 4200.503262]  [<ffffffff8122b2eb>] sysfs_hash_and_remove+0x5b/0xb0
[ 4200.503265]  [<ffffffff8122f3d1>] sysfs_remove_group+0x61/0x100
[ 4200.503273]  [<ffffffff814251eb>] device_remove_groups+0x3b/0x60
[ 4200.503275]  [<ffffffff81425534>] device_remove_attrs+0x44/0x80
[ 4200.503277]  [<ffffffff81425e97>] device_del+0x127/0x1c0
[ 4200.503279]  [<ffffffff81425f52>] device_unregister+0x22/0x60
[ 4200.503282]  [<ffffffffa0300970>] fcoe_ctlr_device_delete+0xe0/0xf0 [libfcoe]
[ 4200.503285]  [<ffffffffa02f1b5c>] fcoe_interface_cleanup+0x6c/0xa0 [fcoe]
[ 4200.503287]  [<ffffffffa02f3355>] fcoe_destroy_work+0x105/0x120 [fcoe]
[ 4200.503290]  [<ffffffff8107ee91>] process_one_work+0x1a1/0x580
[ 4200.503292]  [<ffffffff8107ee2b>] ? process_one_work+0x13b/0x580
[ 4200.503295]  [<ffffffffa02f3250>] ? fcoe_if_destroy+0x230/0x230 [fcoe]
[ 4200.503297]  [<ffffffff81080c6e>] worker_thread+0x15e/0x440
[ 4200.503299]  [<ffffffff81080b10>] ? busy_worker_rebind_fn+0x100/0x100
[ 4200.503301]  [<ffffffff8108715a>] kthread+0xea/0xf0
[ 4200.503304]  [<ffffffff81087070>] ? kthread_create_on_node+0x160/0x160
[ 4200.503306]  [<ffffffff816d70ac>] ret_from_fork+0x7c/0xb0
[ 4200.503308]  [<ffffffff81087070>] ? kthread_create_on_node+0x160/0x160

Signed-off-by: Robert Love <robert.w.love@intel.com>
Tested-by: Jack Morgan <jack.morgan@intel.com>
Quarx2k pushed a commit that referenced this issue May 12, 2014
Dropping the session's reference count after the client's means we leave
a window where the session's se_client pointer is NULL.  An xpt_user
callback that encounters such a session may then crash:

[  303.956011] BUG: unable to handle kernel NULL pointer dereference at 0000000000000318
[  303.959061] IP: [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[  303.959061] PGD 37811067 PUD 3d498067 PMD 0
[  303.959061] Oops: 0002 [#8] PREEMPT SMP
[  303.959061] Modules linked in: md5 nfsd auth_rpcgss nfs_acl snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc microcode psmouse snd_timer serio_raw pcspkr evdev snd soundcore i2c_piix4 i2c_core intel_agp intel_gtt processor button nfs lockd sunrpc fscache ata_generic pata_acpi ata_piix uhci_hcd libata btrfs usbcore usb_common crc32c scsi_mod libcrc32c zlib_deflate floppy virtio_balloon virtio_net virtio_pci virtio_blk virtio_ring virtio
[  303.959061] CPU 0
[  303.959061] Pid: 264, comm: nfsd Tainted: G      D      3.8.0-ARCH+ #156 Bochs Bochs
[  303.959061] RIP: 0010:[<ffffffff81481a8e>]  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[  303.959061] RSP: 0018:ffff880037877dd8  EFLAGS: 00010202
[  303.959061] RAX: 0000000000000100 RBX: ffff880037a2b698 RCX: ffff88003d879278
[  303.959061] RDX: ffff88003d879278 RSI: dead000000100100 RDI: 0000000000000318
[  303.959061] RBP: ffff880037877dd8 R08: ffff88003c5a0f00 R09: 0000000000000002
[  303.959061] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  303.959061] R13: 0000000000000318 R14: ffff880037a2b680 R15: ffff88003c1cbe00
[  303.959061] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  303.959061] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  303.959061] CR2: 0000000000000318 CR3: 000000003d49c000 CR4: 00000000000006f0
[  303.959061] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  303.959061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  303.959061] Process nfsd (pid: 264, threadinfo ffff880037876000, task ffff88003c1fd0a0)
[  303.959061] Stack:
[  303.959061]  ffff880037877e08 ffffffffa03772ec ffff88003d879000 ffff88003d879278
[  303.959061]  ffff88003d879080 0000000000000000 ffff880037877e38 ffffffffa0222a1f
[  303.959061]  0000000000107ac0 ffff88003c22e000 ffff88003d879000 ffff88003c1cbe00
[  303.959061] Call Trace:
[  303.959061]  [<ffffffffa03772ec>] nfsd4_conn_lost+0x3c/0xa0 [nfsd]
[  303.959061]  [<ffffffffa0222a1f>] svc_delete_xprt+0x10f/0x180 [sunrpc]
[  303.959061]  [<ffffffffa0223d96>] svc_recv+0xe6/0x580 [sunrpc]
[  303.959061]  [<ffffffffa03587c5>] nfsd+0xb5/0x140 [nfsd]
[  303.959061]  [<ffffffffa0358710>] ? nfsd_destroy+0x90/0x90 [nfsd]
[  303.959061]  [<ffffffff8107ae00>] kthread+0xc0/0xd0
[  303.959061]  [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pte_at+0x50/0x100
[  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[  303.959061]  [<ffffffff814898ec>] ret_from_fork+0x7c/0xb0
[  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[  303.959061] Code: ff ff 5d c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 83 80 44 e0 ff ff 01 b8 00 01 00 00 <3e> 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f
[  303.959061] RIP  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[  303.959061]  RSP <ffff880037877dd8>
[  303.959061] CR2: 0000000000000318
[  304.001218] ---[ end trace 2d809cd4a7931f5a ]---
[  304.001903] note: nfsd[264] exited with preempt_count 2

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Quarx2k pushed a commit that referenced this issue May 12, 2014
i2c: suppress lockdep warning on delete_device

Since commit 846f997 the following lockdep
warning is thrown in case i2c device is removed (via delete_device sysfs
attribute) which contains subdevices (e.g. i2c multiplexer):

=============================================
[ INFO: possible recursive locking detected ]
3.8.7-0-sampleversion-fct #8 Tainted: G           O
---------------------------------------------
bash/3743 is trying to acquire lock:
  (s_active#110){++++.+}, at: [<ffffffff802b3048>] sysfs_hash_and_remove+0x58/0xc8

but task is already holding lock:
  (s_active#110){++++.+}, at: [<ffffffff802b3cb8>] sysfs_write_file+0xc8/0x208

other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(s_active#110);
   lock(s_active#110);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

4 locks held by bash/3743:
  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff802b3c3c>] sysfs_write_file+0x4c/0x208
  #1:  (s_active#110){++++.+}, at: [<ffffffff802b3cb8>] sysfs_write_file+0xc8/0x208
  #2:  (&adap->userspace_clients_lock/1){+.+.+.}, at: [<ffffffff80454a18>] i2c_sysfs_delete_device+0x90/0x238
  #3:  (&__lockdep_no_validate__){......}, at: [<ffffffff803dcc24>] device_release_driver+0x24/0x48

stack backtrace:
Call Trace:
[<ffffffff80575cc8>] dump_stack+0x8/0x34
[<ffffffff801b50fc>] __lock_acquire+0x161c/0x2110
[<ffffffff801b5c3c>] lock_acquire+0x4c/0x70
[<ffffffff802b60cc>] sysfs_addrm_finish+0x19c/0x1e0
[<ffffffff802b3048>] sysfs_hash_and_remove+0x58/0xc8
[<ffffffff802b7d8c>] sysfs_remove_group+0x64/0x148
[<ffffffff803d990c>] device_remove_attrs+0x9c/0x1a8
[<ffffffff803d9b1c>] device_del+0x104/0x1d8
[<ffffffff803d9c18>] device_unregister+0x28/0x70
[<ffffffff8045505c>] i2c_del_adapter+0x1cc/0x328
[<ffffffff8045802c>] i2c_del_mux_adapter+0x14/0x38
[<ffffffffc025c108>] pca954x_remove+0x90/0xe0 [pca954x]
[<ffffffff804542f8>] i2c_device_remove+0x80/0xe8
[<ffffffff803dca9c>] __device_release_driver+0x74/0xf8
[<ffffffff803dcc2c>] device_release_driver+0x2c/0x48
[<ffffffff803dbc14>] bus_remove_device+0x13c/0x1d8
[<ffffffff803d9b24>] device_del+0x10c/0x1d8
[<ffffffff803d9c18>] device_unregister+0x28/0x70
[<ffffffff80454b08>] i2c_sysfs_delete_device+0x180/0x238
[<ffffffff802b3cd4>] sysfs_write_file+0xe4/0x208
[<ffffffff8023ddc4>] vfs_write+0xbc/0x160
[<ffffffff8023df6c>] SyS_write+0x54/0xd8
[<ffffffff8013d424>] handle_sys64+0x44/0x64

The problem is already known for USB and PCI subsystems. The reason is that
delete_device attribute is defined statically in i2c-core.c and used for all
devices in i2c subsystem.

Discussion of original USB problem:
http://lkml.indiana.edu/hypermail/linux/kernel/1204.3/01160.html

Commit 356c05d introduced new macro to suppress
lockdep warnings for this special case and included workaround for USB code.

LKML discussion of the workaround:
http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/03634.html

As i2c case is in principle the same, the same workaround could be used here.

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Quarx2k pushed a commit that referenced this issue May 12, 2014
Daniel Petre reported crashes in icmp_dst_unreach() with following call
graph:

#3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77
#4 [ffff88003fc03948] icmp_send at ffffffff814d5fec
#5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795
#6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965
#7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032
#8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66
#9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd
#10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7
#11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596

Daniel found a similar problem mentioned in
 http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html

And indeed this is the root cause : skb->cb[] contains data fooling IP
stack.

We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure()
is called. Or else skb->cb[] might contain garbage from GSO segmentation
layer.

A similar fix was tested on linux-3.9, but gre code was refactored in
linux-3.10. I'll send patches for stable kernels as well.

Many thanks to Daniel for providing reports, patches and testing !

Reported-by: Daniel Petre <daniel.petre@rcs-rds.ro>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quarx2k pushed a commit that referenced this issue May 17, 2014
commit ea3768b upstream.

We used to keep the port's char device structs and the /sys entries
around till the last reference to the port was dropped.  This is
actually unnecessary, and resulted in buggy behaviour:

1. Open port in guest
2. Hot-unplug port
3. Hot-plug a port with the same 'name' property as the unplugged one

This resulted in hot-plug being unsuccessful, as a port with the same
name already exists (even though it was unplugged).

This behaviour resulted in a warning message like this one:

-------------------8<---------------------------------------
WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
Hardware name: KVM
sysfs: cannot create duplicate filename
'/devices/pci0000:00/0000:00:04.0/virtio0/virtio-ports/vport0p1'

Call Trace:
 [<ffffffff8106b607>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8106b6f6>] ? warn_slowpath_fmt+0x46/0x50
 [<ffffffff811f2319>] ? sysfs_add_one+0xc9/0x130
 [<ffffffff811f23e8>] ? create_dir+0x68/0xb0
 [<ffffffff811f2469>] ? sysfs_create_dir+0x39/0x50
 [<ffffffff81273129>] ? kobject_add_internal+0xb9/0x260
 [<ffffffff812733d8>] ? kobject_add_varg+0x38/0x60
 [<ffffffff812734b4>] ? kobject_add+0x44/0x70
 [<ffffffff81349de4>] ? get_device_parent+0xf4/0x1d0
 [<ffffffff8134b389>] ? device_add+0xc9/0x650

-------------------8<---------------------------------------

Instead of relying on guest applications to release all references to
the ports, we should go ahead and unregister the port from all the core
layers.  Any open/read calls on the port will then just return errors,
and an unplug/plug operation on the host will succeed as expected.

This also caused buggy behaviour in case of the device removal (not just
a port): when the device was removed (which means all ports on that
device are removed automatically as well), the ports with active
users would clean up only when the last references were dropped -- and
it would be too late then to be referencing char device pointers,
resulting in oopses:

-------------------8<---------------------------------------
PID: 6162   TASK: ffff8801147ad500  CPU: 0   COMMAND: "cat"
 #0 [ffff88011b9d5a90] machine_kexec at ffffffff8103232b
 #1 [ffff88011b9d5af0] crash_kexec at ffffffff810b9322
 #2 [ffff88011b9d5bc0] oops_end at ffffffff814f4a50
 #3 [ffff88011b9d5bf0] die at ffffffff8100f26b
 #4 [ffff88011b9d5c20] do_general_protection at ffffffff814f45e2
 #5 [ffff88011b9d5c50] general_protection at ffffffff814f3db5
    [exception RIP: strlen+2]
    RIP: ffffffff81272ae2  RSP: ffff88011b9d5d00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff880118901c18  RCX: 0000000000000000
    RDX: ffff88011799982c  RSI: 00000000000000d0  RDI: 3a303030302f3030
    RBP: ffff88011b9d5d38   R8: 0000000000000006   R9: ffffffffa0134500
    R10: 0000000000001000  R11: 0000000000001000  R12: ffff880117a1cc10
    R13: 00000000000000d0  R14: 0000000000000017  R15: ffffffff81aff700
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff88011b9d5d00] kobject_get_path at ffffffff8126dc5d
 #7 [ffff88011b9d5d40] kobject_uevent_env at ffffffff8126e551
 #8 [ffff88011b9d5dd0] kobject_uevent at ffffffff8126e9eb
 #9 [ffff88011b9d5de0] device_del at ffffffff813440c7

-------------------8<---------------------------------------

So clean up when we have all the context, and all that's left to do when
the references to the port have dropped is to free up the port struct
itself.

Reported-by: chayang <chayang@redhat.com>
Reported-by: YOGANANTH SUBRAMANIAN <anantyog@in.ibm.com>
Reported-by: FuXiangChun <xfu@redhat.com>
Reported-by: Qunfang Zhang <qzhang@redhat.com>
Reported-by: Sibiao Luo <sluo@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Quarx2k pushed a commit that referenced this issue May 29, 2014
…rnel/git/dhowells/linux-fs

Pull FS-Cache updates from David Howells:
 "This contains a number of fixes for various FS-Cache issues plus some
  cleanups.  The commits are, in order:

   1) Provide a system wait_on_atomic_t() and wake_up_atomic_t() sharing
      the bit-wait table (enhancement for #8).

   2) Don't put spin_lock() in a while-condition as spin_lock() may have
      a do {} while(0) wrapper (cleanup).

   3) Symbolically name i_mutex lock classes rather than using numbers
      in CacheFiles (cleanup).

   4) Don't sleep in page release if __GFP_FS is not set (deadlock vs
      ext4).

   5) Uninline fscache_object_init() (cleanup for #7).

   6) Wrap checks on object state (cleanup for #7).

   7) Simplify the object state machine by separating work states from
      wait states.

   8) Simplify cookie retention by objects (NULL pointer deref fix).

   9) Remove unused list_to_page() macro (cleanup).

  10) Make the remaining-pages counter in the retrieval op atomic
      (assertion failure fix).

  11) Don't use spin_is_locked() in assertions (assertion failure fix)"

* tag 'fscache-20130702' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  FS-Cache: Don't use spin_is_locked() in assertions
  FS-Cache: The retrieval remaining-pages counter needs to be atomic_t
  cachefiles: remove unused macro list_to_page()
  FS-Cache: Simplify cookie retention for fscache_objects, fixing oops
  FS-Cache: Fix object state machine to have separate work and wait states
  FS-Cache: Wrap checks on object state
  FS-Cache: Uninline fscache_object_init()
  FS-Cache: Don't sleep in page release if __GFP_FS is not set
  CacheFiles: name i_mutex lock class explicitly
  fs/fscache: remove spin_lock() from the condition in while()
  Add wait_on_atomic_t() and wake_up_atomic_t()
Quarx2k pushed a commit that referenced this issue May 29, 2014
Several people reported the warning: "kernel BUG at kernel/timer.c:729!"
and the stack trace is:

	#7 [ffff880214d25c10] mod_timer+501 at ffffffff8106d905
	#8 [ffff880214d25c50] br_multicast_del_pg.isra.20+261 at ffffffffa0731d25 [bridge]
	#9 [ffff880214d25c80] br_multicast_disable_port+88 at ffffffffa0732948 [bridge]
	#10 [ffff880214d25cb0] br_stp_disable_port+154 at ffffffffa072bcca [bridge]
	#11 [ffff880214d25ce8] br_device_event+520 at ffffffffa072a4e8 [bridge]
	#12 [ffff880214d25d18] notifier_call_chain+76 at ffffffff8164aafc
	#13 [ffff880214d25d50] raw_notifier_call_chain+22 at ffffffff810858f6
	#14 [ffff880214d25d60] call_netdevice_notifiers+45 at ffffffff81536aad
	#15 [ffff880214d25d80] dev_close_many+183 at ffffffff81536d17
	#16 [ffff880214d25dc0] rollback_registered_many+168 at ffffffff81537f68
	#17 [ffff880214d25de8] rollback_registered+49 at ffffffff81538101
	#18 [ffff880214d25e10] unregister_netdevice_queue+72 at ffffffff815390d8
	#19 [ffff880214d25e30] __tun_detach+272 at ffffffffa074c2f0 [tun]
	#20 [ffff880214d25e88] tun_chr_close+45 at ffffffffa074c4bd [tun]
	#21 [ffff880214d25ea8] __fput+225 at ffffffff8119b1f1
	#22 [ffff880214d25ef0] ____fput+14 at ffffffff8119b3fe
	#23 [ffff880214d25f00] task_work_run+159 at ffffffff8107cf7f
	#24 [ffff880214d25f30] do_notify_resume+97 at ffffffff810139e1
	#25 [ffff880214d25f50] int_signal+18 at ffffffff8164f292

this is due to I forgot to check if mp->timer is armed in
br_multicast_del_pg(). This bug is introduced by
commit 9f00b2e (bridge: only expire the mdb entry
when query is received).

Same for __br_mdb_del().

Tested-by: poma <pomidorabelisima@gmail.com>
Reported-by: LiYonghua <809674045@qq.com>
Reported-by: Robert Hancock <hancockrwd@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quarx2k pushed a commit that referenced this issue May 29, 2014
…ifier

In fd4363f ("x86: Introduce int3 (breakpoint)-based
instruction patching"), the mechanism that was introduced for
notifying alternatives code from int3 exception handler that and
exception occured was die_notifier.

This is however problematic, as early code might be using jump
labels even before the notifier registration has been performed,
which will then lead to an oops due to unhandled exception. One
of such occurences has been encountered by Fengguang:

 int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
 task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
 RIP: 0010:[<ffffffff811098cc>]  [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
 RSP: 0000:ffff88000dd03f10  EFLAGS: 00000006
 RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
 RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
 RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
 R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
 R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
 FS:  0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
 Stack:
  ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
  ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
  ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
 Call Trace:
  <IRQ>
  [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
  [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
  [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
  [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
  [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
  <EOI>
  [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
  [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
  [<ffffffff81015f10>] default_idle+0x147/0x282
  [<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d
  [<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db
  [<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84
  [<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5
  [...]

Fix this by directly calling poke_int3_handler() from the int3
exception handler (analogically to what ftrace has been doing
already), instead of relying on notifier, registration of which
might not have yet been finalized by the time of the first trap.

Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307231007490.14024@pobox.suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Quarx2k pushed a commit that referenced this issue May 29, 2014
We used to keep the port's char device structs and the /sys entries
around till the last reference to the port was dropped.  This is
actually unnecessary, and resulted in buggy behaviour:

1. Open port in guest
2. Hot-unplug port
3. Hot-plug a port with the same 'name' property as the unplugged one

This resulted in hot-plug being unsuccessful, as a port with the same
name already exists (even though it was unplugged).

This behaviour resulted in a warning message like this one:

-------------------8<---------------------------------------
WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
Hardware name: KVM
sysfs: cannot create duplicate filename
'/devices/pci0000:00/0000:00:04.0/virtio0/virtio-ports/vport0p1'

Call Trace:
 [<ffffffff8106b607>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8106b6f6>] ? warn_slowpath_fmt+0x46/0x50
 [<ffffffff811f2319>] ? sysfs_add_one+0xc9/0x130
 [<ffffffff811f23e8>] ? create_dir+0x68/0xb0
 [<ffffffff811f2469>] ? sysfs_create_dir+0x39/0x50
 [<ffffffff81273129>] ? kobject_add_internal+0xb9/0x260
 [<ffffffff812733d8>] ? kobject_add_varg+0x38/0x60
 [<ffffffff812734b4>] ? kobject_add+0x44/0x70
 [<ffffffff81349de4>] ? get_device_parent+0xf4/0x1d0
 [<ffffffff8134b389>] ? device_add+0xc9/0x650

-------------------8<---------------------------------------

Instead of relying on guest applications to release all references to
the ports, we should go ahead and unregister the port from all the core
layers.  Any open/read calls on the port will then just return errors,
and an unplug/plug operation on the host will succeed as expected.

This also caused buggy behaviour in case of the device removal (not just
a port): when the device was removed (which means all ports on that
device are removed automatically as well), the ports with active
users would clean up only when the last references were dropped -- and
it would be too late then to be referencing char device pointers,
resulting in oopses:

-------------------8<---------------------------------------
PID: 6162   TASK: ffff8801147ad500  CPU: 0   COMMAND: "cat"
 #0 [ffff88011b9d5a90] machine_kexec at ffffffff8103232b
 #1 [ffff88011b9d5af0] crash_kexec at ffffffff810b9322
 #2 [ffff88011b9d5bc0] oops_end at ffffffff814f4a50
 #3 [ffff88011b9d5bf0] die at ffffffff8100f26b
 #4 [ffff88011b9d5c20] do_general_protection at ffffffff814f45e2
 #5 [ffff88011b9d5c50] general_protection at ffffffff814f3db5
    [exception RIP: strlen+2]
    RIP: ffffffff81272ae2  RSP: ffff88011b9d5d00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff880118901c18  RCX: 0000000000000000
    RDX: ffff88011799982c  RSI: 00000000000000d0  RDI: 3a303030302f3030
    RBP: ffff88011b9d5d38   R8: 0000000000000006   R9: ffffffffa0134500
    R10: 0000000000001000  R11: 0000000000001000  R12: ffff880117a1cc10
    R13: 00000000000000d0  R14: 0000000000000017  R15: ffffffff81aff700
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #6 [ffff88011b9d5d00] kobject_get_path at ffffffff8126dc5d
 #7 [ffff88011b9d5d40] kobject_uevent_env at ffffffff8126e551
 #8 [ffff88011b9d5dd0] kobject_uevent at ffffffff8126e9eb
 #9 [ffff88011b9d5de0] device_del at ffffffff813440c7

-------------------8<---------------------------------------

So clean up when we have all the context, and all that's left to do when
the references to the port have dropped is to free up the port struct
itself.

CC: <stable@vger.kernel.org>
Reported-by: chayang <chayang@redhat.com>
Reported-by: YOGANANTH SUBRAMANIAN <anantyog@in.ibm.com>
Reported-by: FuXiangChun <xfu@redhat.com>
Reported-by: Qunfang Zhang <qzhang@redhat.com>
Reported-by: Sibiao Luo <sluo@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Quarx2k pushed a commit that referenced this issue May 29, 2014
In several places, this snippet is used when removing neigh entries:

	list_del(&neigh->list);
	ipoib_neigh_free(neigh);

The list_del() removes neigh from the associated struct ipoib_path, while
ipoib_neigh_free() removes neigh from the device's neigh entry lookup
table.  Both of these operations are protected by the priv->lock
spinlock.  The table however is also protected via RCU, and so naturally
the lock is not held when doing reads.

This leads to a race condition, in which a thread may successfully look
up a neigh entry that has already been deleted from neigh->list.  Since
the previous deletion will have marked the entry with poison, a second
list_del() on the object will cause a panic:

  #5 [ffff8802338c3c70] general_protection at ffffffff815108c5
     [exception RIP: list_del+16]
     RIP: ffffffff81289020  RSP: ffff8802338c3d20  RFLAGS: 00010082
     RAX: dead000000200200  RBX: ffff880433e60c88  RCX: 0000000000009e6c
     RDX: 0000000000000246  RSI: ffff8806012ca298  RDI: ffff880433e60c88
     RBP: ffff8802338c3d30   R8: ffff8806012ca2e8   R9: 00000000ffffffff
     R10: 0000000000000001  R11: 0000000000000000  R12: ffff8804346b2020
     R13: ffff88032a3e7540  R14: ffff8804346b26e0  R15: 0000000000000246
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib]
  #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm]
  #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm]
  #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10
 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66
 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca

We move the list_del() into ipoib_neigh_free(), so that deletion happens
only once, after the entry has been successfully removed from the lookup
table.  This same behavior is already used in ipoib_del_neighs_by_gid()
and __ipoib_reap_neigh().

Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Quarx2k pushed a commit that referenced this issue May 29, 2014
When 2 devices are registered at the same time, in Part 2 both will get
the same minor number, making Part6 crash, because it cannot create a
create a device with a duplicated minor:
[    7.157648] ------------[ cut here ]------------
[    7.157666] WARNING: at fs/sysfs/dir.c:530 sysfs_add_one+0xbd/0xe0()
[    7.157669] sysfs: cannot create duplicate filename '/dev/char/81:1'
[    7.157672] Modules linked in: qtec_xform(+) qt5023_video(+) videobuf2_vmalloc videobuf2_dma_sg videobuf2_memops videobuf2_core gpio_xilinx(+) qtec_white qtec_cmosis(+) qtec_pcie qt5023
[    7.157694] CPU: 0 PID: 120 Comm: systemd-udevd Not tainted 3.10.0-qtec-standard #8
[    7.157698] Hardware name: QTechnology QT5022/QT5022, BIOS PM_2.1.0.309 X64 05/23/2013
[    7.157702]  0000000000000009 ffff8801788358e8 ffffffff8176c487 ffff880178835928
[    7.157707]  ffffffff8106f6f0 ffff880175759770 00000000ffffffef ffff880175759930
[    7.157712]  ffff8801788359e8 ffff880178bff000 ffff880175759930 ffff880178835988
[    7.157718] Call Trace:
[    7.157728]  [<ffffffff8176c487>] dump_stack+0x19/0x1b
[    7.157735]  [<ffffffff8106f6f0>] warn_slowpath_common+0x70/0xa0
[    7.157740]  [<ffffffff8106f7d6>] warn_slowpath_fmt+0x46/0x50
[    7.157746]  [<ffffffff81324b35>] ? strlcat+0x65/0x90
[    7.157750]  [<ffffffff811d516d>] sysfs_add_one+0xbd/0xe0
[    7.157755]  [<ffffffff811d5c6b>] sysfs_do_create_link_sd+0xdb/0x200
[    7.157760]  [<ffffffff811d5db1>] sysfs_create_link+0x21/0x40
[    7.157765]  [<ffffffff813e277b>] device_add+0x21b/0x6d0
[    7.157772]  [<ffffffff813f2985>] ? pm_runtime_init+0xe5/0xf0
[    7.157776]  [<ffffffff813e2c4e>] device_register+0x1e/0x30
[    7.157782]  [<ffffffff8153e8c3>] __video_register_device+0x313/0x610
[    7.157791]  [<ffffffffa00957c5>] qtec_xform_probe+0x465/0x7a4 [qtec_xform]
[    7.157797]  [<ffffffff813e7b13>] platform_drv_probe+0x43/0x80
[    7.157802]  [<ffffffff813e531a>] ? driver_sysfs_add+0x7a/0xb0
[    7.157807]  [<ffffffff813e584b>] driver_probe_device+0x8b/0x3a0
[    7.157812]  [<ffffffff813e5c0b>] __driver_attach+0xab/0xb0
[    7.157816]  [<ffffffff813e5b60>] ? driver_probe_device+0x3a0/0x3a0
[    7.157820]  [<ffffffff813e37fd>] bus_for_each_dev+0x5d/0xa0
[    7.157825]  [<ffffffff813e529e>] driver_attach+0x1e/0x20
[    7.157829]  [<ffffffff813e4d3e>] bus_add_driver+0x10e/0x280
[    7.157833]  [<ffffffffa0099000>] ? 0xffffffffa0098fff
[    7.157837]  [<ffffffffa0099000>] ? 0xffffffffa0098fff
[    7.157842]  [<ffffffff813e6317>] driver_register+0x77/0x170
[    7.157848]  [<ffffffff81151bcc>] ? __vunmap+0x9c/0x110
[    7.157852]  [<ffffffffa0099000>] ? 0xffffffffa0098fff
[    7.157857]  [<ffffffff813e7236>] platform_driver_register+0x46/0x50
[    7.157863]  [<ffffffffa0099010>] qtec_xform_plat_driver_init+0x10/0x12 [qtec_xform]
[    7.157869]  [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0
[    7.157875]  [<ffffffff810cf1f1>] load_module+0x1a91/0x2630
[    7.157880]  [<ffffffff8133bee0>] ? ddebug_proc_show+0xe0/0xe0
[    7.157887]  [<ffffffff817760f2>] ? page_fault+0x22/0x30
[    7.157892]  [<ffffffff810cfe7a>] SyS_init_module+0xea/0x140
[    7.157898]  [<ffffffff8177e5b9>] tracesys+0xd0/0xd5
[    7.157902] ---[ end trace 660cc3a65a4bf01b ]---
[    7.157939] __video_register_device: device_register failed

Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Acked-by: Sakari Ailus <sakari.ailus@iki.fi>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Quarx2k pushed a commit that referenced this issue May 29, 2014
When booting secondary CPUs, announce_cpu() is called to show which cpu has
been brought up. For example:

[    0.402751] smpboot: Booting Node   0, Processors  #1 #2 #3 #4 #5 OK
[    0.525667] smpboot: Booting Node   1, Processors  #6 #7 #8 #9 #10 #11 OK
[    0.755592] smpboot: Booting Node   0, Processors  #12 #13 #14 #15 #16 #17 OK
[    0.890495] smpboot: Booting Node   1, Processors  #18 #19 #20 #21 #22 #23

But the last "OK" is lost, because 'nr_cpu_ids-1' represents the maximum
possible cpu id. It should use the maximum present cpu id in case not all
CPUs booted up.

Signed-off-by: Libin <huawei.libin@huawei.com>
Cc: <guohanjun@huawei.com>
Cc: <wangyijing@huawei.com>
Cc: <fenghua.yu@intel.com>
Cc: <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1378378676-18276-1-git-send-email-huawei.libin@huawei.com
[ tweaked the changelog, removed unnecessary line break, tweaked the format to align the fields vertically. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Quarx2k pushed a commit that referenced this issue May 29, 2014
Enable only those interrupts that we can handle & acknowledge in the
interrupt handler.

At least on EdgeRouter Lite, the hardware may occasionally interrupt with
some error condition when the physical link status changes frequently.
Since the interrupt condition is not acked properly, this leads to the
following warning and the IRQ gets disabled completely:

[   41.324700] eth0: Link down
[   44.324721] eth0: 1000 Mbps Full duplex, port  0, queue  0
[   44.885590] irq 117: nobody cared (try booting with the "irqpoll" option)
[   44.892397] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.11.0-rc5-edge-los.git-27d042f-dirty-00950-gaa42f2d-dirty #8
[   44.902825] Stack : ffffffff815c0000 0000000000000004 0000000000000003 0000000000000000
	  ffffffff81fd0000 ffffffff815c0000 0000000000000004 ffffffff8118530c
	  ffffffff815c0000 ffffffff811858d8 0000000000000000 0000000000000000
	  ffffffff81fd0000 ffffffff81fc0000 ffffffff8152f3a0 ffffffff815b7bf7
	  ffffffff81fc6688 ffffffff815b8060 0000000000000000 0000000000000000
	  0000000000000000 ffffffff815346c8 ffffffff815346b0 ffffffff814a6a18
	  ffffffff8158b848 ffffffff81145614 ffffffff81593800 ffffffff81187174
	  ffffffff815b7d00 ffffffff8158b760 0000000000000000 ffffffff814a9184
	  0000000000000000 0000000000000000 0000000000000000 0000000000000000
	  0000000000000000 ffffffff811203b8 0000000000000000 0000000000000000
	  ...
[   44.968408] Call Trace:
[   44.970873] [<ffffffff811203b8>] show_stack+0x68/0x80
[   44.975937] [<ffffffff814a9184>] dump_stack+0x78/0xb8
[   44.980999] [<ffffffff811aac54>] __report_bad_irq+0x44/0x108
[   44.986662] [<ffffffff811ab238>] note_interrupt+0x248/0x2a0
[   44.992240] [<ffffffff811a85e4>] handle_irq_event_percpu+0x144/0x200
[   44.998598] [<ffffffff811a86f4>] handle_irq_event+0x54/0x90
[   45.004176] [<ffffffff811ab908>] handle_level_irq+0xd0/0x148
[   45.009839] [<ffffffff811a7b04>] generic_handle_irq+0x34/0x50
[   45.015589] [<ffffffff8111dae8>] do_IRQ+0x18/0x30
[   45.020301] [<ffffffff8110486c>] plat_irq_dispatch+0x74/0xb8
[   45.025958]
[   45.027451] handlers:
[   45.029731] [<ffffffff813fca10>] cvm_oct_rgmii_rml_interrupt
[   45.035397] Disabling IRQ #117
[   45.038742] Port 0 receive error code 13, packet dropped
[   46.324719] eth0: Link down
[   48.324733] eth0: 1000 Mbps Full duplex, port  0, queue  0

Reported-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: David Daney <david.daney@cavium.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: devel@driverdev.osuosl.org
Cc: linux-mips@linux-mips.org
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: richard@nod.at
Patchwork: https://patchwork.linux-mips.org/patch/5810/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Quarx2k pushed a commit that referenced this issue May 29, 2014
Enable only those interrupts that we can handle & acknowledge in the
interrupt handler.

At least on EdgeRouter Lite, the hardware may occasionally interrupt with
some error condition when the physical link status changes frequently.
Since the interrupt condition is not acked properly, this leads to the
following warning and the IRQ gets disabled completely:

[   41.324700] eth0: Link down
[   44.324721] eth0: 1000 Mbps Full duplex, port  0, queue  0
[   44.885590] irq 117: nobody cared (try booting with the "irqpoll" option)
[   44.892397] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.11.0-rc5-edge-los.git-27d042f-dirty-00950-gaa42f2d-dirty #8
[   44.902825] Stack : ffffffff815c0000 0000000000000004 0000000000000003 0000000000000000
	  ffffffff81fd0000 ffffffff815c0000 0000000000000004 ffffffff8118530c
	  ffffffff815c0000 ffffffff811858d8 0000000000000000 0000000000000000
	  ffffffff81fd0000 ffffffff81fc0000 ffffffff8152f3a0 ffffffff815b7bf7
	  ffffffff81fc6688 ffffffff815b8060 0000000000000000 0000000000000000
	  0000000000000000 ffffffff815346c8 ffffffff815346b0 ffffffff814a6a18
	  ffffffff8158b848 ffffffff81145614 ffffffff81593800 ffffffff81187174
	  ffffffff815b7d00 ffffffff8158b760 0000000000000000 ffffffff814a9184
	  0000000000000000 0000000000000000 0000000000000000 0000000000000000
	  0000000000000000 ffffffff811203b8 0000000000000000 0000000000000000
	  ...
[   44.968408] Call Trace:
[   44.970873] [<ffffffff811203b8>] show_stack+0x68/0x80
[   44.975937] [<ffffffff814a9184>] dump_stack+0x78/0xb8
[   44.980999] [<ffffffff811aac54>] __report_bad_irq+0x44/0x108
[   44.986662] [<ffffffff811ab238>] note_interrupt+0x248/0x2a0
[   44.992240] [<ffffffff811a85e4>] handle_irq_event_percpu+0x144/0x200
[   44.998598] [<ffffffff811a86f4>] handle_irq_event+0x54/0x90
[   45.004176] [<ffffffff811ab908>] handle_level_irq+0xd0/0x148
[   45.009839] [<ffffffff811a7b04>] generic_handle_irq+0x34/0x50
[   45.015589] [<ffffffff8111dae8>] do_IRQ+0x18/0x30
[   45.020301] [<ffffffff8110486c>] plat_irq_dispatch+0x74/0xb8
[   45.025958]
[   45.027451] handlers:
[   45.029731] [<ffffffff813fca10>] cvm_oct_rgmii_rml_interrupt
[   45.035397] Disabling IRQ #117
[   45.038742] Port 0 receive error code 13, packet dropped
[   46.324719] eth0: Link down
[   48.324733] eth0: 1000 Mbps Full duplex, port  0, queue  0

Reported-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: David Daney <david.daney@cavium.com>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Quarx2k pushed a commit that referenced this issue May 29, 2014
When parsing lines from objdump a line containing source code starting
with a numeric label is mistaken for a line of disassembly starting with
a memory address.

Current validation fails to recognise that the "memory address" is out
of range and calculates an invalid offset which later causes this
segfault:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000457315 in disasm__calc_percent (notes=0xc98970, evidx=0, offset=143705, end=2127526177, path=0x7fffffffbf50)
    at util/annotate.c:631
631				hits += h->addr[offset++];
(gdb) bt
 #0  0x0000000000457315 in disasm__calc_percent (notes=0xc98970, evidx=0, offset=143705, end=2127526177, path=0x7fffffffbf50)
    at util/annotate.c:631
 #1  0x00000000004d65e3 in annotate_browser__calc_percent (browser=0x7fffffffd130, evsel=0xa01da0) at ui/browsers/annotate.c:364
 #2  0x00000000004d7433 in annotate_browser__run (browser=0x7fffffffd130, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:672
 #3  0x00000000004d80c9 in symbol__tui_annotate (sym=0xc989a0, map=0xa02660, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:962
 #4  0x00000000004d7aa0 in hist_entry__tui_annotate (he=0xdf73f0, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:823
 #5  0x00000000004dd648 in perf_evsel__hists_browse (evsel=0xa01da0, nr_events=1, helpline=
    0x58b768 "For a higher level overview, try: perf report --sort comm,dso", ev_name=0xa02cd0 "cycles", left_exits=false, hbt=
    0x0, min_pcnt=0, env=0xa011e0) at ui/browsers/hists.c:1659
 #6  0x00000000004de372 in perf_evlist__tui_browse_hists (evlist=0xa01520, help=
    0x58b768 "For a higher level overview, try: perf report --sort comm,dso", hbt=0x0, min_pcnt=0, env=0xa011e0)
    at ui/browsers/hists.c:1950
 #7  0x000000000042cf6b in __cmd_report (rep=0x7fffffffd6c0) at builtin-report.c:581
 #8  0x000000000042e25d in cmd_report (argc=0, argv=0x7fffffffe4b0, prefix=0x0) at builtin-report.c:965
 #9  0x000000000041a0e1 in run_builtin (p=0x801548, argc=1, argv=0x7fffffffe4b0) at perf.c:319
 #10 0x000000000041a319 in handle_internal_command (argc=1, argv=0x7fffffffe4b0) at perf.c:376
 #11 0x000000000041a465 in run_argv (argcp=0x7fffffffe38c, argv=0x7fffffffe380) at perf.c:420
 #12 0x000000000041a707 in main (argc=1, argv=0x7fffffffe4b0) at perf.c:521

After the fix is applied the symbol can be annotated showing the
problematic line "1:      rep"

copy_user_generic_string  /usr/lib/debug/lib/modules/3.9.10-100.fc17.x86_64/vmlinux
             */
            ENTRY(copy_user_generic_string)
                    CFI_STARTPROC
                    ASM_STAC
                    andl %edx,%edx
              and    %edx,%edx
                    jz 4f
              je     37
                    cmpl $8,%edx
              cmp    $0x8,%edx
                    jb 2f           /* less than 8 bytes, go to byte copy loop */
              jb     33
                    ALIGN_DESTINATION
              mov    %edi,%ecx
              and    $0x7,%ecx
              je     28
              sub    $0x8,%ecx
              neg    %ecx
              sub    %ecx,%edx
        1a:   mov    (%rsi),%al
              mov    %al,(%rdi)
              inc    %rsi
              inc    %rdi
              dec    %ecx
              jne    1a
                    movl %edx,%ecx
        28:   mov    %edx,%ecx
                    shrl $3,%ecx
              shr    $0x3,%ecx
                    andl $7,%edx
              and    $0x7,%edx
            1:      rep
100.00        rep    movsq %ds:(%rsi),%es:(%rdi)
                    movsq
            2:      movl %edx,%ecx
        33:   mov    %edx,%ecx
            3:      rep
              rep    movsb %ds:(%rsi),%es:(%rdi)
                    movsb
            4:      xorl %eax,%eax
        37:   xor    %eax,%eax
              data32 xchg %ax,%ax
                    ASM_CLAC
                    ret
              retq

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1379009721-27667-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Quarx2k pushed a commit that referenced this issue May 29, 2014
…CIe cards

The band selection and PE control code for the
RT3593 chipsets only handles USB based devices
currently. Due to this limitation RT3593 based
PCIe cards are not working correctly.

On PCIe cards band selection is controlled via
GPIO #8 which is identical to the USB devices.
The LNA PE control is slightly different, all
LNA PEs are controlled by GPIO #4.

Update the code to configure the GPIO_CTRL register
correctly on PCIe devices.

Cc: Steven Liu <steven.liu@mediatek.com>
Cc: JasonYS Cheng <jasonys.cheng@mediatek.com>
Signed-off-by: Gabor Juhos <juhosg@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Quarx2k pushed a commit that referenced this issue May 29, 2014
As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Quarx2k pushed a commit that referenced this issue May 29, 2014
Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Quarx2k pushed a commit that referenced this issue May 29, 2014
The crash[1] is found by xfstests/generic/208 with "-o compress",
it's not reproduced everytime, but it does panic.

The bug is quite interesting, it's actually introduced by a recent commit
(573aeca,
Btrfs: actually limit the size of delalloc range).

Btrfs implements delay allocation, so during writeback, we
(1) get a page A and lock it
(2) search the state tree for delalloc bytes and lock all pages within the range
(3) process the delalloc range, including find disk space and create
    ordered extent and so on.
(4) submit the page A.

It runs well in normal cases, but if we're in a racy case, eg.
buffered compressed writes and aio-dio writes,
sometimes we may fail to lock all pages in the 'delalloc' range,
in which case, we need to fall back to search the state tree again with
a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).

The mentioned commit has a side effect, that is, in the fallback case,
we can find delalloc bytes before the index of the page we already have locked,
so we're in the case of (delalloc_end <= *start) and return with (found > 0).

This ends with not locking delalloc pages but making ->writepage still
process them, and the crash happens.

This fixes it by just thinking that we find nothing and returning to caller
as the caller knows how to deal with it properly.

[1]:
------------[ cut here ]------------
kernel BUG at mm/page-writeback.c:2170!
[...]
CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G           O 3.11.0+ #8
[...]
RIP: 0010:[<ffffffff810f5093>]  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
[...]
[ 4934.248731] Stack:
[ 4934.248731]  ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
[ 4934.248731]  0000000000000000 0000000000000000 0000000000000fff 0000000000000620
[ 4934.248731]  ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
[ 4934.248731] Call Trace:
[ 4934.248731]  [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
[ 4934.248731]  [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs]
[ 4934.248731]  [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b
[ 4934.248731]  [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs]
[ 4934.248731]  [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs]
[ 4934.248731]  [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
[ 4934.248731]  [<ffffffff810608f5>] kthread+0x8d/0x95
[ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731]  [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0
[ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
[ 4934.248731] RIP  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
[ 4934.248731]  RSP <ffff8801869f9c48>
[ 4934.280307] ---[ end trace 36f06d3f8750236a ]---

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Quarx2k pushed a commit that referenced this issue May 29, 2014
Andrey reported the following report:

ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3
ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3)
Accessed by thread T13003:
  #0 ffffffff810dd2da (asan_report_error+0x32a/0x440)
  #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40)
  #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20)
  #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260)
  #4 ffffffff812a1065 (__fput+0x155/0x360)
  #5 ffffffff812a12de (____fput+0x1e/0x30)
  #6 ffffffff8111708d (task_work_run+0x10d/0x140)
  #7 ffffffff810ea043 (do_exit+0x433/0x11f0)
  #8 ffffffff810eaee4 (do_group_exit+0x84/0x130)
  #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30)
  #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Allocated by thread T5167:
  #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0)
  #1 ffffffff8128337c (__kmalloc+0xbc/0x500)
  #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90)
  #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0)
  #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40)
  #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430)
  #6 ffffffff8129b668 (finish_open+0x68/0xa0)
  #7 ffffffff812b66ac (do_last+0xb8c/0x1710)
  #8 ffffffff812b7350 (path_openat+0x120/0xb50)
  #9 ffffffff812b8884 (do_filp_open+0x54/0xb0)
  #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0)
  #11 ffffffff8129d4b7 (SyS_open+0x37/0x50)
  #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Shadow bytes around the buggy address:
  ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
  ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb
  ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap redzone:          fa
  Heap kmalloc redzone:  fb
  Freed heap region:     fd
  Shadow gap:            fe

The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;'

Although the crash happened in ftrace_regex_open() the real bug
occurred in trace_get_user() where there's an incrementation to
parser->idx without a check against the size. The way it is triggered
is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop
that reads the last character stores it and then breaks out because
there is no more characters. Then the last character is read to determine
what to do next, and the index is incremented without checking size.

Then the caller of trace_get_user() usually nulls out the last character
with a zero, but since the index is equal to the size, it writes a nul
character after the allocated space, which can corrupt memory.

Luckily, only root user has write access to this file.

Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Quarx2k pushed a commit that referenced this issue May 29, 2014
…ux/kernel/git/tip/tip

Pull x86 boot changes from Ingo Molnar:
 "Two changes that prettify and compactify the SMP bootup output from:

     smpboot: Booting Node   0, Processors  #1 #2 #3 OK
     smpboot: Booting Node   1, Processors  #4 #5 #6 #7 OK
     smpboot: Booting Node   2, Processors  #8 #9 #10 #11 OK
     smpboot: Booting Node   3, Processors  #12 #13 #14 #15 OK
     Brought up 16 CPUs

  to something like:

     x86: Booting SMP configuration:
     .... node  #0, CPUs:        #1  #2  #3
     .... node  #1, CPUs:    #4  #5  #6  #7
     .... node  #2, CPUs:    #8  #9 #10 #11
     .... node  #3, CPUs:   #12 #13 #14 #15
     x86: Booted up 4 nodes, 16 CPUs"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Further compress CPUs bootup message
  x86: Improve the printout of the SMP bootup CPU table
Quarx2k pushed a commit that referenced this issue May 29, 2014
Lockdep reported that there is a potential for deadlock because
vcpu->arch.tbacct_lock is not irq-safe, and is sometimes taken inside
the rq_lock (run-queue lock) in the scheduler, which is taken within
interrupts.  The lockdep splat looks like:

======================================================
[ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
3.12.0-rc5-kvm+ #8 Not tainted
------------------------------------------------------
qemu-system-ppc/4803 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
(&(&vcpu->arch.tbacct_lock)->rlock){+.+...}, at: [<c0000000000947ac>] .kvmppc_core_vcpu_put_hv+0x2c/0xa0

and this task is already holding:
(&rq->lock){-.-.-.}, at: [<c000000000ac16c0>] .__schedule+0x180/0xaa0
which would create a new lock dependency:
(&rq->lock){-.-.-.} -> (&(&vcpu->arch.tbacct_lock)->rlock){+.+...}

but this new dependency connects a HARDIRQ-irq-safe lock:
(&rq->lock){-.-.-.}
... which became HARDIRQ-irq-safe at:
 [<c00000000013797c>] .lock_acquire+0xbc/0x190
 [<c000000000ac3c74>] ._raw_spin_lock+0x34/0x60
 [<c0000000000f8564>] .scheduler_tick+0x54/0x180
 [<c0000000000c2610>] .update_process_times+0x70/0xa0
 [<c00000000012cdfc>] .tick_periodic+0x3c/0xe0
 [<c00000000012cec8>] .tick_handle_periodic+0x28/0xb0
 [<c00000000001ef40>] .timer_interrupt+0x120/0x2e0
 [<c000000000002868>] decrementer_common+0x168/0x180
 [<c0000000001c7ca4>] .get_page_from_freelist+0x924/0xc10
 [<c0000000001c8e00>] .__alloc_pages_nodemask+0x200/0xba0
 [<c0000000001c9eb8>] .alloc_pages_exact_nid+0x68/0x110
 [<c000000000f4c3ec>] .page_cgroup_init+0x1e0/0x270
 [<c000000000f24480>] .start_kernel+0x3e0/0x4e4
 [<c000000000009d30>] .start_here_common+0x20/0x70

to a HARDIRQ-irq-unsafe lock:
(&(&vcpu->arch.tbacct_lock)->rlock){+.+...}
... which became HARDIRQ-irq-unsafe at:
...  [<c00000000013797c>] .lock_acquire+0xbc/0x190
 [<c000000000ac3c74>] ._raw_spin_lock+0x34/0x60
 [<c0000000000946ac>] .kvmppc_core_vcpu_load_hv+0x2c/0x100
 [<c00000000008394c>] .kvmppc_core_vcpu_load+0x2c/0x40
 [<c000000000081000>] .kvm_arch_vcpu_load+0x10/0x30
 [<c00000000007afd4>] .vcpu_load+0x64/0xd0
 [<c00000000007b0f8>] .kvm_vcpu_ioctl+0x68/0x730
 [<c00000000025530c>] .do_vfs_ioctl+0x4dc/0x7a0
 [<c000000000255694>] .SyS_ioctl+0xc4/0xe0
 [<c000000000009ee4>] syscall_exit+0x0/0x98

Some users have reported this deadlock occurring in practice, though
the reports have been primarily on 3.10.x-based kernels.

This fixes the problem by making tbacct_lock be irq-safe.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Quarx2k pushed a commit that referenced this issue May 29, 2014
the 'soc' node in the common .dtsi for MPC5121 has an '#interrupt-cells'
property although this node is not an interrupt controller

remove this erroneously placed property because starting with v3.13-rc1
lookup and resolution of 'interrupts' specs for peripherals gets misled,
emits 'no irq domain found' WARN() messages and breaks the boot process

  irq: no irq domain found for /soc@80000000 !
  ------------[ cut here ]------------
  WARNING: at drivers/of/platform.c:171
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper Tainted: G        W    3.13.0-rc1-00001-g8a66234 #8
  task: df823bb0 ti: df834000 task.ti: df834000
  NIP: c02b5190 LR: c02b5180 CTR: c01cf4e0
  REGS: df835c50 TRAP: 0700   Tainted: G        W     (3.13.0-rc1-00001-g8a66234)
  MSR: 00029032 <EE,ME,IR,DR,RI>  CR: 229a9d42  XER: 20000000

  GPR00: c02b5180 df835d00 df823bb0 00000000 00000000 df835b18 ffffffff 00000308
  GPR08: c0479cc0 c0480000 c0479cc0 00000308 00000308 00000000 c00040fc 00000000
  GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 df850880
  GPR24: df84d670 00000000 00000001 df8561a0 dffffccc df85089c 00000020 00000001
  NIP [c02b5190] of_device_alloc+0xf4/0x1a0
  LR [c02b5180] of_device_alloc+0xe4/0x1a0
  Call Trace:
  [df835d00] [c02b5180] of_device_alloc+0xe4/0x1a0 (unreliable)
  [df835d50] [c02b5278] of_platform_device_create_pdata+0x3c/0xc8
  [df835d70] [c02b53fc] of_platform_bus_create+0xf8/0x170
  [df835dc0] [c02b5448] of_platform_bus_create+0x144/0x170
  [df835e10] [c02b55a8] of_platform_bus_probe+0x98/0xe8
  [df835e30] [c0437508] mpc512x_init+0x28/0x1c4
  [df835e70] [c0435de8] ppc_init+0x4c/0x60
  [df835e80] [c0003b28] do_one_initcall+0x150/0x1a4
  [df835ef0] [c0432048] kernel_init_freeable+0x114/0x1c0
  [df835f30] [c0004114] kernel_init+0x18/0x124
  [df835f40] [c000e910] ret_from_kernel_thread+0x5c/0x64
  Instruction dump:
  409effd4 57c9103a 57de2834 7c89f050 7f83e378 7c972214 7f45d378 48001f55
  7c63d278 7c630034 5463d97e 687a0001 <0f1a0000> 2f990000 387b0010 939b0098
  ---[ end trace 2257f10e5a20cbdd ]---

  ...
  irq: no irq domain found for /soc@80000000 !
  fsl-diu-fb 80002100.display: could not get DIU IRQ
  fsl-diu-fb: probe of 80002100.display failed with error -22
  irq: no irq domain found for /soc@80000000 !
  mpc512x_dma 80014000.dma: Error mapping IRQ!
  mpc512x_dma: probe of 80014000.dma failed with error -22
  ...
  irq: no irq domain found for /soc@80000000 !
  fs_enet: probe of 80002800.ethernet failed with error -22
  ...
  irq: no irq domain found for /soc@80000000 !
  mpc5121-rtc 80000a00.rtc: mpc5121_rtc_probe: could not request irq: 0
  mpc5121-rtc: probe of 80000a00.rtc failed with error -22
  ...

Cc: Anatolij Gustschin <agust@denx.de>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Gerhard Sittig <gsi@denx.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Quarx2k pushed a commit that referenced this issue May 29, 2014
Check DMA mapping return values in function ioat_dma_self_test() to get
rid of following warning message.

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 1203 at lib/dma-debug.c:937 check_unmap+0x4c0/0x9a0()
  ioatdma 0000:00:04.0: DMA-API: device driver failed to check map error[device address=0x000000085191b000] [size=2000 bytes] [mapped as single]
  Modules linked in: ioatdma(+) mac_hid wmi acpi_pad lp parport hidd_generic usbhid hid ixgbe isci dca libsas ahci ptp libahci scsi_transport_sas meegaraid_sas pps_core mdio
  CPU: 0 PID: 1203 Comm: systemd-udevd Not tainted 3.13.0-rc4+ #8
  Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRIVTIIN1.86B.0044.L09.1311181644 11/18/2013
  Call Trace:
    dump_stack+0x4d/0x66
    warn_slowpath_common+0x7d/0xa0
    warn_slowpath_fmt+0x4c/0x50
    check_unmap+0x4c0/0x9a0
    debug_dma_unmap_page+0x81/0x90
    ioat_dma_self_test+0x3d2/0x680 [ioatdma]
    ioat3_dma_self_test+0x12/0x30 [ioatdma]
    ioat_probe+0xf4/0x110 [ioatdma]
    ioat3_dma_probe+0x268/0x410 [ioatdma]
    ioat_pci_probe+0x122/0x1b0 [ioatdma]
    local_pci_probe+0x45/0xa0
    pci_device_probe+0xd9/0x130
    driver_probe_device+0x171/0x490
    __driver_attach+0x93/0xa0
    bus_for_each_dev+0x6b/0xb0
    driver_attach+0x1e/0x20
    bus_add_driver+0x1f8/0x2b0
    driver_register+0x81/0x110
    __pci_register_driver+0x60/0x70
    ioat_init_module+0x89/0x1000 [ioatdma]
    do_one_initcall+0xe2/0x250
    load_module+0x2313/0x2a00
    SyS_init_module+0xd9/0x130
    system_call_fastpath+0x1a/0x1f
  ---[ end trace 990c591681d27c31 ]---
  Mapped at:
    debug_dma_map_page+0xbe/0x180
    ioat_dma_self_test+0x1ab/0x680 [ioatdma]
    ioat3_dma_self_test+0x12/0x30 [ioatdma]
    ioat_probe+0xf4/0x110 [ioatdma]
    ioat3_dma_probe+0x268/0x410 [ioatdma]

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Quarx2k pushed a commit that referenced this issue May 29, 2014
If recovery failed ath10k returned 0 (success) and
mac80211 continued to call other driver callbacks.
This caused null dereference.

This is how the failure looked like:

ath10k: ctl_resp never came in (-110)
ath10k: failed to connect to HTC: -110
ath10k: could not init core (-110)
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffffa0b355c1>] ath10k_ce_send+0x1d/0x15d [ath10k_pci]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ath10k_pci ath10k_core ath5k ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nf_nat_ipv4 ]
CPU: 1 PID: 36 Comm: kworker/1:1 Tainted: G        WC   3.13.0-rc8-wl-ath+ #8
Hardware name: To be filled by O.E.M. To be filled by O.E.M./HURONRIVER, BIOS 4.6.5 05/02/2012
Workqueue: events ieee80211_restart_work [mac80211]
task: ffff880215b521c0 ti: ffff880215e18000 task.ti: ffff880215e18000
RIP: 0010:[<ffffffffa0b355c1>]  [<ffffffffa0b355c1>] ath10k_ce_send+0x1d/0x15d [ath10k_pci]
RSP: 0018:ffff880215e19af8  EFLAGS: 00010292
RAX: ffff880215e19b10 RBX: 0000000000000000 RCX: 0000000000000018
RDX: 00000000d9ccf800 RSI: ffff8800c965ad00 RDI: 0000000000000000
RBP: ffff880215e19b58 R08: 0000000000000002 R09: 0000000000000000
R10: ffffffff812e1a23 R11: 0000000000000292 R12: 0000000000000018
R13: 0000000000000000 R14: 0000000000000002 R15: ffff88021562d700
FS:  0000000000000000(0000) GS:ffff88021fa80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001a0d000 CR4: 00000000000407e0
Stack:
 d9ccf8000d47df40 ffffffffa0b367a0 ffff880215e19b10 0000000000000010
 ffff880215e19b68 ffff880215e19b28 0000000000000018 ffff8800c965ad00
 0000000000000018 0000000000000000 0000000000000002 ffff88021562d700
Call Trace:
 [<ffffffffa0b3251d>] ath10k_pci_hif_send_head+0xa7/0xcb [ath10k_pci]
 [<ffffffffa0b16cbe>] ath10k_htc_send+0x23d/0x2d0 [ath10k_core]
 [<ffffffffa0b1a169>] ath10k_wmi_cmd_send_nowait+0x5d/0x85 [ath10k_core]
 [<ffffffffa0b1aaef>] ath10k_wmi_cmd_send+0x62/0x115 [ath10k_core]
 [<ffffffff814e8abd>] ? __netdev_alloc_skb+0x4b/0x9b
 [<ffffffffa0b1c438>] ath10k_wmi_vdev_set_param+0x91/0xa3 [ath10k_core]
 [<ffffffffa0b0e0d5>] ath10k_mac_set_rts+0x3e/0x40 [ath10k_core]
 [<ffffffffa0b0e1d0>] ath10k_set_frag_threshold+0x5e/0x9c [ath10k_core]
 [<ffffffffa09d60eb>] ieee80211_reconfig+0x12a/0x7b3 [mac80211]
 [<ffffffff815a8069>] ? mutex_unlock+0x9/0xb
 [<ffffffffa09b3a40>] ieee80211_restart_work+0x5e/0x68 [mac80211]
 [<ffffffff810c01d0>] process_one_work+0x1d7/0x2fc
 [<ffffffff810c0166>] ? process_one_work+0x16d/0x2fc
 [<ffffffff810c06c8>] worker_thread+0x12e/0x1fb
 [<ffffffff810c059a>] ? rescuer_thread+0x27b/0x27b
 [<ffffffff810c5aee>] kthread+0xb5/0xbd
 [<ffffffff815a9220>] ? _raw_spin_unlock_irq+0x28/0x42
 [<ffffffff810c5a39>] ? __kthread_parkme+0x5c/0x5c
 [<ffffffff815ae04c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810c5a39>] ? __kthread_parkme+0x5c/0x5c
Code: df ff d0 48 83 c4 18 5b 41 5c 41 5d 5d c3 55 48 89 e5 41 57 41 56 45 89 c6 41 55 41 54 41 89 cc 53 48 89
RIP  [<ffffffffa0b355c1>] ath10k_ce_send+0x1d/0x15d [ath10k_pci]
 RSP <ffff880215e19af8>
CR2: 0000000000000000

Reported-By: Ben Greear <greearb@candelatech.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Quarx2k pushed a commit that referenced this issue May 29, 2014
…king

Daniel Borkmann reported a VM_BUG_ON assertion failing:

  ------------[ cut here ]------------
  kernel BUG at mm/mlock.c:528!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ccm arc4 iwldvm [...]
   video
  CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ #8
  Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012
  task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000
  RIP: 0010:[<ffffffff81171ad0>]  [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0
  Call Trace:
    do_munmap+0x18f/0x3b0
    vm_munmap+0x41/0x60
    SyS_munmap+0x22/0x30
    system_call_fastpath+0x1a/0x1f
  RIP   munlock_vma_pages_range+0x2e0/0x2f0
  ---[ end trace a0088dcf07ae10f2 ]---

because munlock_vma_pages_range() thinks it's unexpectedly in the middle
of a THP page.  This can be reproduced with default config since 3.11
kernels.  A reproducer can be found in the kernel's selftest directory
for networking by running ./psock_tpacket.

The problem is that an order=2 compound page (allocated by
alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP vma (mapped
by packet_mmap()) and mistaken for a THP page and assumed to be order=9.

The checks for THP in munlock came with commit ff6a6da ("mm:
accelerate munlock() treatment of THP pages"), i.e.  since 3.9, but did
not trigger a bug.  It just makes munlock_vma_pages_range() skip such
compound pages until the next 512-pages-aligned page, when it encounters
a head page.  This is however not a problem for vma's where mlocking has
no effect anyway, but it can distort the accounting.

Since commit 7225522 ("mm: munlock: batch non-THP page isolation
and munlock+putback using pagevec") this can trigger a VM_BUG_ON in
PageTransHuge() check.

This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL, a
list of flags that make vma's non-mlockable and non-mergeable.  The
reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP, which is
already on the VM_SPECIAL list, and both are intended for non-LRU pages
where mlocking makes no sense anyway.  Related Lkml discussion can be
found in [2].

 [1] tools/testing/selftests/net/psock_tpacket
 [2] https://lkml.org/lkml/2014/1/10/427

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>
Tested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [3.11.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Quarx2k pushed a commit that referenced this issue May 29, 2014
vmxnet3's netpoll driver is incorrectly coded.  It directly calls
vmxnet3_do_poll, which is the driver internal napi poll routine.  As the netpoll
controller method doesn't block real napi polls in any way, there is a potential
for race conditions in which the netpoll controller method and the napi poll
method run concurrently.  The result is data corruption causing panics such as this
one recently observed:
PID: 1371   TASK: ffff88023762caa0  CPU: 1   COMMAND: "rs:main Q:Reg"
 #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b
 #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92
 #2 [ffff88023abd58b0] oops_end at ffffffff8152b570
 #3 [ffff88023abd58e0] die at ffffffff81010e0b
 #4 [ffff88023abd5910] do_trap at ffffffff8152add4
 #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95
 #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b
    [exception RIP: vmxnet3_rq_rx_complete+1968]
    RIP: ffffffffa00f1e80  RSP: ffff88023abd5ac8  RFLAGS: 00010086
    RAX: 0000000000000000  RBX: ffff88023b5dcee0  RCX: 00000000000000c0
    RDX: 0000000000000000  RSI: 00000000000005f2  RDI: ffff88023b5dcee0
    RBP: ffff88023abd5b48   R8: 0000000000000000   R9: ffff88023a3b6048
    R10: 0000000000000000  R11: 0000000000000002  R12: ffff8802398d4cd8
    R13: ffff88023af35140  R14: ffff88023b60c890  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3]
 #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3]
 #9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7

The fix is to do as other drivers do, and have the poll controller call the top
half interrupt handler, which schedules a napi poll properly to recieve frames

Tested by myself, successfully.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Shreyas Bhatewara <sbhatewara@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: stable@vger.kernel.org
Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quarx2k pushed a commit that referenced this issue Jun 25, 2014
Currently, the 32-bit and 64-bit atomic operations on ARM do not
include memory constraints in the inline assembly blocks. In the
case of barrier-less operations [for example, atomic_add], this
means that the compiler may constant fold values which have actually
been modified by a call to an atomic operation.

This issue can be observed in the atomic64_test routine in
<kernel root>/lib/atomic64_test.c:

00000000 <test_atomic64>:
   0:	e1a0c00d 	mov	ip, sp
   4:	e92dd830 	push	{r4, r5, fp, ip, lr, pc}
   8:	e24cb004 	sub	fp, ip, #4
   c:	e24dd008 	sub	sp, sp, #8
  10:	e24b3014 	sub	r3, fp, #20
  14:	e30d000d 	movw	r0, #53261	; 0xd00d
  18:	e3011337 	movw	r1, #4919	; 0x1337
  1c:	e34c0001 	movt	r0, #49153	; 0xc001
  20:	e34a1aa3 	movt	r1, #43683	; 0xaaa3
  24:	e16300f8 	strd	r0, [r3, #-8]!
  28:	e30c0afe 	movw	r0, #51966	; 0xcafe
  2c:	e30b1eef 	movw	r1, #48879	; 0xbeef
  30:	e34d0eaf 	movt	r0, #57007	; 0xdeaf
  34:	e34d1ead 	movt	r1, #57005	; 0xdead
  38:	e1b34f9f 	ldrexd	r4, [r3]
  3c:	e1a34f90 	strexd	r4, r0, [r3]
  40:	e3340000 	teq	r4, #0
  44:	1afffffb 	bne	38 <test_atomic64+0x38>
  48:	e59f0004 	ldr	r0, [pc, #4]	; 54 <test_atomic64+0x54>
  4c:	e3a0101e 	mov	r1, #30
  50:	ebfffffe 	bl	0 <__bug>
  54:	00000000 	.word	0x00000000

The atomic64_set (0x38-0x44) writes to the atomic64_t, but the
compiler doesn't see this, assumes the test condition is always
false and generates an unconditional branch to __bug. The rest of the
test is optimised away.

This patch adds suitable memory constraints to the atomic operations on ARM
to ensure that the compiler is informed of the correct data hazards. We have
to use the "Qo" constraints to avoid hitting the GCC anomaly described at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44492 , where the compiler
makes assumptions about the writeback in the addressing mode used by the
inline assembly. These constraints forbid the use of auto{inc,dec} addressing
modes, so it doesn't matter if we don't use the operand exactly once.

Cc: stable@kernel.org
Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants