Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solve the thread completion/destruction problem without cost on every context switch #10

Open
nyh opened this issue Jun 23, 2013 · 5 comments

Comments

@nyh
Copy link
Contributor

nyh commented Jun 23, 2013

In commit 96ee424 I solved a long-standing bug we had, where a terminating thread, in complete(), would wake the thread join()ing it (_joiner), but in rare cases, this joiner thread would delete the completing thread's stack quickly, before complete() returned, causing a crash.

My solution in this commit included an extra variable we need to check after every context switch - to see if the thread we just left terminated, and we need to wake its joiner (on the new thread's stack). This new if() on every context switch - even when no thread ever terminates - is a waste - not a big one, but still a waste. It would be nice if we had a solution without such an extra if().

In a long email brainstorming with Guy we came up with two alternative solutions. Both are more complex and require more coding, but we should consider one of them in the future:

Solution 1 - migration (my favorite, but Avi and Guy are not enthusiastic about it)

The idea behind this solution is that join() can only run in parallel to complete() if they are running on two different CPUs. If we know that both are on the same CPU, and we disable preemption, then join() will not run (and destroy the terminating thread) until the next schedule() switches the threads. To get them both on the same CPU, all we need to do is to migrate the current thread to the joiner's CPU.

So with this solution, complete() will call unref() directly (no need to set a flag instead), and unref will look like this:

void thread::unref()
{
    if (_ref_counter.fetch_add(-1) == 1) {
        if (_joiner) {
            if (thread::current() == this) {
                // Waking the _joiner thread might cause it to quickly delete
                // this thread, including the stack we're running on, so we
                // need to make sure that _joiner can't run while doing this.
                // We ensure this by migrating to the same CPU as _joiner, and
                // then preempt_disable is enough to ensure _joiner is not
                // running (note we assume here migration of a thread out of
                // this CPU only happens in load_balancer() and this doesn't
                // run when preemption is disabled).
                preempt_disable(); // already disabled in complete(), but doesn't hurt
                while (_joiner->tcpu() != tcpu()) {
                    migrate(_joiner->tcpu());
                }
                _joiner->wake_with([&] { _status.store(status::terminated); });
                // at this point, _joiner cannot have run yet (it's on the
                // same CPU as us and preemption is disabled), so our thread
                // and its stack still exists.
                schedule();
            }

            _joiner->wake_with([&] { _status.store(status::terminated); });
        } else {
            _status.store(status::terminated);
        }
    }
}

The remaining challenge is, of course, to implement migrate(). We can't trivially migrate the running thread from itself, because we'll have the potential migrate() running on the original CPU in parallel with itself on the target CPU. Instead, we need to add to load_balancer() support to send the load_balancer on the current CPU the command to move this thread to a different CPU. The current CPU can then yield() hoping load_balancer gets to run, but perhaps it will be more efficient to set a new status ("migrating") so that the thread being migrated doesn't run again until actually migrated ("waiting" may not good enough because a spurious wake can cause it to stop waiting).

While adding migrate() is a complication, I think it will be a generally somewhat-useful feature. Avi also needed it once, to allow a thread to pin itself to a specific CPU.

Solution 2 - temporary stack (Guy's favorite)

This solution also has complete() calling unref() directly, but when unref wants to set the thread's status and wake the joiner, it can't do it on the thread's stack, as we know, so the idea of this solution is to switch to a temporary stack. Each CPU will have a small stack (even one page may be enough) used just in unref() when it decides to delete the thread, it switches to this stack (via an asm() instruction).

There's a complication here, that it's not enough to call wake() on the new stack. wake() also checks the current thread's state, checks preempt_counter (a thread-local variable) and even needs the current thread just to get the current CPU. All of this cannot work when the thread structure, its stack and its TLS, are all deleted in parallel with this code.
So we need a new version of wake() which can take all this information (current CPU, etc.) as parameters. unref() will need to save these things in registers (and force the compiler to leave them in registers), change the stack, and then call wake with these parameters.

This was referenced Jun 23, 2013
@nyh
Copy link
Contributor Author

nyh commented Jun 24, 2013

One downside with solution 2 proposed above is that it adds more assembly code, which we'll need to port in the future to different architectures.

To avoid this, we can use actual threads instead of bare stacks - have a per-cpu thread with a small stack, terminator_thread(), similar to how we have a per-cpu load-balancer and idle thread.

So this proposal becomes:

Solution 2a:

complete() will:

  1. preempt_disable() - so the terminator_thread doesn't run in the middle of this part
  2. set thread's state to "terminating" (as we do now)
  3. add to this thread to a PERCPU variable termination_list (can also use a struct cpu field). No locking is needed. [but note: don't use an STL list, an malloc can cause context switch so we need to control where happens]
  4. wake this CPU's terminator_thread
  5. call schedule()

Each terminator thread will:

  1. wait_until there's something in this cpu's termination_list
  2. preempt_disable(), to prevent competition with more complete()s
  3. for each thread in termination_list, assert() its status is "terminating", and unref() it.
  4. preempt_enable()

nyh added a commit that referenced this issue Jun 24, 2013
This patch solves issue #11, where join() hangs were seen in tst-mutex, with
threads remaining in "terminating" state.

The problem was that complete() assumed that _cpu->terminating_thread was
nullptr, so it could overwrite it with the current thread. This is usually
true - when we switch to any thread, if terminating_thread!=0 it is
handled. The problem is that when we switch to a *new* thread (in
sched::init(), call switch_to_first()) this code does not run, and if
this thread quickly terminates, _cpu->terminating_thread gets overwritten
instead of being handled.

The simplest workaround, in this patch is simply to handle (i.e., call
unref()) the previous _cpu->terminating_thread before overwriting it.

The downside with this approach is that the termination of a thread
may be delayed by the run time of the first time-slice of a new thread.

But we anyway plan to eventually replace this termination mechanism
(see issue #10), so I think this solution is fine.
@loudinthecloud
Copy link
Contributor

Solution 2a is better than unrefing from within schedule() but it adds X more threads, they don't consume much (do they?) but we'll have X idle threads X load balancer and X terminators. If 2a is also an option, maybe we can unify all those threads to a single scheduler_helper thread or something.

Besides that, adding a set_stack() function for each arch is not a big problem in my opinion, we are doing it anyway in switch_to so we can reuse it.

@nyh
Copy link
Contributor Author

nyh commented Jun 24, 2013

On Mon, Jun 24, 2013 at 4:48 PM, loudinthecloud notifications@github.comwrote:

Solution 2a is better than unrefing from within schedule() but it adds X
more threads, they don't consume much (do they?) but we'll have X idle
threads X load balancer and X terminators.

Waiting threads do not consume any resources beyond their memory use -
mostly the stack.
The scheduler doesn't even consider them, because they're not on the
runqueue of any CPU.

One thing that is "ugly" about all these extra threads is that they
"consume" a thread number, and visible on "osv info threads". To make it
less ugly, perhaps we should consider giving these "internal" threads
negative numbers, and have "osv info threads" not show them.

If 2a is also an option, maybe we can unify all those threads to a single

scheduler_helper thread or something.

A good idea. Since idle_thread, load_balancer, and terminator_thread, are
not meant to run concurrently (they run on the same CPU, and do no
waiting), there is no reason why they need to be different threads, with
different stacks. We can save a bit of memory if we use the same stack for
all three, and just have one helper thread that checks all three conditions.

Besides that, adding a set_stack() function for each arch is not a big
problem in my opinion, we are doing it anyway in switch_to so we can reuse
it.

I'm not sure that all it takes to change the stack in the middle of C code
without a disaster is a set_stack() function (I'm not even sure what it
means to change the stack in a function, maybe it needs to be a macro...).
I'm not saying it can't work, but since we already have the same mechanism
for normal context-switching, maybe it's better to just use exactly the
same mechanism, instead of duplicating it.


Reply to this email directly or view it on GitHubhttps://github.com//issues/10#issuecomment-19907628
.

Nadav Har'El
nyh@cloudius-systems.com

@loudinthecloud
Copy link
Contributor

On 06/24/2013 05:27 PM, nyh wrote:

I'm not sure that all it takes to change the stack in the middle of C
code
without a disaster is a set_stack() function (I'm not even sure what it
means to change the stack in a function, maybe it needs to be a
macro...).
I'm not saying it can't work, but since we already have the same
mechanism
for normal context-switching, maybe it's better to just use exactly the
same mechanism, instead of duplicating it.

It's possible that set_stack() will be used both by switch_to() and
complete() and that's what I was suggesting.

Obviously, set_stack() will not work if it's not inlined...

Besides that, I support option 2a as well especially if we plan to unify
idle_thread, load_balance and "the terminator".

@avikivity
Copy link
Member

On Mon, Jun 24, 2013 at 5:56 PM, loudinthecloud notifications@github.comwrote:

On 06/24/2013 05:27 PM, nyh wrote:

I'm not sure that all it takes to change the stack in the middle of C
code
without a disaster is a set_stack() function (I'm not even sure what it
means to change the stack in a function, maybe it needs to be a
macro...).
I'm not saying it can't work, but since we already have the same
mechanism
for normal context-switching, maybe it's better to just use exactly the
same mechanism, instead of duplicating it.

It's possible that set_stack() will be used both by switch_to() and
complete() and that's what I was suggesting.

Obviously, set_stack() will not work if it's not inlined...

Besides that, I support option 2a as well especially if we plan to unify
idle_thread, load_balance and "the terminator".

Inlining is not sufficient, as the compiler may/will spill variables on
the stack.

penberg pushed a commit that referenced this issue Aug 15, 2013
Building OSv with debug memory allocator enabled:

  $ make -j mode=debug conf-preempt=0 conf-debug_memory=1

Causes the guest to enter a busy loop right after JVM starts up:

  $ ./scripts/run.py -d

  [...]

  OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed

GDB explains:

  #0  0x00000000003b5c54 in
boost::intrusive::rbtree_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
boost::intrusive::set_member_hook<boost::intrusive::none,
boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
&memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true>
>::private_erase (this=0x1d2f8c8 <memory::free_page_ranges+8>, b=..., e=...,
n=@0x3b40e9: 6179885759521391432) at
../../external/misc.bin/usr/include/boost/intrusive/rbtree.hpp:1417
  #1  0x00000000003b552e in
boost::intrusive::rbtree_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
boost::intrusive::set_member_hook<boost::intrusive::none,
boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
&memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true>
>::erase<memory::page_range, memory::addr_cmp>(memory::page_range const&,
memory::addr_cmp,
boost::intrusive::detail::enable_if_c<!boost::intrusive::detail::is_convertible<memory::addr_cmp,
boost::intrusive::tree_iterator<boost::intrusive::rbtree_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
boost::intrusive::set_member_hook<boost::intrusive::none,
boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
&memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true> >,
true> >::value, void>::type*) (this=0x1d2f8c0 <memory::free_page_ranges>,
key=..., comp=...) at
../../external/misc.bin/usr/include/boost/intrusive/rbtree.hpp:878
  #2  0x00000000003b4c4e in
boost::intrusive::rbtree_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
boost::intrusive::set_member_hook<boost::intrusive::none,
boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
&memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true>
>::erase (this=0x1d2f8c0 <memory::free_page_ranges>, value=...) at
../../external/misc.bin/usr/include/boost/intrusive/rbtree.hpp:856
  #3  0x00000000003b4145 in
boost::intrusive::set_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
boost::intrusive::set_member_hook<boost::intrusive::none,
boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
&memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true>
>::erase (this=0x1d2f8c0 <memory::free_page_ranges>, value=...) at
../../external/misc.bin/usr/include/boost/intrusive/set.hpp:601
  #4  0x00000000003b0130 in memory::refill_page_buffer () at ../../core/mempool.cc:487
  #5  0x00000000003b05f8 in memory::untracked_alloc_page () at ../../core/mempool.cc:569
  #6  0x00000000003b0631 in memory::alloc_page () at ../../core/mempool.cc:577
  #7  0x0000000000367a7c in mmu::populate::small_page (this=0x2000001fd460, ptep=..., offset=0) at ../../core/mmu.cc:456
  #8  0x0000000000365b00 in mmu::page_range_operation::operate_page
(this=0x2000001fd460, huge=false, addr=0xffffe0004ec9b000, offset=0) at
../../core/mmu.cc:438
  #9  0x0000000000365790 in mmu::page_range_operation::operate
(this=0x2000001fd460, start=0xffffe0004ec9b000, size=4096) at
../../core/mmu.cc:387
  #10 0x0000000000366148 in mmu::vpopulate (addr=0xffffe0004ec9b000, size=4096) at ../../core/mmu.cc:657
  #11 0x00000000003b0d8d in dbg::malloc (size=16) at ../../core/mempool.cc:818
  #12 0x00000000003b0f32 in malloc (size=16) at ../../core/mempool.cc:854

Fix the problem by checking if free_page_ranges is empty in
refill_page_buffer(). This fixes the busy loop issue and shows what's
really happening:

  OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed
  alloc_page(): out of memory
  Aborted
penberg pushed a commit that referenced this issue Aug 16, 2013
Starting up Cassandra with debug memory allocator GPFs as follows:

  Breakpoint 1, abort () at ../../runtime.cc:85
  85	{
  (gdb) bt
  #0  abort () at ../../runtime.cc:85
  #1  0x0000000000375812 in osv::generate_signal (siginfo=..., ef=ef@entry=0xffffc0003ffe3008) at ../../libc/signal.cc:40
  #2  0x000000000037587c in osv::handle_segmentation_fault (addr=addr@entry=18446708889768681440, ef=ef@entry=0xffffc0003ffe3008)
      at ../../libc/signal.cc:55
  #3  0x00000000002fba02 in page_fault (ef=0xffffc0003ffe3008) at ../../core/mmu.cc:876
  #4  <signal handler called>
  #5  dbg::realloc (v=v@entry=0xffffe00019b3e000, size=size@entry=16) at ../../core/mempool.cc:846
  #6  0x000000000032654c in realloc (obj=0xffffe00019b3e000, size=16) at ../../core/mempool.cc:870
  #7  0x0000100000627743 in ?? ()
  #8  0x00002000001fe770 in ?? ()
  #9  0x00002000001fe780 in ?? ()
  #10 0x00002000001fe710 in ?? ()
  #11 0x00002000001fe700 in ?? ()
  #12 0xffffe000170e8000 in ?? ()
  #13 0x0000000200000001 in ?? ()
  #14 0x0000000000000020 in ?? ()
  #15 0x00002000001ffe70 in ?? ()
  #16 0xffffe000170e0004 in ?? ()
  #17 0x000000000036f361 in strcpy (dest=0x100001087420 "", src=<optimized out>) at ../../libc/string/strcpy.c:8
  #18 0x0000100000629b53 in ?? ()
  #19 0xffffe00019b22000 in ?? ()
  #20 0x0000000000000001 in ?? ()
  #21 0x0000000000000000 in ?? ()

The problem was introduced in commit 1ea5672 ("memory: let the debug
allocator mimic the standard allocator more closely") which forgot
to convert realloc() to use 'pad_before'.
penberg pushed a commit that referenced this issue Aug 26, 2013
If a crashed OSv guest is restarted, ZFS mount causes a GPF in early
startup:

  VFS: mounting zfs at /usr
  zfs: mounting osv/usr from device /dev/vblk1
  Aborted

GDB backtrace points finger at zfs_rmnode():

  #0  processor::halt_no_interrupts () at ../../arch/x64/processor.hh:212
  #1  0x00000000003e7f2a in osv::halt () at ../../core/power.cc:20
  #2  0x000000000021cdd4 in abort (msg=0x636df0 "Aborted\n") at ../../runtime.cc:95
  #3  0x000000000021cda2 in abort () at ../../runtime.cc:86
  #4  0x000000000044c149 in osv::generate_signal (siginfo=..., ef=0xffffc0003ffe7008) at ../../libc/signal.cc:44
  #5  0x000000000044c220 in osv::handle_segmentation_fault (addr=72, ef=0xffffc0003ffe7008) at ../../libc/signal.cc:55
  #6  0x0000000000366df3 in page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:876
  #7  <signal handler called>
  #8  0x0000000000345eaa in zfs_rmnode (zp=0xffffc0003d1de400)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:611
  #9  0x000000000035650c in zfs_zinactive (zp=0xffffc0003d1de400)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1355
  #10 0x0000000000345be1 in zfs_unlinked_drain (zfsvfs=0xffffc0003ddfe000)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:523
  #11 0x000000000034f45c in zfsvfs_setup (zfsvfs=0xffffc0003ddfe000, mounting=true)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:881
  #12 0x000000000034f7a4 in zfs_domount (vfsp=0xffffc0003de02000, osname=0x6b14cb "osv/usr")
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1016
  #13 0x000000000034f98c in zfs_mount (mp=0xffffc0003de02000, dev=0x6b14d7 "/dev/vblk1", flags=0, data=0x6b14cb)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1415
  #14 0x0000000000406852 in sys_mount (dev=0x6b14d7 "/dev/vblk1", dir=0x6b14a3 "/usr", fsname=0x6b14d3 "zfs", flags=0, data=0x6b14cb)
      at ../../fs/vfs/vfs_mount.c:171
  #15 0x00000000003eff97 in mount_usr () at ../../fs/vfs/main.cc:1415
  #16 0x0000000000203a89 in do_main_thread (_args=0xffffc0003fe9ced0) at ../../loader.cc:215
  #17 0x0000000000448575 in pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, pthread_private::thread_attr const*)::{lambda()#1}::operator()() const () at ../../libc/pthread.cc:59
  #18 0x00000000004499d3 in std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda0>::_M_invoke(const std::_Any_data &) (__functor=...)
      at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071
  #19 0x000000000037e602 in std::function<void ()>::operator()() const (this=0xffffc0003e170038)
      at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2468
  #20 0x00000000003bae3e in sched::thread::main (this=0xffffc0003e170010) at ../../core/sched.cc:581
  #21 0x00000000003b8c92 in sched::thread_main_c (t=0xffffc0003e170010) at ../../arch/x64/arch-switch.hh:133
  #22 0x0000000000399c8e in thread_main () at ../../arch/x64/entry.S:101

The problem is that ZFS tries to check if the znode is an attribute
directory and trips over zp->z_vnode being NULL.  However, as explained
in commit b7ee91e ("zfs: port vop_lookup"), we don't even support
extended attributes so drop the check completely for OSv.
penberg pushed a commit that referenced this issue Aug 26, 2013
If a crashed OSv guest is restarted, ZFS mount causes a GPF in early
startup:

  VFS: mounting zfs at /usr
  zfs: mounting osv/usr from device /dev/vblk1
  Aborted

GDB backtrace points finger at zfs_rmnode():

  #0  processor::halt_no_interrupts () at ../../arch/x64/processor.hh:212
  #1  0x00000000003e7f2a in osv::halt () at ../../core/power.cc:20
  #2  0x000000000021cdd4 in abort (msg=0x636df0 "Aborted\n") at ../../runtime.cc:95
  #3  0x000000000021cda2 in abort () at ../../runtime.cc:86
  #4  0x000000000044c149 in osv::generate_signal (siginfo=..., ef=0xffffc0003ffe7008) at ../../libc/signal.cc:44
  #5  0x000000000044c220 in osv::handle_segmentation_fault (addr=72, ef=0xffffc0003ffe7008) at ../../libc/signal.cc:55
  #6  0x0000000000366df3 in page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:876
  #7  <signal handler called>
  #8  0x0000000000345eaa in zfs_rmnode (zp=0xffffc0003d1de400)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:611
  #9  0x000000000035650c in zfs_zinactive (zp=0xffffc0003d1de400)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1355
  #10 0x0000000000345be1 in zfs_unlinked_drain (zfsvfs=0xffffc0003ddfe000)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:523
  #11 0x000000000034f45c in zfsvfs_setup (zfsvfs=0xffffc0003ddfe000, mounting=true)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:881
  #12 0x000000000034f7a4 in zfs_domount (vfsp=0xffffc0003de02000, osname=0x6b14cb "osv/usr")
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1016
  #13 0x000000000034f98c in zfs_mount (mp=0xffffc0003de02000, dev=0x6b14d7 "/dev/vblk1", flags=0, data=0x6b14cb)
      at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1415
  #14 0x0000000000406852 in sys_mount (dev=0x6b14d7 "/dev/vblk1", dir=0x6b14a3 "/usr", fsname=0x6b14d3 "zfs", flags=0, data=0x6b14cb)
      at ../../fs/vfs/vfs_mount.c:171
  #15 0x00000000003eff97 in mount_usr () at ../../fs/vfs/main.cc:1415
  #16 0x0000000000203a89 in do_main_thread (_args=0xffffc0003fe9ced0) at ../../loader.cc:215
  #17 0x0000000000448575 in pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, pthread_private::thread_attr const*)::{lambda()#1}::operator()() const () at ../../libc/pthread.cc:59
  #18 0x00000000004499d3 in std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda0>::_M_invoke(const std::_Any_data &) (__functor=...)
      at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071
  #19 0x000000000037e602 in std::function<void ()>::operator()() const (this=0xffffc0003e170038)
      at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2468
  #20 0x00000000003bae3e in sched::thread::main (this=0xffffc0003e170010) at ../../core/sched.cc:581
  #21 0x00000000003b8c92 in sched::thread_main_c (t=0xffffc0003e170010) at ../../arch/x64/arch-switch.hh:133
  #22 0x0000000000399c8e in thread_main () at ../../arch/x64/entry.S:101

The problem is that ZFS tries to check if the znode is an attribute
directory and trips over zp->z_vnode being NULL.  However, as explained
in commit b7ee91e ("zfs: port vop_lookup"), we don't even support
extended attributes so drop the check completely for OSv.
penberg pushed a commit that referenced this issue Aug 26, 2013
Fix mincore() to deal with unmapped addresses like msync() does.

This fixes a SIGSEGV in libunwind's access_mem() when leak detector is
enabled:

   (gdb) bt
  #0  page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:871
  #1  <signal handler called>
  #2  ContiguousSpace::block_start_const (this=<optimized out>, p=0x77d2f3968)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/oops/oop.inline.hpp:411
  #3  0x00001000008ae16c in GenerationBlockStartClosure::do_space (this=0x2000001f9100, s=<optimized out>)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/generation.cpp:242
  #4  0x00001000007f097c in DefNewGeneration::space_iterate (this=0xffffc0003fb68c00, blk=0x2000001f9100, usedOnly=<optimized out>)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/defNewGeneration.cpp:480
  #5  0x00001000008aca0e in Generation::block_start (this=<optimized out>, p=<optimized out>)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/generation.cpp:251
  #6  0x0000100000b06d2f in os::print_location (st=st@entry=0x2000001f9560, x=32165017960, verbose=verbose@entry=false)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/os.cpp:868
  #7  0x0000100000b11b5b in os::print_register_info (st=0x2000001f9560, context=0x2000001f9740)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:839
  #8  0x0000100000c6cde8 in VMError::report (this=0x2000001f9610, st=st@entry=0x2000001f9560)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:551
  #9  0x0000100000c6da3b in VMError::report_and_die (this=this@entry=0x2000001f9610)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:984
  #10 0x0000100000b1109f in JVM_handle_linux_signal (sig=11, info=0x2000001f9bb8, ucVoid=0x2000001f9740,
      abort_if_unrecognized=<optimized out>)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:528
  #11 0x000000000039f242 in call_signal_handler (frame=0x2000001f9b10) at ../../arch/x64/signal.cc:69
  #12 <signal handler called>
  #13 0x000000000057d721 in access_mem ()
  #14 0x000000000057cb1d in dwarf_get ()
  #15 0x000000000057ce51 in _ULx86_64_step ()
  #16 0x00000000004315fd in backtrace (buffer=0x1ff9d80 <memory::alloc_tracker::remember(void*, int)::bt>, size=20)
      at ../../libc/misc/backtrace.cc:16
  #17 0x00000000003b8d99 in memory::alloc_tracker::remember (this=0x1777ae0 <memory::tracker>, addr=0xffffc0004508de00, size=54)
      at ../../core/alloctracker.cc:59
  #18 0x00000000003b0504 in memory::tracker_remember (addr=0xffffc0004508de00, size=54) at ../../core/mempool.cc:43
  #19 0x00000000003b2152 in std_malloc (size=54) at ../../core/mempool.cc:723
  #20 0x00000000003b259c in malloc (size=54) at ../../core/mempool.cc:856
  #21 0x0000100001615e4c in JNU_GetStringPlatformChars (env=env@entry=0xffffc0003a4dc1d8, jstr=jstr@entry=0xffffc0004591b800,
      isCopy=isCopy@entry=0x0) at ../../../src/share/native/common/jni_util.c:801
  #22 0x000010000161ada6 in Java_java_io_UnixFileSystem_getBooleanAttributes0 (env=0xffffc0003a4dc1d8, this=<optimized out>,
      file=<optimized out>) at ../../../src/solaris/native/java/io/UnixFileSystem_md.c:111
  #23 0x000020000021ed8e in ?? ()
  #24 0x00002000001faa58 in ?? ()
  #25 0x00002000001faac0 in ?? ()
  #26 0x00002000001faa50 in ?? ()
  #27 0x0000000000000000 in ?? ()

Spotted by Avi Kivity.
penberg pushed a commit that referenced this issue Sep 3, 2013
If "-javaagent" option is enabled, the JVM page faults:

  (gdb) bt
  #0  0x000000000033d432 in halt_no_interrupts () at ../../arch/x64/processor.hh:232
  #1  osv::halt () at ../../core/power.cc:20
  #2  0x0000000000214d82 in abort (msg=msg@entry=0x57ea90 "page fault outside application") at ../../runtime.cc:98
  #3  0x00000000002fd4d8 in page_fault (ef=0xffffc0003ffe6008) at ../../core/mmu.cc:943
  #4  <signal handler called>
  #5  0x0000000000373169 in __stpncpy (d=d@entry=0x2000001fe7f0 "dlsym: symbol JVM_begin_signal_setting not found", s=0x0, n=1023)
      at ../../libc/string/stpncpy.c:19
  #6  0x0000000000373ad1 in strncpy (d=0x2000001fe7f0 "dlsym: symbol JVM_begin_signal_setting not found", s=<optimized out>,
      n=<optimized out>) at ../../libc/string/strncpy.c:7
  #7  0x0000100000b0ab67 in os::dll_load (filename=filename@entry=0x2000001febf0 "libinstrument.so",
      ebuf=ebuf@entry=0x2000001fe7f0 "dlsym: symbol JVM_begin_signal_setting not found", ebuflen=ebuflen@entry=1024)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:1841
  #8  0x0000100000c247eb in lookup_on_load (agent=agent@entry=0xffffc0003976ccc0,
      on_load_symbols=on_load_symbols@entry=0x2000001ffd40, num_symbol_entries=1)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:3585
  #9  0x0000100000c2a64f in lookup_agent_on_load (agent=0xffffc0003976ccc0)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:3617
  #10 create_vm_init_agents ()
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:3656
  #11 Threads::create_vm (args=<optimized out>, canTryAgain=canTryAgain@entry=0x2000001ffdb0)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:3177
  #12 0x000010000094d7b0 in JNI_CreateJavaVM (vm=0x2000001ffe58, penv=0x2000001ffe60, args=<optimized out>)
      at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/prims/jni.cpp:5127
  #13 0x0000100000007b3b in main (argc=<optimized out>, argv=0xffffc0003fff2008) at ../../java/java.cc:73
  #14 0x0000000000208ac8 in run_main (prog=<optimized out>, args=args@entry=0xffffc0003fe9bfa0) at ../../loader.cc:196
  #15 0x0000000000208c68 in do_main_thread (_args=0xffffc0003fe9bfa0) at ../../loader.cc:217
  #16 0x0000000000376d36 in operator() (__closure=0xffffc0003b5e2a00) at ../../libc/pthread.cc:59
  #17 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda0>::_M_invoke(const std::_Any_data &) (__functor=...) at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071
  #18 0x000000000032ed6b in main (this=0xffffc0003a807010) at ../../core/sched.cc:536
  #19 sched::thread_main_c (t=0xffffc0003a807010) at ../../arch/x64/arch-switch.hh:133
  #20 0x000000000031873e in thread_main () at ../../arch/x64/entry.S:101
  Backtrace stopped: frame did not save the PC

This is caused by the JVM expecting dlerror() to return an error string
if dlopen() fails. Fix that up.
penberg pushed a commit that referenced this issue Dec 23, 2013
I saw this Abort:

35.159 Mb/s
50.230 Mb/s
46.648 Mb/s
68.850 Mb/s
Wrote 613.418 MB in 10.00 s
Aborted

The backtrace says:

  (gdb) bt
  #0  0x000000000035bb82 in halt_no_interrupts () at
  /home/asias/src/cloudius-systems/osv/arch/x64/processor.hh:241
  #1  osv::halt () at
  /home/asias/src/cloudius-systems/osv/core/power.cc:28
  #2  0x0000000000218142 in abort (msg=msg@entry=0x55197f "Aborted\n") at
  /home/asias/src/cloudius-systems/osv/runtime.cc:89
  #3  0x000000000021816e in abort () at
  /home/asias/src/cloudius-systems/osv/runtime.cc:79
  #4  0x000000000039eaa2 in osv::generate_signal (siginfo=...,
  ef=0xffffc0003eb56008) at
  /home/asias/src/cloudius-systems/osv/libc/signal.cc:58
  #5  0x000000000039eb0c in osv::handle_segmentation_fault
  (addr=<optimized out>, ef=<optimized out>) at
  /home/asias/src/cloudius-systems/osv/libc/signal.cc:73
  #6  0x000000000030b45c in mmu::vm_sigsegv
  (addr=addr@entry=17592186060800, ef=ef@entry=0xffffc0003eb56008) at
  /home/asias/src/cloudius-systems/osv/core/mmu.cc:763
  #7  0x000000000030b54b in mmu::vm_fault (addr=<optimized out>,
  addr@entry=17592186061840, ef=ef@entry=0xffffc0003eb56008)
      at /home/asias/src/cloudius-systems/osv/core/mmu.cc:773
  #8  0x000000000032bff5 in page_fault (ef=0xffffc0003eb56008) at
  /home/asias/src/cloudius-systems/osv/arch/x64/mmu.cc:35
  #9  <signal handler called>
  #10 0x0000100000004410 in ?? ()
  #11 0x000000000031e5fd in virtio::blk::req_done
  (this=0xffffc0003eddb800) at
  /home/asias/src/cloudius-systems/osv/drivers/virtio-blk.

Wait until all the bio are done to fix this use after free.

This patch also make the test to measure completed writes instead of
submitted writes.

Reviewed-by: Tomasz Grabiec <tgrabiec@gmail.com>
Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
penberg pushed a commit that referenced this issue Jul 17, 2014
This reverts commit 902535a.

Raphael reports the following breakage in tst-poll.so:

  OSv
  ---

    TEST tst-nway-merger.so        OK  (0.782 s)
    TEST tst-openat.so             OK  (0.731 s)
    TEST tst-pipe.so               OK  (6.840 s)
    TEST tst-poll.so              OSv v0.10-319-g1bdd4a2
  eth0: 192.168.122.15
  Running 3 test cases...
  Assertion failed: tp->t_flags & 0x1000000
  (/data/osv/bsd/sys/netinet/tcp_timer.cc: tcp_timer_tso_flush: 438)

  [backtrace]
  0x0000000000223e98 <__assert_fail+24>
  0x0000000000279de4 <???+2596324>
  0x00000000003dbd9d <async::timer_task::fire(async::percpu_timer_task&)+125>
  0x00000000003dcb13 <async::async_worker::run()+1587>
  0x00000000003be11a <thread_main_c+26>
  0x0000000000372525 <???+3613989>

  GDB
  ---

  (gdb) bt
  #0  0x00000000003d2a72 in cli_hlt () at /data/osv/arch/x64/processor.hh:242
  #1  halt_no_interrupts () at /data/osv/arch/x64/arch.hh:48
  #2  osv::halt () at /data/osv/core/power.cc:34
  #3  0x0000000000223df0 in abort (fmt=fmt@entry=0x5c9980 "Assertion
  failed: %s (%s: %s: %d)\n") at /data/osv/runtime.cc:139
  #4  0x0000000000223e99 in __assert_fail (expr=<optimized out>,
  file=<optimized out>, line=<optimized out>, func=<optimized out>)
      at /data/osv/runtime.cc:145
  #5  0x0000000000279de5 in tcp_timer_tso_flush (timer=...,
  tp=0xffffa0003db51800) at /data/osv/bsd/sys/netinet/tcp_timer.cc:438
  #6  0x00000000003dbd9e in async::timer_task::fire
  (this=this@entry=0xffffa0003d6b9a10, task=...) at
  /data/osv/core/async.cc:362
  #7  0x00000000003dcb14 in fire (task=..., this=0xffff80003fa84030) at
  /data/osv/core/async.cc:227
  #8  async::async_worker::run (this=0xffff80003fa84030) at
  /data/osv/core/async.cc:175
  #9  0x00000000003be11b in main (this=0xffff80003fa84730) at
  /data/osv/core/sched.cc:915
  #10 sched::thread_main_c (t=0xffff80003fa84730) at
  /data/osv/arch/x64/arch-switch.hh:137
  #11 0x0000000000372526 in thread_main () at /data/osv/arch/x64/entry.S:113
  (gdb) print tp->t_flags
  No symbol "tp" in current context.
  (gdb) frame 5
  #5  0x0000000000279de5 in tcp_timer_tso_flush (timer=...,
  tp=0xffffa0003db51800) at /data/osv/bsd/sys/netinet/tcp_timer.cc:438
  438     KASSERT(tp->t_flags & TF_TSO, "tcp_timer_tso_flush: TSO disabled");
  (gdb) print tp->t_flags
  $1 = 992

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
asias added a commit to asias/osv that referenced this issue Aug 28, 2014
   $ sudo scripts/run.py -c1 -nv -m4g  -e "/tests/tst-epoll.so"
   ...

   PASS: epoll_wait
   PASS: write
   PASS: epoll_wait
   PASS: epoll_ctl MOD
   PASS: epoll_wait
   PASS: epoll_ctl ADD
   PASS: errno == EEXIST
   Assertion failed: i != f_epolls->end() (/home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc: epoll_del: 249)

   [backtrace]
   0x0000000000223648 <__assert_fail+24>
   0x000000000041743a <file::epoll_del(epoll_ptr)+186>
   0x00000000003f7682 <epoll_ctl+786>
   0x0000100000c032db <???+12595931>
   0x000000000040510a <osv::application::run_main(std::string, int, char**)+266>
   0x0000000000405847 <osv::application::run_main()+551>
   0x000000000020bbd1 <osv::application::main()+97>
   0x0000000000406698 <???+4220568>
   0x0000000000433745 <???+4405061>
   0x00000000003e38aa <thread_main_c+26>
   0x0000000000389c75 <???+3710069>

   (gdb) bt
   #0  0x00000000003f8e92 in cli_hlt () at /home/asias/src/cloudius-systems/osv/arch/x64/processor.hh:242
   cloudius-systems#1  halt_no_interrupts () at /home/asias/src/cloudius-systems/osv/arch/x64/arch.hh:49
   cloudius-systems#2  osv::halt () at /home/asias/src/cloudius-systems/osv/core/power.cc:34
   cloudius-systems#3  0x000000000022360f in abort (fmt=fmt@entry=0x603b60 "Assertion failed: %s (%s: %s: %d)\n") at /home/asias/src/cloudius-systems/osv/runtime.cc:150
   cloudius-systems#4  0x0000000000223649 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>)
       at /home/asias/src/cloudius-systems/osv/runtime.cc:156
   cloudius-systems#5  0x000000000041743b in file::epoll_del (this=<optimized out>, ep=...) at /home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc:249
   cloudius-systems#6  0x00000000003f7683 in del (key=..., this=0xffff800001746040) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:96
   cloudius-systems#7  epoll_ctl (epfd=<optimized out>, op=<optimized out>, fd=<optimized out>, event=0x2000002f9d50) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:283
   cloudius-systems#8  0x0000100000c032dc in test_epolloneshot () at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:78
   cloudius-systems#9  main (ac=<optimized out>, av=<optimized out>) at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:231
   cloudius-systems#10 0x000000000040510b in osv::application::run_main (this=this@entry=0xffffa00001561f18, path="/tests/tst-epoll.so", argc=argc@entry=1,
       argv=argv@entry=0xffffa00002994760) at /home/asias/src/cloudius-systems/osv/core/app.cc:182
   cloudius-systems#11 0x0000000000405848 in osv::application::run_main (this=this@entry=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:197
   cloudius-systems#12 0x000000000020bbd2 in osv::application::main (this=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:144
   cloudius-systems#13 0x0000000000406699 in operator() (__closure=0x0, app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:104
   cloudius-systems#14 osv::application::__lambda3::_FUN (app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:106
   cloudius-systems#15 0x0000000000433746 in operator() (__closure=0xffffa00003320600) at /home/asias/src/cloudius-systems/osv/libc/pthread.cc:97
   cloudius-systems#16 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda7>::_M_invoke(const std::_Any_data &) (__functor=...) at /home/asias/src/cloudius-systems/osv/external/x64/gcc.bin/usr/include/c++/4.8.2/functional:2071
   cloudius-systems#17 0x00000000003e38ab in main (this=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/core/sched.cc:940
   cloudius-systems#18 sched::thread_main_c (t=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/arch/x64/arch-switch.hh:137
   cloudius-systems#19 0x0000000000389c76 in thread_main () at /home/asias/src/cloudius-systems/osv/arch/x64/entry.S:113
asias added a commit to asias/osv that referenced this issue Aug 28, 2014
After:
    Revert "Revert "epoll: fix up lock ordering issues""
    This reverts commit 408a982.

I saw the following without this patch:

   $ sudo scripts/run.py -c1 -nv -m4g  -e "/tests/tst-epoll.so"
   ...

   PASS: epoll_wait
   PASS: write
   PASS: epoll_wait
   PASS: epoll_ctl MOD
   PASS: epoll_wait
   PASS: epoll_ctl ADD
   PASS: errno == EEXIST
   Assertion failed: i != f_epolls->end() (/home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc: epoll_del: 249)

   [backtrace]
   0x0000000000223648 <__assert_fail+24>
   0x000000000041743a <file::epoll_del(epoll_ptr)+186>
   0x00000000003f7682 <epoll_ctl+786>
   0x0000100000c032db <???+12595931>
   0x000000000040510a <osv::application::run_main(std::string, int, char**)+266>
   0x0000000000405847 <osv::application::run_main()+551>
   0x000000000020bbd1 <osv::application::main()+97>
   0x0000000000406698 <???+4220568>
   0x0000000000433745 <???+4405061>
   0x00000000003e38aa <thread_main_c+26>
   0x0000000000389c75 <???+3710069>

   (gdb) bt
   #0  0x00000000003f8e92 in cli_hlt () at /home/asias/src/cloudius-systems/osv/arch/x64/processor.hh:242
   cloudius-systems#1  halt_no_interrupts () at /home/asias/src/cloudius-systems/osv/arch/x64/arch.hh:49
   cloudius-systems#2  osv::halt () at /home/asias/src/cloudius-systems/osv/core/power.cc:34
   cloudius-systems#3  0x000000000022360f in abort (fmt=fmt@entry=0x603b60 "Assertion failed: %s (%s: %s: %d)\n") at /home/asias/src/cloudius-systems/osv/runtime.cc:150
   cloudius-systems#4  0x0000000000223649 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>)
       at /home/asias/src/cloudius-systems/osv/runtime.cc:156
   cloudius-systems#5  0x000000000041743b in file::epoll_del (this=<optimized out>, ep=...) at /home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc:249
   cloudius-systems#6  0x00000000003f7683 in del (key=..., this=0xffff800001746040) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:96
   cloudius-systems#7  epoll_ctl (epfd=<optimized out>, op=<optimized out>, fd=<optimized out>, event=0x2000002f9d50) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:283
   cloudius-systems#8  0x0000100000c032dc in test_epolloneshot () at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:78
   cloudius-systems#9  main (ac=<optimized out>, av=<optimized out>) at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:231
   cloudius-systems#10 0x000000000040510b in osv::application::run_main (this=this@entry=0xffffa00001561f18, path="/tests/tst-epoll.so", argc=argc@entry=1,
       argv=argv@entry=0xffffa00002994760) at /home/asias/src/cloudius-systems/osv/core/app.cc:182
   cloudius-systems#11 0x0000000000405848 in osv::application::run_main (this=this@entry=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:197
   cloudius-systems#12 0x000000000020bbd2 in osv::application::main (this=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:144
   cloudius-systems#13 0x0000000000406699 in operator() (__closure=0x0, app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:104
   cloudius-systems#14 osv::application::__lambda3::_FUN (app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:106
   cloudius-systems#15 0x0000000000433746 in operator() (__closure=0xffffa00003320600) at /home/asias/src/cloudius-systems/osv/libc/pthread.cc:97
   cloudius-systems#16 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda7>::_M_invoke(const std::_Any_data &) (__functor=...) at /home/asias/src/cloudius-systems/osv/external/x64/gcc.bin/usr/include/c++/4.8.2/functional:2071
   cloudius-systems#17 0x00000000003e38ab in main (this=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/core/sched.cc:940
   cloudius-systems#18 sched::thread_main_c (t=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/arch/x64/arch-switch.hh:137
   cloudius-systems#19 0x0000000000389c76 in thread_main () at /home/asias/src/cloudius-systems/osv/arch/x64/entry.S:113
asias added a commit to asias/osv that referenced this issue Aug 28, 2014
The problem is that when epoll_wait removes the key from _file map, it
does not remove the key from _epoll map, so when epoll_ctrl(DEL) tries
to remove _file_map again, we hit the assert.

This fixes the following.

After:
    Revert "Revert "epoll: fix up lock ordering issues""

I saw:

   $ sudo scripts/run.py -c1 -nv -m4g  -e "/tests/tst-epoll.so"
   ...

   PASS: epoll_wait
   PASS: write
   PASS: epoll_wait
   PASS: epoll_ctl MOD
   PASS: epoll_wait
   PASS: epoll_ctl ADD
   PASS: errno == EEXIST
   Assertion failed: i != f_epolls->end() (/home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc: epoll_del: 249)

   [backtrace]
   0x0000000000223648 <__assert_fail+24>
   0x000000000041743a <file::epoll_del(epoll_ptr)+186>
   0x00000000003f7682 <epoll_ctl+786>
   0x0000100000c032db <???+12595931>
   0x000000000040510a <osv::application::run_main(std::string, int, char**)+266>
   0x0000000000405847 <osv::application::run_main()+551>
   0x000000000020bbd1 <osv::application::main()+97>
   0x0000000000406698 <???+4220568>
   0x0000000000433745 <???+4405061>
   0x00000000003e38aa <thread_main_c+26>
   0x0000000000389c75 <???+3710069>

   (gdb) bt
   #0  0x00000000003f8e92 in cli_hlt () at /home/asias/src/cloudius-systems/osv/arch/x64/processor.hh:242
   cloudius-systems#1  halt_no_interrupts () at /home/asias/src/cloudius-systems/osv/arch/x64/arch.hh:49
   cloudius-systems#2  osv::halt () at /home/asias/src/cloudius-systems/osv/core/power.cc:34
   cloudius-systems#3  0x000000000022360f in abort (fmt=fmt@entry=0x603b60 "Assertion failed: %s (%s: %s: %d)\n") at /home/asias/src/cloudius-systems/osv/runtime.cc:150
   cloudius-systems#4  0x0000000000223649 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>)
       at /home/asias/src/cloudius-systems/osv/runtime.cc:156
   cloudius-systems#5  0x000000000041743b in file::epoll_del (this=<optimized out>, ep=...) at /home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc:249
   cloudius-systems#6  0x00000000003f7683 in del (key=..., this=0xffff800001746040) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:96
   cloudius-systems#7  epoll_ctl (epfd=<optimized out>, op=<optimized out>, fd=<optimized out>, event=0x2000002f9d50) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:283
   cloudius-systems#8  0x0000100000c032dc in test_epolloneshot () at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:78
   cloudius-systems#9  main (ac=<optimized out>, av=<optimized out>) at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:231
   cloudius-systems#10 0x000000000040510b in osv::application::run_main (this=this@entry=0xffffa00001561f18, path="/tests/tst-epoll.so", argc=argc@entry=1,
       argv=argv@entry=0xffffa00002994760) at /home/asias/src/cloudius-systems/osv/core/app.cc:182
   cloudius-systems#11 0x0000000000405848 in osv::application::run_main (this=this@entry=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:197
   cloudius-systems#12 0x000000000020bbd2 in osv::application::main (this=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:144
   cloudius-systems#13 0x0000000000406699 in operator() (__closure=0x0, app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:104
   cloudius-systems#14 osv::application::__lambda3::_FUN (app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:106
   cloudius-systems#15 0x0000000000433746 in operator() (__closure=0xffffa00003320600) at /home/asias/src/cloudius-systems/osv/libc/pthread.cc:97
   cloudius-systems#16 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda7>::_M_invoke(const std::_Any_data &) (__functor=...) at /home/asias/src/cloudius-systems/osv/external/x64/gcc.bin/usr/include/c++/4.8.2/functional:2071
   cloudius-systems#17 0x00000000003e38ab in main (this=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/core/sched.cc:940
   cloudius-systems#18 sched::thread_main_c (t=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/arch/x64/arch-switch.hh:137
   cloudius-systems#19 0x0000000000389c76 in thread_main () at /home/asias/src/cloudius-systems/osv/arch/x64/entry.S:113

Signed-off-by: Asias He <asias@cloudius-systems.com>
avikivity pushed a commit that referenced this issue Aug 28, 2014
The problem is that when epoll_wait removes the key from _file map, it
does not remove the key from _epoll map, so when epoll_ctrl(DEL) tries
to remove _file_map again, we hit the assert.

This fixes the following.

After:
    Revert "Revert "epoll: fix up lock ordering issues""

I saw:

   $ sudo scripts/run.py -c1 -nv -m4g  -e "/tests/tst-epoll.so"
   ...

   PASS: epoll_wait
   PASS: write
   PASS: epoll_wait
   PASS: epoll_ctl MOD
   PASS: epoll_wait
   PASS: epoll_ctl ADD
   PASS: errno == EEXIST
   Assertion failed: i != f_epolls->end() (/home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc: epoll_del: 249)

   [backtrace]
   0x0000000000223648 <__assert_fail+24>
   0x000000000041743a <file::epoll_del(epoll_ptr)+186>
   0x00000000003f7682 <epoll_ctl+786>
   0x0000100000c032db <???+12595931>
   0x000000000040510a <osv::application::run_main(std::string, int, char**)+266>
   0x0000000000405847 <osv::application::run_main()+551>
   0x000000000020bbd1 <osv::application::main()+97>
   0x0000000000406698 <???+4220568>
   0x0000000000433745 <???+4405061>
   0x00000000003e38aa <thread_main_c+26>
   0x0000000000389c75 <???+3710069>

   (gdb) bt
   #0  0x00000000003f8e92 in cli_hlt () at /home/asias/src/cloudius-systems/osv/arch/x64/processor.hh:242
   #1  halt_no_interrupts () at /home/asias/src/cloudius-systems/osv/arch/x64/arch.hh:49
   #2  osv::halt () at /home/asias/src/cloudius-systems/osv/core/power.cc:34
   #3  0x000000000022360f in abort (fmt=fmt@entry=0x603b60 "Assertion failed: %s (%s: %s: %d)\n") at /home/asias/src/cloudius-systems/osv/runtime.cc:150
   #4  0x0000000000223649 in __assert_fail (expr=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>)
       at /home/asias/src/cloudius-systems/osv/runtime.cc:156
   #5  0x000000000041743b in file::epoll_del (this=<optimized out>, ep=...) at /home/asias/src/cloudius-systems/osv/fs/vfs/kern_descrip.cc:249
   #6  0x00000000003f7683 in del (key=..., this=0xffff800001746040) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:96
   #7  epoll_ctl (epfd=<optimized out>, op=<optimized out>, fd=<optimized out>, event=0x2000002f9d50) at /home/asias/src/cloudius-systems/osv/core/epoll.cc:283
   #8  0x0000100000c032dc in test_epolloneshot () at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:78
   #9  main (ac=<optimized out>, av=<optimized out>) at /home/asias/src/cloudius-systems/osv/tests/tst-epoll.cc:231
   #10 0x000000000040510b in osv::application::run_main (this=this@entry=0xffffa00001561f18, path="/tests/tst-epoll.so", argc=argc@entry=1,
       argv=argv@entry=0xffffa00002994760) at /home/asias/src/cloudius-systems/osv/core/app.cc:182
   #11 0x0000000000405848 in osv::application::run_main (this=this@entry=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:197
   #12 0x000000000020bbd2 in osv::application::main (this=0xffffa00001561f18) at /home/asias/src/cloudius-systems/osv/core/app.cc:144
   #13 0x0000000000406699 in operator() (__closure=0x0, app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:104
   #14 osv::application::__lambda3::_FUN (app=<optimized out>) at /home/asias/src/cloudius-systems/osv/core/app.cc:106
   #15 0x0000000000433746 in operator() (__closure=0xffffa00003320600) at /home/asias/src/cloudius-systems/osv/libc/pthread.cc:97
   #16 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda7>::_M_invoke(const std::_Any_data &) (__functor=...) at /home/asias/src/cloudius-systems/osv/external/x64/gcc.bin/usr/include/c++/4.8.2/functional:2071
   #17 0x00000000003e38ab in main (this=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/core/sched.cc:940
   #18 sched::thread_main_c (t=0xffff80000354d050) at /home/asias/src/cloudius-systems/osv/arch/x64/arch-switch.hh:137
   #19 0x0000000000389c76 in thread_main () at /home/asias/src/cloudius-systems/osv/arch/x64/entry.S:113

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
nyh pushed a commit that referenced this issue Dec 9, 2019
New versions of boost like 1.69.0 have more complicated
dependendencies and cause issues with some of the unit tests.

```
tst-rcu-list.so
  libboost_unit_test_framework.so.1.69.0
    libboost_timer.so.1.69.0
      libboost_chrono.so.1.69.0
        libboost_system.so.1.69.0
        libgcc_s.so.1
      libboost_system.so.1.69.0
      libgcc_s.so.1
    libboost_system.so.1.69.0
    libgcc_s.so.1
  libboost_filesystem.so.1.69.0
    libboost_system.so.1.69.0
    libgcc_s.so.1
  libgcc_s.so.1
```

More specifically unit tests like tst-rcu-list and tst-rcu-hashtable
crash when executing FINI functions while unloading boost libraries
with a stack trace like this:

```
#0  0x00000000403a3cf2 in processor::cli_hlt () at arch/x64/processor.hh:247
#1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2  osv::halt () at arch/x64/power.cc:26
#3  0x0000000040239dae in abort (fmt=fmt@entry=0x4064155f "Aborted\n") at runtime.cc:132
#4  0x00000000402028ab in abort () at runtime.cc:98
#5  0x000000004021981e in osv::generate_signal (siginfo=..., ef=0xffff80000123c068) at libc/signal.cc:124
#6  0x0000000040463f7f in osv::handle_mmap_fault (addr=<optimized out>, sig=<optimized out>, ef=<optimized out>) at libc/signal.cc:139
#7  0x000000004033da62 in mmu::vm_fault (addr=17592187015168, addr@entry=17592187015648, ef=ef@entry=0xffff80000123c068) at core/mmu.cc:1337
#8  0x000000004039dc30 in page_fault (ef=0xffff80000123c068) at arch/x64/mmu.cc:42
#9  <signal handler called>
#10 0x00001000000ed1e0 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() ()
#11 0x000000004023a357 in __cxxabiv1::__cxa_finalize (dso=<optimized out>) at runtime.cc:183
#12 0x000010000041f5f7 in __do_global_dtors_aux ()
#13 0x00002000001008f0 in ?? ()
#14 0x0000000040352f34 in elf::object::run_fini_funcs (this=0xffffa000015d0a00) at core/elf.cc:1105
```

The order in which FINI functions are executed currently follows
the order in which its objects are unloaded. It is not exactly clear why,
but there seems to be some dependency between child
objects in the DT_NEEDED list of paricular object. For example,
if in tst-rcu-list.so the libboost_unit_test_framework.so.1.69.0
is unloaded first and libboost_filesystem.so.1.69.0 unloaded later,
the std string destructor called by libboost_filesystem.so.1.69.0 FINI
function causes page fault shown above.

The commit #4d24b90aa38d90bdcf5ebecf7a5badeca3b657eb from 6 years ago
made process of unloading of objects symmetrical to the loading one
but not quite exactly which seems to be a culprit.

This patch refines the `elf::unload_needed()` method by changing the order
in which the dependent object references are released.

First we release references held by the `_used_by_resolve_plt_got` set
mostly because they are added after DT_NEEDED is processed and `_needed` populated and
we want this to be symmetrical.

Secondly we release references from the `_needed` vector
by iterating and removing elements starting at the end. This
also makes the unloading behavior symmetrical to the order
in which the objects from DT_NEEDED were loaded in first place.

Concluding with this patch all the unit tests now
pass when linked the boost 1.69.0 from Fedora 30 host.

Signed-off-by: Waldemar Kozaczuk <jwkozaczuk@gmail.com>
Message-Id: <20191208185437.21976-1-jwkozaczuk@gmail.com>
nyh pushed a commit that referenced this issue Mar 29, 2021
…ssembly

This patch adds necessary CFI (Call Frame Information) directives
to the assembly code in entry.S and macros.S to make stack traces
for page faults and interrupt handler look correct. Otherwise
debugging scenarios that involve those becomes very painful.

Before the patch:
-----------------
#0  mmu::file_vma::fault (this=0xffffa00040c9b280, addr=17592186171392, ef=0x200000100110) at core/mmu.cc:1708
#1  0x00000000401da05c in mmu::vm_fault (addr=17592186171392, ef=0x200000100110) at core/mmu.cc:1354
#2  0x000000004020ba9c in page_fault (ef=0x200000100110) at arch/aarch64/mmu.cc:50
#3  0x000000004020b82c in handle_mem_abort () at arch/aarch64/entry.S:161
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

After the patch:
----------------
#0  mmu::file_vma::fault (this=0xffffa00040c9b280, addr=17592186171392, ef=0x200000100110) at core/mmu.cc:1708
#1  0x00000000401da05c in mmu::vm_fault (addr=17592186171392, ef=0x200000100110) at core/mmu.cc:1354
#2  0x000000004020ba9c in page_fault (ef=0x200000100110) at arch/aarch64/mmu.cc:50
#3  <signal handler called>
#4  elf::object::_dynamic_tag (tag=29, this=0xffffa00040dcf800) at core/elf.cc:649
#5  elf::object::dynamic_exists (tag=29, this=0xffffa00040dcf800) at core/elf.cc:641
#6  elf::object::load_needed (this=0xffffa00040dcf800, loaded_objects=std::vector of length 1, capacity 1 = {...}) at core/elf.cc:1068
#7  0x00000000401f1bd4 in elf::program::load_object (this=this@entry=0xffffa00040897ba0, name="/libvdso.so", extra_path=std::vector of length 0, capacity 0,
    loaded_objects=std::vector of length 1, capacity 1 = {...}) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/shared_ptr_base.h:1321
#8  0x00000000401f24fc in elf::program::get_library (this=0xffffa00040897ba0, name="libvdso.so", extra_path=std::vector of length 0, capacity 0, delay_init=false)
    at core/elf.cc:1459
#9  0x0000000040314ea0 in osv::application::prepare_argv (this=this@entry=0xffffa00040dcf610, program=program@entry=0xffffa00040897ba0) at core/app.cc:361
#10 0x0000000040315694 in osv::application::application(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>) (this=0xffffa00040dcf610, command="/tests/tst-hello.so", args=..., new_program=false, env=0x0, main_function_name="main", post_main=...)
    at core/app.cc:182
...

Before the patch:
-----------------
#0  virtio::blk::ack_irq (this=0xffffa00040a91c00) at drivers/virtio-blk.cc:103
#1  operator() (__closure=0xffffa00040950e40) at drivers/virtio-blk.cc:143
#2  std::__invoke_impl<bool, virtio::blk::blk(virtio::virtio_device&)::<lambda(pci::device&)>::<lambda()>&> (__f=...)
    at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/invoke.h:60
#3  std::__invoke_r<bool, virtio::blk::blk(virtio::virtio_device&)::<lambda(pci::device&)>::<lambda()>&> (__fn=...)
    at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/invoke.h:141
#4  std::_Function_handler<bool(), virtio::blk::blk(virtio::virtio_device&)::<lambda(pci::device&)>::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/std_function.h:291
#5  0x000000004020c784 in std::function<bool ()>::operator()() const (this=<optimized out>)
    at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/std_function.h:622
#6  interrupt_table::invoke_interrupt (this=this@entry=0x40780968 <idt>, id=id@entry=36) at arch/aarch64/exceptions.cc:135
#7  0x000000004020c95c in interrupt (frame=<optimized out>) at arch/aarch64/exceptions.cc:167
#8  0x000000004020b930 in entry_irq () at arch/aarch64/entry.S:175
#9  0x000000004020b930 in entry_irq () at arch/aarch64/entry.S:175
#10 0x000000004020b930 in entry_irq () at arch/aarch64/entry.S:175
#11 0x000000004020b930 in entry_irq () at arch/aarch64/entry.S:175
#12 0x000000004020b930 in entry_irq () at arch/aarch64/entry.S:175
#13 0x000000004020b930 in entry_irq () at arch/aarch64/entry.S:175
... (infinite number of frames)

After the patch:
----------------
#0  virtio::blk::ack_irq (this=0xffffa00040a91c00) at drivers/virtio-blk.cc:103
#1  operator() (__closure=0xffffa00040950e40) at drivers/virtio-blk.cc:143
#2  std::__invoke_impl<bool, virtio::blk::blk(virtio::virtio_device&)::<lambda(pci::device&)>::<lambda()>&> (__f=...)
    at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/invoke.h:60
#3  std::__invoke_r<bool, virtio::blk::blk(virtio::virtio_device&)::<lambda(pci::device&)>::<lambda()>&> (__fn=...)
    at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/invoke.h:141
#4  std::_Function_handler<bool(), virtio::blk::blk(virtio::virtio_device&)::<lambda(pci::device&)>::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/std_function.h:291
#5  0x000000004020c784 in std::function<bool ()>::operator()() const (this=<optimized out>)
    at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/std_function.h:622
#6  interrupt_table::invoke_interrupt (this=this@entry=0x40780968 <idt>, id=id@entry=36) at arch/aarch64/exceptions.cc:135
#7  0x000000004020c95c in interrupt (frame=<optimized out>) at arch/aarch64/exceptions.cc:167
#8  <signal handler called>
#9  sched::cpu::do_idle (this=this@entry=0xffff8000407fe040) at core/sched.cc:454
#10 0x00000000402e6b78 in sched::cpu::idle (this=0xffff8000407fe040) at core/sched.cc:472
#11 0x00000000402e6ba0 in operator() (__closure=<optimized out>, __closure=<optimized out>) at core/sched.cc:165
#12 std::__invoke_impl<void, sched::cpu::init_idle_thread()::<lambda()>&> (__f=...)
    at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/invoke.h:60
#13 std::__invoke_r<void, sched::cpu::init_idle_thread()::<lambda()>&> (__fn=...) at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/invoke.h:153
#14 std::_Function_handler<void(), sched::cpu::init_idle_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at build/downloaded_packages/aarch64/gcc/install/usr/include/c++/10/bits/std_function.h:291
#15 0x00000000402e72b4 in sched::thread::main (this=0xffffa000408eea48) at core/sched.cc:1267
#16 sched::thread_main_c (t=0xffffa000408eea48) at arch/aarch64/arch-switch.hh:162
#17 0x000000004020b788 in thread_main () at arch/aarch64/entry.S:114

Refs #1128

Signed-off-by: Waldemar Kozaczuk <jwkozaczuk@gmail.com>
Message-Id: <20210329214843.31992-1-jwkozaczuk@gmail.com>
nyh pushed a commit that referenced this issue May 11, 2022
The manifest skeleton files like usr.manifest.skel have an entry
for libgcc_s_dir looking like this:

/usr/lib/libgcc_s.so.1: %(libgcc_s_dir)s/libgcc_s.so.1

This actually gets expanded quite late during the build process
by upload_manifest.py. The unfortunate consequence of this is that
loader.py used during debugging which reads usr.manifest does not have
any logic to load libgcc_s.so.1. And this makes stack traces
look useless in those cases that involve libgcc_s.so.1.

So this patch slightly changes the scripts/build and scripts/module.py
to expand %(libgcc_s_dir) when writing to build/release/usr.manifest.
As a result of this the stack trace of the crash I have been working
on looks much more reasonable:

(gdb) bt
'#0  0x00000000403047c2 in processor::cli_hlt () at arch/x64/processor.hh:247
#1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2  osv::halt () at arch/x64/power.cc:29
#3  0x0000000040239504 in abort (fmt=fmt@entry=0x405b1e93 "Aborted\n") at runtime.cc:142
#4  0x0000000040202e80 in abort () at runtime.cc:106
#5  0x000010000002b6b6 in ?? ()
#6  0x000010000003f5cb in _Unwind_Resume ()
#7  0x0000100000062daa in ?? ()
#8  0x0000100000075b5c in boost::execution_monitor::vexecute(boost::function<void ()> const&) ()
#9  0x000010000007f0a9 in boost::unit_test::framework::init(bool (*)(), int, char**) ()
#10 0x000010000009254d in boost::unit_test::unit_test_main(bool (*)(), int, char**) ()
#11 0x000000004039d021 in osv::application::run_main (this=0xffffa00000bd8c10) at core/app.cc:416
#12 0x000000004039d22d in operator() (app=<optimized out>, __closure=0x0) at core/app.cc:236
#13 _FUN () at core/app.cc:238
#14 0x00000000403d089a in operator() (__closure=0xffffa00000d57800) at libc/pthread.cc:116
#15 std::__invoke_impl<void, pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()>&> (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#16 std::__invoke_r<void, pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()>&> (__fn=...) at /usr/include/c++/11/bits/invoke.h:154
#17 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/11/bits/std_function.h:290
#18 0x000000004036b5ae in sched::thread::main (this=0xffff800000f6a040) at core/sched.cc:1267
#19 sched::thread_main_c (t=0xffff800000f6a040) at arch/x64/arch-switch.hh:325
#20 0x00000000402fda43 in thread_main () at arch/x64/entry.S:116

Signed-off-by: Waldemar Kozaczuk <jwkozaczuk@gmail.com>
Message-Id: <20220511015843.33294-1-jwkozaczuk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants