Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Psi 4.0b5 with ictce-5.5.0 and MPI #434

Closed
wants to merge 8 commits into from

Conversation

wpoely86
Copy link
Member

I've build it on the raichu and everything seems to work. The test suite takes a long time but finished successful.

@boegel
Copy link
Member

boegel commented Sep 13, 2013

Hold on, now I'm confused... So, nothing has changed compared to the stuff that was included in #433 w.r.t. the mpi easyconfig for PSI?
Why is the -mt (which was working) not included anymore (since you closed #433)?

@wpoely86 wpoely86 closed this Sep 13, 2013
@wpoely86 wpoely86 reopened this Sep 13, 2013
@wpoely86
Copy link
Member Author

Oeps, I've pushed the wrong branch. This one should be correct.

@boegel, yeah nothing has changed expect that the -mt is split off in a separate branch. That pull request is coming soon ;-)

The test suite has run on the raichu and every tests has finished successful.

@boegel
Copy link
Member

boegel commented Sep 16, 2013

Note: on hold until the issue with hanging MPI tests is sorted out (see discussion in #433).

A variable is changed without proper mutex locking.
@wpoely86
Copy link
Member Author

@boegel I've debugged one case where the test mpn-hb hanged and added a patch. But I'm unable to reproduce the hangs consistently. Can you rerun the testsuite with the latest eb?

A variable is changed without proper mutex locking.
@wpoely86
Copy link
Member Author

OK, now with the patch in the commit.

Feel free to squash to last 2 commits into one when you merge.

@wpoely86
Copy link
Member Author

OK, I've tried again and still unable to reproduce the hang. As far as I can test, it doesn't hang anymore...

 I've added two env vars to the configure script: one for the install
 directory for the source and one for the install directory of the
 objects.
@boegel
Copy link
Member

boegel commented Oct 2, 2013

@wpoely86: The new-plugin patch you added is basically a rewrite of the configure script... Can you describe what's going on there? Why is the diff so big, what changed exactly, ...?

@wpoely86
Copy link
Member Author

wpoely86 commented Oct 2, 2013

@boegel The diff is so big because the configure script is regenerated. The real difference is in configure.ac, include/psiconfig.h.in and src/bin/psi4/create_new_plugin.cc (the bottom part of the patch). You can ignore the actual configure script as it is autogenerated on basis of configure.ac.

I haven't pushed this patch upstream. It's on my todo list. The thread pool and the MPI patch are already merged upstream.

@boegel
Copy link
Member

boegel commented Oct 2, 2013

@wpoely86: OK, thank you for clarifying that. Please do the following to make that more clear:

  • add comments at the top of the patch file describing what the patch does (summarized), and mention it's so big because of the regenerated configure script
  • move all the non-configure parts of the patch to the top of the file, to highlight the actual changes

@wpoely86
Copy link
Member Author

wpoely86 commented Oct 2, 2013

Done.

@boegel
Copy link
Member

boegel commented Oct 2, 2013

@wpoely86: Excellent, thanks. I'm (re)testing the PSI builds on my end, and I'll merge this (and #439) in when they succeed.

@wpoely86
Copy link
Member Author

wpoely86 commented Oct 2, 2013

@boegel if it hangs, please don't kill it. Attach gdb to it and give a backtrace of all threads. (Or give me access if that's possible).

@boegel
Copy link
Member

boegel commented Oct 2, 2013

@wpoely86: Ok, I will.

@boegel
Copy link
Member

boegel commented Oct 2, 2013

@wpoely86: The PSI-4.0b5-ictce-5.5.0.eb is hanging, here's the GDB stracktrace after attaching to the psi4 process:

(gdb) bt
#0  0x00002aaaab27343c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000f237ca in psi::detci::tpool_queue_close(psi::detci::tpool*, int) ()
#2  0x0000000000f1e13f in psi::detci::s3_block_v(psi::detci::stringwr*, psi::detci::stringwr*, double**, double**, double*, int, int, int, int, int, int, int, int, double**, double*, double*, double*, int*, int*) ()
#3  0x0000000000ef8fa7 in psi::detci::sigma_block(psi::detci::stringwr**, psi::detci::stringwr**, double**, double**, double*, double*, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int) ()
#4  0x0000000000ef9620 in psi::detci::sigma_b(psi::detci::stringwr**, psi::detci::stringwr**, psi::detci::CIvect&, psi::detci::CIvect&, double*, double*, int, int) ()
#5  0x0000000000ef7f12 in psi::detci::sigma(psi::detci::stringwr**, psi::detci::stringwr**, psi::detci::CIvect&, psi::detci::CIvect&, double*, double*, int, int) ()
#6  0x0000000000f03855 in psi::detci::mpn_generator(psi::detci::CIvect&, psi::detci::stringwr**, psi::detci::stringwr**) ()
#7  0x0000000000ed4905 in psi::detci::mpn(psi::detci::stringwr**, psi::detci::stringwr**) ()
#8  0x0000000000ecfa69 in psi::detci::detci(psi::Options&) ()
#9  0x000000000072b3be in py_psi_detci() ()
#10 0x000000000074be6e in boost::python::objects::caller_py_function_impl<boost::python::detail::caller<double (*)(), boost::python::default_call_policies, boost::mpl::vector1<double> > >::operator()(_object*, _object*) ()
#11 0x00002aaaacf16c9f in boost::python::objects::function::call(_object*, _object*) const ()
   from /user/scratch/gent/vsc400/vsc40023/easybuild_REGTEST/SL6/sandybridge/software/Boost/1.53.0-ictce-5.5.0-Python-2.7.5/lib/libboost_python.so.1.53.0
#12 0x00002aaaacf169c5 in boost::detail::function::void_function_ref_invoker0<boost::python::objects::(anonymous namespace)::bind_return, void>::invoke(boost::detail::function::function_buffer&) () from /user/scratch/gent/vsc400/vsc40023/easybuild_REGTEST/SL6/sandybridge/software/Boost/1.53.0-ictce-5.5.0-Python-2.7.5/lib/libboost_python.so.1.53.0
#13 0x00002aaaacf2150f in boost::python::detail::exception_handler::operator()(boost::function0<void> const&) const ()
   from /user/scratch/gent/vsc400/vsc40023/easybuild_REGTEST/SL6/sandybridge/software/Boost/1.53.0-ictce-5.5.0-Python-2.7.5/lib/libboost_python.so.1.53.0
#14 0x0000000000740588 in boost::detail::function::function_obj_invoker2<boost::_bi::bind_t<bool, boost::python::detail::translate_exception<psi::PsiException, void (*)(psi::PsiException const&)>, boost::_bi::list3<boost::arg<1>, boost::arg<2>, boost::_bi::value<void (*)(psi::PsiException const&)> > >, bool, boost::python::detail::exception_handler const&, boost::function0<void> const&>::invoke(boost::detail::function::function_buffer&, boost::python::detail::exception_handler const&, boost::function0<void> const&) ()
#15 0x00002aaaacf218ca in boost::python::handle_exception_impl(boost::function0<void>) ()
   from /user/scratch/gent/vsc400/vsc40023/easybuild_REGTEST/SL6/sandybridge/software/Boost/1.53.0-ictce-5.5.0-Python-2.7.5/lib/libboost_python.so.1.53.0
#16 0x00002aaaacf18567 in function_call ()
   from /user/scratch/gent/vsc400/vsc40023/easybuild_REGTEST/SL6/sandybridge/software/Boost/1.53.0-ictce-5.5.0-Python-2.7.5/lib/libboost_python.so.1.53.0
#17 0x00002aaaae031acb in PyObject_Call (func=0x43edc6c, arg=0x80, kw=0x191) at Objects/abstract.c:2529
#18 0x00002aaaae110354 in update_keyword_args (pp_stack=0x43edc6c, oparg=128) at Python/ceval.c:4239
#19 do_call (pp_stack=0x43edc6c, oparg=128) at Python/ceval.c:4211
#20 call_function (pp_stack=0x43edc6c, oparg=128) at Python/ceval.c:4044
#21 0x00002aaaae10871f in PyEval_EvalFrameEx (f=0x43edc6c, throwflag=128) at Python/ceval.c:2666
#22 0x00002aaaae10ec1e in PyEval_EvalCodeEx (co=0x43edc6c, globals=0x80, locals=0x191, args=0xffffffffffffffff, argcount=0, kws=0xc8, kwcount=1, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3253
#23 0x00002aaaae06e570 in function_call (func=0x43edc6c, arg=0x80, kw=0x191) at Objects/funcobject.c:526
#24 0x00002aaaae031acb in PyObject_Call (func=0x43edc6c, arg=0x80, kw=0x191) at Objects/abstract.c:2529
#25 0x00002aaaae10f9a7 in ext_do_call (func=0x43edc6c, pp_stack=0x80, flags=401, na=-1, nk=0) at Python/ceval.c:4334
#26 0x00002aaaae1082a6 in PyEval_EvalFrameEx (f=0x43edc6c, throwflag=128) at Python/ceval.c:2705
#27 0x00002aaaae10ec1e in PyEval_EvalCodeEx (co=0x43edc6c, globals=0x80, locals=0x191, args=0xffffffffffffffff, argcount=0, kws=0xc8, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3253
#28 0x00002aaaae110668 in fast_function (pp_stack=0x43edc6c, oparg=128) at Python/ceval.c:4117
#29 call_function (pp_stack=0x43edc6c, oparg=128) at Python/ceval.c:4042
#30 0x00002aaaae10871f in PyEval_EvalFrameEx (f=0x43edc6c, throwflag=128) at Python/ceval.c:2666
#31 0x00002aaaae10ec1e in PyEval_EvalCodeEx (co=0x43edc6c, globals=0x80, locals=0x191, args=0xffffffffffffffff, argcount=0, kws=0xc8, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3253
#32 0x00002aaaae10e489 in PyEval_EvalCode (co=0x43edc6c, globals=0x80, locals=0x191) at Python/ceval.c:667
#33 0x00002aaaae14a643 in run_mod (str=0x43edc6c "\221\001", start=128, globals=0x191, locals=0xffffffffffffffff, flags=0x0) at Python/pythonrun.c:1365
---Type <return> to continue, or q <return> to quit---
#34 PyRun_StringFlags (str=0x43edc6c "\221\001", start=128, globals=0x191, locals=0xffffffffffffffff, flags=0x0) at Python/pythonrun.c:1328
#35 0x00002aaaacf26092 in boost::python::exec(boost::python::str, boost::python::api::object, boost::python::api::object) ()
   from /user/scratch/gent/vsc400/vsc40023/easybuild_REGTEST/SL6/sandybridge/software/Boost/1.53.0-ictce-5.5.0-Python-2.7.5/lib/libboost_python.so.1.53.0
#36 0x0000000000736337 in psi::Python::run(_IO_FILE*) ()
#37 0x00000000007037b2 in main ()

Avoid collision with changes in PSI-4.0b4-mpi.patch that are already in
PSI-4.0b5-mpi-memcpy.patch.
@boegel
Copy link
Member

boegel commented Oct 2, 2013

I'm still running into hanging builds with PSI 4.0b5, so I'm forced to postpone this until after EB v1.8, sorry.

@boegel
Copy link
Member

boegel commented Oct 2, 2013

Please merge this into #443.

@boegel boegel mentioned this pull request Oct 2, 2013
Hang should be fixed now.
@wpoely86
Copy link
Member Author

wpoely86 commented Oct 3, 2013

😆
I've got the bug this time! The last commit should fix everything!

@boegel
Copy link
Member

boegel commented Oct 4, 2013

This PR is closed, but no new PSI PR to rule them all open yet?

@boegel boegel reopened this Oct 4, 2013
@wpoely86 wpoely86 mentioned this pull request Oct 6, 2013
@wpoely86
Copy link
Member Author

wpoely86 commented Oct 6, 2013

See #457

@wpoely86 wpoely86 closed this Oct 6, 2013
@wpoely86 wpoely86 deleted the psi4b5-ictce-mpi branch November 11, 2013 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants