Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exatn::getService failure #7

Closed
DmitryLyakh opened this issue Jul 30, 2019 · 12 comments
Closed

exatn::getService failure #7

DmitryLyakh opened this issue Jul 30, 2019 · 12 comments
Assignees
Labels
invalid This doesn't seem right

Comments

@DmitryLyakh
Copy link
Member

DmitryLyakh commented Jul 30, 2019

Branch devel. After gluing together exatn::numerics and exatn::runtime it looks like CPP microservices exatn::getService does not work (it does not discover tensor runtime services). Three test fail with the same error.

@DmitryLyakh DmitryLyakh added the invalid This doesn't seem right label Jul 30, 2019
@amccaskey
Copy link
Member

I observed a failure in TAProlParserTester. It was missing exatn::initialize(). Can you post the other errors you saw, ctest --output-on-failure

@DmitryLyakh
Copy link
Member Author

Test project /home/dima/src/exatn/build
Start 1: ServiceRegistryTester
1/5 Test #1: ServiceRegistryTester ............ Passed 0.00 sec
Start 2: NumericsTester
2/5 Test #2: NumericsTester ...................***Exception: SegFault 0.11 sec
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from NumericsTester
[ RUN ] NumericsTester.checkSimple
2 1 13 0
{1:5,0:13}
2 61 32
{61,32}
{1:4}
[ OK ] NumericsTester.checkSimple (0 ms)
[ RUN ] NumericsTester.checkNumServer
Exatn not initialized before use. Please execute exatn::Initialize() before using API.
Could not find service with name eager-dag-executor. Perhaps the service is not Identifiable.
Invalid exatn Service. Could not find eager-dag-executor in Service Registry.
Exatn not initialized before use. Please execute exatn::Initialize() before using API.
Could not find service with name talsh-node-executor. Perhaps the service is not Identifiable.
Invalid exatn Service. Could not find talsh-node-executor in Service Registry.

Start 3: TAProLInterpreterTester

3/5 Test #3: TAProLInterpreterTester ..........***Exception: SegFault 0.10 sec
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from TAProLInterpreterTester
[ RUN ] TAProLInterpreterTester.checkSimple
Exatn not initialized before use. Please execute exatn::Initialize() before using API.

Start 4: DirectedBoostGraphTester

4/5 Test #4: DirectedBoostGraphTester ......... Passed 0.00 sec
Start 5: TensorRuntimeTester
5/5 Test #5: TensorRuntimeTester ..............***Exception: SegFault 0.10 sec
Exatn not initialized before use. Please execute exatn::Initialize() before using API.
Could not find service with name eager-dag-executor. Perhaps the service is not Identifiable.
Invalid exatn Service. Could not find eager-dag-executor in Service Registry.
Exatn not initialized before use. Please execute exatn::Initialize() before using API.
Could not find service with name talsh-node-executor. Perhaps the service is not Identifiable.
Invalid exatn Service. Could not find talsh-node-executor in Service Registry.

40% tests passed, 3 tests failed out of 5

Total Test time (real) = 0.32 sec

The following tests FAILED:
2 - NumericsTester (SEGFAULT)
3 - TAProLInterpreterTester (SEGFAULT)
5 - TensorRuntimeTester (SEGFAULT)
Errors while running CTest

@amccaskey
Copy link
Member

Oh cool, these are easy to fix. You just need to update the failing testers main() functions to first run exatn::initialize().

@DmitryLyakh
Copy link
Member Author

I actually did that in NumericsTester but it still fails for some reason ...

@DmitryLyakh
Copy link
Member Author

Oh, you meant I should put exatn::initialize() into main() instead of test itself?

@amccaskey
Copy link
Member

Well either would do if there is only one TEST(). Did you run make install?

@DmitryLyakh
Copy link
Member Author

NumericsTester has two tests and I put exatn::initialize() and exatn::finalize() in the second one since it was failing. So, I guess I should just move this to main() instead. Will try.

@DmitryLyakh
Copy link
Member Author

Do you think getService not being able to find registered exatn::runtime services was because of this as well?

@amccaskey
Copy link
Member

yes, if you don't call initialize() then the ServiceRegistry is not instantiated and initialized. ServiceRegistry::initialize() searches the ~/.exatn/plugins directory for shared libraries, loads them, and calls *Activator::start() on each one. Without doing this, there are no services available to get.

@DmitryLyakh
Copy link
Member Author

Nope, it does not help either, same thing:
Exatn not initialized before use. Please execute exatn::Initialize() before using API.
Could not find service with name eager-dag-executor. Perhaps the service is not Identifiable.
Invalid exatn Service. Could not find eager-dag-executor in Service Registry.
Exatn not initialized before use. Please execute exatn::Initialize() before using API.
Could not find service with name talsh-node-executor. Perhaps the service is not Identifiable.
Invalid exatn Service. Could not find talsh-node-executor in Service Registry.

@DmitryLyakh
Copy link
Member Author

Also note that it only complains about exatn runtime services: EagerGraphExecutor and TalshNodeExecutor.

@amccaskey
Copy link
Member

This is interesting. I was able to reproduce this. It is happening because TensorRuntime::TensorRuntime(..) constructor is getting reference to the executor services before the exatn::initialize() call is finished. So exatn::initalize() gets called, which loads the bundles, which loads the MPIRPCActivator.start(), which constructs MPIServer, which calls DriverServer super constructor, which instantiates a TAProlInterpreter, which creates the NumServer, which creates the TensorRuntime, which calls getService()... all before initialize() finishes. Phew....

Simple fix is to move the Taprol parser construction from DriverServer constructor initializer list, to the MPIServer.start() method.

amccaskey pushed a commit that referenced this issue Jul 31, 2019
Signed-off-by: Alex McCaskey <mccaskeyaj@ornl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants