Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when launching with POSTPONE:<color> #25

Open
uahic opened this issue Jul 19, 2016 · 4 comments
Open

Segmentation fault when launching with POSTPONE:<color> #25

uahic opened this issue Jul 19, 2016 · 4 comments

Comments

@uahic
Copy link

uahic commented Jul 19, 2016

Use-case: I am trying to adapt the PyNN MUSIC branch towards PyNN 0.8.1 and NEST 2.10. While the PyNN part was no problem a segmentation fault occurs within MUSIC. This happens independently of PyNN-MUSIC.

The error can be reproduced when launching MUSIC with _MUSIC_CONFIG=POSTPONE:0 either with 'python <scriptname.py>' or 'mpiexec -np 1 <scriptname.py>' (launching music as single process)

@mdjurfeldt
From what I can observe is that the old Python-config API sets POSTPONE: but the configuration parser actually expects an ApplicationMap-section within the ENV. Maybe I have missed something to do but in the current state it looks like that either the music-config/config.py must provide a full application-map in the ENV (=I need to change the way PyNN-Music/multisim.py assembles this) or that the MUSIC C++ code must be adapted.

Another question is what the runtime actually does when postpone is true? I mean it calls maybePostponedSetup but I dont see where the updated ENV's are actually parsed, maybe I do miss something?

`void
Setup::maybePostponedSetup ()
{
if (postponeSetup_)
{
delete config_;
config_ = new Configuration ();
fullInit ();
}
}

void
Setup::fullInit ()
{
errorChecks ();
if (!config ("timebase", &timebase_))
timebase_ = MUSIC_DEFAULT_TIMEBASE; // default timebase
string binary;
config_->lookup ("binary", &binary);
string args;
config_->lookup ("args", &args);
argv_ = parseArgs (binary, args, &argc_);
temporalNegotiator_ = new TemporalNegotiator (this);
}
`

@uahic
Copy link
Author

uahic commented Sep 15, 2016

I adjusted the Configuration class such that POSTPONE is now handled properly;

however I would like to know how to start MPI such that MUSIC is not handling all of the MPI nodes as member of the same Simulation group but that it is also possible to let two NEST simulation groups communicate with each other.

@mdjurfeldt
Copy link
Contributor

Sorry for not returning to the POSTPONE issue for such a long time. I'm not sure what happened to this code, which was correct before. If you think that your fix is the right one, you are of course free to submit a pull request. In either case, I need to look at this and will do so ASAP.

Could you please clarify what you mean with your comment about simulation groups? Each MUSIC-aware application gets its own intracommunicator, associated with its own MPI process group, to be used for its internal communication. What happens above this is dependent on the communication algorithm selected. For pairwise communication, intercommunicators are created for use by MUSIC ports (these are not available through the MUSIC API). For collective communication, a communicator covering all MPI processes is instead used internally.

Given this, you can have two instances of NEST running as separate MUSIC-aware applications, with their own MPI process groups, and these can communicate with eachother through MUSIC ports. This sounds similar to what you request, but probably isn't. What is it that you request?

@uahic
Copy link
Author

uahic commented Sep 16, 2016

This answer is really what I was looking for; Here is my current understanding (feel free to correct me where I am wrong):

Using MUSIC via music binary (mpiexec .... music

  • MPI launches new processes with music binary
  • MUSIC binary parses the config file and assembles/exports the information to a string
  • Binary as specified in the config is launched
  • MUSIC splits intra-communicators from the COMM_WORLD communicator with respect to its application-color

If I launch two different NEST simulations that are coupled via MUSIC (this is what I mean -imprecisely speaking - meant with groups) it works all fine.

When launching mpiexec python <pynn_music_script.py>, then MUSIC does not complain anymore after the little bugfix but NEST does. More in detail: it says that the random number generators are not in synchrony. All of the MPI ranks seem to be in the same group with respect to NEST. I had no deeper look into the MPI management of NEST but I think its simply using COMM_WORLD. Maybe there is (actually must be) the usage of intracommunicators when MUSIC is enabled but it seems not working as I do it right now, probably I need to crawl some NEST code to get a better understanding or you already know it?

Back to the pull-request story:
Does POSTPONE work for you? if yes what MUSIC and pynn versions are you using? I fixed it for the most recent MUSIC version and merged the music subfolder from the old PyNN towards the 0.8.1 version repo (so I might need to ship that as well if you want to test it)

@mdjurfeldt
Copy link
Contributor

Your understanding is correct.

Can you provide a simple test case demonstrating your python problem such that I can reproduce it on my machine? I will then debug it.

Getting back to you regarding POSTPONE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants