Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on pyslurm.node() #52

Closed
ajmazurie opened this issue Dec 9, 2015 · 11 comments
Closed

Segmentation fault on pyslurm.node() #52

ajmazurie opened this issue Dec 9, 2015 · 11 comments

Comments

@ajmazurie
Copy link

Here is an example of session with Pyslurm triggering the error message:

Python 2.7.9 (default, Feb 23 2015, 14:53:24) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyslurm
>>> pyslurm.version()
'14.11.5'
>>> pyslurm.slurm_api_version()
(14, 11, 6)
>>> pyslurm.node()
Segmentation fault

Best,
Aurélien

@gingergeeks
Copy link
Member

Yep, this is has been raised by Jonathon Anderson on the Slurm mailing list and I'm currently working on it. How many nodes in your cluster ? The reason I ask is so I can recreate the issue.

@ajmazurie
Copy link
Author

I'll have to check but we're above 40 nodes at that point.
Aurélien

On Dec 9, 2015, at 2:28 PM, Gingergeek notifications@github.com wrote:

Yep, this is has been raised by Jonathon Anderson on the Slurm mailing list and I'm currently working on it. How many nodes in your cluster ? The reason I ask is so I can recreate the issue.


Reply to this email directly or view it on GitHub #52 (comment).

@gingergeeks
Copy link
Member

Thanks, sometime ago, I changed the code so I could track and manage the change in data (various classes) from the Slurm API so that looped monitoring programs could be used. Clearly I have a memory issue here but this was not showing up on my small test VM.

@gingergeeks
Copy link
Member

Aurélien,
I have traced a possible cause to a defined node in the config which does not truly exist. The Slurm API returns a record for it but with no data content. A quick check for a node name of NULL in the record was all that was necessary, does this match your configuration ?

Mark

@ajmazurie
Copy link
Author

Mark,
I'll have to get back to you on this. Are you asking if one of the node has a name 'NULL'?

Aurélien

On Dec 9, 2015, at 4:17 PM, Gingergeek notifications@github.com wrote:

Aurélien,
I have traced a possible cause to a defined node in the config which does not truly exist. The Slurm API returns a record for it but with no data content. A quick check for a node name of NULL in the record was all that was necessary, does this match your configuration ?

Mark


Reply to this email directly or view it on GitHub #52 (comment).

@gingergeeks
Copy link
Member

Aurélien,
It was a statement that we have traced the problem to a bug in PySlurm and not handling a node record with an empty nodename entry. I think this is where nodes have been entered in the slurm config but do not actually exist in the hosts file or DNS. My question was if it is possible that you have nodes in your config that do not actually exist ?

Mark

@gingergeeks
Copy link
Member

Aurélien,
If you pull the latest 14.11.5 the patch as discussed previously has been committed. Please let me know if this now works and close the ticket if you are happy.

Mark

@ajmazurie
Copy link
Author

Maybe... I do not have a complete understanding of how the cluster is managed, unfortunately.
Aurélien

On Dec 10, 2015, at 1:17 AM, Gingergeek notifications@github.com wrote:

Aurélien,
It was a statement that we have traced the problem to a bug in PySlurm and not handling a node record with an empty nodename entry. I think this is where nodes have been entered in the slurm config but do not actually exist in the hosts file or DNS. My question was if it is possible that you have nodes in your config that do not actually exist ?

Mark


Reply to this email directly or view it on GitHub.

@gingergeeks
Copy link
Member

Aurélien,
This should now be resolved with the latest pyslurm-14.11.5 commits. So please test and close the ticket if it is fixed.

Mark

@gingergeeks
Copy link
Member

Aurélien,
Have you had chance to test and confirm it is now resolved ?

Mark

@gingergeeks
Copy link
Member

Closing this as resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants