Scalable cluster administration Python framework — Manage node sets, node groups and execute commands on cluster nodes in parallel.
Python Shell
Latest commit b121270 Sep 14, 2016 @martinetd martinetd EngineClient: skip closing eof on non-existing streams
We close stdin when clush's stdin closes as well as when remote workers
exit.

Fixes this kind of trace:
$ echo | clush -w foobar[0-11] -d echo
...
WARNING:ClusterShell.Worker.EngineClient:<ClusterShell.Worker.Ssh.SshClient instance at 0x28eaab8>: [Errno 32] Broken pipe
...
foobar0: ssh: Could not resolve hostname foobar0: Name or service not known
...
clush: foobar0: exited with exit code 255
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ClusterShell/CLI/Clush.py", line 751, in clush_excepthook
    raise exp
KeyError: 'stdin'

Original exception was:
Traceback (most recent call last):
  File "/usr/bin/clush", line 10, in <module>
    main()
  File "/usr/lib/python2.7/site-packages/ClusterShell/CLI/Clush.py", line 1069, in main
    options.remote != 'no')
  File "/usr/lib/python2.7/site-packages/ClusterShell/CLI/Clush.py", line 667, in run_command
    task.resume()
  File "/usr/lib/python2.7/site-packages/ClusterShell/Task.py", line 799, in resume
    self._resume()
  File "/usr/lib/python2.7/site-packages/ClusterShell/Task.py", line 762, in _resume
    self._run(self.timeout)
  File "/usr/lib/python2.7/site-packages/ClusterShell/Task.py", line 407, in _run
    self._engine.run(timeout)
  File "/usr/lib/python2.7/site-packages/ClusterShell/Engine/Engine.py", line 700, in run
    self.runloop(timeout)
  File "/usr/lib/python2.7/site-packages/ClusterShell/Engine/EPoll.py", line 165, in runloop
    client._handle_read(sname)
  File "/usr/lib/python2.7/site-packages/ClusterShell/Worker/EngineClient.py", line 520, in _handle_read
    self.eh.ev_msg(self, pmsg.get())
  File "/usr/lib/python2.7/site-packages/ClusterShell/CLI/Clush.py", line 85, in ev_msg
    self.master_worker.set_write_eof()
  File "/usr/lib/python2.7/site-packages/ClusterShell/Worker/Exec.py", line 365, in set_write_eof
    client._set_write_eof(sname or self.SNAME_STDIN)
  File "/usr/lib/python2.7/site-packages/ClusterShell/Worker/EngineClient.py", line 415, in _set_write_eof
    wfile = self.streams[sname]
KeyError: 'stdin'

Closes #320.

Change-Id: I877e036548dd95abda4494e420ab6c9bc5138bff

README.md

ClusterShell 1.7 Python Library and Tools

ClusterShell is an event-driven open source Python library, designed to run local or distant commands in parallel on server farms or on large Linux clusters. It will take care of common issues encountered on HPC clusters, such as operating on groups of nodes, running distributed commands using optimized execution algorithms, as well as gathering results and merging identical outputs, or retrieving return codes. ClusterShell takes advantage of existing remote shell facilities already installed on your systems, like SSH.

ClusterShell's primary goal is to improve the administration of high- performance clusters by providing a lightweight but scalable Python API for developers. It also provides clush, clubak and nodeset, three convenient command-line tools that allow traditional shell scripts to benefit from some of the library features.

Requirements (v1.7)

  • GNU/Linux, *BSD, Mac OS X
  • OpenSSH (ssh/scp) or rsh
  • Python 2.x (x >= 4)
  • PyYAML (optional)

License

ClusterShell is distributed under the CeCILL-C license, a French transposition of the GNU LGPL, and is fully LGPL-compatible (see Licence_CeCILL-C_V1-en.txt).

Documentation

Online documentation is available here:

http://clustershell.readthedocs.org/

The Sphinx documentation source is available under the doc/sphinx directory. Type 'make' to see all available formats (you need Sphinx installed and sphinx_rtd_theme to build the documentation). For example, to generate html docs, just type:

make html BUILDDIR=/dest/path

For local library API documentation, just type:

$ pydoc ClusterShell

The following man pages are also provided:

clush(1), clubak(1), nodeset(1), clush.conl(5), groups.conf(5)

Test Suite

Regression testing scripts are available in the 'tests' directory:

$ cd tests
$ nosetests -sv <Test.py>
$ nosetests -sv --all-modules

You have to allow 'ssh localhost' and 'ssh $HOSTNAME' without any warnings for "remote" tests to run as expected. $HOSTNAME should not be 127.0.0.1 nor ::1. Also some tests use the 'bc' command.

ClusterShell interactively

>>> from ClusterShell.Task import task_self
>>> from ClusterShell.NodeSet import NodeSet
>>> task = task_self()
>>> task.run("/bin/uname -r", nodes="linux[4-6,32-39]")
<ClusterShell.Worker.Ssh.WorkerSsh object at 0x20a5e90>
>>> for buf, key in task.iter_buffers():
...     print NodeSet.fromlist(key), buf
... 
linux[32-39] 2.6.40.6-0.fc15.x86_64

linux[4-6] 2.6.32-71.el6.x86_64

Links

Web site:

http://clustershell.sourceforge.net
or http://cea-hpc.github.com/clustershell/

Online documentation:

http://clustershell.readthedocs.org/

Github source respository:

https://github.com/cea-hpc/clustershell

Github Wiki:

https://github.com/cea-hpc/clustershell/wiki

Github Issue tracking system:

https://github.com/cea-hpc/clustershell/issues

Sourceforge.net project page:

http://sourceforge.net/projects/clustershell

Python Package Index (PyPI) link:

http://pypi.python.org/pypi/ClusterShell

ClusterShell was born along with Shine, a scalable Lustre FS admin tool:

http://lustre-shine.sourceforge.net

Core developers/reviewers

  • Stephane Thiell
  • Aurelien Degremont
  • Henri Doreau
  • Dominique Martinet

CEA/DAM 2010, 2011, 2012, 2013, 2014, 2015 - http://www-hpc.cea.fr