Skip to content

Commit

Permalink
Merge pull request #686 from christian-monch/add-shell-examples
Browse files Browse the repository at this point in the history
Add shell-examples
  • Loading branch information
mih committed May 16, 2024
2 parents 5dea188 + c61ecdb commit 37246b1
Show file tree
Hide file tree
Showing 4 changed files with 227 additions and 40 deletions.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ There is no limit to the number of files. Contributors should strive for files w

Within a sub-package, code should generally use relative imports. The corresponding tests should also import the tested code via relative imports.

Code users should be able to import the most relevant functionality from the sub-package's `__init__.py`. Only items importable from the sub-package's top-level are considered to be part of its "public" API.
Code users should be able to import the most relevant functionality from the sub-package's `__init__.py`. Only items importable from the sub-package's top-level are considered to be part of its "public" API. If a sub-module is imported in the sub-package's `__init__.py`, consider adding `__all__` to the sub-module to restrict wildcard imports from the sub-module, and to document what is considered to be part of the "public" API.

Sub-packages should be as self-contained as possible. Individual components in `datalad-next` should strive to be easily migratable to the DataLad core package. This means that any organization principles like *all-exceptions-go-into-a-single-location-in-datalad-next* do not apply. For example, each sub-package should define its exceptions separately from others. When functionality is shared between sub-packages, absolute imports should be made.

Expand Down
10 changes: 9 additions & 1 deletion datalad_next/shell/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@
- :class:`FixedLengthResponseGeneratorPowerShell`
When :func:`shell` is executed it will use a
When :func:`datalad_next.shell.shell` is executed it will use a
:class:`VariableLengthResponseClass` to skip the login message of the shell.
This is done by executing a *zero command* (a command that will possibly
generate some output, and successfully return) in the shell. The zero command is
Expand All @@ -123,6 +123,14 @@ class identified by ``zero_command_rg_class`` will be used by default to create
per-call basis by providing a different response generator class in the
``response_generator``-parameter of :meth:`ShellCommandExecutor.__call__`.
Examples
--------
See the documentation of :func:`datalad_next.shell.shell` for examples of how to
use the shell-function and different response generator classes.
API overview
------------
.. currentmodule:: datalad_next.shell
.. autosummary::
Expand Down
9 changes: 9 additions & 0 deletions datalad_next/shell/operations/posix.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,15 @@
from datalad_next.consts import COPY_BUFSIZE


__all__ = [
'DownloadResponseGenerator',
'DownloadResponseGeneratorPosix',
'upload',
'download',
'delete',
]


lgr = logging.getLogger("datalad.ext.next.shell.operations")


Expand Down
246 changes: 208 additions & 38 deletions datalad_next/shell/shell.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,39 @@ def shell(shell_cmd: list[str],
via the returned instance of :class:`ShellCommandExecutor` are executed in
the same shell instance.
Simple example that invokes a single command::
Parameters
----------
shell_cmd : list[str]
The command to execute the shell. It should be a list of strings that
is given to :func:`iter_subproc` as `args`-parameter. For example:
``['ssh', '-p', '2222', 'localhost']``.
chunk_size : int, optional
The size of the chunks that are read from the shell's ``stdout`` and
``stderr``. This also defines the size of stored ``stderr``-content.
zero_command_rg_class : type[VariableLengthResponseGenerator], optional, default: 'VariableLengthResponseGeneratorPosix'
Shell uses an instance of the specified response generator class to
execute the *zero command* ("zero command" is the command used to skip
the login messages of the shell). This class will also be used as the
default response generator for all further commands executed in the
:class:`ShellCommandExecutor`-instances that is returned by
:func:`shell`. Currently, the following concrete subclasses of
:class:`VariableLengthResponseGenerator` exist:
- :class:`VariableLengthResponseGeneratorPosix`: compatible with
POSIX-compliant shells, e.g. ``sh`` or ``bash``.
- :class:`VariableLengthResponseGeneratorPowerShell`: compatible
with PowerShell.
Yields
------
:class:`ShellCommandExecutor`
Examples
--------
**Example 1:** a simple example that invokes a single command, prints its
output and its return code::
>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
Expand All @@ -92,10 +124,10 @@ def shell(shell_cmd: list[str],
b'-rw-r--r-- 1 root root 2773 Nov 14 10:05 /etc/passwd\\n'
0
Example that invokes two commands, the second of which exits with a non-zero
return code. The error output is retrieved from ``result.stderr``, which
contains all ``stderr`` data that was written since the last command was
executed::
**Example 2:** this example invokes two commands, the second of which exits
with a non-zero return code. The error output is retrieved from
``result.stderr``, which contains all ``stderr`` data that was written
since the last command was executed::
>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
Expand All @@ -110,10 +142,11 @@ def shell(shell_cmd: list[str],
2
b"Pseudo-terminal will not be allocated because stdin is not a terminal.\\r\\nls: cannot access '/no-such-file': No such file or directory\\n"
The following example demonstrates how to use the ``check``-parameter to
raise a :class:`CommandError`-exception if the return code of the command is
not zero. This delegates error handling to the calling code and help to keep
the code clean::
**Example 3:** demonstrates how to use the
``check``-parameter to raise a :class:`CommandError`-exception if the
return code of the command is
not zero. This delegates error handling to the calling code and helps to
keep the code clean::
>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
Expand All @@ -129,7 +162,7 @@ def shell(shell_cmd: list[str],
raise CommandError(
datalad.runner.exception.CommandError: CommandError: 'ls /no-such-file' failed with exitcode 2 [err: 'cannot access '/no-such-file': No such file or directory']
Manual checking of the return code::
**Example 4:** an example for manual checking of the return code::
>>> from datalad_next.shell import shell
>>> def file_exists(file_name):
Expand All @@ -138,19 +171,21 @@ def shell(shell_cmd: list[str],
... return result.returncode == 0
... print(file_exists('/etc/passwd'))
True
... print(file_exists('/no-such-file'))
>>> print(file_exists('/no-such-file'))
False
An example for result content checking::
**Example 5:** an example for result content checking::
>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
... result = ssh(f'grep root /etc/passwd', check=True).stdout
... if len(result.splitlines()) != 1:
... raise ValueError('Expected exactly one line')
For long running commands a generator-based result fetching can be used.
To use generator-based output the command has to be executed with the method
**Example 6:** how to work with generator-based results.
For long running commands a generator-based result fetching
can be used. To use generator-based output the command has to be executed
with the method
:meth:`ShellCommandExecutor.start`. This method returns a generator that
provides command output as soon as it is available::
Expand All @@ -171,33 +206,168 @@ def shell(shell_cmd: list[str],
(The exact output of the above example might differ, depending on the
length of the first two entries in the ``/etc/passwd``-file.)
Parameters
----------
shell_cmd : list[str]
The command to execute the shell. It should be a list of strings that
is given to :func:`iter_subproc` as `args`-parameter. For example:
``['ssh', '-p', '2222', 'localhost']``.
chunk_size : int, optional
The size of the chunks that are read from the shell's ``stdout`` and
``stderr``. This also defines the size of stored ``stderr``-content.
zero_command_rg_class : type[VariableLengthResponseGenerator], optional, default: 'VariableLengthResponseGeneratorPosix'
Shell uses an instance of the specified response generator class to
execute the *zero command* ("zero command" is the command used to skip
the login messages of the shell). This class will also be used as the
default response generator for all further commands executed in the
:class:`ShellCommandExecutor`-instances that is returned by
:func:`shell`. Currently, the following concrete subclasses of
:class:`VariableLengthResponseGenerator` exist:
**Example 7:** how to use the ``stdin``-parameter to feed data to a command
that is executed in the persistent shell.
The methods :meth:`ShellCommandExecutor.__call__` and
:meth:`ShellCommandExecutor.start` allow to pass an iterable in the
``stdin``-argument. The content of this iterable will be sent to ``stdin``
of the executed command::
- :class:`VariableLengthResponseGeneratorPosix`: compatible with
POSIX-compliant shells, e.g. ``sh`` or ``bash``.
>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
... result = ssh(b'head -c 4', stdin=(b'ab', b'c', b'd'))
... print(result.stdout)
b'abcd'
**Example 8:** how to work with commands that consume ``stdin`` completely.
In the previous example, the command
``head -c 4`` was used to consume data from ``stdin``. This command
terminates after
reading exactly 4 bytes from ``stdin``. If ``cat`` was used
instead of ``head -c 4``, the command would have
continued to run until its ``stdin`` was closed. The ``stdin`` of the
command that is executed in the persistent shell can be close by calling
:meth:`ssh.close`. But, in order to be able to call :meth:`ssh.close`,
any process that consumes ``stdin`` completely should be executed by
calling the :meth:`ssh.start`-method.
The reason for this is that :meth:`ssh.start` will return immediately which
allows to call the :meth:`ssh.close`-method, as shown in the following
code (:meth:`ssh.__call__` would have waited for ``cat`` to terminate, but
because :meth:`ssh.close` is not called, ``cat`` would never terminate)::
- :class:`VariableLengthResponseGeneratorPowerShell`: compatible
with PowerShell.
>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
... result_generator = ssh.start(b'cat', stdin=(b'12', b'34', b'56'))
... ssh.close()
... print(tuple(result_generator))
(b'123456',)
Note that
the ``ssh``-object cannot be used for further command execution after
:meth:`ssh.close` was called. Further command execution requires to spin up
a new persistent shell-object. To prevent this overhead, it is advised to
limit the number of bytes that a shell-command consumes, either by their
number, e.g. by using ``head -c``, or by some other means, e.g.
by interpreting the content or using a command like ``timeout``.
**Example 9:** upload a file to the persistent shell. The command
``head -c`` can be used to implement the upload a file to a remote shell.
The basic idea
is to determine the number of bytes that will be uploaded and create a
command in the remote shell that will consume exactly this amount of bytes.
The following code implements this idea (without file-name escaping and
error handling)::
>>> import os
>>> import time
>>> from datalad_next.shell import shell
>>> def upload(ssh, file_name, remote_file_name):
... size = os.stat(file_name).st_size
... f = open(file_name, 'rb')
... return ssh(f'head -c {size} > {remote_file_name}', stdin=iter(f.read, b''))
...
>>> with shell(['ssh', 'localhost']) as ssh:
... upload(ssh, '/etc/passwd', '/tmp/uploaded-1')
Note: in this example, ``f`` is not explicitly closed, it is only
closed when the program exits. The reason for
this is that the shell uses threads internally for stdin-feeding, and there
is no simple way to determine whether the thread that reads ``f`` has yet
read an EOF and exited. If ``f`` is closed before the thread exits, and the
thread tries to read from ``f``, a ``ValueError`` will be raised (the
function :func:`datalad_next.shell.posix.upload` contains a solution
for this problem that has slightly more code. For the sake of simplicity,
this solution was not implemented in the example above).
**Example 10:** download a file. This example
uses a fixed-length response generator
to download a file from a remote shell. The basic idea is to determine the
number of bytes that will be downloaded and create a fixed-length response
generator that reads exactly this number of bytes. The fixed length response
generator is then passed to :meth:`ssh.start` in the keyword-argument
``response_generator``. This instructs :meth:`ssh.start` to use the response
generator to interpret the output of this command invocation (the example
code has no file-name escaping or error handling)::
>>> from datalad_next.shell import shell
>>> from datalad_next.shell.response_generators import FixedLengthResponseGeneratorPosix
>>> def download(ssh, remote_file_name, local_file_name):
... size = ssh(f'stat -c %s {remote_file_name}').stdout
... with open(local_file_name, 'wb') as f:
... response_generator = FixedLengthResponseGeneratorPosix(ssh.stdout, int(size))
... results = ssh.start(f'cat {remote_file_name}', response_generator=response_generator)
... for chunk in results:
... f.write(chunk)
...
>>> with shell(['ssh', 'localhost']) as ssh:
... download(ssh, '/etc/passwd', '/tmp/downloaded-1')
...
Note that :meth:`ssh.start` is used to start the download. This allows to
process downloaded data as soon as it is available.
**Example 11:**
This example implements interaction with a *Python* interpreter (which
can be local or remote). Interaction in the context of this example means,
executing a
line of python code, returning the result, i.e. the output on ``stdout``,
and detect whether an exception was raised or not. To this end
a Python-specific variable-length response generator is created by
subclassing the
generic class :class:`VariableLengthResponseGenerator`. The new response
generator implements the method :meth:`get_final_command`, which takes a
python statement and returns a ``try``-``except``-block that executes the
python statement, prints the end-marker and a return code (which is ``0`` if
the statement was executed successfully, and ``1`` if an exception was
raised)::
>>> from datalad_next.shell import shell
>>> from datalad_next.shell.response_generators import VariableLengthResponseGenerator
>>> class PythonResponseGenerator(VariableLengthResponseGenerator):
... def get_final_command(self, command: bytes) -> bytes:
... return f'''try:
... {command.decode()}
... print('{self.end_marker.decode()}')
... print(0)
... except:
... print('{self.end_marker.decode()}')
... print(1)
... '''.encode()
... @property
... def zero_command(self) -> bytes:
... return b'True'
...
>>> with shell(['python', '-u', '-i']) as py:
... print(py('1 + 1'))
... print(py('1 / 0'))
...
ExecutionResult(stdout=b'2\\n', stderr=b'>>> ... ... ... ... ... ... ... ... ', returncode=0)
ExecutionResult(stdout=b'', stderr=b'... ... ... ... ... ... ... ... Traceback (most recent call last):\\n File "<stdin>", line 2, in <module>\\nZeroDivisionError: division by zero', returncode=1)
The python response generator could be extended to deliver exception
information in an extended ``ExecutionResult``. This can be achieved by
*pickling* (see the ``pickle``-module) a caught exception to a byte-string,
printing this byte-string after the return-code line, and printing another
end-marker. The :meth:`send`-method of the response generator must then
be overwritten to unpickle the exception information and store it in an
extended ``ExecutionResult`` (or raise it in the shell-context, if that is
preferred).
**Example 12:** this example shows how to use the shell context handler
in situations were a ``with``-statement is not suitable, e.g. if a shell
object should be used in multiple, independently called functions. In this
case the context manager
can be manually entered and exited. The following code generates a global
``ShellCommandExecutor``-instance in the ``ssh``-variable::
>>> from datalad_next.shell import shell
>>> context_manager = shell(['ssh', 'localhost'])
>>> ssh = context_manager.__enter__()
>>> print(ssh(b'ls /etc/passwd').stdout)
b'/etc/passwd\\n'
>>> context_manager.__exit__(None, None, None)
False
Yields
------
:class:`ShellCommandExecutor`
"""

def train(queue: Queue):
Expand Down

0 comments on commit 37246b1

Please sign in to comment.