Skip to content

Commit

Permalink
update docs.
Browse files Browse the repository at this point in the history
correctly points to the new ludicoc mode
  • Loading branch information
nsheff committed Mar 21, 2019
1 parent 0cb85d8 commit e91d04b
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 11 deletions.
12 changes: 8 additions & 4 deletions docs/advanced-run-method.md
Expand Up @@ -20,13 +20,17 @@ If you provide a `target` file, then `pypiper` will first check to see if that t

Since Pypiper runs all your commands from within python (using the `subprocess` python module), it's nice to be aware of the two types of processes that `subprocess` allows: **direct processes** and **shell processes**.

By default, Pypiper will try to guess what kind of process you want, so for most pipelines, it's probably not necessary to understand the details in this section. However, how you write your commands has some implications for memory tracking, and advanced pipeline authors may want to control the process types that Pypiper uses, so this section covers how these subprocesses work.
By default, Pypiper will guess which to use based on your command, so for most pipelines, you don't need to worry about it. However, how you write your commands has some implications for memory tracking, and advanced pipeline authors may want to control the process types that Pypiper uses, so this section covers how these subprocesses work.

**Direct process**: A direct process is one that Python executes directly, from within python. Python retains control over the process completely. Wherever possible, you should use a direct subprocess -- this has the advantage of enabling Python to monitor the memory use of the subprocess, because Python retains control over it. This the preferable way of running subprocesses in Python. The disadvantage of direct subprocesses is that you may not use shell-specific operators in a direct subprocess. For instance, if you use an asterisk (`*`) for wildcard expansion, or a bracket (`>`) for output redirection, or a pipe (`|`) to link processes -- these are commands understood by a shell like Bash, and thus, cannot be run as direct subprocesses in Python.
**Direct process**: A direct process is one that Python executes directly, from within python. Python retains control over the process completely. Wherever possible, you should use a direct subprocess because it enabling Python to monitor the memory use of the subprocess. This the preferable way of running subprocesses in Python. The disadvantage of direct subprocesses is that you may not use shell-specific operators in a direct subprocess. For instance, if you use an asterisk (`*`) for wildcard expansion, or a bracket (`>`) for output redirection, or a pipe (`|`) to link processes -- these are commands understood by a shell like Bash, and thus, cannot be run as direct subprocesses in Python.

**Shell process**: In a shell process, Python first spawns a shell, and then runs the command in that shell. The spawned shell is then controlled by Python, but processes done by the shell are not. This allows you to use shell operators (`*`, `|`, `>`), but at the cost of the ability to monitor memory high water mark, because Python does not have direct control over subprocesses run inside a subshell.
**Shell process**: In a shell process, Python first spawns a shell, and then runs the command in that shell. The spawned shell is then controlled by Python, but processes done by the shell are not. This allows you to use shell operators (`*`, `|`, `>`), but at the cost of the ability to monitor memory for each command independently, because Python does not have direct control over subprocesses run inside a subshell.

You **must** use a shell process if you are using shell operators in your command. You can force Pypiper to use one or the other by specifying `shell=True` or `shell=False` to the `run` function. By default Pypiper will try to guess: if your command contains any of the shell process characters (`*`, `|`, or `>`), it will be run in a shell. Otherwise, it will be run as a direct subprocess. So for most purposes, you do not need to worry about this at all, but you may want to write your commands to minimize shell operators if you are interested in the memory monitoring features of Pypiper.
### How pypiper handles shell subprocesses

Pypiper includes 2 nice provisions that help us deal with shell processes. First, pypiper divides commands with pipes (`|`) and executes them as *direct processes*. This enables you to pass a piped shell command, but still get the benefit of a direct process. Unless using the shell directly, with pypiper, each process in the pipe is monitored for return value, and for memory use individually, and this information will be reported in the pipeline log. Nice! Second, pypiper uses the `psutil` module to monitor memory of *all child processes*. That means when you use a shell process, we *do* monitor the memory use of that process (and any other processes it spawns), which gives us more accurate memory monitoring.

This comment has been minimized.

Copy link
@vreuter

vreuter Mar 21, 2019

Member

🏆


You can force Pypiper by specifying `shell=True` or `shell=False` to the `run` function, but really, you shouldn't have to. By default Pypiper will try to guess: if your command contains any of the shell process characters (`*` or `>`), it will be run in a shell. If it contains a pipe (`|`), it will be split and run as direct, piped subprocesses. Anything else will be run as a direct subprocess.

This comment has been minimized.

Copy link
@vreuter

vreuter Mar 21, 2019

Member

This reads a little oddly..."force Pypiper" to do what? "will try to guess" what? I think it clears a bit if the entire paragraph is read, but it could be clearer with something like "You can directly tell pypiper hot to run a command..."


## The `nofail` argument

Expand Down
9 changes: 4 additions & 5 deletions docs/autodoc_build/pypiper.md
Expand Up @@ -39,12 +39,11 @@ Prints the command, and then executes it, then prints the memory use and return

Uses python's subprocess.Popen() to execute the given command. The shell argument is simply
passed along to Popen(). You should use shell=False (default) where possible, because this enables memory
profiling. You should use shell=True if you require shell functions like redirects (>) or stars (*), but this
will prevent the script from monitoring memory use. The pipes (|) will be used to split the command into
subprocesses run within python, so the memory profiling is possible.
profiling. You should use shell=True if you require shell functions like redirects (>) or pipes (|), but this
will prevent the script from monitoring memory use.
cmd can also be a series (a dict object) of multiple commands, which will be run in succession.
```python
def callprint(self, cmd, shell=None, lock_file=None, nofail=False, container=None):
def callprint(self, cmd, shell, lock_file=None, nofail=False, container=None):
```

**Parameters:**
Expand Down Expand Up @@ -318,7 +317,7 @@ it will first create a lock file to prevent other calls to run
is being created. It also records the memory of the process and
provides some logging output.
```python
def run(self, cmd, target=None, lock_name=None, shell=None, nofail=False, clean=False, follow=None, container=None):
def run(self, cmd, target=None, lock_name=None, shell=None, nofail=False, errmsg=None, clean=False, follow=None, container=None):
```

**Parameters:**
Expand Down
File renamed without changes.
5 changes: 3 additions & 2 deletions mkdocs.yml
Expand Up @@ -19,10 +19,11 @@ nav:
- Cleaning up intermediate files: clean.md
- Best practices: best-practices.md
- Toolkits:
- "NGSTk: the NGS toolkit": ngstk.md
- "NGSTk: the NGS toolkit": ngstk_intro.md
- Reference:
- Catalog of pipeline outputs: outputs.md
- API: autodoc_build/pypiper.md
- Pypiper API: autodoc_build/pypiper.md
- NGSTk API: autodoc_build/ngstk.md
- FAQ: faq.md
- Support: support.md
- Contributing: contributing.md
Expand Down

0 comments on commit e91d04b

Please sign in to comment.