update docs.

correctly points to the new ludicoc mode
databio · Mar 21, 2019 · e91d04b · vreuter · Mar 21, 2019 · vreuter
1 parent 0cb85d8
commit e91d04b
Show file tree

Hide file tree

Showing 4 changed files with 15 additions and 11 deletions.
diff --git a/docs/advanced-run-method.md b/docs/advanced-run-method.md
@@ -20,13 +20,17 @@ If you provide a `target` file, then `pypiper` will first check to see if that t
 
 Since Pypiper runs all your commands from within python (using the `subprocess` python module), it's nice to be aware of the two types of processes that `subprocess` allows: **direct processes** and **shell processes**.
 
-By default, Pypiper will try to guess what kind of process you want, so for most pipelines, it's probably not necessary to understand the details in this section. However, how you write your commands has some implications for memory tracking, and advanced pipeline authors may want to control the process types that Pypiper uses, so this section covers how these subprocesses work.
+By default, Pypiper will guess which to use based on your command, so for most pipelines, you don't need to worry about it. However, how you write your commands has some implications for memory tracking, and advanced pipeline authors may want to control the process types that Pypiper uses, so this section covers how these subprocesses work.
 
-**Direct process**: A direct process is one that Python executes directly, from within python. Python retains control over the process completely. Wherever possible, you should use a direct subprocess -- this has the advantage of enabling Python to monitor the memory use of the subprocess, because Python retains control over it. This the preferable way of running subprocesses in Python. The disadvantage of direct subprocesses is that you may not use shell-specific operators in a direct subprocess. For instance, if you use an asterisk (`*`) for wildcard expansion, or a bracket (`>`) for output redirection, or a pipe (`|`) to link processes -- these are commands understood by a shell like Bash, and thus, cannot be run as direct subprocesses in Python.
+**Direct process**: A direct process is one that Python executes directly, from within python. Python retains control over the process completely. Wherever possible, you should use a direct subprocess because it enabling Python to monitor the memory use of the subprocess. This the preferable way of running subprocesses in Python. The disadvantage of direct subprocesses is that you may not use shell-specific operators in a direct subprocess. For instance, if you use an asterisk (`*`) for wildcard expansion, or a bracket (`>`) for output redirection, or a pipe (`|`) to link processes -- these are commands understood by a shell like Bash, and thus, cannot be run as direct subprocesses in Python.
 
-**Shell process**: In a shell process, Python first spawns a shell, and then runs the command in that shell. The spawned shell is then controlled by Python, but processes done by the shell are not. This allows you to use shell operators (`*`, `|`, `>`), but at the cost of the ability to monitor memory high water mark, because Python does not have direct control over subprocesses run inside a subshell. 
+**Shell process**: In a shell process, Python first spawns a shell, and then runs the command in that shell. The spawned shell is then controlled by Python, but processes done by the shell are not. This allows you to use shell operators (`*`, `|`, `>`), but at the cost of the ability to monitor memory for each command independently, because Python does not have direct control over subprocesses run inside a subshell. 
 
-You **must** use a shell process if you are using shell operators in your command.  You can force Pypiper to use one or the other by specifying `shell=True` or `shell=False` to the `run` function. By default Pypiper will try to guess: if your command contains any of the shell process characters (`*`, `|`, or `>`), it will be run in a shell. Otherwise, it will be run as a direct subprocess. So for most purposes, you do not need to worry about this at all, but you may want to write your commands to minimize shell operators if you are interested in the memory monitoring features of Pypiper.
+### How pypiper handles shell subprocesses
+
+Pypiper includes 2 nice provisions that help us deal with shell processes. First, pypiper divides commands with pipes (`|`) and executes them as *direct processes*. This enables you to pass a piped shell command, but still get the benefit of a direct process. Unless using the shell directly, with pypiper, each process in the pipe is monitored for return value, and for memory use individually, and this information will be reported in the pipeline log. Nice! Second, pypiper uses the `psutil` module to monitor memory of *all child processes*. That means when you use a shell process, we *do* monitor the memory use of that process (and any other processes it spawns), which gives us more accurate memory monitoring.
+
+You can force Pypiper by specifying `shell=True` or `shell=False` to the `run` function, but really, you shouldn't have to. By default Pypiper will try to guess: if your command contains any of the shell process characters (`*` or `>`), it will be run in a shell. If it contains a pipe (`|`), it will be split and run as direct, piped subprocesses. Anything else will be run as a direct subprocess.
 
 ## The `nofail` argument
 

diff --git a/docs/autodoc_build/pypiper.md b/docs/autodoc_build/pypiper.md
@@ -39,12 +39,11 @@ Prints the command, and then executes it, then prints the memory use and return
 
 Uses python's subprocess.Popen() to execute the given command. The shell argument is simply
 passed along to Popen(). You should use shell=False (default) where possible, because this enables memory
-profiling. You should use shell=True if you require shell functions like redirects (>) or stars (*), but this
-will prevent the script from monitoring memory use. The pipes (|) will be used to split the command into
-subprocesses run within python, so the memory profiling is possible.
+profiling. You should use shell=True if you require shell functions like redirects (>) or pipes (|), but this
+will prevent the script from monitoring memory use.
 cmd can also be a series (a dict object) of multiple commands, which will be run in succession.
 ```python
-def callprint(self, cmd, shell=None, lock_file=None, nofail=False, container=None):
+def callprint(self, cmd, shell, lock_file=None, nofail=False, container=None):
 ```
 
 **Parameters:**
@@ -318,7 +317,7 @@ it will first create a lock file to prevent other calls to run
 is being created. It also records the memory of the process and
 provides some logging output.
 ```python
-def run(self, cmd, target=None, lock_name=None, shell=None, nofail=False, clean=False, follow=None, container=None):
+def run(self, cmd, target=None, lock_name=None, shell=None, nofail=False, errmsg=None, clean=False, follow=None, container=None):
 ```
 
 **Parameters:**

diff --git a/docs/ngstk.md → docs/ngstk_intro.md b/docs/ngstk.md → docs/ngstk_intro.md
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -19,10 +19,11 @@ nav:
     - Cleaning up intermediate files: clean.md
     - Best practices: best-practices.md
   - Toolkits:
-    - "NGSTk: the NGS toolkit": ngstk.md    
+    - "NGSTk: the NGS toolkit": ngstk_intro.md    
   - Reference:
     - Catalog of pipeline outputs: outputs.md
-    - API: autodoc_build/pypiper.md
+    - Pypiper API: autodoc_build/pypiper.md
+    - NGSTk API: autodoc_build/ngstk.md
     - FAQ: faq.md
     - Support: support.md
     - Contributing: contributing.md