# Subprocess and OS (WINDOWS)

### Obligatory wall of text
One of the most common, yet complicated, tasks that most programming languages need to do is spawning new processes. This could be as simple as seeing what files are present in the current working directory (`ls`) or as complicated as creating a program workflow that *pipes* output from one program into another program's input. <br/><br/>
Many such tasks are easily taken care of through the use of Python libraries and modules (`import`) that *wrap* the programs into Python code, effectively creating Application Programming Interfaces (API). <br/><br/>
However, there are many use cases that require the user to make calls to the terminal from ***within*** a Python program.

### Operating System Conundrum

First, we need to address the following issue. As many in this class have found out, while Python can be installed on most operating systems; do the same thing in one operating system (Unix) may not always yield the same results in another (Windows).<br/><br/>
The very first step to making a program "OS-agnostic" is through the use of the `os` module.

In [1]:
import os

Certain commands are the same between most environments:<br/>
For instance, getting the current working directory:<br/>
```
pwd
```
\*Note: In Python, this is accomplished by:

In [2]:
os.getcwd()

'C:\\Users\\MDS\\Dropbox\\BIOINF575W18\\B575W18'

That isn't always the case though. The `os` module wraps OS-specific operations into a set of standardized commands. <br/>
For instance, the Linux end-of-line (EOL) character is a `\n`, but `\r\n` in Windows. In Python, we can just use the following:

In [3]:
os.linesep

'\r\n'

This above command detects the current environment, and sets the EOL it will be using based on that.

Anomther example, in a Linux environment, one must use the following command to list the contents of a given directory:
```
ls -alh 
```

In Windows, the equivalent is as follows:
```
dir
```

Python allows users to do a single command, in spite of the OS:

In [4]:
os.listdir()

['.ipynb_checkpoints',
 'blast.py',
 'Comps_and_generators.ipynb',
 'Git_tutorial.ipynb',
 'Git_tutorial.pptx',
 'README.md',
 'scripts',
 'Subprocess.ipynb']

However, the biggest issue for creating an OS-agnostic program is ***paths*** <br/>
Windows: `"C:\Users\MDS\Documents/"`<br/>
Linux: `/mnt/c/Users/MDS/Documents/`<br/><br/>
Enter Python:

In [5]:
# working out of C as my working directory
os.path.join('.','Users','MDS','Documents')

'.\\Users\\MDS\\Documents'

## ***NOTE***: From here on out, this notebook will *only* work on **Linux**

# Subprocess

If you Google anything on how to run shell commands, but don't specify Python 3.x, you will likely get an answer that includes `popen`, `popen2`, or `popen3`. These were the most prolific ways to *open* a new *p*rocess. In Python 3.x, they encapsulated these functions into a new one called `run` available through the `subprocess` library.

In [None]:
import subprocess

The first thing we will look are trivial examples that demonstrate just capturing the *output* (stdout) of a program

In [None]:
# check_output returns a bytestring by default, so I set encoding to convert it to strings.
print(subprocess.check_output(['ls', '-ahl', os.getcwd()], encoding='utf_8'))

However, while the `check_output` function is still in the `subprocess` module, it can easily be converted into into a more specific and/or flexible `run` function signature.

In [None]:
sub = subprocess.run(['ls', '-ahl', os.getcwd()],
                     encoding='utf_8', stdout=subprocess.PIPE, check=True)

In [None]:
# Let's see what sub can do
[i for in dir(sub) if not i.startswith('_')]

In [None]:
print(sub.stdout)

The main utility of `check_output` was to capture the output (stdout) of a program. By using the `stdout=subprocess.PIPE` argument, the output can easily be captured, along with its return code. A return code signifies the program's exit status: 0 for success, anything else otherwise

In [None]:
sub.returncode

With our `run` code above, our program ran to completetion, exiting with status 0. The next example shows a different status.

In [None]:
print(subprocess.run('exit 1', shell = True))

However, if the `check=True` argument is used, it will raise a `CalledProcessError` if your program exits with anything different than 0. This is helpful for detecting a pipeline failure, and exiting or correcting before attempting to continue computation.

In [None]:
print(subprocess.run('exit 1', shell = True, check = True))

## Syntax

Hopefully, you have picked up that I seemingly used two different syntaxes when using `run`:<br/>
1. A list of arguments: `subprocess.run(['ls', '-ahl', os.getcwd()], ...)` 
2. A string and `shell`: `subprocess.run('exit 1', shell = True, ...)`

The preferred way of using `run` is the first way. This is mainly for security purposes (to prevent shell injection attacks), but it also allows the module to take care of any required escaping and quoting of arguments for a pseudo-OS-agnostic approach. That said, some programs only work on one OS, and therefore, there is often little reason one should use `run` one way or another besides habit.

There are some guidelines though:
1. Sequence (list) of arguments is generally preferred
2. A str is appropriate if the user is just calling a program with no arguments
3. The user should use a str to pass argument if `shell` is `True`<br/>
Your next questions should be, "What is `shell`?"

`shell` is just your terminal/command prompt. This is the environment where you call `ls/dir` in. It is also where users can define variables. More importantly, this is where your *environmental variables* are set...like `PATH`.<br/><br/>
By using `shell = True`, the user can now use shell-based environmental variable expansion from within a Python program.

In [None]:
print(subprocess.run('echo $PATH', shell = True, 
                     stdout=subprocess.PIPE, encoding='utf_8')
      .stdout.split(':')[:5])

For the most part, you shouldn't need to use `shell` simply because Python has modules in the standard library that can do most of the shell commands. For example `mkdir` can be done with `os.mkdir()`, and `$PATH` can be found using `sys.path` 

# Blocking vs Non-blocking

The last topic of this lecture is "blocking". This is computer science lingo/jargon for whether or not a program ***waits*** until something is complete before moving on. Think of this like a really bad website that takes forever to load because it is waiting until it has rendered all its images first, versus the website that sets the formatting and text while it works on the images.

For the purposes of instruction, here are a few notes:
1. `subprocess.run()` is blocking (it waits until the process is complete)
2. `subprocess.Popen()` is non-blocking (it will run the command, then move on)

As said before, ***most*** use cases can be taken care of through the use of `run()`. However, `Popen()` allows the user a more flexible control of the subprocess call. `run()` is just a *wrapped* version of `Popen()` that simplifies use. However, `Popen()` can be used *almost* exactly the same way (albeit with more optional parameters).

An example use case for `Popen()` is if the user has some intermediate data that needs to get processed, but the output of that data doesn't necessarily affect the rest of the pipeline.

In [None]:
with subprocess.Popen(['ls', '-ahl', os.getcwd()],
                     encoding='utf_8', stdout=subprocess.PIPE) as proc:
    print(proc.stdout.read())

Furthermore, `Popen` can create background processes. As such, `Popen` has a lot more functionality than `run`

In [None]:
sub_popen = subprocess.Popen(['ls', '-ahl', os.getcwd()],
                     encoding='utf_8', stdout=subprocess.PIPE)
for j in dir(sub_popen):
    if j.startswith('_'):
        pass
    else:
        print(j)
sub_popen.kill()

# Quick BLAST Workshop