### Subprocesses

One of the biggest strengths of Python is that it can be used as a *glue* language. <br>
It can 'glue' together a series of programs into a flexible and highly extensible pipline.

### Why subprocesses
One of the most common, yet complicated, tasks that most programming languages need to do is creating new processes. <br>
This could be as simple as seeing what files are present in the current working directory (`ls`) or as complicated as creating a program workflow that *pipes* output from one program into another program's input. <br/><br/>
Many such tasks are easily taken care of through the use of Python libraries and modules (`import`) that *wrap* the programs into Python code, effectively creating Application Programming Interfaces (API). <br/><br/>
However, there are many use cases that require the user to make calls to the terminal from ***within*** a Python program.

#### Operating System Conundrum

First, we need to address the following issue. As many in this class have found out, while Python can be installed on most operating systems; doing the same thing in one operating system (Unix) may not always yield the same results in another (Windows).<br/><br/>
The very first step to making a program **"OS-agnostic"** is through the use of the `os` module.

In [None]:
import os

https://docs.python.org/3/library/os.html

In [None]:
#dir(os)

In [None]:
for elem in dir(os):
    if "error" in elem:
        print(elem)

In [None]:
# The name of the operating system dependent module imported. 
# The following names have currently been registered: 'posix', 'nt', 'java'
# Portable Operating System Interface -  IEEE standard designed to facilitate application portability
# (Windows) New Technology - a 32-bit operating system that supports preemptive multitasking
# 
os.name

In [None]:
# Returns information identifying the current operating system. The return value is an object with five attributes:
# - sysname - operating system name
# - nodename - name of machine on network (implementation-defined)
# - release - operating system release
# - version - operating system version
# - machine - hardware identifier

os.uname()

In [None]:
import sys

# https://docs.python.org/3/library/sys.html
# This string contains a platform identifier that can be used to append platform-specific components
# to sys.path, for instance.
    
sys.platform

In [None]:
# A list of strings that specifies the search path for modules. 

sys.path

In [None]:
# A mapping object representing the string environment.

os.environ['HOME']

In [None]:
os.environ

In [None]:
#Return the value of the environment variable key if it exists, 
#or default if it doesn’t. key, default and the result are str.

os.getenv("HOME")

In [None]:
os.getenv("PATH")

In [None]:
# Returns the list of directories that will be searched for a named executable,
#similar to a shell, when launching a process. 
# env, when specified, should be an environment variable dictionary to lookup the PATH in. 
# By default, when env is None, environ is used.

os.get_exec_path()

The `os` module wraps OS-specific operations into a set of standardized commands. <br>
For instance, the Linux end-of-line (EOL) character is a `\n`, but `\r\n` in Windows. <br>
In Python, we can just use the following:

In [None]:
# EOL - for the current (detected) environment

'''
The string used to separate (or, rather, terminate) lines on the current platform. 
This may be a single character, such as '\n' for POSIX, or multiple characters, 
for example, '\r\n' for Windows. 
Do not use os.linesep as a line terminator when writing files opened in text mode (the default); 
use a single '\n' instead, on all platforms.
'''

os.linesep

Another example, in a Linux environment, one must use the following command to list the contents of a given directory:
```
ls -alh 
```

In Windows, the equivalent is as follows:
```
dir
```

Python allows users to do a single command, in spite of the OS:

In [None]:
# List directory contents

os.listdir("demoCM")

However, the biggest issue for creating an OS-agnostic program is ***paths*** <br/>
Windows: `"C:\\Users\\MDS\\Documents"`<br/>
Linux: `/mnt/c/Users/MDS/Documents/`<br/><br/>
Enter Python:

In [None]:
# path joining from pwd
pwd = os.getcwd()
os.path.join(pwd,"test.py")

### `subprocess`

If you Google anything on how to run shell commands, but don't specify Python 3.x, you will likely get an answer that includes `popen`, `popen2`, or `popen3`. These were the most prolific ways to *open* a new *p*rocess. In Python 3.x, they encapsulated these functions into a new one called `run` available through the `subprocess` library.

In [None]:
# Import and alias
import subprocess as sp

#### `check_output`

In [None]:
# check_output returns a bytestring by default, so I set encoding to convert it to strings.
# [command, command line arguments]
# change from bytes to string using encoding

sp.check_output(["echo","test"],encoding='utf_8')

In [None]:
sp.check_output([os.path.join(pwd,"test.py"),"[1,2,3]"],encoding='utf_8')

The first thing we will look are trivial examples that demonstrate just capturing the *output* (stdout) of a program

However, while the `check_output` function is still in the `subprocess` module, it can easily be converted into into a more specific and/or flexible `run` function signature.

#### `run`

In [None]:
sub = sp.run(
    [
        'echo',             # The command we want to run
        'test'              # Arguments for the command
    ],
    encoding='utf_8',       # Converting byte code
    stdout=sp.PIPE,         # Where to send the output
    check=True              # Whether to raise an error if the process fails
)  
sub

In [None]:
dir(sub)

In [None]:
print(sub.stdout)

The main utility of `check_output` was to capture the output (stdout) of a program. <br>
By using the `stdout=subprocess.PIPE` argument, the output can easily be captured, along with its return code. <br>
A return code signifies the program's exit status: 0 for success, anything else otherwise

In [None]:
sub.returncode

With our `run` code above, our program ran to completetion, exiting with status 0. The next example shows a different status.

In [None]:
sp.run(
        'exit 1',      # Command & arguments
        shell = True   # Run from the shell
        )


However, if the `check=True` argument is used, it will raise a `CalledProcessError` if your program exits with anything different than 0. This is helpful for detecting a pipeline failure, and exiting or correcting before attempting to continue computation.

In [None]:
sp.run(
        'exit 1',      # Command & arguments
        shell = True,  # Run from the shell
        check = True   # Check exit status
    )

In [None]:
sub = sp.run(
        'exit 1',      # Command & arguments
        shell = True,  # Run from the shell
        # check = True   # Check exit status
    )
if (sub.returncode != 0):
    print(f"Exit code {sub.returncode}. Expected 0 when there is no error.")

#### Syntax

Syntax when using `run`:<br/>
1. A list of arguments: `subprocess.run(['echo', 'test', ...], ...)` 
2. A string and `shell`: `subprocess.run('exit 1', shell = True, ...)`

The preferred way of using `run` is the first way. <br>
This preference is mainly due to security purposes (to prevent shell injection attacks). <br>
It also allows the module to take care of any required escaping and quoting of arguments for a pseudo-OS-agnostic approach. 

There are some guidelines though:
1. Sequence (list) of arguments is generally preferred
2. A str is appropriate if the user is just calling a program with no arguments
3. The user should use a str to pass argument if `shell` is `True`<br/>
Your next questions should be, "What is `shell`?"

`shell` is just your terminal/command prompt. This is the environment where you call `ls/dir` in. It is also where users can define variables. More importantly, this is where your *environmental variables* are set...like `PATH`.<br/><br/>
By using `shell = True`, the user can now use shell-based environmental variable expansion from within a Python program.

In [None]:
sp.run(
        'echo $PATH',            # Command
        shell = True,            # Use the shell
        stdout=sp.PIPE,          # Where to send it
        encoding='utf_8'         # Convert from bytes to string
    )      # Look at the output


In [None]:
p1 = sp.run(
        'sleep 5',               # Command
        shell = True,            # Use the shell
        stdout=sp.PIPE,          # Where to send it
        encoding='utf_8'         # Convert from bytes to string
    )
print(p1)
p2 = sp.run(
        'echo done',             # Command
        shell = True,            # Use the shell
        stdout=sp.PIPE,          # Where to send it
        encoding='utf_8'         # Convert from bytes to string
    )
print(p2)

For the most part, you shouldn't need to use `shell` simply because Python has modules in the standard library that can do most of the shell commands. For example `mkdir` can be done with `os.mkdir()`, and `$PATH` can be retrieved using os.getenv("PATH") or os.get_exec_path() as shown above. 

#### Blocking vs Non-blocking

The last topic of this lecture is "blocking". This is computer science lingo/jargon for whether or not a program ***waits*** until something is complete before moving on. Think of this like a really bad website that takes forever to load because it is waiting until it has rendered all its images first, versus the website that sets the formatting and text while it works on the images.

1. `subprocess.run()` is blocking (it waits until the process is complete)
2. `subprocess.Popen()` is non-blocking (it will run the command, then move on)

***Most*** use cases can be handled through the use of `run()`.<br> 
`run()` is just a *wrapped* version of `Popen()` that simplifies use. <br>
However, `Popen()` allows the user a more flexible control of the subprocess call. <br>
`Popen()` can be used similar way as run (with more optional parameters).

An example use case for `Popen()` is if the user has some intermediate data that needs to get processed, but the output of that data doesn't necessarily affect the rest of the pipeline.

#### `Popen`

In [None]:
p1 = sp.Popen(
        'sleep 5',               # Command
        shell = True,            # Use the shell
        stdout=sp.PIPE,          # Where to send it
        encoding='utf_8'         # Convert from bytes to string
    )
print(p1)
p2 = sp.Popen(
        'echo done',             # Command
        shell = True,            # Use the shell
        stdout=sp.PIPE,          # Where to send it
        encoding='utf_8'         # Convert from bytes to string
    )
print(p2)
print("processes ran")

print(p1.stdout.read())
print(p2.stdout.read())
print("processes completed")



In [None]:
# Use context manager to handle process while it is running,
# and gracefully close it
with sp.Popen(
    [
        'echo',         # Command
        'here we are'       # Command line arguments
    ],
    encoding='utf_8', # Convert from byte to string
    stdout=sp.PIPE    # Where to send it
) as proc:            # Enclose and alias the context manager
    print(
        proc.stdout.read() # Look at the output
    )

In [None]:
for elem in dir(proc):
    if not elem.startswith('_'):
        print(elem)

#### ***NOTE***: From here on out, there might be different commands used for **Linux** / **MacOS** or **Windows**

Add the following text to a new file `test_pipe.txt` 
```
testing
a
subprocess
pipe
```

In [None]:
# another way to add the text to the file
#test_pipe.txt - a file to be used to demonstrate pipe of cat and sort 
!echo testing > test_pipe.txt
!echo a >> test_pipe.txt
!echo subprocess >> test_pipe.txt
!echo pipe >> test_pipe.txt


In [None]:
# start the first process - cat - reading the file content

# mac OS
p1 = sp.Popen(['cat','test_pipe.txt'], stdout=sp.PIPE, encoding='utf_8')

# windows OS
# p1 = sp.Popen(['type','test_pipe.txt'], stdout=sp.PIPE, encoding='utf_8')

print(p1.stdout.read())

In [None]:
# add the second process and connect the pipe: 
# for p2 we use stdin=p1.stdout

# mac OS
p1 = sp.Popen(['cat','test_pipe.txt'], stdout=sp.PIPE, encoding='utf_8')

# windows OS
# p1 = sp.Popen(['type','test_pipe.txt'], stdout=sp.PIPE, encoding='utf_8')


p2 = sp.Popen(['sort'], stdin=p1.stdout, stdout=sp.PIPE, encoding='utf_8')
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits
output = p2.communicate()[0]
print(output)


`Popen` can create background processes, shell-background-like behavior means not blocking. <br>
`Popen` has a lot more functionality than `run`.

In [None]:
sub_popen = sp.Popen(
    [
        'echo',          # Command
        'test',        # Command line arguments
    ],
    encoding='utf_8',  # Convert from byte to string
    stdout=sp.PIPE     # Where to send it
)
for j in dir(sub_popen):
    if not j.startswith('_'):
        print(j)


In [None]:
sub_popen.kill()       # Close the process

Example creating child process.<br>
https://pymotw.com/3/subprocess/

A collection of `Popen` examples: <br>
https://www.programcreek.com/python/example/50/subprocess.Popen

#### Exercise 
Write bash script that takes the file as an argument and returns lines that contain the letter p




#### Exercise -  only if you have R installed
Write R script that takes the file `test_R.txt` as an argument and return the sum of the matrix from the file

```
rN	val1	val2
r1	1	2
r2	3	4
```