<img style="float: right;" src="http://www2.le.ac.uk/liscb1.jpg">
# System Commands

Sometimes you need python to run an external program(s) within your script. You may be reading in data, reformating it and then wanting to pass it into another bit of software before reading the results back into python for analysis.

We've already seen how to parse files in notebook 2 (`../2-analysing-data.ipynb`), so in this notebook we'll look specifically at how to run external programs/system commands. The key library for doing this is `subprocess`.

In [1]:
import subprocess

## Call
There are a number of different functions within the subprocess library. The first one we'll look at is called `call`, this simply runs the given command in the shell.

In [2]:
subprocess.call('df -h', shell=True)

0

The only output that is returned from running the command is `0`, an exit code that means this system process ran without errors (any value other than zero means there was an error). However, if you look at the terminal running jupyter you should see the output from the `df -h` command. It ran successfully but the `call` function does not return the command output to python.

There are actually two ways to input a shell command using subprocess, the first where we enter the whole command within a string and set `shell=True` is shown above. The other way is to pass a list of strings, each containing one part of the command. In this case, we leave `shell` set to the default of False. 

*Note that setting shell to True is considered a security hazard if combined with untrusted input.*

In [3]:
subprocess.call(['df', '-h'])

0

The `call` function will wait until the commmand has completed before returning.

In [4]:
sleeptime = 4
subprocess.call(['sleep', str(sleeptime)])  # sleeptime is converted to a string
print("Complete")

Complete


Therefore the `call` function is useful if you need to run a simple command who's output does not need to be analysed. For example, running a program to index a file - the index is neccessary for the following steps but you don't need to read it in, it just needs to exist.

## Check_output
Another function in the subprocess library is `check_output`, this runs in much the same way as `call` but instead of returning an error code, it returns the command output. 

In [5]:
subprocess.check_output(['ls', '-l'])

b'total 391\n-rw-r--r-- 1 trf5 it_staff 13645 Jul 17 13:43 Bioinformatics.ipynb\n-rw-r--r-- 1 trf5 it_staff  8149 Jul 31 09:08 Comprehensions.ipynb\n-rw-r--r-- 1 trf5 it_staff 11258 Jul 17 13:35 Minimization.ipynb\n-rw-r--r-- 1 trf5 it_staff  9666 Jul 17 11:08 Objects.ipynb\n-rw-r--r-- 1 trf5 it_staff  9263 Jul 31 13:06 Parallel_Processes.ipynb\ndrwxr-xr-x 3 trf5 it_staff  4096 Jul 31 09:08 Solutions\n-rw-r--r-- 1 trf5 it_staff 12258 Jul 31 15:38 System_Commands.ipynb\n'

Note that what has been returned is a bytes object, as shown by the `b` at the start. To covert this object into a string, we need to run the `.decode()` method.

In [6]:
contents = subprocess.check_output(['ls', '-l'])
print(contents.decode())

total 391
-rw-r--r-- 1 trf5 it_staff 13645 Jul 17 13:43 Bioinformatics.ipynb
-rw-r--r-- 1 trf5 it_staff  8149 Jul 31 09:08 Comprehensions.ipynb
-rw-r--r-- 1 trf5 it_staff 11258 Jul 17 13:35 Minimization.ipynb
-rw-r--r-- 1 trf5 it_staff  9666 Jul 17 11:08 Objects.ipynb
-rw-r--r-- 1 trf5 it_staff  9263 Jul 31 13:06 Parallel_Processes.ipynb
drwxr-xr-x 3 trf5 it_staff  4096 Jul 31 09:08 Solutions
-rw-r--r-- 1 trf5 it_staff 12258 Jul 31 15:38 System_Commands.ipynb



As before `check_output` will wait until the command has finished before returning any results.

In [7]:
res = subprocess.check_output('sleep 5; ls', shell=True)
res

b'Bioinformatics.ipynb\nComprehensions.ipynb\nMinimization.ipynb\nObjects.ipynb\nParallel_Processes.ipynb\nSolutions\nSystem_Commands.ipynb\n'

## Popen

Both `call` and `check_output` are limited versions of `Popen`, thus `Popen` has a full range of running options and can give you a lot more control over inputs and outputs. 

This first thing we'll look at is how to read the command outputs.

In [8]:
p = subprocess.Popen(['echo', 'hello world'], stdout=subprocess.PIPE)
p.communicate()

(b'hello world\n', None)

What we've done above is to pass the `subprocess.PIPE` class to the stdout (standard output) argument. Therefore, the stdout from the shell command will be *piped* into the python object `p`. To retrieve this value, we're using the `.communicate()` method. Note that this returns a tuple of the stdout and stderr.

In [9]:
p = subprocess.Popen(['ls'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
print("Output: %s" % out)
print("Error: %s" % err)

Output: b'Bioinformatics.ipynb\nComprehensions.ipynb\nMinimization.ipynb\nObjects.ipynb\nParallel_Processes.ipynb\nSolutions\nSystem_Commands.ipynb\n'
Error: b''


We can also use `.communicate()` to pass in an input value as stdin - this also has to be a byte object.

In [10]:
p = subprocess.Popen(['cat'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
p.communicate(input=b'Hello!')

(b'Hello!', None)

Note that the command is only run once you call the `.communicate()` method. 

In [11]:
p = subprocess.Popen(['sleep','5'], stdout=subprocess.PIPE)
print("I've created the class object")
p.communicate()
print("Now I've run the command")

I've created the class object
Now I've run the command


Finally, we can use these pipes to move data from one system command to another, creating a python pipeline of external programs.

In [12]:
p1 = subprocess.Popen(['echo', 'Hello World'], stdout=subprocess.PIPE)
echoout, echoerr = p1.communicate()

p2 = subprocess.Popen(['sed','s/World/There/'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)

p2.communicate(input=echoout)

(b'Hello There\n', None)

### Exercise

Use subprocess to pipe the output from `ls` to `grep`, searching for the file names that contain an underscore. 

**BONUS**

Add a third `Popen` class to pipe these results to `sed` and substitude the '.ipynb' extension for '.txt'.

**BONUS**

Add error catching after each process by checking what the exit code (return code) is. Only continue if it returns 0.

In [13]:
p1 = subprocess.Popen(['ls'], stdout=subprocess.PIPE)
lsout, lserr = p1.communicate()

if p1.returncode:
    raise Exception("ls failed!")

p2 = subprocess.Popen(['grep', '_'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
grepout, greperr = p2.communicate(input=lsout)

if p2.returncode:
    raise Exception("grep failed!")

p3 = subprocess.Popen(['sed', 's/.ipynb/.txt/'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
sedout, sederr = p3.communicate(input=grepout)

if p3.returncode:
    raise Exception("sed failed!")

print(sedout.decode())

Parallel_Processes.txt
System_Commands.txt

