# Using Python for bioinformatics, and with bioinformatics packages

## Python versus Bash for building toolchains/pipelines

### The Bash approach

- In Bash/Shell it's straightforward to build toolchains using pipes, and software found in your $PATH

In [18]:
# This environment runs on a Python interpreter, so it doesn't directly run shell commands.
# But since it uses an IPython interpreter, prefixing with "!" allows shell commands to be used, e.g.:
!echo "Hi"

# Toolchains in Bash
# Bash/Shell has the pipe character "|", which allows toolchains to be built up. Programs in $PATH can be directly brought in, e.g.:
!echo -n "1234" | wc -m   # Count characters from echo command. -n suppresses the behaviour to also add a newline, ensuring accurate character count

# Side note, we can integrate this with python, though it doesn't create a Python data type directly
var = !echo "1234" # Python mixed with Shell
print (type(var)) # Not a regular python data type, an Slist
str_var = str(var[0]) # Converts first element of Slist (1234) into a python string
print(type(str_var)) # Evaluates as a python string
print(str_var[0]) # Works as expected

Hi
4
<class 'IPython.utils.text.SList'>
<class 'str'>
1


### The Python equivalent

- In Python, instead of pipes we might use the "subprocess" module:
- This allows external commands to be run, but can be pretty wordy compared to a simple: "echo -n "1234" | wc -l"
- So in some pipeline development, it may make sense to be picking Shell rather than Python, bringing in Python when you need to do more advanced manipulations
- Keep in mind that in the workflow paradigm Nextflow, you can integrate them. Each process has a Shebang for setting the interpreter (Shell by default)
- Alternatively, you can use a Shell process, and call Python as an external programme from there, pointing at a script file

In [21]:
# The subprocess module
 
import subprocess

# Defines command 1, which would direct stdout into a pipe
process1 = subprocess.Popen(['echo', '-n', '1234'], stdout=subprocess.PIPE)

# Defines second command, defines input from pipe, and output back to pipe
process2 = subprocess.Popen(['wc', '-c'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)

# Actually runs the commands
output1, _ = process1.communicate() # Annotate me!
output2, _ = process2.communicate(input=output1) # Annotate me!

# Final output
out = output2.decode().strip() # Annotate me!
print(out)


4


### A Python alternative

- sh package

## Bioinformatics packages for Python 

Software packages built for direct python integration (e.g., Pybedtools)


## Biopython

- Biopython is a 

- The Conda environment contains a Biopython install (the command used was: `conda install -c conda-forge biopython`)
