# Using Python for bioinformatics, and with bioinformatics packages

## Python versus Bash for building toolchains/pipelines

### The Bash approach

- In Bash/Shell it's straightforward to build toolchains using pipes, and software found in your $PATH

In [23]:
# This environment runs on a Python interpreter, so it doesn't directly run shell commands.
# But since it uses an IPython interpreter, prefixing with "!" allows shell commands to be used, e.g.:
!echo "Hi"

# Toolchains in Bash
# Bash/Shell has the pipe character "|", which allows toolchains to be built up. Programs in $PATH can be directly brought in, e.g.:
!echo -n "1234" | wc -m   # Count characters from echo command. -n suppresses the behaviour to also add a newline, ensuring accurate character count

# Side note, we can integrate this with python, though it doesn't create a Python data type directly
var = !echo "1234" # Python mixed with Shell
print (type(var)) # Not a regular python data type, an Slist
str_var = str(var[0]) # Converts first element of Slist (1234) into a python string
print(type(str_var)) # Evaluates as a python string
print(str_var[0]) # Works as expected

Hi
4
<class 'IPython.utils.text.SList'>
<class 'str'>
1


### The Python equivalent of pipes

- In Python, instead of pipes we might use the "subprocess" module:
- This allows external commands to be run, but can be pretty wordy compared to a simple: "echo -n "1234" | wc -l"
- So in some pipeline development, it may make sense to be picking Shell rather than Python, bringing in Python when you need to do more advanced manipulations
- Keep in mind that in the workflow paradigm Nextflow, you can integrate them. Each process can have a shebang for setting the interpreter (if none is added, defaults to Shell)
- Alternatively, you could have a Nextflow Shell code block, and call Python as an external programme from it, pointing at a script file: `python scripts/function_x.py`

In [24]:
# The subprocess module
 
import subprocess

# Defines command 1, which would direct stdout into a pipe
process1 = subprocess.Popen(['echo', '-n', '1234'], stdout=subprocess.PIPE)

# Defines second command, defines input from pipe, and output back to pipe
process2 = subprocess.Popen(['wc', '-c'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)

# Actually runs the commands
output1, _ = process1.communicate() # Annotate me!
output2, _ = process2.communicate(input=output1) # Annotate me!

# Final output
out = output2.decode().strip() # Annotate me!
print(out)

4


### A Python alternative to subprocess

- sh package

## Bioinformatics packages for Python

### Ports of popular packages for direct python integration (e.g., Pybedtools)

- You can install and use Linux programmes, calling them with Sh or Subprocess
- Or, if there is a Python port available, this could be used directly; we will use Pybedtools as an example
- Pybedtools is in the conda environment already, installed using: `conda install -c bioconda pybedtools`
- For me personally, I prefer to use the original versions build for Linux and Shell

In [25]:
from pybedtools import BedTool

a = pybedtools.example_bedtool('a.bed')
b = pybedtools.example_bedtool('b.bed')

a.head()
b.head()

Unexpected exception formatting exception. Falling back to standard exception


Traceback (most recent call last):
  File "/home/inkfish/programs/miniconda3/envs/python-data-science/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
  File "/tmp/ipykernel_47061/4240041548.py", line 1, in <module>
    from pybedtools import BedTool
ModuleNotFoundError: No module named 'pybedtools'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/inkfish/programs/miniconda3/envs/python-data-science/lib/python3.11/site-packages/pygments/styles/__init__.py", line 90, in get_style_by_name
ModuleNotFoundError: No module named 'pygments.styles.default'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/inkfish/programs/miniconda3/envs/python-data-science/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 2105, in showtraceback
  File "/home/inkfish/programs/miniconda3/envs/python-data-science/lib/py


### Biopython

- Biopython is a 

- The Conda environment contains a Biopython install (the command used was: `conda install -c conda-forge biopython`)
