# Programming Languages for Bioinformatics

The most common programming languages for bioinformatics are:

 * Bash - fast and efficient file manipulation
 * Python - clear multi-purpose language, excellent math support, very easy to learn, increasingly popular for analysis of biological sequences, *BioPython*
 * Perl - scripting language, strong in string manipulation, sometimes for confusing for beginners, *traditionally* used in analysis of biological sequences, *BioPerl*
 * C++ - required for computationally expensive tasks, more difficult to learn
 * R - statistical programming language, also often used for solving specific bioinformatics problems with existing packages
 * Java - platform independence, more difficult to learn than scripting languages, but for some users easier to learn than C++

Python has in my opinion advanced to a multi-purpose language. If you are able to master Python, you will be capable of analyzing biological sequence data, performing statistical analysis, visuzalizing data, using machine learning very efficiently, and more. In this class, we will therefore focus on learning how to use Python3, and how to apply it to problems in bioinformatics.

(Be aware that there are major difference between Python3 and older Python standards! See e.g. https://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html)

# Installation 

Skip over this section if you are working with JupyterHub, for now. This is only to provide you with the necessary information to work on an Ubuntu system. (Be aware that even though Python is platform independent, there are system-specific differences. I will not give support for Windows installation. We are also able to test and correct your code if you developed on Windows. Please use Unix for development if working outside of JupyterHub.)

On Ubuntu, install python3, pip, and numpy with:

`
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install python3 python3-pip python3-numpy 
`

(The first two lines will bring your system up to date prior installation.)

For using biopython, install via pip:

`
pip3 install biopython
`

If installation of biopython via pip fails because your system cannot access online repositories for package download (we observed this when Ubuntu is run as a virtual machine within Windows, and Windows runs a firewall), do the following:


 1)  Download the package sources from \url{https://biopython.org/wiki/Download}, the file could be called \texttt{biopython-1.72.tar.gz}.
 
 2) Transfer the downloaded file to your computer.
 
 3) Install from local file with pip:
    `pip3 install --no-index --find-links="/tmp/tranferred_packages" biopython`


For using doit, install via pip:

`
pip3 install doit
`

# Modes for using Python

## JupyterNotebook

There are different ways to use Python. In the current course, we will in the beginning mostly use the JupyterNotebooks, which provide an excellent interface documentation and interactive code that you can modify to your liking to test how Python works. You are already familar with this mode from the Bash introduction.

## Interactive Usage

In your terminal, call python3:

![python_bash.jpg](python_bash.jpg)

This mode might be helpful for fast testing of syntax. It is usually not suitable for solving complex bioinformatics tasks in a reproducible way. However, if you use interactive mode, I recommend that you type commands in a Text file (e.g. name it make.doc) and copy/paste into the Python command prompt, instead of typing there, directly. This enables you to store and document your code in the most primitive way.

## Python Scripts

If you want to replicate your tasks and distribute the code as standalone software, later, organize your code in *scripts*. We strongly recommend that your python scripts have the following properties:

 * the header contains the path to python3 on your system (on JupyterHub, that's `#!/opt/conda/bin/python3`, on most other Unix systems the actual path to python3 will be located with the directive `#!/usr/bin/env python3`); this tells your system to use the version of python3 that is found in your `PATH` (this statement is not required for *module files* associated with your main script)
 * file ending should be `.py`
 * file contains code in python3
 
Sidenote: remember, the `PATH` is the location where Unix systems look for binaries, i.e. executable python scripts! It's a collection of directories. You can manipulate that collection of directories if you want to!

You find a Python script `hello_world.py` in the directory of this JupyterNotebook. Doubleclick the file and inspect contents. The functionality of the script is trivial. It will print the string "Hello world". Next, we will make this script executable and execute it. First, we do this from within the JupyterNotebook:

In [7]:
%%script bash
cd ~/
cd Python_introduction
ls -l hello_world.py
chmod u+x hello_world.py
ls -l hello_world.py
echo "Next, we will execute the script, output will appear below:"
./hello_world.py

-rwxr--r-- 1 38458 users 176 Apr 27 13:35 hello_world.py
-rwxr--r-- 1 38458 users 176 Apr 27 13:35 hello_world.py
Next, we will execute the script, output will appear below:
Hello world


Of course, such scripts can also be executed from the terminal, instead of a JupyterNotebook:

![execute_in_terminal.jpg](execute_in_terminal.jpg)

It's probably much more common to execute such scripts in the terminal than to call scripts from a JupyterNotebook.

Take note of the dot and the slash (./) in front of the script name (please compare to the script name when calling e.g. `ls -l` - there are no dot and slash that were a part of the file name! We specify the dot and the slash in front of the script name in order say: "execute the script that is located at ./hello_world.py" and ./ is the current directory what are residing in. We called the script by a relative path specification.

If `hello_world.py` is executable, and if the location of `hello_world.py` is in your `PATH`, then you can call it with:

`hello_world.py`
                                                                     
(no dot, no slash)

The output will be identical since we would call the same script.

### Executability

If the script is not executable, this happens:

![permission_denied.jpg](permission_denied.jpg)

You can check the executability status of a script:

In [11]:
%%script bash
cd ~/Python_introduction
chmod u-x hello_world.py
ls -l hello_world.py

-rw-r--r-- 1 38458 users 176 Apr 27 13:35 hello_world.py


In the above example, the file is not executable (the `x` for user with some number is missing). If it was executable for user, it would look like this:






In [13]:
%%script bash
cd ~/Python_introduction
chmod u+x hello_world.py
ls -l hello_world.py

-rwxr--r-- 1 38458 users 176 Apr 27 13:35 hello_world.py


You have two options to call the script after identifying above problem (script is not executable):

 1. Call the script with the exectuable interpreter (that can be found in your `PATH`): `python3 hello_world.py`
 2. Make the script exectuable, afterwards, you call it without interpreter statement because the interpreter is contained in the first line of the script: `chmod u+x hello_world.py; hello_world.py` (I here assume that the script is in your `PATH`.)

### Location

If the script is not in your `PATH`, this happens:

![command_not_found.jpg](command_not_found.jpg)

You have three options:

 1. Call the script with an interpreter that is in your `PATH`: `python3 hello_world.py`
 2. Call the script with explicit path, examples:
    * If the script is in your current directory (check with `cwd` where you are, or check with `ls` whether the script is where you are), call the script with a leading `./` to tell the system that you mean the script is this particular directory: `./hello_world.py`
    * Call the script with absolute or relative path, e.g. `~/Python_introduction/hello_world.py`
 3. Extend your `PATH` variable. The `PATH` variable contains all locations where the system looks for executable files that are called without explicit path. Check which locations are already in your `PATH` with:
 

 

In [19]:
%%script bash
echo $PATH

/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin


Typically, local folders in your home directory are not contained in `PATH` by default, but you can extend contents of the `PATH` variable. You have the option to do this for a single terminal session, or for all future terminal sessions. For demonstration, we assume that your script is located in the directory `~/Python_introduction`:

  * Modify `PATH` for a single terminal session:
    1. Type: `PATH=~/Python_introduction:$PATH` into your terminal. **Be careful not to add any spaces, do not forget to add `:$PATH` at the end** (If you make mistakes here, the current terminal window will not be able to locate any of the usual bash commands, anymore... if you make this mistake: close the terminal, open a new one, try again.
    2. Test your path modification: `echo $PATH` should now return the original directories in `$PATH` and in addition the novel directory at the very beginning.

Demonstration of the process in JupyterNotebook instead of terminal:

  
 

In [23]:
%%script bash
echo "Show original \$PATH:"
echo $PATH
echo "Modifying \$PATH."
PATH=~/Python_introduction:$PATH
echo "Show extended new \$PATH:"
echo $PATH

Show original $PATH:
/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Modifying $PATH.
Show extended new $PATH:
/home/jovyan/Python_introduction:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin


  * Modify `PATH` for all (future) terminal sessions (**Warning: always test the path modification in a terminal on the target machine prior permanently modifying the path!**):
    1. Open the bash configuration file `~/.profile` in a text editor, e.g. vi (vi is an old school editor that requires very little resources but is a bit odd to use for beginners):
       * Type `vi ~/.profile` into your terminal.
       * Press the letter lowercase `a` on your keyboard, this takes you into editing mode.
       * Use the arrow-down key navigate to the bottom of your file (if you do this for the first time, the `~/.profile` will be empty and you don't have go anywhere, just stay in the current line.)
       * At the bottom of the file, add your `PATH` modification: `PATH=~/Python_introduction:$PATH`
       * Press escape key (this takes you out of editing mode)
       * Type the letters `:wq` (this saves the file and quits vi).
    2. In any **new** terminal window, the modified `PATH` will now automatically be loaded. To enable the modification in your current (**old**) terminal session, type: `source ~/.profile`. 
    3. Test your path modification: `echo $PATH` should now include your prepended directory.
    
Note: On other systems than JupyterHub, it's often more common to store path extensions in a file called `~/.bashrc`. There, you simply use `~/.bashrc` instead of `~/.profile`.

Please make sure that you understand the concept of the `$PATH` variable. It is an essential concept for using third-party software on Unix systems.
