## Scripps Research CBB - Code Topics

# Basic GNU Parallel and Python Multiprocessing

__Shang-Fu Chen (Shaun)  
PhD student @ Torkamani lab  
Scripps Research Tranlational Institute (SRTI)__

# Outline

1. How can I go home earlier?
1. What's GNU Parallel?
1. In what situation should I parallelize my jobs?
1. (How to run multiprocessing in Python script?)

# Why?

<center><img src='img/et-home-phone.jpg' style="width: 550px;"></center>

# How can I go home earlier?
- Sequential jobs
- Interactive parallel (`nohup`, `screen`)
- UNIX-based parallel (GNU `parallel`)
- Python-based parallel (python `multiprocessing`)

# For Loop vs Parallel

# GNU Parallel
- A: If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
- B: GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

Normal Parallel | GNU Parallel
:-:|:-: 
![fig_1](img/fig_1.png) | ![fig_2](img/fig_2.png)


- Resoure: [Tool: Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them](https://www.biostars.org/p/63816/)

## Example_1 - single command line

In [1]:
%%bash

# imbalance work load, very important to manage the work load balance.
# typical example: chr1-22, different chr will finish in different time, the computation resource will wait for the larger chunk.
# here we simulate the case by simple code, by sleep command with different time.

start=`date +%s`
for i in {1..4}; do 
    sleep ${i}
done
end=`date +%s`
echo Total execution time: $((end-start)) sec

Total execution time: 10 sec


#### Parallelization 
```parallel [command] {1} ::: [list]```

# Outline
- GNU screen for running job in background (credit: Roger Tu)
- GNU parallel for running job in parallel
- (Optional) Python package “multiprocessing”

# Example_1 - single command line

In [22]:
%%bash

start=`date +%s`
for i in {1..4}; do
    sleep ${i}
done
end=`date +%s`
echo Total execution time: $((end-start)) sec

Total execution time: 10 sec


```parallel [command] {1} ::: [list_1]```

In [23]:
%%bash

start=`date +%s`
parallel sleep {1} ::: {1..4}
end=`date +%s`
echo Total execution time: $((end-start)) sec

Total execution time: 5 sec


```parallel [command] {1} {2} ::: [list_1] ::: {list_2}```

In [20]:
%%bash

parallel echo {1} + {2} ::: {1..2} ::: {10..11}

1 + 10
1 + 11
2 + 10
2 + 11


# Example_2 - multiple command lines

In [10]:
%%bash

for i in {1..8}; do
    echo command line ${i}
    sleep 0.5
done

command line 1
command line 2
command line 3
command line 4
command line 5
command line 6
command line 7
command line 8


#### Parallelization 
```parallel "[command_1] {1}; [command_2]" ::: [list]```

In [11]:
%%bash

parallel "echo command line {1}; sleep 0.5" ::: {1..8}

command line 1
command line 2
command line 3
command line 4
command line 5
command line 6
command line 7
command line 8


# Example_3 - bash script

In [13]:
%%bash

for i in {1..8}; do echo "echo command line ${i}; sleep 0.5"; done > example_3.sh

cat example_1.sh

echo command line 1; sleep 0.5
echo command line 2; sleep 0.5
echo command line 3; sleep 0.5
echo command line 4; sleep 0.5
echo command line 5; sleep 0.5
echo command line 6; sleep 0.5
echo command line 7; sleep 0.5
echo command line 8; sleep 0.5


In [14]:
%%bash

time bash example_3.sh

command line 1
command line 2
command line 3
command line 4
command line 5
command line 6
command line 7
command line 8



real	0m4.051s
user	0m0.007s
sys	0m0.013s


### Parallelization 

```parallel < ${bash_script}```

In [15]:
%%bash

time parallel < example_1.sh

command line 1
command line 2
command line 3
command line 4
command line 5
command line 6
command line 7
command line 8



real	0m0.750s
user	0m0.146s
sys	0m0.149s


# Example_4 - function

In [37]:
%%bash

FUN() {
    echo $1 + $2
    sleep 0.1
}

start=`date +%s`
for i in {1..2}; do
    for j in {10..12}; do
        FUN ${i} ${j}
    done
done
end=`date +%s`
echo Total execution time: $((end-start)) sec

1 + 10
1 + 11
1 + 12
2 + 10
2 + 11
2 + 12
Total execution time: 0 sec


### Parallelization 

In [18]:
%%bash

FUN() {
    echo $1 + $2
    sleep 0.1
}
# environemnt variable
# -f means it's a function
export -f FUN

start=`date +%s`
parallel FUN ${i} ${j} ::: {1..2} ::: {10..12}
end=`date +%s`
echo Total execution time: $((end-start)) sec

1 + 10
1 + 11
1 + 12
2 + 10
2 + 11
2 + 12


# Example_5 - while loop

In [35]:
%%bash

for i in {1..4}; do echo $i; done > file.txt

while read line; do
    echo ${line}
done < file.txt

1
2
3
4


### Parallelization 

In [36]:
%%bash

parallel echo "{1}" :::: file.txt

1
2
3
4


# Python Multiprocessing

<center><img src="img/intro_mgr.png" style="width: 1400px;"></center>