# Deep Learning Experiments with Task Spooler
> A Unix Spooler with GPU support

- toc: false
- branch: master
- badges: true
- comments: false
- categories: [spooler, task manager, deep learning]


## Introduction

Task Spooler was originally developed by Lluis Batlle i Rossell but is no longer maintained.
The branch introduced here is a [fork](https://github.com/justanhduc/task-spooler) of the original program with more features including GPU support.

## Installation

First, you can clone Task Spooler from Github.
Optionally, you can choose a different version by checking out another tag.
In this tutorial, I will use the latest version on `master`.

In [1]:
%%capture
!git clone https://github.com/justanhduc/task-spooler

Next, you need to create a `CUDA_HOME` environment variable to point to the CUDA root directory.
Then, you can execute the given install script.

In [2]:
!cd task-spooler/ && CUDA_HOME=/usr/local/cuda ./reinstall

rm -f *.o ts
cc -pedantic -ansi -Wall -g -O0 -std=c11 -D_XOPEN_SOURCE=500 -D__STRICT_ANSI__ -c main.c
cc -pedantic -ansi -Wall -g -O0 -std=c11 -D_XOPEN_SOURCE=500 -D__STRICT_ANSI__ -c server.c
cc -pedantic -ansi -Wall -g -O0 -std=c11 -D_XOPEN_SOURCE=500 -D__STRICT_ANSI__ -c server_start.c
cc -pedantic -ansi -Wall -g -O0 -std=c11 -D_XOPEN_SOURCE=500 -D__STRICT_ANSI__ -c client.c
cc -pedantic -ansi -Wall -g -O0 -std=c11 -D_XOPEN_SOURCE=500 -D__STRICT_ANSI__ -c msgdump.c
cc -pedantic -ansi -Wall -g -O0 -std=c11 -D_XOPEN_SOURCE=500 -D__STRICT_ANSI__ -c jobs.c
cc -pedantic -ansi -Wall -g -O0 -std=c11 -D_XOPEN_SOURCE=500 -D__STRICT_ANSI__ -c execute.c
cc -pedantic -ansi -Wall -g -O0 -std=c11 -D_XOPEN_SOURCE=500 -D__STRICT_ANSI__ -c msg.c
cc -pedantic -ansi -Wall -g -O0 -std=c11 -D_XOPEN_SOURCE=500 -D__STRICT_ANSI__ -c mail.c
cc -pedantic -ansi -Wall -g -O0 -std=c11 -D_XOPEN_SOURCE=500 -D__STRICT_ANSI__ -c error.c
cc -pedantic -ansi -Wall -g -O0 -std=c11 -D_XOPEN_SOURCE=500 -D__STRICT_ANSI__ 

## Basics of Task Spooler
### First look
The interface of Task Spooler is shown below by simply executing `ts` without argument.
In the figure above, `ID` refers to job ID.
There are four main types of `State`: `running` indicates that a job is currently running,
`queued` that a CPU job is waiting to be executed, `allocating` is a queued GPU job,
and `running` means the job is currently being executed.
When a job is executed, the `stdout` stream is redirected to a file under the `Output` tab.
These log files will never automatically deleted even after the job list is cleared.
`E-Level` captures and displays the return error of a process.
`Time` indicates the running time of a job.
The running command is shown in the `Command` column.
The numbers inside the square bracket next to `Command` specify the number of currently running
jobs and the maximum jobs (slots) that can be run in parallel.
For example, in the figure above, there is no running job and you can run at most one job in 
parallel, respectively.
The maximum slot number can be adjusted manually.


In [3]:
!ts

ID   State      Output               E-Level  Time   GPUs  Command [run=0/1]


### Queuing your first job

Jobs can be added by simply appending `ts` in front of your command.
For e.g., to run make the system sleep for 10 seconds using Task Spooler, execute

In [4]:
!ts sleep 10
!ts
!sleep 10  # lets check ts again after 10 seconds
!ts

0
ID   State      Output               E-Level  Time   GPUs  Command [run=1/1]
0    running    /tmp/ts-out.j0MGwO                   0     sleep 10
ID   State      Output               E-Level  Time   GPUs  Command [run=0/1]
0    finished   /tmp/ts-out.j0MGwO   0        10.00s 0     sleep 10


You can see that the first job with ID `0` is currently running, and
the other job is being queued.
After 10 seconds, the first job will finish with an `E-Level` of `0` and 
the second job will start.

To enable running more jobs in parallel, you can increase the maximum slot number by
using a `-S` flag followed by the desired number.
For instance,

In [5]:
!ts -S 4
!ts

ID   State      Output               E-Level  Time   GPUs  Command [run=0/4]
0    finished   /tmp/ts-out.j0MGwO   0        10.00s 0     sleep 10


The command above allows you to run 4 jobs at the same time.
You can verify by typing `ts` and the last number in the square bracket should change to `4`.
Let's try queuing 5 jobs at once and this time we should increase the sleep time to 
`100` so that the job doesn't end too fast.
You should be able to see something like this


In [6]:
!ts sleep 100
!ts sleep 20
!ts sleep 30
!ts sleep 40
!ts sleep 10
!ts

1
2
3
4
5
ID   State      Output               E-Level  Time   GPUs  Command [run=4/4]
1    running    /tmp/ts-out.xDq00e                   0     sleep 100
2    running    /tmp/ts-out.HUzUai                   0     sleep 20
3    running    /tmp/ts-out.sYcGno                   0     sleep 30
4    running    /tmp/ts-out.ArV4nv                   0     sleep 40
5    queued     (file)                               0     sleep 10
0    finished   /tmp/ts-out.j0MGwO   0        10.00s 0     sleep 10


### Vieing command outputs

As mentioned above, the `stdout` of the command is redirected to a file specified in the 
`Output` column. 
To manually see the written output, you can simply look for that file.
But of course Task Spooler is more than that. It lets you read the outputs contents in two ways
via the flags `-t` and `-c`.

`-c`, which stands for `cat`, allows you to see all the output from the beginning to the end.
`-t`, which means `tail`, displays only the last 10 lines of the output.
Let's try them out.
First, we can something that can produce a lot of texts, like `ls`, `df` or `du`.
The choice is yours.
For me, I ran `ts ls /usr/bin`. The job ID of the command in my case is `0` so to visualize 
the whole output, I used `ts -c 0`. It displayed a long list of excutable files.
When I typed `ts -t 0`, it showed only the last 10 lines.


In [7]:
#collapse-output
!ts -K  # reset Task Spooler. it will be introduced later
!ts ls /usr/bin
!ts -t 0

0
yes
zdump
zip
zipcloak
zipdetails
zipgrep
zipinfo
zipnote
zipsplit
zrun


In [8]:
#collapse-output

!ts -c 0

[
2to3-2.7
7z
7za
7zr
acyclic
add-apt-repository
addpart
addr2line
apropos
apt
apt-add-repository
apt-cache
apt-cdrom
apt-config
apt-extracttemplates
apt-ftparchive
apt-get
apt-key
apt-mark
apt-sortpkgs
ar
arch
as
asan_symbolize
asan_symbolize-6.0
assistant
awk
b2
b2sum
base32
base64
basename
bashbug
bcomps
bcp
bjam
bootctl
browse
bsd-from
bsd-write
busctl
c++
c89
c89-gcc
c99
c99-gcc
cal
calendar
captoinfo
catchsegv
catman
cautious-launcher
cc
ccomps
c++filt
chage
chattr
chcon
chfn
chronic
chrt
chsh
circo
cksum
clang
clang++
clang-6.0
clang++-6.0
clang-cpp-6.0
clear
clear_console
clinfo
cluster
cmake
cmp
col
colcrt
colrm
column
combine
comm
compose
corelist
cpack
cpan
cpan5.26-x86_64-linux-gnu
cpp
cpp-7
c_rehash
csplit
ctest
curl
curl-config
cut
cvt
dbus-cleanup-sockets
dbus-daemon
dbus-monitor
dbus-run-session
dbus-send
dbus-update-activation-environment
dbus-uuidgen
debconf
debconf-apt-progress
debconf-communicate
debconf-copydb
debconf-escape
debconf-set-selections
debconf-show
deb-

### Miscs

There are many other flag options to manage your tasks.
First of all, to see all the available options, use a `-h` options.
Among these, the ones you probably will use most are `-r`, `-C`, `-k`, `-T` and `-K`.
To remove a queued or finished job (with `finished`, `queued` or `allocating` status), 
use `-r` with optionally a job ID.
For example, `ts -r` removes the last added job if it is not running yet.
`ts -r 10` removes the job with ID `10`.
If the job is successfully removed, it should disappear from the job list.




In [18]:
!ts -K
!ts -S 2  # lets run 2 tasks at a time
!ts sleep 100
!ts sleep 100
!ts sleep 100
!ts

0
1
2
ID   State      Output               E-Level  Time   GPUs  Command [run=2/2]
0    running    /tmp/ts-out.gClvpl                   0     sleep 100
1    running    /tmp/ts-out.rW9nIv                   0     sleep 100
2    queued     (file)                               0     sleep 100


In [19]:
!ts -r 2  # remove job 2
!ts

ID   State      Output               E-Level  Time   GPUs  Command [run=2/2]
0    running    /tmp/ts-out.gClvpl                   0     sleep 100
1    running    /tmp/ts-out.rW9nIv                   0     sleep 100


To kill a running job, use `ts -k <jobid>`.

In [20]:
!ts -k 0  # lets kill job 0
!ts

ID   State      Output               E-Level  Time   GPUs  Command [run=1/2]
1    running    /tmp/ts-out.rW9nIv                   0     sleep 100
0    finished   /tmp/ts-out.gClvpl   -1        8.07s 0     sleep 100


In [21]:
# Queue some more jobs here
!ts -S 5
!ts sleep 100
!ts sleep 100
!ts sleep 100
!ts

3
4
5
ID   State      Output               E-Level  Time   GPUs  Command [run=4/5]
1    running    /tmp/ts-out.rW9nIv                   0     sleep 100
3    running    /tmp/ts-out.BeUKip                   0     sleep 100
4    running    /tmp/ts-out.uFu50z                   0     sleep 100
5    running    /tmp/ts-out.o0hd1F                   0     sleep 100
0    finished   /tmp/ts-out.gClvpl   -1        8.07s 0     sleep 100


To kill all running jobs, use `ts -T`.

In [22]:
!ts -T  # terminates all running jobs
!ts

ID   State      Output               E-Level  Time   GPUs  Command [run=0/5]
0    finished   /tmp/ts-out.gClvpl   -1        8.07s 0     sleep 100
1    finished   /tmp/ts-out.rW9nIv   -1       22.42s 0     sleep 100
5    finished   /tmp/ts-out.o0hd1F   -1        8.84s 0     sleep 100
3    finished   /tmp/ts-out.BeUKip   -1        9.06s 0     sleep 100
4    finished   /tmp/ts-out.uFu50z   -1        8.95s 0     sleep 100


To clear all the `finished` jobs from the list, use `-C` without argument.

In [23]:
!ts sleep 100
!ts -C  # clear job list
!ts

6
ID   State      Output               E-Level  Time   GPUs  Command [run=1/5]
6    running    /tmp/ts-out.bOY0Sx                   0     sleep 100


Finally, `ts -K` will kill the Task Spooler process.

In [24]:
!ts -K  # lets kill Task Spooler
!ts  # then restarts

ID   State      Output               E-Level  Time   GPUs  Command [run=0/1]


There are some useful flags when scheduling tasks as well.
You may want to execute a task only after a certain job finishes.
In this case you can use the flag `-d` with no argument to make your future task depend on
the last added job, `-D` with a comma separated list of job IDs which are
the IDs of the jobs that the to-be-run task depends on, and `-W` followed by a list of IDs, which states that the dependent job will run iff all the dependencies finish with exit code `0`.
For example, 

In [None]:
!ts -S 10
# lets queue 3 jobs first
!ts sleep 100
!ts sleep 100
!ts sleep 200
!ts

0
1
2
ID   State      Output               E-Level  Time   GPUs  Command [run=3/10]
0    running    /tmp/ts-out.1wh18P                   0     sleep 100
1    running    /tmp/ts-out.aqr1P0                   0     sleep 100
2    running    /tmp/ts-out.SLCGX7                   0     sleep 200


In [None]:
!ts -d sleep 10  # does not care about exit code
!ts -D 0,1,3 sleep 10  # runs after jobs 0, 1 and 3
!ts -W 0,2,3 sleep 10  # to run this job, jobs 0, 2 and 3 need to finish well
!ts

3
4
5
ID   State      Output               E-Level  Time   GPUs  Command [run=3/10]
0    running    /tmp/ts-out.1wh18P                   0     sleep 100
1    running    /tmp/ts-out.aqr1P0                   0     sleep 100
2    running    /tmp/ts-out.SLCGX7                   0     sleep 200
3    queued     (file)                               0     [2]&& sleep 10
4    queued     (file)                               0     [0,1,3]&& sleep 10
5    queued     (file)                               0     [0,2,3]&& sleep 10


In [None]:
!ts -k 2
!ts

ID   State      Output               E-Level  Time   GPUs  Command [run=3/10]
0    running    /tmp/ts-out.1wh18P                   0     sleep 100
1    running    /tmp/ts-out.aqr1P0                   0     sleep 100
3    running    /tmp/ts-out.suaN1K                   0     [2]&& sleep 10
4    queued     (file)                               0     [0,1,3]&& sleep 10
5    queued     (file)                               0     [0,2,3]&& sleep 10
2    finished   /tmp/ts-out.SLCGX7   -1       10.35s 0     sleep 200


In [None]:
!sleep 100  # let's wait for jobs 0 and 1 to finish
!ts  # you will see that the job queued with `-W` will be skipped

ID   State      Output               E-Level  Time   GPUs  Command [run=0/10]
2    finished   /tmp/ts-out.SLCGX7   -1       10.35s 0     sleep 200
3    finished   /tmp/ts-out.suaN1K   0        10.00s 0     [2]&& sleep 10
0    finished   /tmp/ts-out.1wh18P   0         1.67m 0     sleep 100
5    skipped    (no output)                          0     [0,2,3]&& sleep 10
1    finished   /tmp/ts-out.aqr1P0   0         1.67m 0     sleep 100
4    finished   /tmp/ts-out.yV8vfT   0        10.00s 0     [0,1,3]&& sleep 10


To distinguish tasks, you can also label them using the `-L` flag.

In [None]:
!ts -L foo sleep 10

6


In [None]:
!ts

ID   State      Output               E-Level  Time   GPUs  Command [run=0/10]
2    finished   /tmp/ts-out.SLCGX7   -1       10.35s 0     sleep 200
3    finished   /tmp/ts-out.suaN1K   0        10.00s 0     [2]&& sleep 10
0    finished   /tmp/ts-out.1wh18P   0         1.67m 0     sleep 100
5    skipped    (no output)                          0     [0,2,7303014]&& sleep 10
1    finished   /tmp/ts-out.aqr1P0   0         1.67m 0     sleep 100
4    finished   /tmp/ts-out.yV8vfT   0        10.00s 0     [0,1,3]&& sleep 10
6    finished   /tmp/ts-out.EO9Qct   0        10.00s 0     [foo]sleep 10


## GPU support

The `GPUs` column shows the number of GPUs that the task requires.

Before, when running CPU tasks, the number of parallel tasks is capped by the 
number of slots.
For a GPU task, it is further restricted by the number of available GPUs.
In other words, a GPU task can run only when there are enough both slots and GPUs.
The availability of a GPU is determined by the free memory of that GPU.
If more than 90% of the memory is available, the GPU is deemed to be free, and vice versa.
If there are more free GPUs than required, the GPUs will be chosen auto-magically and randomly.

There is one thing to note here. Because the availability of a GPU is determined by its
memory usage, and it may take time for your task to initialize the GPU memory, so if you 
run two tasks at the same time, they may use the same device and eventually may crash due to
out-of-memory error.
Therefore, in Task Spooler, I deliberately delay subsequent GPU tasks a short time 
(30 seconds by default) after a GPU task is just executed.
This is ugly, but it does the job.
You can change this delay time via the flag `--set_gpu_wait` followed by the number of seconds.
That's why when you execute several jobs at once, you may find the tasks after the first one 
taking a long time to start execution.
Also sometimes you may see the job status being changed to `running` but the task is not actually
executed yet, and there is no output file. This is usual. Just keep waiting... It will be 
executed soon (or sometimes not very soon, but anw it will run)!

Now, to tell Task Spooler that your job requires GPU, use `-G` followed by the number of 
required GPUs. Task Spooler will allocate the GPU(s) for the job, and it will make your job see
only the provided GPU(s) so your task won't mess with the others.
For a stupid example, let's sleep with 1 GPU. In your terminal, execute


In [None]:
!ts -K
!ts -G 1 sleep 1
!ts

0
ID   State      Output               E-Level  Time   GPUs  Command [run=1/1]
0    running    /tmp/ts-out.N6RDHT                   1     sleep 1


In [None]:
!ts -G 100 sleep 1
!ts

1
ID   State      Output               E-Level  Time   GPUs  Command [run=0/1]
1    allocating (file)                               100   sleep 1
0    finished   /tmp/ts-out.N6RDHT   0         1.00s 1     sleep 1


In the figure, I demanded 100 GPUs even though the server has only 1, and hence the task has
to be queued (in this case, forever).

## Deep learning with Task Spooler


Let's train a Convolutional Neural Network (CNN) on MNIST.
For this example, I will use the official [Pytorch MNIST example](https://github.com/pytorch/examples/blob/master/mnist/main.py).
To enable the code to use muti-GPU, you will have to manually add 

```
model = nn.DataParallel(model)
```
after line 124 (`optimizer = optim.Adadelta(model.parameters(), lr=args.lr)`).
You can download the script by executing the cell below.

In [25]:
%%capture
!wget https://open-source-codes.s3.amazonaws.com/mnist.py

To train the CNN with Task Spooler using 1 GPU, execute the script as usual in terminal 
but with `ts -G 1` before `python`. The full command is

In [26]:
!ts -K
!ts -G 1 python mnist.py
!ts

0
ID   State      Output               E-Level  Time   GPUs  Command [run=1/1]
0    running    /tmp/ts-out.uCfJPL                   1     python mnist.py


Note that without the `-G` flag, the job will run on CPU instead.

To see the output, use the `-c` or `-t` flag.
You should see the training in real-time. You can use `ctrl+c` to stop getting stdout anytime without actually canceling the experiment.

In [28]:
#collapse-output

!ts -t 0

0it [00:00, ?it/s]  0%|          | 0/9912422 [00:00<?, ?it/s]  0%|          | 16384/9912422 [00:00<01:18, 125345.15it/s]  1%|          | 98304/9912422 [00:00<01:00, 163252.10it/s]  4%|4         | 434176/9912422 [00:00<00:42, 225480.04it/s] 18%|#7        | 1753088/9912422 [00:00<00:25, 318272.54it/s] 54%|#####3    | 5349376/9912422 [00:00<00:10, 452888.99it/s] 78%|#######7  | 7716864/9912422 [00:01<00:03, 639356.13it/s]9920512it [00:01, 8147219.65it/s]                            
0it [00:00, ?it/s]  0%|          | 0/28881 [00:00<?, ?it/s]32768it [00:00, 104284.59it/s]           
0it [00:00, ?it/s]  0%|          | 0/1648877 [00:00<?, ?it/s]  1%|          | 16384/1648877 [00:00<00:11, 146567.16it/s]  6%|5         | 98304/1648877 [00:00<00:08, 185590.01it/s] 26%|##6       | 434176/1648877 [00:00<00:04, 255112.69it/s]1654784it [00:00, 2162067.57it/s]                           
0it [00:00, ?it/s]  0%|          | 0/4542 [00:00<?, ?it/s]8192it [00:00, 38263.37it/s]        

In [29]:
!ts

ID   State      Output               E-Level  Time   GPUs  Command [run=1/1]
0    running    /tmp/ts-out.uCfJPL                   1     python mnist.py


Unfortunately, there is only 1 GPU available in Colab, so I can't demonstrate training with multiple GPUs. You will have to trust me that it works!

That's it folks. I hope this little app can boost your productivity and you will enjoy
using it for not only your experiments but also your daily tasks.
If you have any questions or want to contribute, feel free to create an issue
or make a PR on the [Github page](https://github.com/justanhduc/task-spooler).

## About me

I am Duc Nguyen from Vietnam.
Currently, I am a PhD candidate at Yonsei University, Korea.
For more information about me,
you guys can visit [my website](https://justanhduc.github.io/) or contact me at
adnguyen@yonsei.ac.kr.