Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Distributed cron and job scheduler
Python C++ C Shell Java CMake Other
Failed to load latest commit information.
conf config param rename: ipv6 -> ipv6_only
db empty db folder
debian version up: 0.15
doc config param rename: ipv6 -> ipv6_only
jobs debianization: initial commit
node PEP8 code style fix
src config param rename: ipv6 -> ipv6_only
.gitignore remove masterdb
CMakeLists.txt remove masterdb
LICENSE.txt Apache 2.0 license added
README.rst version up: 0.15
groups master: 'hosts' -> 'groups'
hosts_group1 hosts groups
master.cfg config param rename: ipv6 -> ipv6_only
prun prun: use multi-line print for help command
prun-bf.spec version up: 0.15
worker.cfg config param rename: ipv6 -> ipv6_only



PRUN provides control over batch jobs and distributed computing resources. It consists of a single master daemon and worker daemons installed across the computer cluster. Master provides a queueing mechanism, scheduling, priority scheme, failover and planning of cron jobs. Worker is responsible for running of batch jobs and monitoring of their execution. Workers are fully controlled by a master node. Prun can be used on Unix-like operating systems, including Linux and *BSD.

Where to find complete Prun documentation?

This README is just a fast "quick start" document. You can find more detailed documentation at doc/README


Build requirements:

  • cmake 2.6 (or higher)
  • GCC 4.6 (or higher) or Clang
  • boost 1.46 (or higher)

Additional requirements:

  • python 2.6/3.x (admin tool written in Python)

Building debian packages:

> git clone
> cd prun
> debuild -sa -j8
> ls ../prun*.deb

Building RPMs:

> git clone
> mv prun prun-0.15  # add '-version' postfix
> mkdir -p rpmbuild/SOURCES
> tar cjf rpmbuild/SOURCES/prun-0.15.tar.bz2 prun-0.15
> rpmbuild -ba prun-0.15/prun-bf.spec
> ls rpmbuild/RPMS/*/*.rpm

Building runtime from sources:

> git clone
> cd prun
> cmake -DRelease=true .  # skipping Release var disables compiler optimizations
> make -j8

Optional external dependencies

Master node can serialize its jobs state to external key-value database. After master's node restart, master restores all previously submitted and not yet executed jobs at node startup. There are multiple bindings to different databases: LevelDB, Cassandra, Elliptics.

Path to shared library (binding) and its config is set in master.cfg.


Running Worker in terminal:

> cd prun
> ./pworker

Use 'pworker --help' to display command line options.

Running Master in terminal:

> cd prun
> ./pmaster

Use 'pmaster --help' to display command line options.

Submitting 'Hello, World!' job:

> ./prun -c "run test.job"


From debian package:

> dpkg -i prun-worker_0.15_amd64.deb

From RPM:

> rpm -ivh prun-worker-0.15-1.el7.centos.1.x86_64.rpm

Also it is possible to install prun worker and master using installation scripts (like 'make install'):

> cd prun                    # cd to the directory containing prun
> utils/    # install Master
> utils/    # install Worker

The script will ask you a few questions and will setup everything you need to run Master/Worker properly as a background daemon that will start again on system reboots.

You'll be able to stop and start Master/Worker using the script named /etc/init.d/pmaster and /etc/init.d/pworker, accordingly.

Job submission

Here's a simple example of external sort task. Let's assume that we have network-shared directory named 'data', which is read/write available from any node in a cluster. So we need parallelize sorting of a big text file.

One solution is to sort a separate small pieces of a file and then merge them into one big sorted file. Small pieces of a file are called chunks. We can submit a job that sorts chunks from Master to Workers. Here's a simple shell script (see jobs/example/ that does it properly:

echo "Sorting chunk process started"
echo "taskId="$taskId", numTasks="$numTasks", jobId="$jobId


fileSize=`stat --printf="%s" $filename`
partSize=`expr $fileSize / $numTasks`

dd if=$filename bs=$partSize skip=$taskId count=1 | sort --buffer-size=$partSize"b" > $outFile

exit $errCode

For submitting a chunk sorting job, we should describe it in a .job file (see jobs/sort_chunk.job), that is written in JSON format (see doc/README for more detailed description):

    "script" : "example/",
    "language" : "shell",
    "send_script" : true,
    "priority" : 4,
    "job_timeout" : 1800,
    "queue_timeout" : 300,
    "task_timeout" : 300,
    "max_failed_nodes" : 10,
    "num_execution" : 16,
    "max_cluster_instances" : -1,
    "max_worker_instances" : 1,
    "exclusive" : false,
    "no_reschedule" : false

In a few words this job should be executed 16 times, using exactly one CPU of a Worker node and should be done within 1800 seconds. It means that if we have 16 Worker nodes (computers/CPUs), each worker node will sort one of sixteen chunks of the input big file. Even if we have only one worker, chunk sorting job will be executed sixteen times.

After sorting chunks, this chunks could be merged together in one big output file. Here's a simple shell script (see jobs/example/ which does it properly:

echo "Chunk merging process started"
echo "taskId="$taskId", numTasks="$numTasks", jobId="$jobId

chunks=`ls -d data/*[0-9]`

sort --buffer-size=33% -T "data" -m $chunks > $outFile

exit $errCode

And merge job description (see jobs/sort_merge.job):

    "script" : "example/",
    "language" : "shell",
    "send_script" : true,
    "priority" : 4,
    "job_timeout" : 1800,
    "queue_timeout" : 1800,
    "task_timeout" : 1800,
    "max_failed_nodes" : 10,
    "num_execution" : 1,
    "max_cluster_instances" : -1,
    "max_worker_instances" : 1,
    "exclusive" : false,
    "no_reschedule" : false

We want to run merging job strictly after completion of all chunk sorting jobs. It is possible to describe job dependencies in a directed acyclic graph. Prun takes that job dependencies from the .meta file. Here's a simple job dependency between two jobs (see jobs/external_sort.meta):

    "graph" : [["sort_chunk.job", "sort_merge.job"]]

Ok, we are almost done. We are having everything that is needed for sorting the big file: running Workers across cluster nodes, one running Master process, jobs and job descriptions, shared directory containing the input file (data/input.txt). Lets submit job using command-line tool:

> cd prun                          # cd to the directory containing prun
> ./prun master_hostname           # run admin tool, connect to Master host
> run external_sort.meta           # submit a meta job

Cron job submission

Next example is dumping mysql database at muliple hosts simultaneously. Dumping planned at 3 a.m. at Sunday every week.

Firstly, we should create shell script, which does database dumping:

mysqldump -uroot -pQWERTY -A > /home/nobody/dump/all-databases.sql

And save this script to master's jobs directory, e.g. at /home/nobody/jobs/

Then we should create job description file of our cron job:

  "script" : "/home/nobody/jobs/",
  "language" : "shell",
  "send_script" : true,
  "priority" : 8,
  "job_timeout" : 3600,
  "queue_timeout" : 60,
  "task_timeout" : 3600,
  "max_failed_nodes" : -1,
  "num_execution" : -1,
  "max_cluster_instances" : -1,
  "max_worker_instances" : 1,
  "exclusive" : true,
  "no_reschedule" : true,
  "max_exec_at_worker" : 1,
  "exec_unit_type" : "host",
  "cron" : "* 3 * * 0",
  "name" : "weekly_dump"

And save it to master's jobs directory, e.g. at /home/nobody/jobs/dump.job This job will be started once at every available host at 3 a.m. every Sunday after submitting it to the master (here master is running at localhost):

./prun -c "run /home/nobody/jobs/dump.job" localhost

Stopping of any jobs by it's exclusive name performed by 'stop' command:

./prun -c "stop weekly_dump"


The contents of this repository are made available to the public under the terms of the Apache License, Version 2.0. For more information see LICENSE.txt

Something went wrong with that request. Please try again.