DistComp: a distributed computation platform

DistComp is a simple tool that uses data parallelism to speed up computation. It is designed to be easy to use. I have been using it on Cloudlab to use up to 100 nodes to speed up my experiments.

Architecture

DistComp uses a manager-worker architecture. The manager node is responsible for the management, e.g., submitting new tasks and monitoring tasks and worker status. The worker nodes perform the computation. The current design decouples task submission and task execution. A manager submits tasks to the Redis queue. The worker nodes will poll the queue and execute the tasks.

Features

DistCom maximizes resource usage: it runs as many jobs as possible on each node. When some tasks' memory usage grows over time, and the worker is going to run out of memory, the most recent task will be returned to the to-do task queue.
On-demand task submission: new tasks can be submitted anytime.
Fault tolerance: Restarted workers can fetch their previous tasks to continue, and if some workers fail, the failed tasks can be moved to the to-do queue.
Different types of tasks: DistComp supports bash and Python tasks.

How to use

0. Prepare worker nodes

You need to prepare the worker nodes to run the tasks, e.g., install dependencies. Besides the dependency of running the task, the worker and manager nodes need to install redis and psutil packages.

pip install redis psutil

And the manager needs parallel-ssh to launch jobs.

If your task requires data, the data must be accessible on the worker nodes. You can consider using a cluster file system, e.g., MooseFS, to aggregate disk capacity from all workers and provide high-bandwidth access to data.

1. Setup Redis on the manager node

bash ./redis.sh

2. Create the tasks to be run

Create a file containing all the tasks. Each line in the file consists of a task in the following format:

# task type:priority:min_dram:min_cpu:task_params
# echo 'shell:4:2:2:./cachesim PARAM1 PARAM2' >> task

# submit the tasks to the Redis
python3 redisManager.py --task 'initRedis&loadTask' --taskfile task

3. Start the worker nodes

We use the parallel-ssh tool to start the scripts on the worker nodes. The list of workers (ip or hostname) is stored in the host file.

parallel-ssh -h host -i -t 0 '''
    cd /PATH/TO/DistComp;
    screen -S worker -L -Logfile workerScreen/$(hostname) -dm python3 redisWorker.py
'''

4. Monitor the progress

# check the task status
python3 redisManager.py --task checkTask --finished false --todo false --in_progress false --failed false

# check the worker status
python3 redisManager.py --task checkWorker

# monitor the task progress
watch "python3 redisManager.py --task 'checkTask&checkWorker' --finished false --print_result false --in_progress false"

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
diagram		diagram
example		example
README.md		README.md
adhoc.py		adhoc.py
conf.json		conf.json
const.py		const.py
redis.sh		redis.sh
redisManager.py		redisManager.py
redisWorker.py		redisWorker.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diagram

diagram

example

example

README.md

README.md

adhoc.py

adhoc.py

conf.json

conf.json

const.py

const.py

redis.sh

redis.sh

redisManager.py

redisManager.py

redisWorker.py

redisWorker.py

requirements.txt

requirements.txt

utils.py

utils.py

Repository files navigation

DistComp: a distributed computation platform

Architecture

Features

How to use

0. Prepare worker nodes

1. Setup Redis on the manager node

2. Create the tasks to be run

3. Start the worker nodes

4. Monitor the progress

About

Releases

Packages

Contributors 2

Languages

1a1a11a/distComp

Folders and files

Latest commit

History

Repository files navigation

DistComp: a distributed computation platform

Architecture

Features

How to use

0. Prepare worker nodes

1. Setup Redis on the manager node

2. Create the tasks to be run

3. Start the worker nodes

4. Monitor the progress

About

Resources

Stars

Watchers

Forks

Languages