gpu-experiment-scheduler

Lightweight single-node multi-GPU experiment scheduler for Python ML research. Zero dependencies (stdlib only), copy-paste ready.

Schedules a name * seed task matrix across multiple GPUs on one machine using LPT ordering (heavy tasks first) and load-aware dispatch (idle GPU with the smallest active workload gets the next task).

When to use

Situation	Tool
1 machine, 2-8 GPUs, need to run N*M experiments	lab-orchestrator
SLURM cluster available	Hydra + Submitit
Hyperparameter search with Bayesian optimisation	Ray Tune / Optuna
Shared multi-user workstation with job queuing	gflow
Just need experiment tracking	MLflow / W&B

Install

# From source (editable), run from the repo root:
pip install -e .

# Or just copy lab_orchestrator/ into your project - zero dependencies.

Quick start

1. Programmatic (recommended for research repos)

Tip: weight is estimated GPU-hours per task. For a first run use weight=1.0 for everything - LPT still helps. Replace with actual wall-clock hours from logs once you have them.

from lab_orchestrator import Task, run_schedule, detect_gpus

tasks = [
    Task(weight=2.0, name="big", seed=42,            # weight ~= GPU-hours; use 1.0 if unknown
         cmd=["python", "train.py", "--config=big", "--seed=42"]),
    Task(weight=2.0, name="big", seed=43,
         cmd=["python", "train.py", "--config=big", "--seed=43"]),
    Task(weight=0.5, name="small", seed=42,
         cmd=["python", "train.py", "--config=small", "--seed=42"]),
]

run_schedule(detect_gpus(), tasks, workers_per_gpu=2)

2. With task matrix builder

from lab_orchestrator import build_task_matrix, run_schedule, detect_gpus

weights = {"big": 2.0, "small": 0.5}

def make_cmd(name, seed):
    return ["python", "train.py", f"--config={name}", f"--seed={seed}"]

tasks = build_task_matrix(
    names=["big", "small"],
    seeds=[42, 43, 44],
    weights=weights,
    cmd_factory=make_cmd,
)

run_schedule(detect_gpus(), tasks)

3. Config file (YAML)

# experiments.yaml
names: [memorization, masked_instability, temporal_limit]
seeds: [42, 43, 44, 45, 46]
weights:
  memorization: 6.5
  masked_instability: 1.7
  temporal_limit: 7.3
cmd_template: "python main.py {name} --seed {seed} --epochs 50"

Replace main.py with your training script. See examples/sweep/ for a working config that pairs with a training script.

python -m lab_orchestrator experiments.yaml --gpus 0,1,2,3
python -m lab_orchestrator experiments.yaml --dry-run
python -m lab_orchestrator experiments.yaml --resume

4. tmux mode

Useful when you want to attach/detach from running experiments or monitor individual tasks in real time (one tmux window per task).

Generate a tmux script instead of running programmatically:

python -m lab_orchestrator experiments.yaml \
    --tmux --tmux-session my_exp \
    --venv "source .venv/bin/activate" \
    --cwd /path/to/experiment \
    > run_tmux.sh

bash run_tmux.sh
tmux attach -t my_exp

Examples

examples/
├-- sklearn_digits/            # minimal working example (no GPU needed)
│   ├-- train.py               # fits one model for one seed
│   └-- launch.py              # schedules 3 models * 5 seeds
├-- sweep/                     # GPU sweep template with YAML config
│   ├-- train.py               # training script (the "worker")
│   ├-- sweep.py               # programmatic launcher
│   └-- experiments.yaml       # same sweep as a YAML config for the CLI
├-- experiment_template.py     # (advanced) launcher + training in one file
└-- tmux_example.py            # generates a tmux bash script

Try the sklearn example right away (no GPU needed). --gpus 0 creates one subprocess pool - no real GPU required, sklearn ignores CUDA_VISIBLE_DEVICES:

pip install -e .                               # install lab-orchestrator
cd examples/sklearn_digits
python launch.py --gpus 0 --dry-run            # preview the 15-task schedule
python launch.py --gpus 0                      # run all 3 models * 5 seeds
python launch.py --gpus 0 --resume             # skip already-finished seeds
python train.py --name svm --seed 42           # run a single model/seed

Features

LPT scheduling - heavy tasks start first, short ones fill gaps
Load-aware dispatch - each new task goes to the least-loaded GPU
Per-GPU worker pools - overlap CPU-bound work with GPU kernels (default: 1, safe for any model; raise to 2 when your model leaves GPU memory headroom and preprocessing is CPU-bound)
Resume - skip tasks whose seed_*.json already exists
Dry-run - preview the schedule without executing
Progress tracking - ETA with [done/total pct%] in every log line
Subprocess isolation - each task is a separate process with its own CUDA_VISIBLE_DEVICES
tmux generation - alternative to programmatic dispatch
Zero dependencies - stdlib multiprocessing + subprocess only

Architecture

┌----------------------------------------------┐
│  Main process (dispatcher)                   │
│                                              │
│  task_pool --LPT--->  least-loaded GPU queue │
│                                              │
│  result_queue <- READY / START / DONE msgs   │
├----------┬----------┬----------┬-------------┤
│ GPU 0    │ GPU 1    │ GPU 2    │ GPU 3       │
│ worker_0 │ worker_0 │ worker_0 │ worker_0    │
│ worker_1 │ worker_1 │ worker_1 │ worker_1    │
│ (queue)  │ (queue)  │ (queue)  │ (queue)     │
└----------┴----------┴----------┴-------------┘

Workers pull from their GPU-specific queue and run tasks as subprocesses. The dispatcher monitors the result queue, tracks per-GPU active weight, and assigns the next task to the GPU with the smallest total in-flight weight.

Weight calibration

Task weights represent estimated GPU-hours under exclusive single-worker access. If you ran experiments with N workers per GPU sharing contention, decode exclusive time via:

$$T_{\text{exclusive}} = \frac{T_{\text{observed}}}{N^{\alpha}}$$

where α ~= 0.4-0.5 for typical DL workloads (memory-bandwidth bound).

License

MIT License (see LICENSE).

Also

Contributions are very welcome! Open an issue or a PR if you have suggestions or want to add features.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
examples		examples
lab_orchestrator		lab_orchestrator
resources		resources
tests		tests
.flake8		.flake8
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gpu-experiment-scheduler

When to use

Install

Quick start

1. Programmatic (recommended for research repos)

2. With task matrix builder

3. Config file (YAML)

4. tmux mode

Examples

Features

Architecture

Weight calibration

License

Also

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

gpu-experiment-scheduler

When to use

Install

Quick start

1. Programmatic (recommended for research repos)

2. With task matrix builder

3. Config file (YAML)

4. tmux mode

Examples

Features

Architecture

Weight calibration

License

Also

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages