A collection of workflow task graphs for benchmarking and evaluating task scheduling algorithms in heterogeneous computing environments.
Each top-level directory represents a scientific workflow application. Subdirectories within each workflow are organized by the number of nodes (tasks) in the workflow graph.
| Workflow | Description | Node Sizes Available |
|---|---|---|
| FFT | Fast Fourier Transform workflows | 40, 96, 224, 512, 1152 |
| GE | Gene Expression analysis workflows | 9, 35, 135, 527, 2079 |
| Genome | Genome sequencing workflows | 50 – 1000 (in increments of 50–100) |
| LA | Linear Algebra workflows | 11, 37, 137, 529, 2081 |
| LIGO | Laser Interferometer Gravitational-Wave Observatory workflows | 50 – 1000 (in increments of 50–100) |
| Montage | Astronomical image mosaicking workflows | 50 – 1000 (in increments of 50–100) |
Dataset/
├── FFT/
│ ├── 40Nodes/
│ ├── 96Nodes/
│ ├── 224Nodes/
│ ├── 512Nodes/
│ └── 1152Nodes/
├── GE/
│ ├── 9Nodes/
│ ├── 35Nodes/
│ ├── 135Nodes/
│ ├── 527Nodes/
│ └── 2079Nodes/
├── Genome/
│ ├── 50Nodes/
│ ├── 100Nodes/
│ ├── ...
│ └── 1000Nodes/
├── LA/
│ ├── 11Nodes/
│ ├── 37Nodes/
│ ├── 137Nodes/
│ ├── 529Nodes/
│ └── 2081Nodes/
├── LIGO/
│ ├── 50Nodes/
│ ├── 100Nodes/
│ ├── ...
│ └── 1000Nodes/
└── Montage/
├── 50Nodes/
├── 100Nodes/
├── ...
└── 1000Nodes/
These datasets can be used to evaluate scheduling algorithms on metrics such as:
- Makespan — total execution time of the workflow
- Resource utilization — CPU/memory efficiency across cloud instances
- Cost — monetary cost of cloud resource allocation
- Scalability — algorithm performance as workflow size increases
The varying node counts within each workflow allow for scalability testing — from small workflows (9–40 nodes) to large-scale workflows (1000–2081 nodes).