Skip to content

assuncaomarcos/daggen

Repository files navigation

daggen

A Python port of the daggen tool proposed by Suter & Hunold.

This library generates random, synthetic task graphs for simulation. You can easily convert the generated task graphs into directed acyclic graphs (DAGs) in tools such as NetworkX or igraph, or export them to files. The library helps evaluate scheduling algorithms using various application configurations.

Here is an example of how to generate a DAG of 15 tasks, accepting the default parameters, and import it into igraph:

import daggen as dg
import igraph as ig

dag = dg.DAG(seed=42, num_tasks=15)
tasks, edges = dag.task_n_edge_dicts()
igraph_dag = ig.Graph.DictList(vertices=tasks, edges=edges, directed=True)

You can alternatively convert the task graph into a NetworkX' DiGraph:

import daggen as dg
import networkx as nx

dag = dag.DAG(seed=46, num_tasks=15)
tasks, edges = dag.task_n_edge_tuples()
net_dag = nx.DiGraph()
net_dag.add_nodes_from(tasks)
net_dag.add_edges_from(edges)

Task and edge attributes

Each generated task has two attributes:

  • computation: a sequential cost (in Flops).
  • alpha: the alpha parameter of Amdahl's Law, used to encode the overhead of parallelizing tasks in parallel task graphs.

Each edge represents a communication from a parent task to a child task and has a data attribute representing the amount of data transferred from parent to child.

DAG parameters

You can configure the characteristics of the generated DAGs by parametrizing the DAG() function. Following the C implementation, one can set the following parameters:

  • `seed`: used to seed the random number generator and ensure reproducibility.
  • `num_tasks`: Number of computation nodes (application tasks) in the DAG.
  • `min_data`: Minimum amount of data in bytes a task processes.
  • `max_data`: Maximum amount of data in bytes a task processes.
  • `min_alpha`: Minimum value for the extra parameter (e.g., Amdahl's law parameter).
  • `max_alpha`: Minimum value for the extra parameter.
  • `fat`: Width of the DAG, the maximum number of tasks executed concurrently. A small value results in a thin DAG. (e.g., chain) with low task parallelism, while a large value creates a fat DAG (e.g., fork-join). with a high degree of parallelism.
  • `density`: Determines the dependencies between tasks of two consecutive DAG levels.
  • `regular`: Regularity of the task distribution between the different levels of the DAG.
  • `ccr`: Communication to computation ratio. It encodes the complexity of the computation of a task depending on the size n of the dataset it processes. The encoding is as follows:
    • 1 : a * n (a is a constant picked randomly between 26 and 29).
    • 2 : a * n log n
    • 3 : n3/2
    • 0 : Random choice among the three complexities.
  • `jump_size`: Maximum number of levels spanned by inter-task communications, which enables DAGs with execution paths of multiple lengths.

Examples on Google Colab

A couple examples are available on this Colab notebook:

Open in Colab