Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dagger Julius: Fix Dask JIT -> AOT #18

Merged
merged 1 commit into from Jul 15, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion news/2022/07/dagger_julius_benchmark.md
Expand Up @@ -56,7 +56,7 @@ Anyway, that's enough background. Let's scrutinize this benchmark a bit more, be

Something else that we need to understand about this benchmark is that the four graph schedulers included are not all alike. One of the most important ways that schedulers can be compared is "visibility"; can the scheduler see the entire DAG before it starts executing, or does it only get bits and pieces as it goes along executing? This is an important consideration because being able to see the full DAG means that it's easy to perform optimizations on it, such as combining repetitive sub-graphs with a `for`-loop (basically undoing the "unrolling" of the graph so that the language's compiler can better optimize the whole sub-graph). Constructing the whole graph also incurs memory overhead, because the entire graph needs to exist in memory at some point in time; in certain cases, it can be prohibitively expensive just to construct the graph (let alone to actually execute it).

From what I understand, the JGE requires visibility into the whole DAG before execution begins; the same is also true for TensorFlow. Dask, instead, only sees parts of the DAG, as it is being built. I will term these two modes "ahead-of-time" (AOT) and "just-in-time" (JIT), respectively (these are often also referred to as "static" and "dynamic", respectively). So what does Dagger do? Well, in the benchmark, Dagger is being used in JIT mode, although it also supports an AOT mode. JIT mode (using `@spawn` and `fetch`) is recommended for most users, as it is often easier to use, and doesn't require knowledge of the full graph before execution can begin. However, AOT mode (using `delayed` and `compute`) has the benefit of being very efficient at consuming a fully constructed DAG, and can use less memory at runtime for reasons I won't get into here.
From what I understand, the JGE requires visibility into the whole DAG before execution begins; the same is also true for Dask and TensorFlow. However, it's also possible to design a scheduler that instead only sees parts of the DAG, as it is being built. I will term these two modes "ahead-of-time" (AOT) and "just-in-time" (JIT), respectively (these are often also referred to as "static" and "dynamic", respectively). So what does Dagger do? Well, in the benchmark, Dagger is being used in JIT mode, although it also supports an AOT mode. JIT mode (using `@spawn` and `fetch`) is recommended for most users, as it is often easier to use, doesn't store the full graph in memory all at once, and doesn't require knowledge of the full graph before execution can begin. However, AOT mode (using `delayed` and `compute`) has the benefit of being very efficient at consuming a fully constructed DAG, and can use less memory at runtime for reasons I won't get into here.

#### Comparison: What graph-building modes are supported?

Expand Down