# 03 - DAG Execution: 5x Faster Workflows

We've solved context explosion. Now let's solve **latency**.

The problem: MCP tools execute sequentially, even when they don't depend on each other.

The solution: Build a **Directed Acyclic Graph (DAG)** and parallelize independent branches.

## Learning Objectives

After this notebook, you will:

- [ ] Understand what a DAG is and why it matters
- [ ] See how dependencies determine execution order
- [ ] Watch parallel execution achieve 3-5x speedup

---

## What is a DAG?

A **Directed Acyclic Graph** is a structure where:

- **Directed**: Edges have direction (A ‚Üí B means A must finish before B)
- **Acyclic**: No cycles (you can't go A ‚Üí B ‚Üí C ‚Üí A)

```
Example: Research & Report Workflow

     ‚îå‚îÄ‚îÄ‚îÄ fetch_github ‚îÄ‚îÄ‚îê
     ‚îÇ                   ‚îÇ
read_config              ‚îú‚îÄ‚îÄ‚îÄ combine_results ‚îÄ‚îÄ‚îÄ create_report
     ‚îÇ                   ‚îÇ
     ‚îî‚îÄ‚îÄ‚îÄ search_slack ‚îÄ‚îÄ‚îò

‚Ä¢ read_config has no dependencies (starts first)
‚Ä¢ fetch_github and search_slack both depend on read_config
‚Ä¢ fetch_github and search_slack are INDEPENDENT (can run in parallel!)
‚Ä¢ combine_results depends on BOTH
‚Ä¢ create_report depends on combine_results
```

## Building a DAG

Let's create a workflow and visualize its structure:

In [1]:
// Import the DAG builder
const { DAGBuilder } = await import("../../src/dag/builder.ts");

// Define a workflow with dependencies
const tasks = [
  {
    id: "read_config",
    tool: "filesystem:read_file",
    args: { path: "./config.json" },
    dependsOn: [],
  },
  {
    id: "fetch_github",
    tool: "github:list_issues",
    args: { repo: "$read_config.repo" },
    dependsOn: ["read_config"],
  },
  {
    id: "search_slack",
    tool: "slack:search_messages",
    args: { query: "$read_config.project" },
    dependsOn: ["read_config"],
  },
  {
    id: "combine_results",
    tool: "utils:merge_data",
    args: { github: "$fetch_github", slack: "$search_slack" },
    dependsOn: ["fetch_github", "search_slack"],
  },
  {
    id: "create_report",
    tool: "notion:create_page",
    args: { content: "$combine_results" },
    dependsOn: ["combine_results"],
  },
];

// Build the DAG
const dag = DAGBuilder.fromTasks(tasks);

console.log("DAG Structure:");
console.log(dag.toMermaid());

TypeError: Module not found "file:///home/ubuntu/CascadeProjects/AgentCards/src/dag/builder.ts".

## Understanding Layers

The DAG executor groups tasks into **layers**:

- All tasks in a layer can run in **parallel**
- Layers execute **sequentially** (layer 2 waits for layer 1)

In [None]:
// Compute execution layers
const layers = dag.computeLayers();

console.log("Execution Layers:\n" + "=".repeat(50));
layers.forEach((layer, i) => {
  const parallel = layer.length > 1 ? " [PARALLEL]" : "";
  console.log(`\nLayer ${i + 1}${parallel}:`);
  layer.forEach((taskId) => {
    const task = tasks.find((t) => t.id === taskId);
    console.log(`  ‚Ä¢ ${taskId} (${task?.tool})`);
  });
});

## Parallel vs Sequential Execution

Let's measure the difference:

In [None]:
// Simulated task durations (ms)
const taskDurations: Record<string, number> = {
  "read_config": 150,
  "fetch_github": 800,
  "search_slack": 600,
  "combine_results": 100,
  "create_report": 400,
};

// Sequential execution time
const sequentialTime = Object.values(taskDurations).reduce((a, b) => a + b, 0);

// Parallel execution time (sum of max duration per layer)
let parallelTime = 0;
for (const layer of layers) {
  const maxInLayer = Math.max(...layer.map((id) => taskDurations[id]));
  parallelTime += maxInLayer;
}

console.log("Execution Time Comparison\n" + "=".repeat(50));
console.log();
console.log("Sequential (traditional MCP):");
for (const [task, duration] of Object.entries(taskDurations)) {
  console.log(`  ${task.padEnd(20)} ${duration}ms`);
}
console.log(`  ${"".padEnd(20)} ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ`);
console.log(`  ${"TOTAL".padEnd(20)} ${sequentialTime}ms`);

console.log();
console.log("Parallel (with DAG):");
layers.forEach((layer, i) => {
  const durations = layer.map((id) => taskDurations[id]);
  const maxDuration = Math.max(...durations);
  const layerStr = layer.join(" + ");
  console.log(`  Layer ${i + 1}: ${layerStr.padEnd(35)} max=${maxDuration}ms`);
});
console.log(`  ${"".padEnd(43)} ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ`);
console.log(`  ${"TOTAL".padEnd(43)} ${parallelTime}ms`);

console.log();
const speedup = (sequentialTime / parallelTime).toFixed(1);
const saved = sequentialTime - parallelTime;
console.log(`üöÄ Speedup: ${speedup}x (saved ${saved}ms)`);

## Real Execution with ParallelExecutor

Let's run an actual parallel workflow:

In [None]:
// Import the parallel executor
const { ParallelExecutor } = await import("../../src/dag/executor.ts");

// Create mock task functions (simulate network latency)
const mockTasks = {
  read_config: async () => {
    await new Promise((r) => setTimeout(r, 150));
    return { repo: "casys/mcp-gateway", project: "casys" };
  },
  fetch_github: async (config: any) => {
    await new Promise((r) => setTimeout(r, 800));
    return [{ id: 1, title: "Bug fix" }, { id: 2, title: "Feature request" }];
  },
  search_slack: async (config: any) => {
    await new Promise((r) => setTimeout(r, 600));
    return [{ channel: "#dev", message: "Discussed feature X" }];
  },
  combine_results: async (github: any, slack: any) => {
    await new Promise((r) => setTimeout(r, 100));
    return { issues: github, discussions: slack };
  },
  create_report: async (data: any) => {
    await new Promise((r) => setTimeout(r, 400));
    return { pageId: "abc123", url: "https://notion.so/report" };
  },
};

// Execute with timing
console.log("Executing DAG with ParallelExecutor...\n");

const executor = new ParallelExecutor(dag);
const startTime = Date.now();

// Stream events as tasks complete
executor.on("taskStart", (taskId: string) => {
  console.log(`  ‚ñ∂ Started: ${taskId}`);
});

executor.on("taskComplete", (taskId: string, result: any) => {
  const elapsed = Date.now() - startTime;
  console.log(`  ‚úì Completed: ${taskId} (${elapsed}ms)`);
});

const results = await executor.execute(mockTasks);
const totalTime = Date.now() - startTime;

console.log();
console.log(`Total execution time: ${totalTime}ms`);
console.log(`Expected sequential:  ${sequentialTime}ms`);
console.log(`Actual speedup:       ${(sequentialTime / totalTime).toFixed(1)}x`);

## SSE Streaming

Results stream to the client as they complete via Server-Sent Events:

```
Timeline:

0ms      150ms    750ms    950ms    1050ms   1450ms
‚îÇ        ‚îÇ        ‚îÇ        ‚îÇ        ‚îÇ        ‚îÇ
‚ñº        ‚ñº        ‚ñº        ‚ñº        ‚ñº        ‚ñº
[config] [slack]  [github] [combine][report] [done]
         ‚Üë        ‚Üë
         ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ Running in PARALLEL

The client sees:
  event: task_complete
  data: {"task": "read_config", "result": {...}}
  
  event: task_complete
  data: {"task": "search_slack", "result": [...]}
  
  ... (as they finish, not all at once)
```

---

## Quick Check

Before moving on:

1. **What makes tasks parallelizable?**
   - They don't depend on each other's outputs

2. **What is a layer in DAG execution?**
   - A group of tasks that can run simultaneously

3. **Why use SSE streaming?**
   - To show results as they complete, not wait for all to finish

---

**Next:** [04-sandbox-security.ipynb](./04-sandbox-security.ipynb) - Execute code safely with
resource limits