# Data-Juicer-Agents v0.1：Recipe 迭代示例

本 notebook 演示多轮工作流：

1. 生成首版 plan
2. 执行（dry-run 或 run）
3. 从 run 中获取上下文
4. 基于 `--base-plan` + `--from-run-id` 修订 plan
5. 再执行并查看 trace 聚合
6. 可选：运行离线 evaluate

## 0. 环境准备

- 在仓库根目录执行本 notebook。
- 需已安装 `djx`（`uv pip install -e .`）。
- 若使用 LLM 规划，先设置 `DASHSCOPE_API_KEY`。

In [None]:
import json
import os
import pathlib
import subprocess

ROOT = pathlib.Path.cwd()
print("cwd:", ROOT)

# 可选：设置 API Key（建议改为你自己的环境变量管理方式）
# os.environ["DASHSCOPE_API_KEY"] = "<your-key>"

## 1. 生成首版计划

In [None]:
cmd = [
    "djx", "plan", "clean rag corpus for retrieval",
    "--dataset", "data/demo-dataset.jsonl",
    "--export", "tmp.jsonl",
    "--output", "tmp.yaml",
]
print("$", " ".join(cmd))
subprocess.run(cmd, check=True)

print("\nGenerated plan:\n")
print(pathlib.Path("tmp.yaml").read_text(encoding="utf-8"))

## 2. 执行计划并读取 run_id

In [None]:
apply_cmd = ["djx", "apply", "--plan", "tmp.yaml", "--yes", "--dry-run"]
print("$", " ".join(apply_cmd))
subprocess.run(apply_cmd, check=True)

runs_file = pathlib.Path(".djx/runs.jsonl")
last_run = json.loads(runs_file.read_text(encoding="utf-8").strip().splitlines()[-1])
run_id = last_run["run_id"]
plan_id = last_run["plan_id"]
print("run_id:", run_id)
print("plan_id:", plan_id)

## 3. 查看 trace（单次 + 聚合）

In [None]:
subprocess.run(["djx", "trace", run_id], check=True)
subprocess.run(["djx", "trace", "--stats", "--plan-id", plan_id], check=True)
subprocess.run(["djx", "trace", "--plan-id", plan_id, "--limit", "10"], check=True)

## 4. 基于 base plan + run 上下文修订计划

注意：这里不单独引入 feedback 参数，直接复用新的 intent。

In [None]:
rev_cmd = [
    "djx", "plan", "根据上轮结果收紧去重策略",
    "--base-plan", "tmp.yaml",
    "--from-run-id", run_id,
    "--output", "tmp-v2.yaml",
]
print("$", " ".join(rev_cmd))
subprocess.run(rev_cmd, check=True)

print("\nRevised plan:\n")
print(pathlib.Path("tmp-v2.yaml").read_text(encoding="utf-8"))

## 5. 执行修订版并观察 plan 链路

In [None]:
subprocess.run(["djx", "apply", "--plan", "tmp-v2.yaml", "--yes", "--dry-run"], check=True)

runs = [json.loads(x) for x in pathlib.Path(".djx/runs.jsonl").read_text(encoding="utf-8").strip().splitlines()]
new_run = runs[-1]
new_run_id = new_run["run_id"]
new_plan_id = new_run["plan_id"]
print("new_run_id:", new_run_id)
print("new_plan_id:", new_plan_id)

subprocess.run(["djx", "trace", "--stats", "--plan-id", new_plan_id], check=True)

## 6. 可选：离线评测

仅规划评测：

In [None]:
eval_cmd = [
    "djx", "evaluate",
    "--cases", "eval_cases/v0.1_baseline.jsonl",
    "--execute", "none",
    "--no-llm",
]
print("$", " ".join(eval_cmd))
subprocess.run(eval_cmd, check=True)

## 7. 清理（可选）

根据需要删除临时文件：`tmp.yaml`、`tmp-v2.yaml`、`tmp.jsonl`。