Data Recipe for Causal Data Generation

This project constructs causal graphs from object attributes and affordances, and then generates task, emergency, and recovery text based on those graphs. It supports automatic object discovery, multiple LLM-assisted causal graph construction strategies, and optional visualization.

1. Quick Start

Environment Setup

conda create -n data-recipe python=3.11
conda activate data-recipe
pip install -r requirements.txt

Common dependencies include:

requests
networkx
matplotlib (optional, for visualization)

2. End-to-End Agent (Recommended)

The end-to-end agent automatically performs:

Object discovery → causal graph construction → text generation

This is the recommended workflow if you do not want to manually specify objects.

2.1 Configure LLM API (Qwen / DashScope)

If you use a cloud LLM, configure the following environment variables:

macOS / Linux (bash)

export QWEN_API_KEY="<your-key>"
export QWEN_API_BASE="https://dashscope.aliyuncs.com/compatible-mode/v1"  # optional
export QWEN_MODEL="qwen2.5-7b-instruct"                                   # optional

Windows (PowerShell)

$env:QWEN_API_KEY="<your-key>"
$env:QWEN_API_BASE="https://dashscope.aliyuncs.com/compatible-mode/v1"    # optional
$env:QWEN_MODEL="qwen2.5-7b-instruct"                                     # optional

Important

run_agent.sh and pipeline.py read API credentials only from environment variables.
Do NOT hardcode API keys in scripts or source code.

2.2 OpenAI-Compatible API Mapping (Automatic)

If USE_API=true is enabled, the system automatically maps available credentials to a unified interface:

Unified Variable	Source Priority
`CAUSAL_LLM_API_KEY`	`OPENAI_API_KEY` / `AZURE_OPENAI_API_KEY` / `CAUSAL_LLM_API_KEY` / `QWEN_API_KEY` / `DASHSCOPE_API_KEY`
`CAUSAL_LLM_API_BASE`	`CAUSAL_LLM_API_BASE` / `QWEN_API_BASE` / default DashScope endpoint
`CAUSAL_LLM_MODEL`	`CAUSAL_LLM_MODEL` / `QWEN_MODEL` / `qwen-max-2025-01-25`

➡️ Setting only QWEN_API_KEY is sufficient. The pipeline falls back to local or non-API modes only if no valid key is found.

2.3 Configure `run_agent.sh`

Key variables (auto-discovery enabled by default):

Object Discovery
- AUTO_DISCOVER=true
- OBJECT_PROMPT="your task or scenario description"
- DISCOVER_NUM_OBJECTS
- DISCOVER_TEMPERATURE
- DISCOVER_TOP_P
Optional Manual Inputs
- OBJECTS_TEXT_FILE — one object hint per line
- OBJECTS_JSON — fully structured object definitions
Causal Graph Strategy
- USE_QWEN_API_GRAPH — one-shot graph generation
- MICRO_FIRST — micro-graph incremental stitching
- USE_API — use OpenAI-compatible API interface

2.4 Run the Agent

bash run_agent.sh

The pipeline performs:

Automatic object discovery
Quality filtering and deduplication
Causal graph construction (Qwen one-shot / micro-graph stitching / legacy mode)
Task case generation

Output (default):

output/task_case.json

3. Visualization

You can visualize the generated causal graph as a DOT or PNG file:

python visualize_logic_graph.py \
  --task_json output/task_case.json \
  --dot output/logic_graph.dot \
  --png output/logic_graph.png

4. Batch Run Multiple Scenarios (New)

Use batch_run.py to process a scenarios file (JSONL or JSON array) and generate a TaskCase JSON per scenario. Filenames are auto-generated from the scenario/task name for clarity.

Scenario file format (recommended: JSONL, one scenario per line)

{"task_id":"T001","task_name":"Kitchen prep","object_prompt":"Prepare a sandwich and drink","discover_num_objects":4,"use_qwen_api_graph":true,"micro_first":true,"use_api":true}
{"task_id":"T002","task_name":"Office cleaning","object_prompt":"Clean a small office space","discover_num_objects":3,"micro_first":true,"use_api":true}

Also supported: JSON array [{...},{...}].

Field hints (all optional unless noted):

task_id / task_name: used in TaskCase metadata and output filename slug.
object_prompt: description for LLM discovery (if objects/objects_text are absent).
discover_num_objects, discover_temperature, discover_top_p: sampling for object discovery.
use_qwen_api_graph, micro_first, use_api: graph strategy and API toggle (Qwen keys auto-mapped to OpenAI-compatible envs).
objects: structured objects list (category/name/attributes/affordances/logic_graph) to bypass LLM discovery.
objects_text: plain text list; each entry becomes an object category/name.

Run example

python batch_run.py \
  --scenarios_file scenarios.jsonl \
  --output_dir output/batch \
  --use_api \
  --micro_first

Output: one JSON per scenario, named slug(task_name)_taskId.json (e.g., kitchen_prep_T001.json) in output/batch (override with --output_dir).

4. One-Shot Causal Graph via Qwen API

You can directly let Qwen generate the entire causal graph in a single API call, then run downstream text generation.

Required Environment Variables

QWEN_API_KEY or DASHSCOPE_API_KEY
QWEN_API_BASE (optional)
QWEN_MODEL (optional)

Run

python pipeline.py \
  --use_qwen_api_graph \
  --qwen_temperature 0.4 \
  --qwen_top_p 0.9 \
  --task_id T001 \
  --task_name DemoTask

Notes

Existing edges defined in each CausalObject are preserved.
New edges are merged and filtered to known nodes only.
If the API call fails or returns an empty graph, the system automatically falls back to other enabled strategies.
Optional post-verification by Qwen can add cross-object edges for improved global consistency.

5. Micro-Graph Stitching (Multi-Step LLM Assistance)

This mode builds a global DAG by stitching together small, high-confidence causal subgraphs.

python pipeline.py \
  --micro_first \
  --micro_group_size 3 \
  --micro_groups_per_object 6 \
  --micro_min_confidence 0.6

Each micro-graph contains 2–3 nodes
Micro-graphs are cached at:
```
database/micro_graphs.json
```

6. Legacy Full Causal Discovery

To use the original full discovery algorithm:

Do not enable --micro_first
Do not enable --use_qwen_api_graph

python pipeline.py --use_api

This mode is driven by either a local LLM or an API backend, depending on configuration.

7. Output Summary

Artifact	Description
`task_case.json`	Final task/emergency/recovery representation
`logic_graph.dot`	Causal graph in DOT format
`logic_graph.png`	Rendered causal graph

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
output		output
prompts		prompts
src		src
README.md		README.md
batch_run.py		batch_run.py
batch_visualize.ps1		batch_visualize.ps1
generate.sh		generate.sh
pipeline.py		pipeline.py
pipeline_qwen_api_only.py		pipeline_qwen_api_only.py
requirements.txt		requirements.txt
run_agent.sh		run_agent.sh
scenarios_robot_recording.jsonl		scenarios_robot_recording.jsonl
visualize_logic_graph.py		visualize_logic_graph.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Recipe for Causal Data Generation

1. Quick Start

Environment Setup

2. End-to-End Agent (Recommended)

2.1 Configure LLM API (Qwen / DashScope)

macOS / Linux (bash)

Windows (PowerShell)

2.2 OpenAI-Compatible API Mapping (Automatic)

2.3 Configure `run_agent.sh`

2.4 Run the Agent

3. Visualization

4. Batch Run Multiple Scenarios (New)

Scenario file format (recommended: JSONL, one scenario per line)

Run example

4. One-Shot Causal Graph via Qwen API

Required Environment Variables

Run

Notes

5. Micro-Graph Stitching (Multi-Step LLM Assistance)

6. Legacy Full Causal Discovery

7. Output Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Recipe for Causal Data Generation

1. Quick Start

Environment Setup

2. End-to-End Agent (Recommended)

2.1 Configure LLM API (Qwen / DashScope)

macOS / Linux (bash)

Windows (PowerShell)

2.2 OpenAI-Compatible API Mapping (Automatic)

2.3 Configure run_agent.sh

2.4 Run the Agent

3. Visualization

4. Batch Run Multiple Scenarios (New)

Scenario file format (recommended: JSONL, one scenario per line)

Run example

4. One-Shot Causal Graph via Qwen API

Required Environment Variables

Run

Notes

5. Micro-Graph Stitching (Multi-Step LLM Assistance)

6. Legacy Full Causal Discovery

7. Output Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2.3 Configure `run_agent.sh`

Packages