Tree-AutoResearch

An AutoResearch framework for autonomous ML/DL tree-based iterative research with Git worktree support.

Overview

Tree-AutoResearch is a framework that enables AI agents to conduct autonomous ML/DL research. It manages experiment iterations in a tree structure where each node represents a research step with commands, results, and reproducible git state.

Inspired by karpathy/autoresearch — the pioneering project that showed how AI agents can autonomously experiment, iterate, and improve models overnight. Tree-AutoResearch extends this vision with a tree-based approach for more systematic exploration.

What's Different from karpathy/autoresearch?

Feature	karpathy/autoresearch	Tree-AutoResearch
Structure	Linear (one branch at a time)	Tree (parent-child relationships)
Exploration	Sequential	Parallel branches via Git worktrees

Key Innovation: Instead of a single linear progression, Tree-AutoResearch explores multiple research directions simultaneously. Failed branches can be pruned while promising ones are expanded — just like a real research process.

Who Runs the Experiments?

The framework supports two modes depending on project complexity:

Simple Projects: Agent Runs Everything

For fast, local experiments (small models, quick training, single GPU):

Agent can run experiments directly:
  • cd .worktrees/<node_id>
  • python train.py
  • Collect results
  • Record and continue

The AutoResearch loop runs autonomously without user intervention.

Complicated Projects: User-in-the-Loop

For large models, distributed training, expensive compute:

┌─────────────────────────────────────────────────────────────────────────────────┐
│  HUMAN-IN-THE-LOOP WORKFLOW                                                      │
├─────────────────────────────────────────────────────────────────────────────────┤
│                                                                                 │
│  User: Manages the compute resources                                            │
│    • Submits training jobs to cluster                                           │
│    • Monitors GPU usage                                                         │
│    • Handles distributed training coordination                                  │
│    • Collects results from logs                                                 │
│                                                                                 │
│  Agent: Proposes and records                                                     │
│    • Analyzes tree state                                                        │
│    • Proposes next experiments                                                  │
│    • Creates nodes and worktrees                                                │
│    • WAITS for user to provide results                                          │
│    • Records results and makes decisions                                        │
│                                                                                 │
│  Workflow:                                                                      │
│    1. Agent proposes experiment → User runs it                                  │
│    2. User provides results → Agent records and decides next step              │
│    3. Agent proposes next experiment → Repeat                                   │
│                                                                                 │
└─────────────────────────────────────────────────────────────────────────────────┘

For complicated projects, the agent should wait for the user to provide results. The loop pauses at the "EXECUTE" step until the user returns with results.

Security: Protect Your Validation Logic

Important: Design your validation/test cases carefully and do NOT give the code agent write access to these files.

┌─────────────────────────────────────────────────────────────────────────────────┐
│  SECURITY BEST PRACTICES                                                        │
├─────────────────────────────────────────────────────────────────────────────────┤
│                                                                                 │
│  ✓ DO:                                                                          │
│    • Keep validation scripts in a separate, protected directory                 │
│    • Use read-only permissions for test/validation data                         │
│    • Version control your validation logic                                      │
│    • Have the agent call validation functions, not modify them                  │
│                                                                                 │
│  ✗ DON'T:                                                                       │
│    • Let the agent modify evaluation metrics                                    │
│    • Give write access to ground truth data                                     │
│    • Allow the agent to change how results are calculated                       │
│                                                                                 │
│  Example structure:                                                             │
│    project/                                                                     │
│      train.py           ← Agent can modify                                      │
│      config.yaml        ← Agent can modify                                      │
│      validation/        ← READ-ONLY (protected)                                 │
│        evaluate.py      ← Agent can CALL, not modify                            │
│        test_data/       ← Agent can READ, not write                             │
│                                                                                 │
└─────────────────────────────────────────────────────────────────────────────────┘

This prevents accidental (or intentional) "gaming" of the validation metric by modifying how results are calculated.

Installation

Project Structure

tree-autoresearch/
├── README.md
├── tree-autoresearch/           # The skill module
│   ├── SKILL.md                 # Agent instructions
│   ├── Tree-AutoResearch        # Bash wrapper
│   └── Tree_AutoResearch.py     # Python implementation
└── ITERATION_TREE.yaml          # Generated when you start

For Claude Code

# Clone the repository
git clone https://github.com/dongdongunique/Tree-AutoResearch.git
cd Tree-AutoResearch

# The skill is already at the root level - just point your agent to SKILL.md
# Or copy to your project:
cp -r tree-autoresearch/ your-project/

For OpenCode

# Clone the repository
git clone https://github.com/dongdongunique/Tree-AutoResearch.git
cd Tree-AutoResearch

# Copy to your project
cp -r tree-autoresearch/ your-project/

Dependencies

pip install pyyaml

Tree-Based AutoResearch

This framework treats research as a tree traversal problem:

                    Root (baseline)
                         │
         ┌───────────────┼───────────────┐
         │               │               │
    Learning Rate    Architecture    Data Augmentation
      Branch          Branch           Branch
    (lr=0.01)      (ResNet→ViT)    (weak→strong)
         │               │               │
    ┌────┴────┐     (pruned)      ┌────┴────┐
    │         │                   │         │
(lr=0.001) (lr=0.1)           (flip)   (mixup)
    │                         (good)    (best)
    │
(lr=0.0001) BEST

Key Principles:

Autonomous: System proposes next experiments based on tree state and results
Systematic: Tree structure ensures comprehensive coverage of search space
Adaptive: Prune bad branches, expand promising ones
Reproducible: Git worktrees + commits = exact code state for each iteration
Multi-objective: Optimize for accuracy, latency, model size simultaneously

The AutoResearch Loop (NEVER STOP)

Once the experiment loop has begun, the agent should continue indefinitely until manually stopped. The human might be asleep and expects autonomous research.

LOOP FOREVER:

1. ANALYZE: Look at current tree state
   - Which nodes are completed?
   - Which branches show promise?
   - Which branches should be pruned?

2. PROPOSE: Generate next experiment(s)
   - If improved → create variations
   - If degraded → prune and try alternative
   - If stuck → explore orthogonal dimensions

3. CREATE: Add new node(s) to the tree
   - add_node(parent, desc, cmd, worktree=True)

4. EXECUTE: Run experiment in worktree
   - For simple experiments: agent can run directly
   - For complicated projects: WAIT for user to provide results
   - cd .worktrees/<node_id>
   - git commit changes
   - Run training command
   - Collect results

5. RECORD: Update node with results
   - validate_results() → check format
   - update_node() → save results
   - sync_commit() → record git state

6. DECIDE: Prune or expand based on results
   - If better: expand this branch
   - If worse: prune, try sibling

REPEAT FOREVER

CLI Commands

Command	Purpose
`list`	List all nodes
`get --id <id>`	Get node details
`add`	Record new experiment
`validate --results`	Validate results format
`update --id --results`	Record results
`sync --id`	Record git commit
`worktree --id`	Get worktree info + command
`visualize`	Show tree
`compare --id --id2`	Compare two nodes
`best-path`	Show best performing path
`prune --id --reason`	Mark as pruned
`delete --id`	Delete node

Results Format

Single Objective

primary_metric: "accuracy"
primary_value: 0.92
direction: "maximize"  # or "minimize"

Multi-Objective

objectives:
  - metric: "accuracy"
    value: 0.92
    direction: "maximize"
  - metric: "latency_ms"
    value: 15.3
    direction: "minimize"
primary_index: 0  # which objective is primary

Node Structure

node_id: "iter1_baseline"
parent_id: null
description: "Baseline experiment"
command: "python train.py"
working_dir: "."
worktree:
  path: ".worktrees/iter1_baseline"
  branch: "iter/iter1_baseline"
  commit: "a1b2c3d4"  # Exact code state
status: "completed"
results:
  primary_metric: "accuracy"
  primary_value: 0.92
  direction: "maximize"
children: []

sync_commit: Reproducibility Through Git

Every node records the exact git commit that produced its results:

# After making changes in worktree
cd .worktrees/iter1_baseline
git add -A && git commit -m "tune learning rate"

# Record the commit
Tree-AutoResearch ITERATION_TREE.yaml sync --id iter1_baseline
# Output: Synced: iter1_baseline -> a1b2c3d

# Later: reproduce this exact experiment
git checkout a1b2c3d

Use Cases

Hyperparameter Search: Systematically explore learning rates, batch sizes, architectures
Model Architecture: Compare ResNet vs ViT vs ConvNeXt variants
Data Augmentation: Test different augmentation strategies
Training Recipes: Compare optimizers, schedulers, regularization
Multi-objective Optimization: Balance accuracy vs latency vs model size

Requirements

Python 3.6+
PyYAML (pip install pyyaml)
Git (for worktree features)

Acknowledgments

This project tries to improve the karpathy/autoresearch — the vision of AI agents autonomously conducting ML research overnight. Tree-AutoResearch extends this vision with a tree-based structure for more systematic and comprehensive research exploration.

License

MIT License - see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tree-AutoResearch

Overview

What's Different from karpathy/autoresearch?

Who Runs the Experiments?

Simple Projects: Agent Runs Everything

Complicated Projects: User-in-the-Loop

Security: Protect Your Validation Logic

Installation

Project Structure

For Claude Code

For OpenCode

Dependencies

Tree-Based AutoResearch

The AutoResearch Loop (NEVER STOP)

CLI Commands

Results Format

Single Objective

Multi-Objective

Node Structure

sync_commit: Reproducibility Through Git

Use Cases

Requirements

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
tree-autoresearch		tree-autoresearch
ITERATION_TREE.yaml		ITERATION_TREE.yaml
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Tree-AutoResearch

Overview

What's Different from karpathy/autoresearch?

Who Runs the Experiments?

Simple Projects: Agent Runs Everything

Complicated Projects: User-in-the-Loop

Security: Protect Your Validation Logic

Installation

Project Structure

For Claude Code

For OpenCode

Dependencies

Tree-Based AutoResearch

The AutoResearch Loop (NEVER STOP)

CLI Commands

Results Format

Single Objective

Multi-Objective

Node Structure

sync_commit: Reproducibility Through Git

Use Cases

Requirements

Acknowledgments

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages