Skip to content

dongdongunique/Tree-AutoResearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Tree-AutoResearch

An AutoResearch framework for autonomous ML/DL tree-based iterative research with Git worktree support.

License: MIT

Overview

Tree-AutoResearch is a framework that enables AI agents to conduct autonomous ML/DL research. It manages experiment iterations in a tree structure where each node represents a research step with commands, results, and reproducible git state.

Inspired by karpathy/autoresearch — the pioneering project that showed how AI agents can autonomously experiment, iterate, and improve models overnight. Tree-AutoResearch extends this vision with a tree-based approach for more systematic exploration.

What's Different from karpathy/autoresearch?

Feature karpathy/autoresearch Tree-AutoResearch
Structure Linear (one branch at a time) Tree (parent-child relationships)
Exploration Sequential Parallel branches via Git worktrees

Key Innovation: Instead of a single linear progression, Tree-AutoResearch explores multiple research directions simultaneously. Failed branches can be pruned while promising ones are expanded — just like a real research process.

Who Runs the Experiments?

The framework supports two modes depending on project complexity:

Simple Projects: Agent Runs Everything

For fast, local experiments (small models, quick training, single GPU):

Agent can run experiments directly:
  • cd .worktrees/<node_id>
  • python train.py
  • Collect results
  • Record and continue

The AutoResearch loop runs autonomously without user intervention.

Complicated Projects: User-in-the-Loop

For large models, distributed training, expensive compute:

┌─────────────────────────────────────────────────────────────────────────────────┐
│  HUMAN-IN-THE-LOOP WORKFLOW                                                      │
├─────────────────────────────────────────────────────────────────────────────────┤
│                                                                                 │
│  User: Manages the compute resources                                            │
│    • Submits training jobs to cluster                                           │
│    • Monitors GPU usage                                                         │
│    • Handles distributed training coordination                                  │
│    • Collects results from logs                                                 │
│                                                                                 │
│  Agent: Proposes and records                                                     │
│    • Analyzes tree state                                                        │
│    • Proposes next experiments                                                  │
│    • Creates nodes and worktrees                                                │
│    • WAITS for user to provide results                                          │
│    • Records results and makes decisions                                        │
│                                                                                 │
│  Workflow:                                                                      │
│    1. Agent proposes experiment → User runs it                                  │
│    2. User provides results → Agent records and decides next step              │
│    3. Agent proposes next experiment → Repeat                                   │
│                                                                                 │
└─────────────────────────────────────────────────────────────────────────────────┘

For complicated projects, the agent should wait for the user to provide results. The loop pauses at the "EXECUTE" step until the user returns with results.

Security: Protect Your Validation Logic

Important: Design your validation/test cases carefully and do NOT give the code agent write access to these files.

┌─────────────────────────────────────────────────────────────────────────────────┐
│  SECURITY BEST PRACTICES                                                        │
├─────────────────────────────────────────────────────────────────────────────────┤
│                                                                                 │
│  ✓ DO:                                                                          │
│    • Keep validation scripts in a separate, protected directory                 │
│    • Use read-only permissions for test/validation data                         │
│    • Version control your validation logic                                      │
│    • Have the agent call validation functions, not modify them                  │
│                                                                                 │
│  ✗ DON'T:                                                                       │
│    • Let the agent modify evaluation metrics                                    │
│    • Give write access to ground truth data                                     │
│    • Allow the agent to change how results are calculated                       │
│                                                                                 │
│  Example structure:                                                             │
│    project/                                                                     │
│      train.py           ← Agent can modify                                      │
│      config.yaml        ← Agent can modify                                      │
│      validation/        ← READ-ONLY (protected)                                 │
│        evaluate.py      ← Agent can CALL, not modify                            │
│        test_data/       ← Agent can READ, not write                             │
│                                                                                 │
└─────────────────────────────────────────────────────────────────────────────────┘

This prevents accidental (or intentional) "gaming" of the validation metric by modifying how results are calculated.

Installation

Project Structure

tree-autoresearch/
├── README.md
├── tree-autoresearch/           # The skill module
│   ├── SKILL.md                 # Agent instructions
│   ├── Tree-AutoResearch        # Bash wrapper
│   └── Tree_AutoResearch.py     # Python implementation
└── ITERATION_TREE.yaml          # Generated when you start

For Claude Code

# Clone the repository
git clone https://github.com/dongdongunique/Tree-AutoResearch.git
cd Tree-AutoResearch

# The skill is already at the root level - just point your agent to SKILL.md
# Or copy to your project:
cp -r tree-autoresearch/ your-project/

For OpenCode

# Clone the repository
git clone https://github.com/dongdongunique/Tree-AutoResearch.git
cd Tree-AutoResearch

# Copy to your project
cp -r tree-autoresearch/ your-project/

Dependencies

pip install pyyaml

Tree-Based AutoResearch

This framework treats research as a tree traversal problem:

                    Root (baseline)
                         │
         ┌───────────────┼───────────────┐
         │               │               │
    Learning Rate    Architecture    Data Augmentation
      Branch          Branch           Branch
    (lr=0.01)      (ResNet→ViT)    (weak→strong)
         │               │               │
    ┌────┴────┐     (pruned)      ┌────┴────┐
    │         │                   │         │
(lr=0.001) (lr=0.1)           (flip)   (mixup)
    │                         (good)    (best)
    │
(lr=0.0001) BEST

Key Principles:

  • Autonomous: System proposes next experiments based on tree state and results
  • Systematic: Tree structure ensures comprehensive coverage of search space
  • Adaptive: Prune bad branches, expand promising ones
  • Reproducible: Git worktrees + commits = exact code state for each iteration
  • Multi-objective: Optimize for accuracy, latency, model size simultaneously

The AutoResearch Loop (NEVER STOP)

Once the experiment loop has begun, the agent should continue indefinitely until manually stopped. The human might be asleep and expects autonomous research.

LOOP FOREVER:

1. ANALYZE: Look at current tree state
   - Which nodes are completed?
   - Which branches show promise?
   - Which branches should be pruned?

2. PROPOSE: Generate next experiment(s)
   - If improved → create variations
   - If degraded → prune and try alternative
   - If stuck → explore orthogonal dimensions

3. CREATE: Add new node(s) to the tree
   - add_node(parent, desc, cmd, worktree=True)

4. EXECUTE: Run experiment in worktree
   - For simple experiments: agent can run directly
   - For complicated projects: WAIT for user to provide results
   - cd .worktrees/<node_id>
   - git commit changes
   - Run training command
   - Collect results

5. RECORD: Update node with results
   - validate_results() → check format
   - update_node() → save results
   - sync_commit() → record git state

6. DECIDE: Prune or expand based on results
   - If better: expand this branch
   - If worse: prune, try sibling

REPEAT FOREVER

CLI Commands

Command Purpose
list List all nodes
get --id <id> Get node details
add Record new experiment
validate --results Validate results format
update --id --results Record results
sync --id Record git commit
worktree --id Get worktree info + command
visualize Show tree
compare --id --id2 Compare two nodes
best-path Show best performing path
prune --id --reason Mark as pruned
delete --id Delete node

Results Format

Single Objective

primary_metric: "accuracy"
primary_value: 0.92
direction: "maximize"  # or "minimize"

Multi-Objective

objectives:
  - metric: "accuracy"
    value: 0.92
    direction: "maximize"
  - metric: "latency_ms"
    value: 15.3
    direction: "minimize"
primary_index: 0  # which objective is primary

Node Structure

node_id: "iter1_baseline"
parent_id: null
description: "Baseline experiment"
command: "python train.py"
working_dir: "."
worktree:
  path: ".worktrees/iter1_baseline"
  branch: "iter/iter1_baseline"
  commit: "a1b2c3d4"  # Exact code state
status: "completed"
results:
  primary_metric: "accuracy"
  primary_value: 0.92
  direction: "maximize"
children: []

sync_commit: Reproducibility Through Git

Every node records the exact git commit that produced its results:

# After making changes in worktree
cd .worktrees/iter1_baseline
git add -A && git commit -m "tune learning rate"

# Record the commit
Tree-AutoResearch ITERATION_TREE.yaml sync --id iter1_baseline
# Output: Synced: iter1_baseline -> a1b2c3d

# Later: reproduce this exact experiment
git checkout a1b2c3d

Use Cases

  • Hyperparameter Search: Systematically explore learning rates, batch sizes, architectures
  • Model Architecture: Compare ResNet vs ViT vs ConvNeXt variants
  • Data Augmentation: Test different augmentation strategies
  • Training Recipes: Compare optimizers, schedulers, regularization
  • Multi-objective Optimization: Balance accuracy vs latency vs model size

Requirements

  • Python 3.6+
  • PyYAML (pip install pyyaml)
  • Git (for worktree features)

Acknowledgments

This project tries to improve the karpathy/autoresearch — the vision of AI agents autonomously conducting ML research overnight. Tree-AutoResearch extends this vision with a tree-based structure for more systematic and comprehensive research exploration.

License

MIT License - see LICENSE for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors