An AutoResearch framework for autonomous ML/DL tree-based iterative research with Git worktree support.
Tree-AutoResearch is a framework that enables AI agents to conduct autonomous ML/DL research. It manages experiment iterations in a tree structure where each node represents a research step with commands, results, and reproducible git state.
Inspired by karpathy/autoresearch — the pioneering project that showed how AI agents can autonomously experiment, iterate, and improve models overnight. Tree-AutoResearch extends this vision with a tree-based approach for more systematic exploration.
| Feature | karpathy/autoresearch | Tree-AutoResearch |
|---|---|---|
| Structure | Linear (one branch at a time) | Tree (parent-child relationships) |
| Exploration | Sequential | Parallel branches via Git worktrees |
Key Innovation: Instead of a single linear progression, Tree-AutoResearch explores multiple research directions simultaneously. Failed branches can be pruned while promising ones are expanded — just like a real research process.
The framework supports two modes depending on project complexity:
For fast, local experiments (small models, quick training, single GPU):
Agent can run experiments directly:
• cd .worktrees/<node_id>
• python train.py
• Collect results
• Record and continue
The AutoResearch loop runs autonomously without user intervention.
For large models, distributed training, expensive compute:
┌─────────────────────────────────────────────────────────────────────────────────┐
│ HUMAN-IN-THE-LOOP WORKFLOW │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ User: Manages the compute resources │
│ • Submits training jobs to cluster │
│ • Monitors GPU usage │
│ • Handles distributed training coordination │
│ • Collects results from logs │
│ │
│ Agent: Proposes and records │
│ • Analyzes tree state │
│ • Proposes next experiments │
│ • Creates nodes and worktrees │
│ • WAITS for user to provide results │
│ • Records results and makes decisions │
│ │
│ Workflow: │
│ 1. Agent proposes experiment → User runs it │
│ 2. User provides results → Agent records and decides next step │
│ 3. Agent proposes next experiment → Repeat │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
For complicated projects, the agent should wait for the user to provide results. The loop pauses at the "EXECUTE" step until the user returns with results.
Important: Design your validation/test cases carefully and do NOT give the code agent write access to these files.
┌─────────────────────────────────────────────────────────────────────────────────┐
│ SECURITY BEST PRACTICES │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ✓ DO: │
│ • Keep validation scripts in a separate, protected directory │
│ • Use read-only permissions for test/validation data │
│ • Version control your validation logic │
│ • Have the agent call validation functions, not modify them │
│ │
│ ✗ DON'T: │
│ • Let the agent modify evaluation metrics │
│ • Give write access to ground truth data │
│ • Allow the agent to change how results are calculated │
│ │
│ Example structure: │
│ project/ │
│ train.py ← Agent can modify │
│ config.yaml ← Agent can modify │
│ validation/ ← READ-ONLY (protected) │
│ evaluate.py ← Agent can CALL, not modify │
│ test_data/ ← Agent can READ, not write │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
This prevents accidental (or intentional) "gaming" of the validation metric by modifying how results are calculated.
tree-autoresearch/
├── README.md
├── tree-autoresearch/ # The skill module
│ ├── SKILL.md # Agent instructions
│ ├── Tree-AutoResearch # Bash wrapper
│ └── Tree_AutoResearch.py # Python implementation
└── ITERATION_TREE.yaml # Generated when you start
# Clone the repository
git clone https://github.com/dongdongunique/Tree-AutoResearch.git
cd Tree-AutoResearch
# The skill is already at the root level - just point your agent to SKILL.md
# Or copy to your project:
cp -r tree-autoresearch/ your-project/# Clone the repository
git clone https://github.com/dongdongunique/Tree-AutoResearch.git
cd Tree-AutoResearch
# Copy to your project
cp -r tree-autoresearch/ your-project/pip install pyyamlThis framework treats research as a tree traversal problem:
Root (baseline)
│
┌───────────────┼───────────────┐
│ │ │
Learning Rate Architecture Data Augmentation
Branch Branch Branch
(lr=0.01) (ResNet→ViT) (weak→strong)
│ │ │
┌────┴────┐ (pruned) ┌────┴────┐
│ │ │ │
(lr=0.001) (lr=0.1) (flip) (mixup)
│ (good) (best)
│
(lr=0.0001) BEST
Key Principles:
- Autonomous: System proposes next experiments based on tree state and results
- Systematic: Tree structure ensures comprehensive coverage of search space
- Adaptive: Prune bad branches, expand promising ones
- Reproducible: Git worktrees + commits = exact code state for each iteration
- Multi-objective: Optimize for accuracy, latency, model size simultaneously
Once the experiment loop has begun, the agent should continue indefinitely until manually stopped. The human might be asleep and expects autonomous research.
LOOP FOREVER:
1. ANALYZE: Look at current tree state
- Which nodes are completed?
- Which branches show promise?
- Which branches should be pruned?
2. PROPOSE: Generate next experiment(s)
- If improved → create variations
- If degraded → prune and try alternative
- If stuck → explore orthogonal dimensions
3. CREATE: Add new node(s) to the tree
- add_node(parent, desc, cmd, worktree=True)
4. EXECUTE: Run experiment in worktree
- For simple experiments: agent can run directly
- For complicated projects: WAIT for user to provide results
- cd .worktrees/<node_id>
- git commit changes
- Run training command
- Collect results
5. RECORD: Update node with results
- validate_results() → check format
- update_node() → save results
- sync_commit() → record git state
6. DECIDE: Prune or expand based on results
- If better: expand this branch
- If worse: prune, try sibling
REPEAT FOREVER
| Command | Purpose |
|---|---|
list |
List all nodes |
get --id <id> |
Get node details |
add |
Record new experiment |
validate --results |
Validate results format |
update --id --results |
Record results |
sync --id |
Record git commit |
worktree --id |
Get worktree info + command |
visualize |
Show tree |
compare --id --id2 |
Compare two nodes |
best-path |
Show best performing path |
prune --id --reason |
Mark as pruned |
delete --id |
Delete node |
primary_metric: "accuracy"
primary_value: 0.92
direction: "maximize" # or "minimize"objectives:
- metric: "accuracy"
value: 0.92
direction: "maximize"
- metric: "latency_ms"
value: 15.3
direction: "minimize"
primary_index: 0 # which objective is primarynode_id: "iter1_baseline"
parent_id: null
description: "Baseline experiment"
command: "python train.py"
working_dir: "."
worktree:
path: ".worktrees/iter1_baseline"
branch: "iter/iter1_baseline"
commit: "a1b2c3d4" # Exact code state
status: "completed"
results:
primary_metric: "accuracy"
primary_value: 0.92
direction: "maximize"
children: []Every node records the exact git commit that produced its results:
# After making changes in worktree
cd .worktrees/iter1_baseline
git add -A && git commit -m "tune learning rate"
# Record the commit
Tree-AutoResearch ITERATION_TREE.yaml sync --id iter1_baseline
# Output: Synced: iter1_baseline -> a1b2c3d
# Later: reproduce this exact experiment
git checkout a1b2c3d- Hyperparameter Search: Systematically explore learning rates, batch sizes, architectures
- Model Architecture: Compare ResNet vs ViT vs ConvNeXt variants
- Data Augmentation: Test different augmentation strategies
- Training Recipes: Compare optimizers, schedulers, regularization
- Multi-objective Optimization: Balance accuracy vs latency vs model size
- Python 3.6+
- PyYAML (
pip install pyyaml) - Git (for worktree features)
This project tries to improve the karpathy/autoresearch — the vision of AI agents autonomously conducting ML research overnight. Tree-AutoResearch extends this vision with a tree-based structure for more systematic and comprehensive research exploration.
MIT License - see LICENSE for details.