Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,7 @@ Check this log file for connection issues, tool execution errors, and other diag

| Approach | Slug | Description |
| ------------------------------------ | ------------------ | ---------------------------------------------------------------------------------------------- |
| Cerebras Planning and Optimization | `cepo` | Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques |
| Cerebras Planning and Optimization | `cepo` | Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques |
| CoT with Reflection | `cot_reflection` | Implements chain-of-thought reasoning with \<thinking\>, \<reflection> and \<output\> sections |
| PlanSearch | `plansearch` | Implements a search algorithm over candidate plans for solving a problem in natural language |
| ReRead | `re2` | Implements rereading to improve reasoning by processing queries twice |
Expand All @@ -359,6 +359,7 @@ Check this log file for connection issues, tool execution errors, and other diag
| CoT Decoding | N/A for proxy | Implements chain-of-thought decoding to elicit reasoning without explicit prompting |
| Entropy Decoding | N/A for proxy | Implements adaptive sampling based on the uncertainty of tokens during generation |
| Thinkdeeper | N/A for proxy | Implements the `reasoning_effort` param from OpenAI for reasoning models like DeepSeek R1 |
| AutoThink | N/A for proxy | Combines query complexity classification with steering vectors to enhance reasoning |

## Implemented plugins

Expand Down Expand Up @@ -467,6 +468,16 @@ Authorization: Bearer your_secret_api_key

## SOTA results on benchmarks with optillm

### AutoThink on GPQA-Diamond & MMLU-Pro (May 2025)

| **Model** | **GPQA-Diamond** | | **MMLU-Pro** | |
|----------------|-----------------------------|--------------------------|----------------------------|--------------------------|
| | Accuracy (%) | Avg. Tokens | Accuracy (%) | Avg. Tokens |
| DeepSeek-R1-Distill-Qwen-1.5B | 21.72 | 7868.26 | 25.58 | 2842.75 |
| with Fixed Budget | 28.47 | 3570.00 | 26.18 | 1815.67 |
| **with AutoThink** | **31.06** | **3520.52** | **26.38** | **1792.50** |


### LongCePO on LongBench v2 (Apr 2025)

| Model¹ | Context window | Short samples (up to 32K words) | Medium samples (32–128K words) |
Expand Down Expand Up @@ -551,6 +562,7 @@ called patchflows. We saw huge performance gains across all the supported patchf
![Results showing optillm mixture of agents approach used with patchflows](https://raw.githubusercontent.com/codelion/optillm/main/moa-patchwork-results.png)

## References
- [AutoThink: efficient inference for reasoning LLMs](https://dx.doi.org/10.2139/ssrn.5253327) - [Implementation](optillm/autothink)
- [CePO: Empowering Llama with Reasoning using Test-Time Compute](https://cerebras.ai/blog/cepo) - [Implementation](optillm/cepo)
- [LongCePO: Empowering LLMs to efficiently leverage infinite context](https://cerebras.ai/blog/longcepo) - [Implementation](optillm/plugins/longcepo)
- [Chain of Code: Reasoning with a Language Model-Augmented Code Emulator](https://arxiv.org/abs/2312.04474) - [Inspired the implementation of coc plugin](optillm/plugins/coc_plugin.py)
Expand Down
2 changes: 1 addition & 1 deletion optillm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import os

# Version information
__version__ = "0.1.11"
__version__ = "0.1.12"

# Get the path to the root optillm.py
spec = util.spec_from_file_location(
Expand Down
110 changes: 110 additions & 0 deletions optillm/autothink/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# AutoThink

AutoThink is an adaptive thinking approach for Large Language Models that combines query complexity classification with steering vector guidance to enhance model reasoning capabilities.

## Overview

AutoThink combines several advanced techniques to optimize the thinking process of LLMs:

1. **Query Complexity Classification**: Uses an adaptive classifier to determine if a query requires HIGH or LOW complexity reasoning
2. **Token Budget Allocation**: Dynamically allocates thinking tokens based on query complexity
3. **Steering Vector Guidance**: Applies activation-based steering vectors to guide the model's reasoning process
4. **Controlled Thinking Process**: Manages explicit thinking phases with start and end tokens

## How It Works

### 1. Query Classification

AutoThink uses the `adaptive-classifier/llm-router` [model](https://huggingface.co/adaptive-classifier/llm-router) to classify incoming queries:

- **HIGH**: Complex queries requiring deep reasoning, multi-step calculations, or thorough exploration
- **LOW**: Simpler queries requiring less extensive reasoning

### 2. Token Budget

Based on the classification, AutoThink allocates different token budgets for the thinking phase:

- **HIGH**: 70-90% of max tokens allocated for thinking
- **LOW**: 20-40% of max tokens allocated for thinking

### 3. Steering Vectors

AutoThink uses pre-extracted steering vectors from [datasets](https://huggingface.co/datasets?other=pts) like `codelion/Qwen3-0.6B-pts-steering-vectors`. These vectors represent different reasoning patterns:

- **Depth and thoroughness**: Encourages detailed, step-by-step reasoning
- **Numerical accuracy**: Promotes precise calculations and verification
- **Self-correction**: Facilitates error detection and correction
- **Exploration**: Supports considering multiple approaches
- **Organization**: Improves logical structure in responses

During inference, the model's internal activations are modified based on these vectors to enhance specific reasoning capabilities.

### 4. Controlled Thinking Process

The generation process includes:
1. A thinking phase marked by `<think>` and `</think>` tokens
2. Automatic adjustment of thinking time based on query complexity
3. Dynamic application of steering vectors
4. Graceful transition to the final response

## Configuration

AutoThink can be configured with:

```python
{
"model_name": "your-model-name",
"classifier_model": "adaptive-classifier/llm-router",
"steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors",
"target_layer": 19, # Layer to apply steering vectors
"high_complexity_min_tokens": 1024,
"high_complexity_max_tokens": 4096,
"low_complexity_min_tokens": 256,
"low_complexity_max_tokens": 1024,
"pattern_strengths": {
"depth_and_thoroughness": 2.5, # Steering strength for different patterns
"numerical_accuracy": 2.0,
"self_correction": 3.0,
"exploration": 2.0,
"organization": 1.5
}
}
```

## Usage

```python
from optillm.autothink import autothink_decode

response = autothink_decode(
model,
tokenizer,
messages,
{
"steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors",
"target_layer": 19
}
)
```

## Benefits

- **Adaptive Resource Usage**: Models think more on complex problems and less on simple ones
- **Enhanced Reasoning**: Steering vectors guide the model toward better reasoning patterns
- **Efficiency**: Better performance without increasing model size
- **Customizability**: Can be tailored for different domains using domain-specific steering vector datasets


## Citation

If you use this approach in your research, please cite:

```bibtex
@article{autothink,
title={AutoThink: efficient inference for reasoning LLMs},
author={Sharma, Asankhaya},
journal={SSRN Artificial Intelligence eJournal},
year={2025},
url = {https://dx.doi.org/10.2139/ssrn.5253327}
}
```
7 changes: 7 additions & 0 deletions optillm/autothink/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""
AutoThink - Adaptive thinking approach for LLMs with query complexity classification and steering vectors.
"""

from .autothink import autothink_decode, AutoThinkProcessor

__all__ = ["autothink_decode", "AutoThinkProcessor"]
91 changes: 91 additions & 0 deletions optillm/autothink/autothink.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
"""
AutoThink main implementation.

This module provides the main implementation of AutoThink, combining
query complexity classification with steering vectors to enhance reasoning.
"""

import logging
from typing import Dict, List, Any, Optional
from transformers import PreTrainedModel, PreTrainedTokenizer

from .processor import AutoThinkProcessor as InternalProcessor

logger = logging.getLogger(__name__)

class AutoThinkProcessor:
"""
Main AutoThink processor class for external use.
Wraps the internal processor implementation.
"""

def __init__(self, model: PreTrainedModel, tokenizer: PreTrainedTokenizer, config: Dict[str, Any] = None):
"""
Initialize the AutoThink processor.

Args:
model: Language model
tokenizer: Model tokenizer
config: Configuration dictionary
"""
self.config = config or {}
self.processor = None
self.model = model
self.tokenizer = tokenizer

def __call__(self, messages: List[Dict[str, str]]) -> str:
"""Process messages with AutoThink's controlled thinking."""
return self.process(messages)

def process(self, messages: List[Dict[str, str]]) -> str:
"""Process messages with AutoThink's controlled thinking.

Args:
messages: List of message dictionaries

Returns:
Generated response
"""
# Create processor on first use to allow for model loading
if self.processor is None:
self.processor = self._create_processor()

return self.processor.process(messages)

def _create_processor(self):
"""Create the internal processor instance."""
return InternalProcessor(self.config, self.tokenizer, self.model)

def autothink_decode(
model: PreTrainedModel,
tokenizer: PreTrainedTokenizer,
messages: List[Dict[str, str]],
request_config: Optional[Dict[str, Any]] = None
) -> str:
"""
Main plugin execution function with AutoThink's controlled thinking process.

Args:
model: Language model
tokenizer: Model tokenizer
messages: List of message dictionaries
request_config: Optional configuration dictionary

Returns:
Generated response with thinking process
"""
logger.info("Starting AutoThink processing")

# Create config dictionary
config = {}
if request_config:
config.update(request_config)

try:
processor = AutoThinkProcessor(model, tokenizer, config)
response = processor.process(messages)
return response

except Exception as e:
logger.error(f"Error in AutoThink processing: {str(e)}")
raise
Loading