algorithmicsuperintelligence · codelion · May 17, 2025 · May 7, 2025 · May 7, 2025 · May 7, 2025
diff --git a/README.md b/README.md
@@ -343,7 +343,7 @@ Check this log file for connection issues, tool execution errors, and other diag
 
 | Approach                             | Slug               | Description                                                                                    |
 | ------------------------------------ | ------------------ | ---------------------------------------------------------------------------------------------- |
-| Cerebras Planning and Optimization | `cepo`             | Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques |
+| Cerebras Planning and Optimization   | `cepo`             | Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques |
 | CoT with Reflection                  | `cot_reflection`   | Implements chain-of-thought reasoning with \<thinking\>, \<reflection> and \<output\> sections |
 | PlanSearch                           | `plansearch`       | Implements a search algorithm over candidate plans for solving a problem in natural language   |
 | ReRead                               | `re2`              | Implements rereading to improve reasoning by processing queries twice                          |
@@ -359,6 +359,7 @@ Check this log file for connection issues, tool execution errors, and other diag
 | CoT Decoding                         |  N/A for proxy     | Implements chain-of-thought decoding to elicit reasoning without explicit prompting            |
 | Entropy Decoding                     |  N/A for proxy     | Implements adaptive sampling based on the uncertainty of tokens during generation              |
 | Thinkdeeper                          |  N/A for proxy     | Implements the `reasoning_effort` param from OpenAI for reasoning models like DeepSeek R1      |
+| AutoThink                            |  N/A for proxy     | Combines query complexity classification with steering vectors to enhance reasoning            |
 
 ## Implemented plugins
 
@@ -467,6 +468,16 @@ Authorization: Bearer your_secret_api_key
 
 ## SOTA results on benchmarks with optillm
 
+### AutoThink on GPQA-Diamond & MMLU-Pro (May 2025)
+
+| **Model**     | **GPQA-Diamond**            |                          | **MMLU-Pro**               |                          |
+|----------------|-----------------------------|--------------------------|----------------------------|--------------------------|
+|                | Accuracy (%)                | Avg. Tokens              | Accuracy (%)               | Avg. Tokens              |
+| DeepSeek-R1-Distill-Qwen-1.5B    | 21.72                       | 7868.26                  | 25.58                      | 2842.75                  |
+| with Fixed Budget | 28.47                     | 3570.00                  | 26.18                      | 1815.67                  |
+| **with AutoThink**  | **31.06**                   | **3520.52**              | **26.38**                  | **1792.50**              |
+
+
 ### LongCePO on LongBench v2 (Apr 2025)
 
 | Model¹                             | Context window | Short samples (up to 32K words) | Medium samples (32–128K words) |
@@ -551,6 +562,7 @@ called patchflows. We saw huge performance gains across all the supported patchf
 ![Results showing optillm mixture of agents approach used with patchflows](https://raw.githubusercontent.com/codelion/optillm/main/moa-patchwork-results.png)
 
 ## References
+- [AutoThink: efficient inference for reasoning LLMs](https://dx.doi.org/10.2139/ssrn.5253327) - [Implementation](optillm/autothink)
 - [CePO: Empowering Llama with Reasoning using Test-Time Compute](https://cerebras.ai/blog/cepo) - [Implementation](optillm/cepo)
 - [LongCePO: Empowering LLMs to efficiently leverage infinite context](https://cerebras.ai/blog/longcepo) - [Implementation](optillm/plugins/longcepo)
 - [Chain of Code: Reasoning with a Language Model-Augmented Code Emulator](https://arxiv.org/abs/2312.04474) - [Inspired the implementation of coc plugin](optillm/plugins/coc_plugin.py)

diff --git a/optillm/__init__.py b/optillm/__init__.py
@@ -2,7 +2,7 @@
 import os
 
 # Version information
-__version__ = "0.1.11"
+__version__ = "0.1.12"
 
 # Get the path to the root optillm.py
 spec = util.spec_from_file_location(

diff --git a/optillm/autothink/README.md b/optillm/autothink/README.md
@@ -0,0 +1,110 @@
+# AutoThink
+
+AutoThink is an adaptive thinking approach for Large Language Models that combines query complexity classification with steering vector guidance to enhance model reasoning capabilities.
+
+## Overview
+
+AutoThink combines several advanced techniques to optimize the thinking process of LLMs:
+
+1. **Query Complexity Classification**: Uses an adaptive classifier to determine if a query requires HIGH or LOW complexity reasoning
+2. **Token Budget Allocation**: Dynamically allocates thinking tokens based on query complexity
+3. **Steering Vector Guidance**: Applies activation-based steering vectors to guide the model's reasoning process
+4. **Controlled Thinking Process**: Manages explicit thinking phases with start and end tokens
+
+## How It Works
+
+### 1. Query Classification
+
+AutoThink uses the `adaptive-classifier/llm-router` [model](https://huggingface.co/adaptive-classifier/llm-router) to classify incoming queries:
+
+- **HIGH**: Complex queries requiring deep reasoning, multi-step calculations, or thorough exploration
+- **LOW**: Simpler queries requiring less extensive reasoning
+
+### 2. Token Budget
+
+Based on the classification, AutoThink allocates different token budgets for the thinking phase:
+
+- **HIGH**: 70-90% of max tokens allocated for thinking
+- **LOW**: 20-40% of max tokens allocated for thinking
+
+### 3. Steering Vectors
+
+AutoThink uses pre-extracted steering vectors from [datasets](https://huggingface.co/datasets?other=pts) like `codelion/Qwen3-0.6B-pts-steering-vectors`. These vectors represent different reasoning patterns:
+
+- **Depth and thoroughness**: Encourages detailed, step-by-step reasoning
+- **Numerical accuracy**: Promotes precise calculations and verification
+- **Self-correction**: Facilitates error detection and correction
+- **Exploration**: Supports considering multiple approaches
+- **Organization**: Improves logical structure in responses
+
+During inference, the model's internal activations are modified based on these vectors to enhance specific reasoning capabilities.
+
+### 4. Controlled Thinking Process
+
+The generation process includes:
+1. A thinking phase marked by `<think>` and `</think>` tokens
+2. Automatic adjustment of thinking time based on query complexity
+3. Dynamic application of steering vectors
+4. Graceful transition to the final response
+
+## Configuration
+
+AutoThink can be configured with:
+
+```python
+{
+    "model_name": "your-model-name",
+    "classifier_model": "adaptive-classifier/llm-router",
+    "steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors",
+    "target_layer": 19,  # Layer to apply steering vectors
+    "high_complexity_min_tokens": 1024, 
+    "high_complexity_max_tokens": 4096,
+    "low_complexity_min_tokens": 256,
+    "low_complexity_max_tokens": 1024,
+    "pattern_strengths": {
+        "depth_and_thoroughness": 2.5,  # Steering strength for different patterns
+        "numerical_accuracy": 2.0,
+        "self_correction": 3.0,
+        "exploration": 2.0,
+        "organization": 1.5
+    }
+}
+```
+
+## Usage
+
+```python
+from optillm.autothink import autothink_decode
+
+response = autothink_decode(
+    model,
+    tokenizer,
+    messages,
+    {
+        "steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors",
+        "target_layer": 19
+    }
+)
+```
+
+## Benefits
+
+- **Adaptive Resource Usage**: Models think more on complex problems and less on simple ones
+- **Enhanced Reasoning**: Steering vectors guide the model toward better reasoning patterns
+- **Efficiency**: Better performance without increasing model size
+- **Customizability**: Can be tailored for different domains using domain-specific steering vector datasets
+
+
+## Citation
+
+If you use this approach in your research, please cite:
+
+```bibtex
+@article{autothink,
+  title={AutoThink: efficient inference for reasoning LLMs},
+  author={Sharma, Asankhaya},
+  journal={SSRN Artificial Intelligence eJournal},
+  year={2025},
+  url = {https://dx.doi.org/10.2139/ssrn.5253327}
+}
+```
diff --git a/optillm/autothink/__init__.py b/optillm/autothink/__init__.py
@@ -0,0 +1,7 @@
+"""
+AutoThink - Adaptive thinking approach for LLMs with query complexity classification and steering vectors.
+"""
+
+from .autothink import autothink_decode, AutoThinkProcessor
+
+__all__ = ["autothink_decode", "AutoThinkProcessor"]
diff --git a/optillm/autothink/autothink.py b/optillm/autothink/autothink.py
@@ -0,0 +1,91 @@
+"""
+AutoThink main implementation.
+
+This module provides the main implementation of AutoThink, combining
+query complexity classification with steering vectors to enhance reasoning.
+"""
+
+import logging
+from typing import Dict, List, Any, Optional
+from transformers import PreTrainedModel, PreTrainedTokenizer
+
+from .processor import AutoThinkProcessor as InternalProcessor
+
+logger = logging.getLogger(__name__)
+
+class AutoThinkProcessor:
+    """
+    Main AutoThink processor class for external use.
+    Wraps the internal processor implementation.
+    """
+
+    def __init__(self, model: PreTrainedModel, tokenizer: PreTrainedTokenizer, config: Dict[str, Any] = None):
+        """
+        Initialize the AutoThink processor.
+
+        Args:
+            model: Language model
+            tokenizer: Model tokenizer
+            config: Configuration dictionary
+        """
+        self.config = config or {}
+        self.processor = None
+        self.model = model
+        self.tokenizer = tokenizer
+
+    def __call__(self, messages: List[Dict[str, str]]) -> str:
+        """Process messages with AutoThink's controlled thinking."""
+        return self.process(messages)
+
+    def process(self, messages: List[Dict[str, str]]) -> str:
+        """Process messages with AutoThink's controlled thinking.
+
+        Args:
+            messages: List of message dictionaries
+
+        Returns:
+            Generated response
+        """
+        # Create processor on first use to allow for model loading
+        if self.processor is None:
+            self.processor = self._create_processor()
+
+        return self.processor.process(messages)
+
+    def _create_processor(self):
+        """Create the internal processor instance."""
+        return InternalProcessor(self.config, self.tokenizer, self.model)
+
+def autothink_decode(
+    model: PreTrainedModel, 
+    tokenizer: PreTrainedTokenizer, 
+    messages: List[Dict[str, str]], 
+    request_config: Optional[Dict[str, Any]] = None
+) -> str:
+    """
+    Main plugin execution function with AutoThink's controlled thinking process.
+
+    Args:
+        model: Language model
+        tokenizer: Model tokenizer
+        messages: List of message dictionaries
+        request_config: Optional configuration dictionary
+
+    Returns:
+        Generated response with thinking process
+    """
+    logger.info("Starting AutoThink processing")
+
+    # Create config dictionary
+    config = {}
+    if request_config:
+        config.update(request_config)
+
+    try:
+        processor = AutoThinkProcessor(model, tokenizer, config)
+        response = processor.process(messages)
+        return response
+
+    except Exception as e:
+        logger.error(f"Error in AutoThink processing: {str(e)}")
+        raise