Knowledge Flow: Scaling Reasoning Beyond the Context Limit

Yufan Zhuang^1,2, Liyuan Liu¹, Dinghuai Zhang¹, Chandan Singh¹, Yelong Shen¹, Jingbo Shang², Jianfeng Gao¹

¹Microsoft Research ²UC San Diego

We explore how Knowledge Flow Prompting—iteratively updating a knowledge list between rollouts—overcomes the context limit of LLMs in test-time scaling. This iterative refinement process mirrors human deliberation: the model progressively distills insights from attempts, refining a knowledge list, thereby empowering subsequent rollouts. Knowledge Flow enables both gpt-oss-120b and Qwen3-235B-A22B-Thinking to achieve 100% accuracy on AIME25 without any training/tools/external feedback.

For detailed insights into the methodology and results, please refer to our blog post.

🚀 Quick Start

Environment Setup

1. Prerequisites

CUDA >= 12.8
VLLM == 0.10.2

2. Installation

Option A: Minimal Installation (recommended for quickstarts)

pip install -r requirements.txt

This installs the core dependencies.

Option B: Full Installation (includes all dependency versions)

pip install -r requirements_full.txt

This includes additional packages for extended functionality including CUDA libraries, evaluation tools, and more.

📂 Project Structure

Main Implementation

The core Knowledge Flow implementation is available in two variants:

For GPT-OSS Models:

Main Script: scripts/vllm_kflow_oss.py
Bash Script: bash_scripts/gpt_oss.sh
Model: openai/gpt-oss-120b

For Qwen Models:

Main Script: scripts/vllm_kflow_qwen.py
Bash Script: bash_scripts/qwen3.sh
Model: Qwen/Qwen3-235B-A22B-Thinking-2507

Key Parameters

Both main scripts support the following arguments:

--model_name        # Hugging Face model identifier
--max_new_tokens    # Maximum generation length (e.g., 131072 for GPT-OSS, 262144 for Qwen)
--temperature       # Sampling temperature (default: 0.6)
--reflex_size       # Number of iterations (default: 64)
--split             # Dataset split: train/test/validation
--output_postfix    # Custom postfix for output files

🔬 Ablation Studies

We provide several ablation variants to study the impact of different components:

1. Knowledge Regeneration Variants

These scripts test the effect of regenerating knowledge at each step:

Key Difference: Uses NLTK sentence tokenization to segment and regenerate knowledge descriptions more systematically.

2. Positive Knowledge Variants

These scripts explore using positive reinforcement (correct solutions) instead of mistake tracking:

Key Difference: Builds a knowledge base from successful solutions rather than learning from mistakes.

3. Markovian Thinking Baseline

These scripts implement a baseline that doesn't maintain long-term knowledge:

Key Difference: Tests the Markovian Thinking where each iteration only depends on the immediate previous step, without accumulated knowledge.

💻 Usage Examples

Running Knowledge Flow on GPT-OSS

cd bash_scripts
bash gpt_oss.sh

Or run directly with Python:

model_name=openai/gpt-oss-120b
BACKEND=TRITON_ATTN_VLLM_V1
export HF_HUB_ENABLE_HF_TRANSFER=1
export VLLM_FLASHINFER_ALLREDUCE_FUSION_THRESHOLDS_MB='{"2":32,"4":32,"8":8}'
export VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8=1

VLLM_ATTENTION_BACKEND=$BACKEND TOKENIZERS_PARALLELISM=false PYTHONPATH="../":"$PYTHONPATH" python3 ../scripts/vllm_kflow_oss.py \
   --model_name "$model_name" \
   --max_new_tokens 131072 \
   --temperature 0.6 \
   --reflex_size 64

📊 Output Format

Results are saved in ./results/vllm_{max_new_tokens}_{output_postfix}/{model_name}/:

predictions_{step}.json - Predictions at each step
Includes: questions, generated solutions, extracted answers, gold answers, and correctness

🔧 Technical Details

vLLM Backend Configuration

We used this setting on B200 in our experiments, adjust accordingly to your compute platforms.

For GPT-OSS:

export VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1
export HF_HUB_ENABLE_HF_TRANSFER=1
export VLLM_FLASHINFER_ALLREDUCE_FUSION_THRESHOLDS_MB='{"2":32,"4":32,"8":8}'
export VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8=1

For Qwen:

export VLLM_ATTENTION_BACKEND=FLASH_ATTN
export HF_HUB_ENABLE_HF_TRANSFER=1

📚 Citation

If you find this work helpful, please cite us:

@misc{zhuang2025knowledgeflow,
  title = {Knowledge Flow: Scaling Reasoning Beyond the Context Limit},
  url = {https://yufanzhuang.notion.site/knowledge-flow},
  author = {Zhuang, Yufan and Liu, Liyuan and Zhang, Dinghuai and Singh, Chandan and Shen, Yelong and Shang, Jingbo and Gao, Jianfeng},
  journal = {Notion Blog},
  year = {2025},
  month = Oct,
}

📄 License

We release our code under Apache 2.0, see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
bash_scripts		bash_scripts
imgs		imgs
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
requirements_full.txt		requirements_full.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Knowledge Flow: Scaling Reasoning Beyond the Context Limit

🚀 Quick Start

Environment Setup

1. Prerequisites

2. Installation

📂 Project Structure

Main Implementation

For GPT-OSS Models:

For Qwen Models:

Key Parameters

🔬 Ablation Studies

1. Knowledge Regeneration Variants

2. Positive Knowledge Variants

3. Markovian Thinking Baseline

💻 Usage Examples

Running Knowledge Flow on GPT-OSS

📊 Output Format

🔧 Technical Details

vLLM Backend Configuration

📚 Citation

📄 License

About

Uh oh!

Releases

Packages

Languages

License

EvanZhuang/knowledge_flow

Folders and files

Latest commit

History

Repository files navigation

Knowledge Flow: Scaling Reasoning Beyond the Context Limit

🚀 Quick Start

Environment Setup

1. Prerequisites

2. Installation

📂 Project Structure

Main Implementation

For GPT-OSS Models:

For Qwen Models:

Key Parameters

🔬 Ablation Studies

1. Knowledge Regeneration Variants

2. Positive Knowledge Variants

3. Markovian Thinking Baseline

💻 Usage Examples

Running Knowledge Flow on GPT-OSS

📊 Output Format

🔧 Technical Details

vLLM Backend Configuration

📚 Citation

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages