GitHub - ApodexAI/Apodex: Apodex

Apodex: A deep research agent optimized for research and prediction. It achieves a 88.2 on the challenging BrowseComp benchmark. See Quick Start.

📋 Table of Contents

📰 News & Updates
📝 Introduction
✨ Key Features
📈 Performance on Benchmarks
🚀 Quick Start
📊 Benchmark Evaluation
🔬 Trace Collection
❓ FAQ & Troubleshooting
📄 License
🙏 Acknowledgments

📰 News & Updates

[2026-03-11] 🎉🎉🎉 Introducing Apodex, including Apodex-0.7-mini and Apodex-0.7. Apodex-0.7-mini achieves 72.3 on BrowseComp-ZH, setting a new SOTA among open-source models while using only 30B parameters. Our proprietary agent Apodex-0.7-H achieves leading performance on BrowseComp and BrowseComp-ZH among open-source and commercial models.

📝 Introduction

Apodex-0.7

Our new Apodex family represents a significant leap in building reliable agents for long-chain tasks. Engineered with enhanced post-training pipeline, our Apodex-0.7 family achieve SOTA performance in deep research tasks among open-source models.

Key Features

🚀 Apodex-0.7 supports a 256K context window, long-horizon reasoning, and deep multi-step analysis.
🔧 Handles up to 300 tool interactions per task, now with more accurate stepwise reasoning and decision-making.
📦 Released in 30B and 235B parameter scales, accompanied by a comprehensive suite of tools and workflows to flexibly support diverse research settings and compute budgets.
Our proprietary agent, Apodex-0.7-H provides promising evidence for long-chain verifiable reasoning — reasoning processes that are step-verifiable and globally verifiable, improving the performance of complex agentic workflows.

Model Name	Parameters	Max Context	Max Tool Calls	HF Link
Apodex-0.7-mini	30B	256K	300	🤗 link
Apodex-0.7	235B	256K	300	🤗 link

Apodex-0.7 demonstrates strong general-research performance across a broad range of benchmarks, achieving 74.0%, 75.3%, 82.7% and 42.9% on BrowseComp, BrowseComp-ZH, GAIA-Val-165 and HLE-Text, respectively. Apodex-0.7 achieves SOTA performance on BrowseComp-ZH.

✨ Key Features

🤖 Apodex-Optimized Framework

🔓 Fully Open-Source Agent Framework: Complete transparency with open framework and open agents
🔗 Tool Integration: Seamless integration with external tools and APIs
📝 Trace Collection: Comprehensive logging and analysis of agent interactions with elapsed time and estimated completion time displayed in minutes. Ready for SFT and DPO
📊 Benchmark Evaluation: Extensive testing across multiple benchmark datasets

📊 Comprehensive Benchmark Suite

📋 Click to expand benchmark list

GAIA Validation: A benchmark for General AI Assistants. (paper)
GAIA-Text-103: A subset of GAIA Validation for text-only tasks. (paper)
HLE: Humanity's Last Exam. (paper)
HLE-Text-2158: A subset of HLE for text-only tasks. (paper)
HLE-Text-500: A subset of HLE for text-only tasks, created by WebThinker. (paper)
BrowseComp-EN: Web browsing and comprehension tasks. (paper)
BrowseComp-ZH: A Chinese version of BrowseComp. (paper)
WebWalkerQA: Web navigation and question answering. (paper)
Frames: Factuality, Retrieval, And reasoning MEasurement Set. (paper)
XBench-DeepSearch: A benchmark for deep research agents. (website)
FutureX: A live benchmark designed for predicting unknown future. (website)
SEAL-0: A benchmark for evaluating LLMs on conflicting-evidence web questions. (paper)
AIME2025: American Invitational Mathematics Examination 2025. (website)
DeepSearchQA: Google's Deep Search Question Answering benchmark. (paper)

🚀 Quick Start

For optimal usage, we recommend using Apodex with this tool-enabled agent framework and thinking mode enabled.

Prerequisites

🐍 Python 3.10+
📦 uv package manager (Installation guide)
🔑 Required API keys (see configuration section below)

Installation

# Clone the repository
git clone https://github.com/ApodexAI/Apodex
cd Apodex

# Setup environment
cd apps/miroflow-agent
uv sync

# Configure API keys
cp .env.example .env
# Edit .env with your API keys (SERPER_API_KEY, JINA_API_KEY, E2B_API_KEY, etc.)

📝 Environment Variables: See Tool Configuration section for required API keys.

Tool Configuration

Minimal Configuration for Apodex-0.7.

Server	Description	Tools Provided	Required Environment Variables
`tool-python`	Execution environment and file management (E2B sandbox)	`create_sandbox`, `run_command`, `run_python_code`, `upload_file_from_local_to_sandbox`, `download_file_from_sandbox_to_local`, `download_file_from_internet_to_sandbox`	`E2B_API_KEY`
`search_and_scrape_webpage`	Google search via Serper API	`google_search`	`SERPER_API_KEY`, `SERPER_BASE_URL`
`jina_scrape_llm_summary`	Web scraping with LLM-based information extraction	`scrape_and_extract_info`	`JINA_API_KEY`, `JINA_BASE_URL`, `SUMMARY_LLM_BASE_URL`, `SUMMARY_LLM_MODEL_NAME`, `SUMMARY_LLM_API_KEY`

Minimal .env configuration example:

SERPER_API_KEY=your_serper_key
SERPER_BASE_URL="https://google.serper.dev"
JINA_API_KEY=your_jina_key
JINA_BASE_URL="https://r.jina.ai"
E2B_API_KEY=your_e2b_key

# Required for jina_scrape_llm_summary
# Note: Summary LLM can be a small model (e.g., Qwen3-14B or GPT-5-Nano)
# The choice has minimal impact on performance, use what's most convenient
SUMMARY_LLM_BASE_URL="https://your_summary_llm_base_url/v1/chat/completions"
SUMMARY_LLM_MODEL_NAME=your_llm_model_name  # e.g., "Qwen/Qwen3-14B" or "gpt-5-nano"
SUMMARY_LLM_API_KEY=your_llm_api_key  # Optional, depends on LLM provider

# Required for benchmark evaluation (LLM-as-a-Judge)
OPENAI_API_KEY=your_openai_key  # Required for running benchmark evaluations
OPENAI_BASE_URL="https://api.openai.com/v1"  # Optional, defaults to OpenAI's API

💡 Why this is minimal: These 3 MCP servers cover the core capabilities needed for research tasks: web search, content extraction, and code execution. All other servers are optional enhancements.

🤖 Summary LLM: The SUMMARY_LLM can be a small model like Qwen3-14B or GPT-5-Nano. The choice has minimal impact on overall performance, use whichever is most convenient for your setup.

📊 For Benchmark Evaluation: If you plan to run benchmark evaluations, you also need OPENAI_API_KEY (and optionally OPENAI_BASE_URL) for LLM-as-a-Judge functionality used in evaluation scripts.

🖼️ For GAIA Multimodal Tasks: GAIA-Val-165 includes tasks with image/audio/video files. Since Apodex is a text-only LLM, GPT-4o is used to pre-process these files into text descriptions. The same OPENAI_API_KEY is used for both this preprocessing and LLM-as-a-Judge.

📖 For more details: See MiroFlow Tools README for complete documentation of all available tools.

🔧 Click to expand additional available tools

The following optional tools are available but were not used in Apodex-0.7 evaluation:

Server Name	Type	Description
`tool-vqa`	Commercial	Vision processing using Claude
`tool-vqa-os`	Open-Source	Vision processing (open-source alternative)
`tool-transcribe`	Commercial	Audio transcription using OpenAI
`tool-transcribe-os`	Open-Source	Audio transcription using Whisper
`tool-reasoning`	Commercial	Reasoning engine using Claude
`tool-reasoning-os`	Open-Source	Reasoning engine (open-source alternative)
`tool-reading`	Open-Source	Document reading using MarkItDown
`tool-google-search`	Commercial	Web search using Google + scraping
`tool-sogou-search`	Commercial	Web search using Sogou (Chinese)

📖 Local Deployment: For instructions on deploying open-source tools (tool-vqa-os, tool-transcribe-os, tool-reasoning-os) locally, see Local Tool Deployment Guide.

See the MiroFlow Tools README for complete documentation of all available tools.

Pre-configured Agent Settings

The apps/miroflow-agent/conf/agent/ directory contains several pre-configured agent settings. Each configuration uses different tools and requires corresponding environment variables in your .env file.

💡 Recommended: For Apodex-0.7, use apodex_0.7_keep5_max200 (with context management, recommended for most tasks) or apodex_0.7_keep5_max300 (only used for BrowseComp and BrowseComp-ZH).

Configuration	Description	Max Turns	Context Retention	Required Environment Variables	Recommended For
`apodex_0.7_keep5_max200` ⭐	Single-agent with context management	200	Keep 5 most recent	`SERPER_API_KEY`, `SERPER_BASE_URL`, `JINA_API_KEY`, `JINA_BASE_URL`, `E2B_API_KEY`, `SUMMARY_LLM_BASE_URL`, `SUMMARY_LLM_MODEL_NAME`, `SUMMARY_LLM_API_KEY`	1.7 (recommended for most tasks)
`apodex_0.7_keep5_max300` ⭐	Single-agent with context management	300	Keep 5 most recent	Same as above	1.7 (for BrowseComp & BrowseComp-ZH)

💡 Note: All environment variables are listed in apps/miroflow-agent/.env.example. Copy it to .env and fill in the values for the tools you plan to use.

Creating Custom Tool Configurations

🔧 Click to expand custom tool configuration guide

You can create your own YAML configuration file to freely combine MCP servers. Here's how:

Create a new YAML file in apps/miroflow-agent/conf/agent/:

# conf/agent/my_custom_config.yaml
defaults:
  - default
  - _self_

main_agent:
  tools:
    - tool-python                    # Execution environment
    - search_and_scrape_webpage      # Google search
    - jina_scrape_llm_summary        # Web scraping with LLM
    - tool-vqa                       # Vision processing (optional)
    - tool-transcribe                # Audio processing (optional)
    - tool-reasoning                 # Reasoning engine (optional)
    - tool-reading                   # Document reading (optional)
  max_turns: 300  # Maximum number of turns

sub_agents:
  agent-browsing:  # Optional sub-agent
    tools:
      - tool-google-search
      - tool-vqa
      - tool-reading
      - tool-python
    max_turns: 50

keep_tool_result: -1  # Context retention budget: -1 keeps all tool results, or specify K to keep only the K most recent tool responses

💡 Context Retention Strategy: The keep_tool_result parameter implements a recency-based context retention strategy. In the standard ReAct paradigm, all tool outputs are retained in the message history, which can lead to inefficient context utilization. Empirically, we observe that the agent's subsequent actions depend primarily on recent observations rather than distant ones. This strategy retains only the most recent K tool responses (where K is the keep_tool_result value) while preserving the complete sequence of thoughts and actions.

Benefits:

✅ Preserves the reasoning and action trace

✅ Focuses the agent's attention on the most contextually relevant observations

✅ Frees additional context space for extended reasoning and deeper tool-use trajectories

✅ Does not lead to performance degradation while allowing more context space for interactive scaling

Usage: Set keep_tool_result: -1 to keep all tool results, or specify a positive integer K (e.g., keep_tool_result: 5) to keep only the K most recent tool responses.

Use your custom configuration when running evaluations:

cd apps/miroflow-agent
uv run main.py llm=qwen-3 agent=my_custom_config llm.base_url=https://your_base_url/v1

Configure environment variables in .env based on the tools you use.

All available environment variables are listed in apps/miroflow-agent/.env.example. Copy it to .env and configure the variables according to your chosen configuration:
```
cd apps/miroflow-agent
cp .env.example .env
# Edit .env with your actual API keys
```
For other configurations, refer to the Pre-configured Agent Settings table above to see which environment variables are required.

🔑 Click to expand optional API keys

# API for LLM-as-a-Judge (for benchmark testing, required for benchmark evaluation)
OPENAI_API_KEY=your_openai_key
OPENAI_BASE_URL="https://api.openai.com/v1"  # Optional, defaults to OpenAI's API

# API for Open-Source Audio Transcription Tool (for benchmark testing, optional)
WHISPER_MODEL_NAME="openai/whisper-large-v3-turbo"
WHISPER_API_KEY=your_whisper_key
WHISPER_BASE_URL="https://your_whisper_base_url/v1"

# API for Open-Source VQA Tool (for benchmark testing, optional)
VISION_MODEL_NAME="Qwen/Qwen2.5-VL-72B-Instruct"
VISION_API_KEY=your_vision_key
VISION_BASE_URL="https://your_vision_base_url/v1/chat/completions"

# API for Open-Source Reasoning Tool (for benchmark testing, optional)
REASONING_MODEL_NAME="Qwen/Qwen3-235B-A22B-Thinking-2507"
REASONING_API_KEY=your_reasoning_key
REASONING_BASE_URL="https://your_reasoning_base_url/v1/chat/completions"

# API for Claude Sonnet 3.7 as Commercial Tools (optional)
ANTHROPIC_API_KEY=your_anthropic_key

# API for Sogou Search (optional)
TENCENTCLOUD_SECRET_ID=your_tencent_cloud_secret_id
TENCENTCLOUD_SECRET_KEY=your_tencent_cloud_secret_key

# API for Summary LLM (can use small models like Qwen3-14B or GPT-5-Nano)
SUMMARY_LLM_BASE_URL="https://your_summary_llm_base_url/v1/chat/completions"
SUMMARY_LLM_MODEL_NAME=your_summary_llm_model_name  # e.g., "Qwen/Qwen3-14B" or "gpt-5-nano"
SUMMARY_LLM_API_KEY=your_summary_llm_api_key

Serve the Apodex Agent

Option 1 (Recommended): Serve with SGLang or vLLM

Use SGLang to serve Apodex models at port 61002:

NUM_GPUS=4
PORT=61002

# Downloading agent from HF 
AGENT_PATH=apodex-ai/Apodex-0.7-mini


python3 -m sglang.launch_server \
    --model-path $AGENT_PATH \
    --tp $NUM_GPUS \
    --dp 1 \
    --host 0.0.0.0 \
    --port $PORT \
    --trust-remote-code

📍 Server URL: This will start a server at http://0.0.0.0:$PORT. Use this as your server base URL (e.g., http://0.0.0.0:61002/v1).

Option 2: Quantized Light-Weight Options

We also provide comprehensive guidance for serving Apodex agents using CPU-optimized and GPU-accelerated quantization techniques, along with detailed analysis and guidelines for deployment with llama.cpp, Ollama, SGLang, and other inference frameworks.

📖 Complete Guide: See Deployment Documentation for detailed deployment instructions.

Run Your First Task

After setting up the environment and starting your server, run main.py to test with a default question: "What is the title of today's arxiv paper in computer science?"

cd apps/miroflow-agent

# Using Apodex (requires your own server)
uv run python main.py llm=qwen-3 agent=apodex_0.7_keep5_max200 llm.base_url=http://localhost:61002/v1

# Or using Claude (requires ANTHROPIC_API_KEY in .env)
uv run python main.py llm=claude-3-7 agent=single_agent_keep5

# Or using GPT-5 (requires OPENAI_API_KEY in .env)
uv run python main.py llm=gpt-5 agent=single_agent_keep5

To customize your question, edit main.py line 32:

task_description = "Your custom question here"

The agent will search the web, execute code if needed, and provide an answer with sources.

📖 More details: See apps/miroflow-agent/README.md for available configurations and troubleshooting.

📊 Benchmark Evaluation

For researchers who want to reproduce our benchmark results or evaluate on standard benchmarks.

Run Benchmark Evaluation

Note: For Apodex-0.7, use apodex_0.7_keep5_max200 (with context management), apodex_1.7_keep5_max300 (with context management).

Available Parameters:

You can customize the evaluation by setting the following environment variables before running the script:

Parameter	Default	Description
`LLM_MODEL`	`"Apodex"`	Agent name identifier
`BASE_URL`	`"https://your-api.com/v1"`	Base URL of your server
`NUM_RUNS`	Varies by benchmark	Number of evaluation runs (3 for most benchmarks, 8 for GAIA/XBench/FutureX/SEAL-0, 32 for AIME2025)
`LLM_PROVIDER`	`"qwen"`	LLM provider (e.g., `qwen`, `openai`, `anthropic`)
`AGENT_SET`	`"apodex_0.7_keep5_max200"`	Agent configuration (e.g., `apodex_0.7_keep5_max200`, `apodex_0.7_keep5_max300`.)
`MAX_CONTEXT_LENGTH`	`262144`	Maximum context length (256K)
`MAX_CONCURRENT`	`10`	Maximum concurrent tasks
`PASS_AT_K`	`1`	Pass@K evaluation metric
`TEMPERATURE`	`1.0`	Sampling temperature
`API_KEY`	`"xxx"`	API key for the server

Example Usage:

# Navigate to the miroflow-agent directory first
cd apps/miroflow-agent

# Basic usage with v1.5 (recommended)
NUM_RUNS=8 LLM_MODEL="Apodex-0.7-mini" BASE_URL="https://your-api.com/v1" bash scripts/run_evaluate_multiple_runs_gaia-validation-text-103.sh



# Customize number of runs and agent configuration (v1.5 with context management)
LLM_MODEL="Apodex-0.7-mini" \
BASE_URL="https://your-api.com/v1" \
NUM_RUNS=8 \
AGENT_SET="apodex_0.7_keep5_max200" \
bash scripts/run_evaluate_multiple_runs_gaia-validation-text-103.sh

📋 Click to expand all benchmark commands

⚠️ Important for Apodex-0.7: To reproduce our reported results, you must set the correct AGENT_SET:

BrowseComp & BrowseComp-ZH: Use AGENT_SET="apodex_0.7_keep5_max300"

All other benchmarks: Use AGENT_SET="apodex_0.7_keep5_max200"

# Navigate to the miroflow-agent directory first
cd apps/miroflow-agent

# HLE
NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_hle.sh

# HLE-Text-2158
NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_hle-text-2158.sh

# HLE-Text-500
NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_hle-text-500.sh

# GAIA-Text-103
NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_gaia-validation-text-103.sh

# GAIA-Validation (GAIA-Val-165)
NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_gaia-validation.sh

# BrowseComp-EN (⚠️ use max300)
NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max300" bash scripts/run_evaluate_multiple_runs_browsecomp.sh

# BrowseComp-ZH (⚠️ use max300)
NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max300" bash scripts/run_evaluate_multiple_runs_browsecomp_zh.sh

# WebWalkerQA
NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_webwalkerqa.sh

# XBench-DeepSearch
NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_xbench_deepsearch.sh

# FRAMES
NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_frames.sh

# SEAL-0
NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_seal-0.sh

# FutureX
NUM_RUNS=8 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_futurex.sh

# AIME2025
NUM_RUNS=32 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_aime2025.sh

# DeepSearchQA
NUM_RUNS=3 LLM_MODEL="xxx" BASE_URL="xxx" AGENT_SET="apodex_0.7_keep5_max200" bash scripts/run_evaluate_multiple_runs_deepsearchqa.sh

3. Monitor evaluation progress

📊 Click to expand progress monitoring commands

# Navigate to the miroflow-agent directory first
cd apps/miroflow-agent

# For HLE
python benchmarks/check_progress/check_progress_hle.py /path/to/evaluation/logs

# For HLE-Text-2158
python benchmarks/check_progress/check_progress_hle-text-2158.py /path/to/evaluation/logs

# For HLE-Text-500
python benchmarks/check_progress/check_progress_hle-text-500.py /path/to/evaluation/logs

# For BrowseComp-EN
python benchmarks/check_progress/check_progress_browsecomp.py /path/to/evaluation/logs

# For BrowseComp-ZH
python benchmarks/check_progress/check_progress_browsecomp_zh.py /path/to/evaluation/logs

# For GAIA-Validation
python benchmarks/check_progress/check_progress_gaia-validation.py /path/to/evaluation/logs

# For GAIA-Text-103
python benchmarks/check_progress/check_progress_gaia-validation-text-103.py /path/to/evaluation/logs

# For WebWalkerQA
python benchmarks/check_progress/check_progress_webwalkerqa.py /path/to/evaluation/logs

# For Frames
python benchmarks/check_progress/check_progress_frames.py /path/to/evaluation/logs

# For XBench-DeepSearch
python benchmarks/check_progress/check_progress_xbench_deepsearch.py /path/to/evaluation/logs

# For SEAL-0
python benchmarks/check_progress/check_progress_seal-0.py /path/to/evaluation/logs

# For AIME2025
python benchmarks/check_progress/check_progress_aime2025.py /path/to/evaluation/logs

# For DeepSearchQA
python benchmarks/check_progress/check_progress_deepsearchqa.py /path/to/evaluation/logs

🔬 Trace Collection

📋 Click to expand trace collection commands

cd apps/collect-trace

# Collect Traces for SFT
bash scripts/collect_trace_claude37.sh
bash scripts/collect_trace_gpt5.sh

# Collect Traces for DPO
bash scripts/collect_trace_qwen3.sh

❓ FAQ & Troubleshooting

Common Issues

🔧 Click to expand troubleshooting guide

Q: Which version should I use?

A: We recommend Apodex-0.7 ⭐ with the minimal configuration:

v1.7 ⭐: Latest version with 256K context, world-leading performance. Use config (with context management):
- apodex_0.7_keep5_max200 (up to 200 turns, recommended for most tasks)
- apodex_0.7_keep5_max300 (up to 300 turns, only used for BrowseComp and BrowseComp-ZH)

Q: How do I get API keys?

A: You need these keys for minimal setup:

SERPER_API_KEY: Get from Serper.dev (Google search API)
JINA_API_KEY: Get from Jina.ai (Web scraping)
E2B_API_KEY: Get from E2B.dev (Code execution sandbox)
SUMMARY_LLM_API_KEY: Your LLM API credentials (for content summarization). Can be a small model like Qwen3-14B or GPT-5-Nano—the choice has minimal impact on performance.
OPENAI_API_KEY: Get from OpenAI (Required for benchmark evaluation, used for LLM-as-a-Judge)
OPENAI_BASE_URL: Optional, defaults to https://api.openai.com/v1. Can be changed to use OpenAI-compatible APIs.

Q: Agent server connection errors

A: Common issues:

Check base URL format: Should end with /v1 (e.g., https://your-api.com/v1)
Verify API key: Ensure API_KEY is set correctly in environment or script
Check server status: Make sure your server is running and accessible
Network issues: Verify firewall/network settings allow connections

Q: Evaluation script fails to run

A: Troubleshooting steps:

Check working directory: Make sure you're in apps/miroflow-agent directory
Verify environment: Run uv sync to ensure dependencies are installed
Check .env file: Ensure all required environment variables are set
Review logs: Check logs/ directory for detailed error messages
Verify data path: Ensure benchmark data is downloaded and in correct location

Q: `uv sync` fails with memory allocation error on WSL

A: WSL2 imposes a memory cap that can cause uv sync to fail when building heavy packages (e.g. transformers, pillow). Two options:

Increase WSL2 memory limit (recommended): Create or edit %UserProfile%\.wslconfig on your Windows host, then restart WSL (wsl --shutdown):
```
[wsl2]
memory=8GB
```
Limit parallel package builds (no restart required): Set the UV_CONCURRENT_BUILDS environment variable before running uv sync:
```
UV_CONCURRENT_BUILDS=1 uv sync
```

Q: Out of memory errors

A: Solutions:

Reduce context length: Set MAX_CONTEXT_LENGTH to a smaller value (e.g., 131072 for 128K)
Use context management with fewer turns:
- For v1.5: Use apodex_0.7_keep5_max200 or apodex_0.7_keep5_max300 (with context management)
Reduce concurrent tasks: Set MAX_CONCURRENT to a smaller number (e.g., 5)
Use smaller agents:
- For v1.5: Try 30B instead of 235B
- For v1.0: Try 8B or 30B instead of 72B

Q: Tool execution errors

A: Common fixes:

E2B errors: Verify E2B_API_KEY is valid and account has credits
Serper errors: Check SERPER_API_KEY and rate limits
Jina errors: Verify JINA_API_KEY and JINA_BASE_URL are correct
LLM summarization errors: Check SUMMARY_LLM_* variables and agent availability

Q: How to monitor long-running evaluations?

A: Use the progress monitoring scripts:

cd apps/miroflow-agent
python benchmarks/check_progress/check_progress_<benchmark_name>.py /path/to/logs

The scripts show completion status, elapsed time, and estimated remaining time.

📄 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

🙏 Acknowledgments

We extend our sincere gratitude to:

🏆 Benchmark Contributors for the comprehensive evaluation datasets
🌍 Open Source Community for the tools and libraries that make this possible
👥 All Contributors who have helped make Apodex better

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
.github/workflows		.github/workflows
apps		apps
assets		assets
libs/miroflow-tools		libs/miroflow-tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
justfile		justfile

Folders and files

Latest commit

History

Repository files navigation

📋 Table of Contents

📰 News & Updates

📝 Introduction

Apodex-0.7

✨ Key Features

🤖 Apodex-Optimized Framework

📊 Comprehensive Benchmark Suite

🚀 Quick Start

Prerequisites

Installation

Tool Configuration

Minimal Configuration for Apodex-0.7.

Pre-configured Agent Settings

Creating Custom Tool Configurations

Serve the Apodex Agent

Option 1 (Recommended): Serve with SGLang or vLLM

Option 2: Quantized Light-Weight Options

Run Your First Task

📊 Benchmark Evaluation

Run Benchmark Evaluation

3. Monitor evaluation progress

🔬 Trace Collection

❓ FAQ & Troubleshooting

Common Issues

Q: Which version should I use?

Q: How do I get API keys?

Q: Agent server connection errors

Q: Evaluation script fails to run

Q: uv sync fails with memory allocation error on WSL

Q: Out of memory errors

Q: Tool execution errors

Q: How to monitor long-running evaluations?

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Q: `uv sync` fails with memory allocation error on WSL

Packages