Skip to content

SparkEngineAI/QuantClaw-plugin

Repository files navigation

QuantClaw logo

QuantClaw: Precision Where It Matters for OpenClaw

中文文档

OpenClaw Plugin Blog Paper arXiv Routing tiers MIT License

QuantClaw overview

QuantClaw is a plug-and-play task-type routing quantization plugin for OpenClaw. It classifies each incoming request, maps it to a precision tier (4bit, 8bit, or 16bit), and routes the request to the right model target so you can balance quality, latency, and cost without asking users to choose precision manually.

🔍 About QuantClaw

QuantClaw is built from quantization studies on OpenClaw workloads rather than from fixed intuition. We evaluate quantized and high-precision models across 24 task types, 104 tasks, 6 models, and scales from 9B to 744B.

Results on Claw-Eval (release v0.0.0):

Model Params (B) BF16 / FP8 NVFP4
GLM-4.7-Flash 30 0.6370 0.6034
GLM-5 744 0.7130 0.7229
MiniMax-M2.5 229 0.6760 0.6823
Qwen3.5-9B 9 0.4267 0.4107
Qwen3.5-35B-A3B 35 0.6686 0.6549
Qwen3.5-397B-A17B 397 0.7048 0.6937
  • High-sensitivity tasks such as coding, safety, and complex workflows benefit from higher precision.
  • Low-sensitivity tasks such as research, multimodal understanding, comprehension, knowledge lookup, office QA, and data analysis can often run well on lower precision.

sensitivity chart

✨ Key Features

Automatic Adaptation

Intelligent Routing

Full Customizability

Built-in Observability

Rules first, then a judge model for requests. Map each query to 4bit, 8bit, or 16bit targets. Tune task types, patterns, targets, pricing, and backends. Track routing, tokens, cost, sessions, and live config changes.

🚀 Quick Start

Install

# Prerequisite: OpenClaw is already installed.

# Install from Clawhub (recommended)
openclaw plugins install clawhub:@sparkengineai/quantclaw

# If OpenClaw is running from a source checkout and the CLI is not on PATH:
cd /path/to/openclaw
node openclaw.mjs plugins install @sparkengineai/quantclaw

# Or install from source
git clone https://github.com/SparkEngineAI/QuantClaw-plugin.git ./quantclaw
openclaw plugins install ./quantclaw

# If the OpenClaw CLI is not on PATH:
cd /path/to/openclaw
node openclaw.mjs plugins install /path/to/quantclaw

Create or bootstrap the runtime config

QuantClaw reads its runtime config from:

~/.openclaw/quantclaw.json

If the file does not exist, starting OpenClaw with the plugin enabled will generate a default quantclaw.json. If you are working from this repository directly, you can also start from the provided example:

cp config.example.json ~/.openclaw/quantclaw.json

Edit the detector chain and targets

{
  "quant": {
    "enabled": true,
    "detectors": ["ruleDetector", "loadModelDetector"],
    "judge": {
      "endpoint": "http://127.0.0.1:8000",
      "model": "BAAI/bge-m3",
      "providerType": "openai-compatible",
      "apiKey": "",
      "cacheTtlMs": 300000
    }
  }
}

Start OpenClaw and open the dashboard

http://127.0.0.1:18789/plugins/quantclaw/stats

⚙️ Configuration Notes

The runtime schema supports:

  • ordered detectors: ruleDetector, loadModelDetector
  • per-task-type id, description, precision, keywords, and patterns
  • per-tier model targets with independent provider, model, endpoint, api key, and pricing
  • model-level pricing overrides for cost reporting
  • hot reload when ~/.openclaw/quantclaw.json changes

Example taskTypes config:

{
  "taskTypes": [
    {
      "id": "coding",
      "precision": "16bit",
      "description": "code review, bug analysis, implementation, debugging, kernels, async behavior, web development",
      "keywords": ["code", "debug", "bug", "Python", "CUDA", "编程", "代码"],
      "patterns": [
        "fix the bug in this repository",
        "(?=.*(?:refactor|重构))(?=.*(?:typescript|ts|node)).*"
      ]
    }
  ],
  "defaultTaskType": "standard"
}

Example targets config:

{
  "targets": {
    "4bit": {
      "provider": "quantclaw-4bit",
      "model": "glm-4.7-flash-int4-autoround",
      "endpoint": "https://api.example.com/v1",
      "apiKey": "${QC_4BIT_API_KEY}",
      "displayName": "4-bit Target",
      "pricing": {
        "inputPer1M": 0.051,
        "outputPer1M": 0.34
      }
    },
    "16bit": {
      "provider": "quantclaw-16bit",
      "model": "glm-4.7-flash",
      "endpoint": "https://api.openai.com/v1",
      "apiKey": "${QC_16BIT_API_KEY}",
      "displayName": "16-bit Target",
      "pricing": {
        "inputPer1M": 0.06,
        "outputPer1M": 0.4
      }
    }
  }
}

Example modelPricing overrides:

{
  "modelPricing": {
    "glm-4.7-flash": {
      "inputPer1M": 0.06,
      "outputPer1M": 0.4
    },
    "glm-4.7-flash-int4-autoround": {
      "inputPer1M": 0.051,
      "outputPer1M": 0.34
    }
  }
}

Target-level pricing is used first for that precision tier. If it is absent, QuantClaw falls back to modelPricing for cost reporting.

🧠 loadModelDetector Backends

loadModelDetector supports either a local embedding-based router exposed through an OpenAI-compatible API or a regular OpenAI-compatible LLM judge.

Build a local embedding router index:

python router/embedding_task_router.py --model-name BAAI/bge-m3 --device cuda --config-path ~/.openclaw/quantclaw.json --output-dir ./embedding_router_index-bge-m3 build --print-summary

Serve that router as an OpenAI-compatible endpoint:

python router/embedding_task_router_server.py --model-name BAAI/bge-m3 --device cuda --output-dir ./embedding_router_index-bge-m3 --port 8012

If your machine does not have a GPU, change --device cuda to --device cpu.

If you do not want to run the local embedding router, you can point quant.judge.endpoint at any OpenAI-compatible LLM endpoint instead.

🙏 Acknowledgements

We especially acknowledge:

👥 Core Contributors

Manyi Zhang, Ji-Fu Li*, Zhongao Sun, Xiaohao Liu, Zhenhua Dong, Xianzhi Yu, Haoli Bai (Project Lead), Xiaobo Xia

Follow SparkEngineAI on WeChat. We hope to share cutting-edge progress in AI Infra, light up stars in the AI field, and help everyone learn and draw inspiration.

SparkEngineAI official account

📖 Citation

If QuantClaw helps your research, engineering work, or benchmark studies, please cite:

@article{zhang2026quantclaw,
  title={QuantClaw: Precision Where It Matters for OpenClaw},
  author={Zhang, Manyi and Li, Ji-Fu and Sun, Zhongao and Liu, Xiaohao and Dong, Zhenghua and Yu, Xianzhi and Bai, Haoli and Xia, Xiaobo},
  journal={arXiv preprint arXiv:2604.22577},
  year={2026}
}

About

QuantClaw is a plug-and-play task-type routing quantization plugin for OpenClaw.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors