QuantClaw: Precision Where It Matters for OpenClaw

QuantClaw is a plug-and-play task-type routing quantization plugin for OpenClaw. It classifies each incoming request, maps it to a precision tier (4bit, 8bit, or 16bit), and routes the request to the right model target so you can balance quality, latency, and cost without asking users to choose precision manually.

🔍 About QuantClaw

QuantClaw is built from quantization studies on OpenClaw workloads rather than from fixed intuition. We evaluate quantized and high-precision models across 24 task types, 104 tasks, 6 models, and scales from 9B to 744B.

Results on Claw-Eval (release v0.0.0):

Model	Params (B)	BF16 / FP8	NVFP4
GLM-4.7-Flash	30	0.6370	0.6034
GLM-5	744	0.7130	0.7229
MiniMax-M2.5	229	0.6760	0.6823
Qwen3.5-9B	9	0.4267	0.4107
Qwen3.5-35B-A3B	35	0.6686	0.6549
Qwen3.5-397B-A17B	397	0.7048	0.6937

High-sensitivity tasks such as coding, safety, and complex workflows benefit from higher precision.
Low-sensitivity tasks such as research, multimodal understanding, comprehension, knowledge lookup, office QA, and data analysis can often run well on lower precision.

✨ Key Features

Automatic Adaptation	Intelligent Routing	Full Customizability	Built-in Observability

Rules first, then a judge model for requests.	Map each query to 4bit, 8bit, or 16bit targets.	Tune task types, patterns, targets, pricing, and backends.	Track routing, tokens, cost, sessions, and live config changes.

🚀 Quick Start

Install

# Prerequisite: OpenClaw is already installed.

# Install from Clawhub (recommended)
openclaw plugins install clawhub:@sparkengineai/quantclaw

# If OpenClaw is running from a source checkout and the CLI is not on PATH:
cd /path/to/openclaw
node openclaw.mjs plugins install @sparkengineai/quantclaw

# Or install from source
git clone https://github.com/SparkEngineAI/QuantClaw-plugin.git ./quantclaw
openclaw plugins install ./quantclaw

# If the OpenClaw CLI is not on PATH:
cd /path/to/openclaw
node openclaw.mjs plugins install /path/to/quantclaw

Create or bootstrap the runtime config

QuantClaw reads its runtime config from:

~/.openclaw/quantclaw.json

If the file does not exist, starting OpenClaw with the plugin enabled will generate a default quantclaw.json. If you are working from this repository directly, you can also start from the provided example:

cp config.example.json ~/.openclaw/quantclaw.json

Edit the detector chain and targets

{
  "quant": {
    "enabled": true,
    "detectors": ["ruleDetector", "loadModelDetector"],
    "judge": {
      "endpoint": "http://127.0.0.1:8000",
      "model": "BAAI/bge-m3",
      "providerType": "openai-compatible",
      "apiKey": "",
      "cacheTtlMs": 300000
    }
  }
}

Start OpenClaw and open the dashboard

http://127.0.0.1:18789/plugins/quantclaw/stats

⚙️ Configuration Notes

The runtime schema supports:

ordered detectors: ruleDetector, loadModelDetector
per-task-type id, description, precision, keywords, and patterns
per-tier model targets with independent provider, model, endpoint, api key, and pricing
model-level pricing overrides for cost reporting
hot reload when ~/.openclaw/quantclaw.json changes

Example taskTypes config:

{
  "taskTypes": [
    {
      "id": "coding",
      "precision": "16bit",
      "description": "code review, bug analysis, implementation, debugging, kernels, async behavior, web development",
      "keywords": ["code", "debug", "bug", "Python", "CUDA", "编程", "代码"],
      "patterns": [
        "fix the bug in this repository",
        "(?=.*(?:refactor|重构))(?=.*(?:typescript|ts|node)).*"
      ]
    }
  ],
  "defaultTaskType": "standard"
}

Example targets config:

{
  "targets": {
    "4bit": {
      "provider": "quantclaw-4bit",
      "model": "glm-4.7-flash-int4-autoround",
      "endpoint": "https://api.example.com/v1",
      "apiKey": "${QC_4BIT_API_KEY}",
      "displayName": "4-bit Target",
      "pricing": {
        "inputPer1M": 0.051,
        "outputPer1M": 0.34
      }
    },
    "16bit": {
      "provider": "quantclaw-16bit",
      "model": "glm-4.7-flash",
      "endpoint": "https://api.openai.com/v1",
      "apiKey": "${QC_16BIT_API_KEY}",
      "displayName": "16-bit Target",
      "pricing": {
        "inputPer1M": 0.06,
        "outputPer1M": 0.4
      }
    }
  }
}

Example modelPricing overrides:

{
  "modelPricing": {
    "glm-4.7-flash": {
      "inputPer1M": 0.06,
      "outputPer1M": 0.4
    },
    "glm-4.7-flash-int4-autoround": {
      "inputPer1M": 0.051,
      "outputPer1M": 0.34
    }
  }
}

Target-level pricing is used first for that precision tier. If it is absent, QuantClaw falls back to modelPricing for cost reporting.

🧠 `loadModelDetector` Backends

loadModelDetector supports either a local embedding-based router exposed through an OpenAI-compatible API or a regular OpenAI-compatible LLM judge.

Build a local embedding router index:

python router/embedding_task_router.py --model-name BAAI/bge-m3 --device cuda --config-path ~/.openclaw/quantclaw.json --output-dir ./embedding_router_index-bge-m3 build --print-summary

Serve that router as an OpenAI-compatible endpoint:

python router/embedding_task_router_server.py --model-name BAAI/bge-m3 --device cuda --output-dir ./embedding_router_index-bge-m3 --port 8012

If your machine does not have a GPU, change --device cuda to --device cpu.

If you do not want to run the local embedding router, you can point quant.judge.endpoint at any OpenAI-compatible LLM endpoint instead.

🙏 Acknowledgements

We especially acknowledge:

👥 Core Contributors

Manyi Zhang, Ji-Fu Li*, Zhongao Sun, Xiaohao Liu, Zhenhua Dong, Xianzhi Yu, Haoli Bai (Project Lead), Xiaobo Xia

Follow SparkEngineAI on WeChat. We hope to share cutting-edge progress in AI Infra, light up stars in the AI field, and help everyone learn and draw inspiration.

📖 Citation

If QuantClaw helps your research, engineering work, or benchmark studies, please cite:

@article{zhang2026quantclaw,
  title={QuantClaw: Precision Where It Matters for OpenClaw},
  author={Zhang, Manyi and Li, Ji-Fu and Sun, Zhongao and Liu, Xiaohao and Dong, Zhenghua and Yu, Xianzhi and Bai, Haoli and Xia, Xiaobo},
  journal={arXiv preprint arXiv:2604.22577},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
figs		figs
prompts		prompts
router		router
src		src
test		test
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
config.example.json		config.example.json
index.ts		index.ts
openclaw.plugin.json		openclaw.plugin.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QuantClaw: Precision Where It Matters for OpenClaw

🔍 About QuantClaw

✨ Key Features

🚀 Quick Start

⚙️ Configuration Notes

🧠 `loadModelDetector` Backends

🙏 Acknowledgements

👥 Core Contributors

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QuantClaw: Precision Where It Matters for OpenClaw

🔍 About QuantClaw

✨ Key Features

🚀 Quick Start

⚙️ Configuration Notes

🧠 loadModelDetector Backends

🙏 Acknowledgements

👥 Core Contributors

📖 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🧠 `loadModelDetector` Backends

Packages