v1.1.0 — Node Model Evaluation & Routing

Latest

Latest

Martin-DLC released this 28 Jun 16:56

77d3f8a

v1.1.0 - Node Model Evaluation & Routing

Highlights

12 Development Cases
16 Node Benchmark Cases
48-call DeepSeek V4 Pilot
Node Model Routing Matrix
Runtime Primary/Fallback 路由
路由审计元数据
DEV-01 和 DEV-05 Live 复测

Pilot Results

48 completed
43 passed
5 failed
0 request errors
CNY 0.893536 estimated cost

What Shipped

建立了 12 条正式 Development Cases 与 16 条 Node Benchmark Cases
完成了 48 次 DeepSeek V4 节点级正式 Pilot
产出并冻结了 Node Model Routing Matrix
将 Routing Matrix 接入 Architecture C 运行时
为 Runtime 增加了模型路由审计字段
对 DEV-01 与 DEV-05 执行了固定路由 Live 复测

What Changed

新增 Benchmark 合同、数据集、Runner 和 Live Client
新增成本与预算护栏
新增节点路由矩阵
Architecture C 支持显式 --model-routing
未评测节点继续使用默认模型
Fallback 仅处理技术错误

Frozen Pilot Facts

Planned: 48
Completed: 48
Passed: 43
Failed: 5
Request Errors: 0
Unknown Cost Runs: 0
Estimated Cost (CNY): 0.893536

这里的 43/48 是节点级 Pilot 通过数：

不是 Architecture C 端到端准确率
不是生产级成功率
每个节点只有 4 个 Pilot 案例
只比较了 DeepSeek 单一 Provider 的 3 个配置

Routing Matrix Outcome

fact_extraction
- Primary: ds-v4-flash-non-thinking
- Fallback: ds-v4-pro-thinking-high
underlying_pain
- Primary: ds-v4-flash-non-thinking
- Fallback: ds-v4-pro-non-thinking
information_gap
- Primary: ds-v4-flash-non-thinking
- Fallback: ds-v4-pro-non-thinking
solution_recommendation
- Primary: ds-v4-flash-non-thinking
- Fallback: ds-v4-pro-thinking-high

Live Retest Summary

DEV-01 历史单模型：7 次 LLM 调用，最终失败于 information_gap
DEV-01 路由版本：9 次 LLM 调用，最终失败于 solution_recommendation
DEV-05 历史单模型：9 次 LLM 调用，最终失败于 solution_recommendation
DEV-05 路由版本：10 次 LLM 调用，最终失败于 risk

Live Validation

DEV-01 从 information_gap 推进到 solution_recommendation
DEV-05 从 solution_recommendation 推进到 risk
两次均未生成 Final Report
两次均未发生 Fallback

两次路由复测都：

使用了 Routing Matrix 指定的 4 个 Primary
将未评测节点保留为默认模型
没有触发技术 Fallback
没有生成 Live Final Report

What v1.1 Proves

节点模型 Benchmark、Routing Matrix 和 Runtime 路由链路已经打通
模型路由由 Evaluation 结果驱动，而不是由品牌或 Tier 决定
Runtime 审计可以稳定记录路由选择与 fallback 边界
Schema 和业务质量失败不会被误转成技术 Fallback

What v1.1 Does Not Prove

不能证明异构模型切换带来了稳定质量提升
不能证明已经获得生产级成功率
不能证明已经具备稳定的 Live Final Report 产出能力

Known Limits

单 Provider
每节点 4 个案例
没有重复采样
当前 Primary 与 Architecture C 原默认模型都属于 DeepSeek V4 Flash 非思考模式
Architecture C 仍然存在 Token、延迟与节点合同稳定性问题
Routing Matrix 只适用于当前 Prompt、Schema、模型和数据版本
所有结果仍需 Human Review

Next

Enterprise Knowledge Base
RAG Retrieval Evaluation
Citation 与方案 Grounding
后续 Skills 与 MCP

Docs

docs/15_Node_Model_Routing_Matrix_V1.md
docs/16_Architecture_C_Model_Routing_V1.md
docs/17_Architecture_C_Model_Routing_Live_Comparison_V1.md

Assets 2