refactor(provider): 统一生成重试为首包超时与流空闲超时#462
Conversation
Generated with [codeagent](https://github.com/qbox/codeagent) Co-authored-by: phantom5099 <245659304+phantom5099@users.noreply.github.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
|
||
| // validateOptionalNonNegativeGenerateControl 校验可选的整型生成控制字段,拒绝会被运行时静默吞掉的负数输入。 | ||
| func validateOptionalNonNegativeGenerateControl(field string, value int) error { | ||
| if value < 0 { |
There was a problem hiding this comment.
generate_max_retries currently only rejects negative values, so very large positive values are accepted and later used directly by the retry loop (for attempt := 0; attempt <= maxRetries; attempt++). This can create extremely long-running runs (effectively a local DoS via misconfiguration). Consider enforcing a sane upper bound during config validation (and documenting that bound).
|
|
||
| 当前只接受明确受支持的字段;未知字段会直接报错,不做“旧格式自动迁移”。 | ||
| `provider.yaml` 只支持平铺字段:`name/driver/base_url/api_key_env/model_source/chat_endpoint_path/discovery_endpoint_path/models`。 | ||
| `provider.yaml` 只支持平铺字段:`name/driver/base_url/api_key_env/model_source/chat_endpoint_path/discovery_endpoint_path/generate_max_retries/generate_start_timeout_sec/generate_idle_timeout_sec/models`。 |
There was a problem hiding this comment.
The supported provider.yaml field list is now edited but still omits chat_api_mode, while code accepts it (customProviderFile.ChatAPIMode). This line should include chat_api_mode to keep docs aligned with actual accepted fields.
|
/code 修复编译报错ok neo-code/internal/provider/anthropic 0.008s coverage: 65.1% of statements neo-code/internal/provider/gemini [neo-code/internal/provider/gemini.test]Error: internal/provider/gemini/provider_test.go:604:6: undefined: isRetryableGenerateError |
处理结果已按触发评论完成修复并推送到当前 PR 分支
未完成项:无。 |
Generated with [codeagent](https://github.com/qbox/codeagent) Co-authored-by: phantom5099 <245659304+phantom5099@users.noreply.github.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
@xgopilot 批判性review整个PR,分析所有潜在风险点,找出所有漏洞,确保每个地方的修改不是退步的,检查方案上的设计是否合理,且不能碎片化,更不能妥协式兼容或兜底旧代码,不许输出正向表扬 |
审查结论(仅问题)
范围说明
|
@xgopilot 只修复第二个第三个,其中第二个改成<0即可 |
完成情况已按你的要求仅修复第 2、3 项,并已推送到当前 PR 分支( 本次修改
验证已通过:
未处理项按你的要求,未处理第 1 项和第 4 项。 |
…ftovers Generated with [codeagent](https://github.com/qbox/codeagent) Co-authored-by: phantom5099 <245659304+phantom5099@users.noreply.github.com>
结论有出入,且存在实现不一致。 不一致点
范围偏移(非主语义,但与 #461 任务边界有出入)
补充
|
|
@xgopilot 提高测试覆盖率 |
处理结果已按“提高测试覆盖率”完成补测并推送到当前 PR 分支。
覆盖提升点新增测试覆盖了此前薄弱分支:
验证已通过:
推送信息
未做内容:本次未修改业务逻辑,仅补测试覆盖。 |
…verage Generated with [codeagent](https://github.com/qbox/codeagent) Co-authored-by: phantom5099 <245659304+phantom5099@users.noreply.github.com>
test(provider): improve coverage for generate attempt helpers
|
结论先说
1) 新增配置项逐项解析(含“为了做什么”)
关键代码证据:
补充语义细节:
注意一点实现与文档差异:
2) 是否真的解决了两个问题?A. 首包前失败恢复不稳定判断:是,核心问题已解决(且是结构性收敛,不是补丁式)。 依据:
B. 长流容易被误杀判断:是,主要误杀路径已被切开并治理。 依据:
残留边界(不是你这两问的主结论,但建议关注)
本次执行范围
|
背景
当前生成链路的重试与超时语义在不同 provider 之间并不一致:
这会导致首包前失败恢复不稳定、长流容易被误杀,以及不同 provider 在同类错误上的表现漂移。
本次改动
统一 provider 级生成尝试语义
text delta、tool call start、tool call delta。message_done收敛为完成态,避免完成后仍落入 start timeout / idle timeout 分支。新增 provider 级生成控制配置
generate_max_retriesgenerate_start_timeout_secgenerate_idle_timeout_sec收敛 OpenAI-compatible 内部边界
配置校验与 custom provider 行为收紧
其他修正
测试与验证
已确认通过:
go test ./internal/provider/... ./internal/config/... ./internal/gateway/launcher/...已知问题
风险与后续