Skip to content

[feat][evaluation] experiment retry mode#434

Merged
lsy357 merged 34 commits into
mainfrom
feat/evalexpt
Feb 27, 2026
Merged

[feat][evaluation] experiment retry mode#434
lsy357 merged 34 commits into
mainfrom
feat/evalexpt

Conversation

@lsy357
Copy link
Copy Markdown
Collaborator

@lsy357 lsy357 commented Feb 26, 2026

What type of PR is this?

feat

Check the PR title

  • This PR title match the format: [<type>][<scope>] <description>. For example: [fix][backend] flaky fix
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Add documentation if the current PR requires user awareness at the usage level.
  • This PR is written in English. PRs not in English will not be reviewed.

(Optional) Translate the PR title into Chinese

(Optional) More detailed description for this PR(en: English/zh: Chinese)

en:
zh(optional):

(Optional) Which issue(s) this PR fixes

@lsy357 lsy357 changed the title [feat][evlauation] experiment retry mode [feat][evaluation] experiment retry mode Feb 26, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 74.22840% with 167 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...luation/domain/service/expt_run_item_event_impl.go 5.12% 35 Missing and 2 partials ⚠️
...ion/domain/service/expt_run_scheduler_mode_impl.go 93.59% 14 Missing and 7 partials ⚠️
backend/infra/lock/lock.go 64.15% 19 Missing ⚠️
...ation/domain/service/expt_manage_execution_impl.go 81.52% 8 Missing and 9 partials ⚠️
...aluation/infra/repo/experiment/expt_item_result.go 0.00% 16 Missing ⚠️
...d/modules/evaluation/application/experiment_app.go 53.12% 11 Missing and 4 partials ⚠️
...valuation/application/convertor/experiment/expt.go 31.57% 9 Missing and 4 partials ⚠️
backend/modules/evaluation/domain/entity/expt.go 60.00% 4 Missing and 4 partials ⚠️
...es/evaluation/domain/service/expt_run_item_impl.go 42.85% 7 Missing and 1 partial ⚠️
...s/evaluation/infra/repo/experiment/expt_run_log.go 55.55% 2 Missing and 2 partials ⚠️
... and 4 more

❌ Your patch status has failed because the patch coverage (74.22%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #434      +/-   ##
==========================================
+ Coverage   72.88%   72.89%   +0.01%     
==========================================
  Files         624      624              
  Lines       62323    62897     +574     
==========================================
+ Hits        45421    45846     +425     
- Misses      13723    13846     +123     
- Partials     3179     3205      +26     
Flag Coverage Δ
unittests 72.89% <74.22%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
backend/modules/evaluation/domain/entity/event.go 63.63% <100.00%> (+3.63%) ⬆️
...ckend/modules/evaluation/domain/entity/expt_run.go 64.64% <ø> (ø)
.../modules/evaluation/domain/entity/expt_template.go 98.01% <ø> (ø)
backend/modules/evaluation/domain/entity/param.go 64.28% <ø> (ø)
...ules/evaluation/domain/service/expt_manage_impl.go 72.15% <100.00%> (+0.11%) ⬆️
...aluation/domain/service/expt_run_item_turn_impl.go 80.06% <50.00%> (ø)
...on/domain/service/expt_run_scheduler_event_impl.go 75.21% <75.00%> (+0.31%) ⬆️
.../application/convertor/experiment/expt_template.go 91.60% <62.50%> (-0.41%) ⬇️
...nd/modules/evaluation/domain/entity/expt_result.go 48.81% <25.00%> (-0.78%) ⬇️
...s/evaluation/infra/repo/experiment/expt_run_log.go 76.66% <55.55%> (-12.99%) ⬇️
... and 9 more

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34b145e...3b56e59. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tpfz
tpfz previously approved these changes Feb 27, 2026
tpfz
tpfz previously approved these changes Feb 27, 2026
@lsy357 lsy357 merged commit caae357 into main Feb 27, 2026
11 checks passed
@lsy357 lsy357 deleted the feat/evalexpt branch February 27, 2026 07:53
Copy link
Copy Markdown
Collaborator

@CozeLoop CozeLoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this PR introduces experiment-level retry modes (retry all / retry selected items) and the necessary plumbing across IDL, application, domain, scheduler, and infra layers. I focused on correctness, concurrency, error handling, and IDL integration.

Key findings:

  • Found two IDL annotation typos (api.boy instead of api.body) that will prevent item_retry_num from being bound at the HTTP/OpenAPI layer.
  • Identified a small error-handling issue where a nil error is wrapped, which can be misleading when debugging.
  • Suggested tightening invariants around run-log item IDs to avoid accidental duplicates in a single retry request.

The rest of the changes (locking primitives, scheduler modes, retry caps, repo/DAO extensions, and error codes) look conceptually sound and consistent with the existing evaluation pipeline. Please see inline comments for details and concrete fixes.

41: optional bool enable_weighted_score (api.body = 'enable_weighted_score', go.tag='json:"enable_weighted_score"')
42: optional map<i64, double> evaluator_score_weights (api.body = 'evaluator_score_weights', go.tag='json:"evaluator_score_weights"')
43: optional i64 expt_template_id (api.body='expt_template_id',api.js_conv='true', go.tag='json:"expt_template_id"')
45: optional i32 item_retry_num (api.boy = 'item_retry_num')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Must Fix] Thrift annotation typo prevents HTTP binding for item_retry_num.

Problem: This field uses api.boy instead of api.body. Hertz/AGW generators only recognize api.body, so item_retry_num from the request body will not be bound into CreateExperimentRequest/SubmitExperimentRequest. The feature will appear to be ignored at the API layer.

Recommendation: Change api.boy = 'item_retry_num' to api.body = 'item_retry_num' for all item_retry_num fields in this Thrift file.

Reference: See Hertz Thrift annotations in your own IDL guidelines and CloudWeGo docs for api.body usage (similar to other fields in this struct).

20: optional i32 item_concur_num (api.body = 'item_concur_num')
22: optional common.RuntimeParam target_runtime_param (api.body = 'target_runtime_param')

45: optional i32 item_retry_num (api.boy = 'item_retry_num')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Must Fix] OpenAPI Thrift annotation typo for item_retry_num.

Problem: This field uses api.boy instead of api.body. As a result, the OpenAPI layer will not bind item_retry_num from the HTTP request body into SubmitExperimentOApiRequest, so clients cannot configure retry times via OpenAPI.

Recommendation: Replace api.boy = 'item_retry_num' with api.body = 'item_retry_num' here to align with other fields and the Hertz binding expectations.

Reference: CloudWeGo/Hertz Thrift annotations for HTTP binding (same pattern you use for item_concur_num and other body fields).

Comment on lines +1010 to +1016
rl, err = e.runLogRepo.Get(ctx, exptID, runID)
if err != nil {
return 0, false, err
}

if rl == nil {
return 0, false, errorx.Wrapf(err, "target runlog %v not found, expt_id: %v", runID, exptID)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Should Fix] Wrapping a nil error here loses context and is misleading.

Problem: In this branch, err is guaranteed to be nil (you already returned on non-nil errors from runLogRepo.Get above). Calling errorx.Wrapf(err, ...) with a nil cause produces an error whose underlying cause is effectively empty and can confuse error inspection/log analysis.

Recommendation: Use a constructor that does not expect a non-nil cause, for example:

if rl == nil {
    return 0, false, errorx.New("target runlog %v not found, expt_id: %v", runID, exptID)
}

or an equivalent errorx.NewByCode + extra message if you want to reuse a typed code.

Reference: Go Code Review Comments – avoid wrapping nil errors; keep error chains meaningful for errors.Is/As and log inspection.

Comment on lines +84 to +102
func (e *ExptRunLog) AppendItemIDs(itemIDs []int64) error {
if e == nil {
return errorx.New("ExptRunLog AppendItemIDs must init first")
}
exists := make(map[int64]bool)
for _, chunk := range e.ItemIds {
for _, itemID := range chunk.ItemIDs {
exists[itemID] = true
}
}
rlItems := ExptRunLogItems{CreateAt: gptr.Of(time.Now().Unix())}
for _, itemID := range itemIDs {
if exists[itemID] {
return errorx.NewByCode(errno.EvalItemAlreadyRetryingCode, errorx.WithExtraMsg(fmt.Sprintf("existed item_id: %v", itemID)))
} else {
rlItems.ItemIDs = append(rlItems.ItemIDs, itemID)
}
}
e.ItemIds = append(e.ItemIds, rlItems)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Guard against duplicate item IDs within the same retry request.

Problem: exists is built only from the already-persisted ItemIds chunks. If itemIDs passed into AppendItemIDs contains duplicates in the same call (e.g. [1, 1, 2] when ItemIds is currently empty), both entries will be appended because the exists map is never updated for the new IDs. This can lead to redundant scheduling work and makes the run log harder to reason about.

Recommendation: Track newly added IDs in the same exists map so the method enforces uniqueness across both historical and in-request items, for example:

for _, itemID := range itemIDs {
    if exists[itemID] {
        return errorx.NewByCode(errno.EvalItemAlreadyRetryingCode,
            errorx.WithExtraMsg(fmt.Sprintf("existed item_id: %v", itemID)))
    }
    exists[itemID] = true
    rlItems.ItemIDs = append(rlItems.ItemIDs, itemID)
}

This keeps the invariant "each itemID appears at most once per run log" regardless of client input.

Reference: Go Code Review Comments – keep invariants explicit and avoid silent duplication of identifiers.

Comment thread backend/go.sum
github.com/google/renameio v0.1.0/go.mod h1:KWCgfxg9yswjAJkECMjeO8J8rahYeXnNhOm40UhjYkI=
github.com/google/s2a-go v0.1.8 h1:zZDs9gcbt9ZPLV0ndSyQk6Kacx2g/X+SKYovpnz3SMM=
github.com/google/s2a-go v0.1.8/go.mod h1:6iNWHTpQ+nfNRN5E00MSdfDwVesa8hhS32PhPO8deJA=
github.com/google/subcommands v1.2.0 h1:vWQspBTo2nEqTUFita5/KeEWlUL8kQObDFbub/EN9oE=
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[No Risk] No issues found with this dependency change.

The added checksum for github.com/google/subcommands v1.2.0 is consistent with the existing go.mod entry and does not introduce any obvious risks for the backend. Version choice and scope of usage look reasonable for CLI-style helpers.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[No Risk] DAO extensions for batch queries and raw run-log scanning look fine.

MGetItemResults/MGetItemTurnResults and the RawFilter branch in ScanItemRunLogs use appropriate query builders, respect the existing index (uk_expt_run_item_turn), and keep pagination semantics intact. Error wrapping is consistent with the rest of the DAO layer.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[No Risk] New error codes for retry flows are well scoped.

EvalItemAlreadyRetrying and ExperimentIsCompleting are clearly named, have stable, non-overlapping codes, and their messages match how they are used in the domain layer (e.g., guarding duplicate retry items and runs in completing state). They are marked as not affecting stability, which aligns with their semantics.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[No Risk] Domain IDL changes for item_retry_num are consistent with the model.

Introducing item_retry_num on Experiment and ExptFieldMapping fits naturally into the existing evaluation configuration and is wired through the convertors and templates. Field IDs avoid collisions and preserve backward compatibility.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[No Risk] Extending ExptRunMode with explicit retry modes improves clarity.

Adding EvaluationModeUnknown, EvaluationModeRetryAll, and EvaluationModeRetryItems makes the run-mode enum more expressive and is consumed consistently across the scheduler and application layers. No issues spotted with numeric values or compatibility.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[No Risk] Deleting this temporary file_name artifact is safe.

This looks like a stray CSV header/sample file that does not belong in the domain service package. Removing it reduces noise in the repo and has no impact on runtime behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants