Skip to content

feat: add job event timeline diagnostics#61

Merged
Calvin1989 merged 1 commit into
mainfrom
feature/v1.8.0-alpha.1-job-event-timeline
Jun 1, 2026
Merged

feat: add job event timeline diagnostics#61
Calvin1989 merged 1 commit into
mainfrom
feature/v1.8.0-alpha.1-job-event-timeline

Conversation

@Calvin1989
Copy link
Copy Markdown
Owner

摘要

本 PR 实现 v1.8.0-alpha.1:Job event timeline and runtime diagnostics。

在 v1.7.0 已完成 runtime / report / artifact regression hardening 的基础上,本 PR 为 job 增加轻量事件时间线,让用户可以看到任务从创建、启动、轮次推进、artifact 写入到完成 / 失败 / 取消的生命周期,并在失败时展示简短 failure reason / traceback summary。

变更内容

  • 为 job 增加 events 字段。
  • 新增生命周期事件:created、started、round_progress、artifact_written、finished、failed、cancelled。
  • GET /jobs 和 GET /status/{job_id} 返回 events。
  • events 随 job 持久化,backend restart 后仍可恢复。
  • 前端 job detail panel 新增事件时间线区域。
  • failed job 展示 failure reason / traceback summary。
  • 保持中文 / English 双语。
  • 更新 regression tests 和 api_smoke_test。
  • 更新 CHANGELOG.md 和 docs/roadmap.md。

验证

已在本地完成:

  • python -m ruff check . (passed)
  • python quick_test.py (passed)
  • python -m pytest (21 passed)
  • cd web; npm run build (passed)
  • python api_smoke_test.py (passed)
  • python api_smoke_test.py --wait-finished (24 events, passed)
  • docker compose restart + --check-recovery (24 events persisted, passed)

说明

  • 不新增依赖。
  • 不修改训练核心算法。
  • 不改变现有 report/artifact URL。
  • 不破坏已有 API 字段。
  • 不创建 tag。
  • 合并后再从 main merge commit 打 v1.8.0-alpha.1 tag。

@Calvin1989 Calvin1989 merged commit 059e2ff into main Jun 1, 2026
1 check passed
@Calvin1989 Calvin1989 deleted the feature/v1.8.0-alpha.1-job-event-timeline branch June 1, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant