Skip to content

Releases: dongkoony/DevOps-Incident-Triage-Model

release-2026.06-rag-preview

01 Jun 04:50
513c710

Choose a tag to compare

Pre-release

Channel

Preview

Summary

This release introduces the first RAG preview layer for the DevOps Incident Triage Model.

The project remains classifier-first. The existing Transformer-based incident classifier is kept intact, while this release adds a preview retrieval layer that can retrieve evidence from domain-specific runbook documents.

LLM-assisted remediation generation is not included yet. That will be introduced in a later incident-assist beta release.

Included

  • POST /retrieve FastAPI endpoint
  • Domain-aware runbook retrieval flow
  • Local TF-IDF sparse retrieval index for preview use
  • Evidence-based retrieval response schema
  • Section-level citations from runbook documents
  • Retrieval metadata including index type, embedding model placeholder, and latency
  • Prometheus-style retrieval metrics:
    • ditri_retrieval_requests_total
    • ditri_retrieval_latency_seconds
  • RAG roadmap documentation
  • RAG evaluation plan
  • Runbook placeholder document structure
  • Release evidence document for release-2026.06-rag-preview

Validation

GitHub Actions on main: Success

Local validation:

uv run --extra dev --extra api ruff check .
uv run --extra dev --extra api pytest -q

Result:

ruff check: passed
pytest: 47 passed, 10 skipped

FastAPI smoke validation:

GET /health     -> 200
POST /retrieve  -> 200
GET /metrics    -> 200

Known Limitations

  • This is a preview release, not a production RAG backend.
  • Retrieval currently uses a local TF-IDF sparse index, not a production Vector DB.
  • /assist and LLM-generated remediation guidance are not implemented yet.
  • Runbooks are portfolio-grade placeholders, not real production incident records.
  • The current dataset remains synthetic.
  • Retrieval quality still needs formal evaluation with retrieval hit rate, groundedness, citation coverage, and hallucination checks.

Next Release Direction

The next planned release is:

release-2026.07-incident-assist-beta

Planned focus:

  • Classifier + RAG integration
  • POST /assist API design and implementation
  • LLM-generated remediation guidance
  • Root cause candidate generation
  • Evidence citations from retrieved runbooks

release-2026.05-classifier-core

19 May 05:43
faf853b

Choose a tag to compare

Channel

Stable

Summary

This release establishes release-2026.05-classifier-core as the stable classifier-core baseline for the DevOps Incident Triage Model.

The current release keeps the project classifier-first. RAG and LLM-assisted incident response are documented as future roadmap work, not implemented in this release.

Included

  • Transformer-based DevOps incident classification baseline
  • FastAPI inference service
  • Batch prediction and async batch job support
  • Evaluation and demo showcase workflow
  • CI smoke checks for lint/test, showcase, and API health
  • Product-style Release Train documentation
  • Future RAG roadmap and runbook placeholder structure
  • Release evidence document for classifier-core

Validation

  • GitHub Actions on main: Success
  • lint-and-test: passed
  • showcase-smoke: passed
  • api-smoke: passed

Local validation:

uv run --extra dev ruff check .
uv run --extra dev --extra api pytest -q

Result:

ruff: All checks passed
pytest: 40 passed, 10 skipped

Data And Model Notes

  • The current starter dataset is synthetic.
  • This release should be treated as a reproducible portfolio-grade classifier baseline.
  • It does not claim real production incident generalization.
  • Model schema and training parameters are not changed by this release.

Known Limitations

  • Docker build validation was not completed locally because the Docker daemon was unavailable.
  • RAG retrieval, /retrieve, /assist, Vector DB integration, and LLM-generated remediation are planned future work.
  • Hugging Face publishing is intentionally deferred until the model card and publish artifact are release-train aligned.

Next Steps

  • Update docs/model_card.md from legacy version metadata to release-2026.05-classifier-core.
  • Confirm the intended Hugging Face model artifact.
  • Publish to Hugging Face after the publish gate is met.

v0.3.0

08 Apr 05:58

Choose a tag to compare

What's Changed

  • chore: back-merge main into develop (v0.2.0) by @dongkoony in #8
  • feat(api): add /predict/batch endpoint with review summary by @dongkoony in #9
  • feat(api): add request tracing and prometheus metrics by @dongkoony in #10
  • feat(mlops): add model benchmark matrix automation by @dongkoony in #13

Full Changelog: v0.2.0...v0.3.0

v0.2.0

20 Mar 08:04
c16a60e

Choose a tag to compare

DevOps Incident Triage Model v0.2.0

Included

  • Raw incident ingestion scaffold (ditri-ingest-raw)
  • Confidence-threshold based human-review gating
  • Evaluation threshold sweep output (reports/threshold_metrics.json)
  • Package/API version bump to 0.2.0

Validation

  • UV_CACHE_DIR=/tmp/uv-cache uv run ruff check .
  • UV_CACHE_DIR=/tmp/uv-cache uv run pytest -q

v0.1.0

20 Mar 07:27
2c848b8

Choose a tag to compare

v0.1.0

첫 공개 릴리즈

  • Hugging Face 기반 DevOps incident triage 프로젝트 스캐폴드
  • 학습/평가/추론/API/FastAPI/Docker/CI 파이프라인
  • 브랜치 전략, PR 템플릿, 릴리즈/증거 문서 포함

참고: 현재 샘플 데이터는 synthetic starter 데이터입니다.