Releases · dongkoony/DevOps-Incident-Triage-Model

01 Jun 04:50

dongkoony

release-2026.06-rag-preview

513c710

release-2026.06-rag-preview Pre-release

Pre-release

Channel

Preview

Summary

This release introduces the first RAG preview layer for the DevOps Incident Triage Model.

The project remains classifier-first. The existing Transformer-based incident classifier is kept intact, while this release adds a preview retrieval layer that can retrieve evidence from domain-specific runbook documents.

LLM-assisted remediation generation is not included yet. That will be introduced in a later incident-assist beta release.

Included

POST /retrieve FastAPI endpoint
Domain-aware runbook retrieval flow
Local TF-IDF sparse retrieval index for preview use
Evidence-based retrieval response schema
Section-level citations from runbook documents
Retrieval metadata including index type, embedding model placeholder, and latency
Prometheus-style retrieval metrics:
- ditri_retrieval_requests_total
- ditri_retrieval_latency_seconds
RAG roadmap documentation
RAG evaluation plan
Runbook placeholder document structure
Release evidence document for release-2026.06-rag-preview

Validation

GitHub Actions on main: Success

Local validation:

uv run --extra dev --extra api ruff check .
uv run --extra dev --extra api pytest -q

Result:

ruff check: passed
pytest: 47 passed, 10 skipped

FastAPI smoke validation:

GET /health     -> 200
POST /retrieve  -> 200
GET /metrics    -> 200

Known Limitations

This is a preview release, not a production RAG backend.
Retrieval currently uses a local TF-IDF sparse index, not a production Vector DB.
/assist and LLM-generated remediation guidance are not implemented yet.
Runbooks are portfolio-grade placeholders, not real production incident records.
The current dataset remains synthetic.
Retrieval quality still needs formal evaluation with retrieval hit rate, groundedness, citation coverage, and hallucination checks.

Next Release Direction

The next planned release is:

release-2026.07-incident-assist-beta

Planned focus:

Classifier + RAG integration
POST /assist API design and implementation
LLM-generated remediation guidance
Root cause candidate generation
Evidence citations from retrieved runbooks

Assets 2

19 May 05:43

dongkoony

release-2026.05-classifier-core

faf853b

release-2026.05-classifier-core Latest

Latest

Channel

Stable

Summary

This release establishes release-2026.05-classifier-core as the stable classifier-core baseline for the DevOps Incident Triage Model.

The current release keeps the project classifier-first. RAG and LLM-assisted incident response are documented as future roadmap work, not implemented in this release.

Included

Transformer-based DevOps incident classification baseline
FastAPI inference service
Batch prediction and async batch job support
Evaluation and demo showcase workflow
CI smoke checks for lint/test, showcase, and API health
Product-style Release Train documentation
Future RAG roadmap and runbook placeholder structure
Release evidence document for classifier-core

Validation

GitHub Actions on main: Success
lint-and-test: passed
showcase-smoke: passed
api-smoke: passed

Local validation:

uv run --extra dev ruff check .
uv run --extra dev --extra api pytest -q

Result:

ruff: All checks passed
pytest: 40 passed, 10 skipped

Data And Model Notes

The current starter dataset is synthetic.
This release should be treated as a reproducible portfolio-grade classifier baseline.
It does not claim real production incident generalization.
Model schema and training parameters are not changed by this release.

Known Limitations

Docker build validation was not completed locally because the Docker daemon was unavailable.
RAG retrieval, /retrieve, /assist, Vector DB integration, and LLM-generated remediation are planned future work.
Hugging Face publishing is intentionally deferred until the model card and publish artifact are release-train aligned.

Next Steps

Update docs/model_card.md from legacy version metadata to release-2026.05-classifier-core.
Confirm the intended Hugging Face model artifact.
Publish to Hugging Face after the publish gate is met.

Assets 2

08 Apr 05:58

dongkoony

v0.3.0

ec01936

v0.3.0

What's Changed

chore: back-merge main into develop (v0.2.0) by @dongkoony in #8
feat(api): add /predict/batch endpoint with review summary by @dongkoony in #9
feat(api): add request tracing and prometheus metrics by @dongkoony in #10
feat(mlops): add model benchmark matrix automation by @dongkoony in #13

Full Changelog: v0.2.0...v0.3.0

Contributors

dongkoony

Assets 2

20 Mar 08:04

dongkoony

v0.2.0

c16a60e

v0.2.0

DevOps Incident Triage Model v0.2.0

Included

Raw incident ingestion scaffold (ditri-ingest-raw)
Confidence-threshold based human-review gating
Evaluation threshold sweep output (reports/threshold_metrics.json)
Package/API version bump to 0.2.0

Validation

UV_CACHE_DIR=/tmp/uv-cache uv run ruff check .
UV_CACHE_DIR=/tmp/uv-cache uv run pytest -q

Assets 2

20 Mar 07:27

dongkoony

v0.1.0

2c848b8

v0.1.0

첫 공개 릴리즈

Hugging Face 기반 DevOps incident triage 프로젝트 스캐폴드
학습/평가/추론/API/FastAPI/Docker/CI 파이프라인
브랜치 전략, PR 템플릿, 릴리즈/증거 문서 포함

참고: 현재 샘플 데이터는 synthetic starter 데이터입니다.

Assets 2

Releases: dongkoony/DevOps-Incident-Triage-Model

release-2026.06-rag-preview

Channel

Summary

Included

Validation

Known Limitations

Next Release Direction

Uh oh!

release-2026.05-classifier-core

Channel

Summary

Included

Validation

Data And Model Notes

Known Limitations

Next Steps

Uh oh!

v0.3.0

What's Changed

Contributors

Uh oh!

v0.2.0

DevOps Incident Triage Model v0.2.0

Included

Validation

Uh oh!

v0.1.0

v0.1.0

Uh oh!