Epic: Part 7 — AI Evaluations
Integration with the AI team's evals platform via Langfuse instrumentation. Business stakeholders define eval criteria in plain language; the platform runs automated LLM judge evaluations against live Glow CI outputs.
Key capabilities:
- Langfuse tracing on all Glow CI LLM calls (RAG retrieval + Gemini synthesis)
- Evals platform integration for automated LLM judge evaluation
- Stakeholder-defined criteria (hallucination, citation coverage, relevance)
- Continuous automated eval runs with results visible to business teams
Stories
| # |
Story |
Role |
Sprint |
| 7.2 |
Instrument Glow CI with Langfuse + connect to evals platform |
Engineer |
Sprint 5 |
📄 PRD: Part 7 — Glow CI PRD
Epic: Part 7 — AI Evaluations
Integration with the AI team's evals platform via Langfuse instrumentation. Business stakeholders define eval criteria in plain language; the platform runs automated LLM judge evaluations against live Glow CI outputs.
Key capabilities:
Stories
📄 PRD: Part 7 — Glow CI PRD