Evaluation: Decouple from Langfuse

**Describe the current behavior**

Text evaluation depends on Langfuse for LLM-as-a-judge scoring. Each evaluator must be configured manually in the Langfuse UI since the Langfuse SDK doesn't expose an API for setting up  evaluator, a recurring manual step for every new organization. Langfuse was originally chosen because Kaapi lacked batch evaluation runs and a UI to surface results.

**Describe the enhancement you'd like**
Decouple the evaluation pipeline from Langfuse:
- Run LLM-as-a-judge scoring natively inside Kaapi (judge prompts, model calls, scoring schema managed in our DB/codebase)
- Persist evaluator definitions and results in Kaapi so new evaluators can be created via API/UI without touching Langfuse
- Retain Langfuse only for tracing/observability, not as a hard dependency for evaluation execution

**Why is this enhancement needed?**
- Kaapi now has both batch operations and a results UI, removing the original justification for the Langfuse dependency
- No manual Langfuse setup per evaluator
- Evaluations runnable end-to-end via Kaapi APIs
- Faster iteration on new judge prompts and scoring criteria

**Additional context**
Langfuse remains valuable for tracing and observability — this change scopes it out only from the evaluation execution path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation: Decouple from Langfuse #864

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Evaluation: Decouple from Langfuse #864

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions