Skip to content

Evaluation: Decouple from Langfuse #864

@AkhileshNegi

Description

@AkhileshNegi

Describe the current behavior

Text evaluation depends on Langfuse for LLM-as-a-judge scoring. Each evaluator must be configured manually in the Langfuse UI since the Langfuse SDK doesn't expose an API for setting up evaluator, a recurring manual step for every new organization. Langfuse was originally chosen because Kaapi lacked batch evaluation runs and a UI to surface results.

Describe the enhancement you'd like
Decouple the evaluation pipeline from Langfuse:

  • Run LLM-as-a-judge scoring natively inside Kaapi (judge prompts, model calls, scoring schema managed in our DB/codebase)
  • Persist evaluator definitions and results in Kaapi so new evaluators can be created via API/UI without touching Langfuse
  • Retain Langfuse only for tracing/observability, not as a hard dependency for evaluation execution

Why is this enhancement needed?

  • Kaapi now has both batch operations and a results UI, removing the original justification for the Langfuse dependency
  • No manual Langfuse setup per evaluator
  • Evaluations runnable end-to-end via Kaapi APIs
  • Faster iteration on new judge prompts and scoring criteria

Additional context
Langfuse remains valuable for tracing and observability — this change scopes it out only from the evaluation execution path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions