LLM-as-a-judge available models #4123
Replies: 1 comment
-
|
Hi @comeRodriguez! I'm Dosu and I’m helping the agenta team. You're right — the LLM-as-a-judge evaluator currently uses a hardcoded list of 5 models (gpt-3.5-turbo, gpt-4o, and three Claude variants) defined in the backend settings template [1]. Custom LiteLLM providers configured in the Model Hub aren't available for the evaluator, even though they work in the playground. The good news is the maintainers are already aware of this gap and have tracked it as AGE-3707 — "LLM-as-a-judge: use same model list as prompts". The plan is to replace the hardcoded model list with the same dynamic model registry used by the playground, which would let you use your custom LiteLLM providers for evaluation too. That work is currently blocked by the evaluator playground migration (AGE-3656), which involves rethinking how evaluator schemas work — so it's on the roadmap but doesn't have an immediate timeline yet [2]. To reply, just mention @dosu. Docs are dead. Just use Dosu. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
I've a question about LLM-as-a-judge available models dropdown in the create evaluators UI:
Today it's only possible to select between a closed list of models, from OpenAI and Anthropic providers.
I managed to create a custom litellm provider for the playground, but it seems that I can't use it in the UI for LLM-as-a-judge, is it a choice or can it be a new feature in futur releases?
Beta Was this translation helpful? Give feedback.
All reactions