chat model evaluation #1870

jordane95 · 2024-05-22T07:02:29Z

Currently, there are some automatic ways to evaluate the chat performance of current aligned language models, such as mt-bench. It basically replaces human scores of different dimensions through another strong language model as proxy, like gpt4. Do we have plan for adding this type of evals?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chat model evaluation #1870

chat model evaluation #1870

jordane95 commented May 22, 2024

chat model evaluation #1870

chat model evaluation #1870

Comments

jordane95 commented May 22, 2024