You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, there are some automatic ways to evaluate the chat performance of current aligned language models, such as mt-bench. It basically replaces human scores of different dimensions through another strong language model as proxy, like gpt4. Do we have plan for adding this type of evals?
The text was updated successfully, but these errors were encountered:
Currently, there are some automatic ways to evaluate the chat performance of current aligned language models, such as mt-bench. It basically replaces human scores of different dimensions through another strong language model as proxy, like gpt4. Do we have plan for adding this type of evals?
The text was updated successfully, but these errors were encountered: