-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation results is always 0, and different from the Leaderboard #69
Comments
Actually I am facing the same issue but with all the tasks. Earlier (a week ago) everything was fine but now I am getting 0 results for the same model which had positive scores on all tasks. |
@lynneChan @harshraj172 Could you provide |
I cannot run docker container in my environment, so I can only run code in branch v0.1. And this is the part results in
|
@lynneChan Hi, the interactions seem to be proceeding normally. This might be a fastchat issue. Maybe you could try with gpt-3.5-turbo to see if the issue persists. |
I want to evaluate the vicuna_7b_v1.5 with the webshop task, and according to the
configs/agents/fastchat_client.yaml
the agent config is setted as following:The vicuna_7b_v1.5 model is deployed with fastchat controller and model_worker.
The evaluation command is:
And after execution done, I get the following results:
The reward is always 0, and different from the leaderboard. So what went wrong? Could anyone give some help?
The text was updated successfully, but these errors were encountered: