-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
I found that metrics like em, f1 can be negative during initial evaluation.
This is due to the fact that metrics calculation share the same function as reward calculation. Howver, reward is set to -1 when there is a format error. Hence, we may also get -1 score in evaluation. However, i think this should be improved. We should change verl/trainer/ppo/ray_trainer.py, in _validate() :
reward_tensor = torch.clamp(reward_tensor, min=0.0)
reward_tensor = torch.clamp(reward_tensor, min=0.0) em_reward_tensor = torch.clamp(em_reward_tensor, min=0.0) llm_reward_tensor = torch.clamp(llm_reward_tensor, min=0.0)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels