You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to apply RLHF on a text classification task. You can imagine the text classification model i.e. policy model here is emotion classification. The pretrained model can output class numbers ranging between 1 and 10. The reward model should train with the dataset labelled with correct class numbers (assuming it is available). Finally, I want to optimize the policy model with reward model using PPO.
Can this be done with this library? If so, please help by illustrating the steps.
Thanks
The text was updated successfully, but these errors were encountered:
Here are two suggestions: 1. Text classification may not require complex RL algorithms. 2. If using RLHF, consider changing the output to label text instead of label number.
I am trying to apply RLHF on a text classification task. You can imagine the text classification model i.e. policy model here is
emotion classification
. The pretrained model can outputclass numbers
ranging between 1 and 10. The reward model should train with the dataset labelled with correctclass numbers
(assuming it is available). Finally, I want to optimize thepolicy model
withreward model
using PPO.Can this be done with this library? If so, please help by illustrating the steps.
Thanks
The text was updated successfully, but these errors were encountered: