You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that current code only realize the reward loss, how about the pretrain loss mentioned in the paper? Do the reward loss and pretrain loss are trained sequentially?
#24
Closed
lkh329 opened this issue
Jun 13, 2023
· 3 comments
Because it is simpler to use ReFL alone directly and to achieve decent results, the code provided here is simply ReFL without the inclusion of pre-training data in order to present the core code of the ReFL algorithm in a simple way. If you wish, you can add the pre-training data yourself. The aim of grad scale 1e-3 and train steps 100 is also to simplify.
No description provided.
The text was updated successfully, but these errors were encountered: