It seems that current code only realize the reward loss, how about the pretrain loss mentioned in the paper? Do the reward loss and pretrain loss are trained sequentially? #24

lkh329 · 2023-06-13T13:39:36Z

No description provided.

WuJie1010 · 2023-06-14T03:12:01Z

Very nice work! I have the same problem. I only confuse that why the grad scale is 1e-3 and only train 100 steps? thanks a lot.

xujz18 · 2023-06-19T12:35:59Z

Because it is simpler to use ReFL alone directly and to achieve decent results, the code provided here is simply ReFL without the inclusion of pre-training data in order to present the core code of the ReFL algorithm in a simple way. If you wish, you can add the pre-training data yourself. The aim of grad scale 1e-3 and train steps 100 is also to simplify.

lkh329 · 2023-06-19T14:49:51Z

Thanks for your replay，very nice work

xujz18 mentioned this issue Jul 9, 2023

how to reproduce fine-tune result？ #34

Closed

xujz18 closed this as completed Jul 9, 2023

hkunzhe mentioned this issue Sep 11, 2023

ReFL implement details #57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It seems that current code only realize the reward loss, how about the pretrain loss mentioned in the paper? Do the reward loss and pretrain loss are trained sequentially? #24

It seems that current code only realize the reward loss, how about the pretrain loss mentioned in the paper? Do the reward loss and pretrain loss are trained sequentially? #24

lkh329 commented Jun 13, 2023

WuJie1010 commented Jun 14, 2023

xujz18 commented Jun 19, 2023 •

edited

Loading

lkh329 commented Jun 19, 2023 via email •

edited

Loading

It seems that current code only realize the reward loss, how about the pretrain loss mentioned in the paper? Do the reward loss and pretrain loss are trained sequentially? #24

It seems that current code only realize the reward loss, how about the pretrain loss mentioned in the paper? Do the reward loss and pretrain loss are trained sequentially? #24

Comments

lkh329 commented Jun 13, 2023

WuJie1010 commented Jun 14, 2023

xujz18 commented Jun 19, 2023 • edited Loading

lkh329 commented Jun 19, 2023 via email • edited Loading

xujz18 commented Jun 19, 2023 •

edited

Loading

lkh329 commented Jun 19, 2023 via email •

edited

Loading