Derive RewardModel from PreTrainedModel #2158

andreaskoepf · 2023-03-21T17:42:35Z

First simple way to derive our RewardModel from PreTrainedModel so simplify loading .. still requires full download of base model which could probably be simplified in the future.

sanagno

LGTM

dvruette · 2023-03-21T22:20:43Z

Looks good! Does this work with AutoModel? If so, how exactly? I would expect it to work with AutoModelForSequenceClassification.from_pretrained.

Also, why not inherit from GPTNeoXModel? Could save us the trouble of loading the base model. We could basically just copy GPTNeoXForCausalLM and add pooling between hidden states and head.

theblackcat102

LGTM

andreaskoepf · 2023-03-22T00:57:47Z

Looks good! Does this work with AutoModel? If so, how exactly? I would expect it to work with AutoModelForSequenceClassification.from_pretrained.

This PR does currently not support loading from AutoModel but requires explicit use of the RewardModel class. If you have time you could look into what is necessary to register for AutoModel loading.

Also, why not inherit from GPTNeoXModel? Could save us the trouble of loading the base model. We could basically just copy GPTNeoXForCausalLM and add pooling between hidden states and head.

It isn't derived from a single architecture to allow different types of base models.

derive RewardModel from PreTrainedModel

8e15b50

andreaskoepf requested review from theblackcat102 and sanagno as code owners March 21, 2023 17:42

andreaskoepf added 2 commits March 21, 2023 21:21

add hidden_size to config for deepspeed compatiblity

ad736f7

Merge remote-tracking branch 'origin/main' into rm_pre_trained_model

291b343

sanagno approved these changes Mar 21, 2023

View reviewed changes

theblackcat102 approved these changes Mar 22, 2023

View reviewed changes

add option to export reward model

c03bdad

andreaskoepf merged commit 99b42bc into main Mar 22, 2023
1 check passed

andreaskoepf deleted the rm_pre_trained_model branch March 22, 2023 00:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Derive RewardModel from PreTrainedModel #2158

Derive RewardModel from PreTrainedModel #2158

andreaskoepf commented Mar 21, 2023

sanagno left a comment

dvruette commented Mar 21, 2023 •

edited

theblackcat102 left a comment

andreaskoepf commented Mar 22, 2023

Derive RewardModel from PreTrainedModel #2158

Derive RewardModel from PreTrainedModel #2158

Conversation

andreaskoepf commented Mar 21, 2023

sanagno left a comment

Choose a reason for hiding this comment

dvruette commented Mar 21, 2023 • edited

theblackcat102 left a comment

Choose a reason for hiding this comment

andreaskoepf commented Mar 22, 2023

dvruette commented Mar 21, 2023 •

edited