-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/remove reward instructor #2289
Feature/remove reward instructor #2289
Conversation
❌ pre-commit failed. |
In terms of checking whether we need functionality from |
…e/remove-reward-instructor
@CloseChoice since you are actively developing in the ML parts of open-assistiant I would like to invite you to join the OA ML team on discord. Please ping me (andreaskoepf). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. Not sure we actually still need the old RM configs, but it's probably fine to keep them around for a little longer.
thx!
@theblackcat102 Is all relevant RM code already part of the new trainer_rm so that the old reward/instructor can be deleted? At least the old rank datasets were not ported yet. What's with rankgen loss etc? |
Do we have a dataset with which I can check if that works correctly? I don't have access to the private OA data but I guess that a any other dataset suitable for RM should suffice. Even if I have to manipulate it a bit. |
@andreaskoepf rankgen didn't have good results to back it up anyway. I think its fine if we just move on |
@CloseChoice I saw you added RM configs for deberta, however the existing code doesn't work for deberta nor due to limited choices in TOKENIZER_CONFIGS under utils.py. Have you tried running a webgpt examples on these new RM config? |
…e/remove-reward-instructor
@theblackcat102 I tried, but so far I failed. Currently we only support I just added the configs from |
❌ pre-commit failed. |
I fixed the python trainer_rm.py --configs defaults_rm debug_rm with this branch and ran into the following error File "xxx/Open-Assistant/model/.venv/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 210, in _unscale_grads_
raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients. I'm not the first to have this problem but any ideas how to fix this? |
❌ pre-commit failed. |
I removed the configs where I ran into problems with the tokenizer config |
@CloseChoice the problem with webgpt, hf_summary and |
@theblackcat102 still get the errors on main |
…e/remove-reward-instructor
@theblackcat102 @andreaskoepf @dvruette Any reasons why we should not merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shahules786 is this a bug?
for data in dataset:
for item in dataset:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is a bug I fixed on the fly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
closes #2049
I updated the model_training/utils.py with all the functionality I could find diverging from reward/instructor.
I still have a few questions:
reward/instructor
, e.g. inexperimental_dataset.py
andcls_dataset.py
and alsow thewebgpt_return_format
function in utils?webgpt
andhfsummary
as reward model training data? Currently only theoasst_export
data is defined as training data for the RM model