Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/remove reward instructor #2289

Merged

Conversation

CloseChoice
Copy link
Collaborator

@CloseChoice CloseChoice commented Apr 1, 2023

closes #2049

I updated the model_training/utils.py with all the functionality I could find diverging from reward/instructor.

I still have a few questions:

  • do we need the all of the dataset processing functionality form reward/instructor, e.g. in experimental_dataset.py and cls_dataset.py and alsow the webgpt_return_format function in utils?
  • will we use webgpt and hfsummary as reward model training data? Currently only the oasst_export data is defined as training data for the RM model

@github-actions
Copy link

github-actions bot commented Apr 1, 2023

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@andreaskoepf andreaskoepf added the ml label Apr 1, 2023
@olliestanley
Copy link
Collaborator

In terms of checking whether we need functionality from reward/instructor, I wouldn't worry too much. If any of it gets deleted and turns out to be needed later, it's still in the Git history so easy enough to go back and grab! I'm also not sure if we need to add all those configs to the YAML as most of them are probably outdated now but will leave that up to the ML guys to review

@CloseChoice CloseChoice marked this pull request as ready for review April 2, 2023 21:14
@andreaskoepf
Copy link
Collaborator

@CloseChoice since you are actively developing in the ML parts of open-assistiant I would like to invite you to join the OA ML team on discord. Please ping me (andreaskoepf).

Copy link
Collaborator

@dvruette dvruette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Not sure we actually still need the old RM configs, but it's probably fine to keep them around for a little longer.

thx!

@andreaskoepf
Copy link
Collaborator

@theblackcat102 Is all relevant RM code already part of the new trainer_rm so that the old reward/instructor can be deleted? At least the old rank datasets were not ported yet. What's with rankgen loss etc?

@CloseChoice
Copy link
Collaborator Author

Do we have a dataset with which I can check if that works correctly? I don't have access to the private OA data but I guess that a any other dataset suitable for RM should suffice. Even if I have to manipulate it a bit.

@theblackcat102
Copy link
Collaborator

@andreaskoepf rankgen didn't have good results to back it up anyway. I think its fine if we just move on

@theblackcat102
Copy link
Collaborator

@CloseChoice I saw you added RM configs for deberta, however the existing code doesn't work for deberta nor due to limited choices in TOKENIZER_CONFIGS under utils.py. Have you tried running a webgpt examples on these new RM config?

@CloseChoice
Copy link
Collaborator Author

CloseChoice commented Apr 4, 2023

@theblackcat102 I tried, but so far I failed. Currently we only support oasst_export as RM dataset (as given by the RM_DATASETS and I don't see any reward modelling functionality in the WebGPT dataset (and don't see it under rank_datasets aswell). Should WebGPT just work out-of-the-box if I add it to RM_DATASETS?

I just added the configs from reward/instructor and copied them as they were.

@github-actions
Copy link

github-actions bot commented Apr 4, 2023

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@CloseChoice
Copy link
Collaborator Author

I fixed the HFSummary dataset and ran

python trainer_rm.py --configs defaults_rm debug_rm

with this branch and ran into the following error

  File "xxx/Open-Assistant/model/.venv/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 210, in _unscale_grads_
    raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.

I'm not the first to have this problem but any ideas how to fix this?

@github-actions
Copy link

github-actions bot commented Apr 4, 2023

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@CloseChoice
Copy link
Collaborator Author

I removed the configs where I ran into problems with the tokenizer config

@theblackcat102
Copy link
Collaborator

@CloseChoice the problem with webgpt, hf_summary and ValueError: Attempting to unscale FP16 gradients. are all fixed here Zzzzz

@CloseChoice
Copy link
Collaborator Author

@theblackcat102 still get the errors on main

@CloseChoice
Copy link
Collaborator Author

@theblackcat102 @andreaskoepf @dvruette Any reasons why we should not merge this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shahules786 is this a bug?

        for data in dataset:
            for item in dataset:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is a bug I fixed on the fly

Copy link
Collaborator

@theblackcat102 theblackcat102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@theblackcat102 theblackcat102 merged commit 1813bcc into LAION-AI:main Apr 14, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor RM code
5 participants