Added RM dataset training split and add support for WebGPT and other for RLHF #1793

theblackcat102 · 2023-02-22T02:55:29Z

Added the code I use for splitting RM dataset : tools/sample_rm_data.py

python -m tools.sample_rm_data 2023-02-12_oasst_prod.jsonl

This generates 3 splits of rm_test.jsonl, rm_train.jsonl, rm_val.jsonl

Added webgpt and private_tuning for RLHF dataset

    - oa_private:
        data_path: .cache
        split: rl
        val_split: 0.0
        fraction: 1
        file: 2023-02-12_oasst_prod.jsonl
    - webgpt:
        val_split: 0.0
        fraction: 1
    - private_tuning:
        val_split: 0.0
        fraction: 1

make sure oa_v3_fixed_plus_safety.jsonl used in private_tuning is placed under .cache

…to support RL

github-actions · 2023-02-22T02:57:23Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

theblackcat102 · 2023-02-22T03:28:31Z

Some new todos:

we might have to only sample the english only prompts, cause spanish was around 29% and current pythia model might perform pretty bad in spanish?

en: 57.2%
es: 28.97%
...

Add a reweight process to ensure the initial mean reward score starts at zero mean.

Previous snippet code I written

sanagno

Great, thanks!

theblackcat102 added 3 commits February 22, 2023 01:25

[fix] lint problem

d1e9b8b

[fix] tidy unused dataset and migrate webgpt and instruction dataset …

dc8d267

…to support RL

[feature] confirm webgpt and private_tuning working in RLHF

7fb1ccc

theblackcat102 requested a review from sanagno as a code owner February 22, 2023 02:55

theblackcat102 added the ml label Feb 22, 2023

[fix] Pylint fix

b611469

theblackcat102 requested review from yk and andreaskoepf as code owners February 22, 2023 03:02

theblackcat102 mentioned this pull request Feb 22, 2023

Incorrect url in SFT prompt_dialogue dataset #1769

Closed

sanagno approved these changes Feb 22, 2023

View reviewed changes

andreaskoepf approved these changes Feb 23, 2023

View reviewed changes

andreaskoepf merged commit 58680f5 into main Feb 23, 2023

andreaskoepf deleted the rm-datasets branch February 23, 2023 08:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added RM dataset training split and add support for WebGPT and other for RLHF #1793

Added RM dataset training split and add support for WebGPT and other for RLHF #1793

theblackcat102 commented Feb 22, 2023

github-actions bot commented Feb 22, 2023

theblackcat102 commented Feb 22, 2023

sanagno left a comment

Added RM dataset training split and add support for WebGPT and other for RLHF #1793

Added RM dataset training split and add support for WebGPT and other for RLHF #1793

Conversation

theblackcat102 commented Feb 22, 2023

github-actions bot commented Feb 22, 2023

theblackcat102 commented Feb 22, 2023

sanagno left a comment

Choose a reason for hiding this comment