Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Reward Model training #1246

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Add Reward Model training #1246

wants to merge 13 commits into from

Commits on Jun 21, 2024

  1. Configuration menu
    Copy the full SHA
    a950f8b View commit details
    Browse the repository at this point in the history
  2. add EOT at end of a chat

    dmahan93 committed Jun 21, 2024
    Configuration menu
    Copy the full SHA
    e360e24 View commit details
    Browse the repository at this point in the history
  3. - add different packing impl (Unpacked, packing until overflow)

    - fix labels to also have valid/test implementations
    - fix label masking in _get_batch to also include anything from get_ltor_masks_and_position_ids
    dmahan93 committed Jun 21, 2024
    Configuration menu
    Copy the full SHA
    9ee4a8f View commit details
    Browse the repository at this point in the history
  4. update README.md

    dmahan93 committed Jun 21, 2024
    Configuration menu
    Copy the full SHA
    0678573 View commit details
    Browse the repository at this point in the history

Commits on Jun 24, 2024

  1. Configuration menu
    Copy the full SHA
    15e3059 View commit details
    Browse the repository at this point in the history

Commits on Jun 25, 2024

  1. - Add metrics to forward step to add DPO specific metrics that are us…

    …eful (accuracy, etc)
    
    - Add reference model setup for DPO
    - Add pairwise dataset for positive/negative pairs
    - Add DPO loss
    dmahan93 committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    2d20d86 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c045006 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    eed3643 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    0392080 View commit details
    Browse the repository at this point in the history
  5. - add precompute logprobs...

    dmahan93 committed Jun 25, 2024
    Configuration menu
    Copy the full SHA
    361f459 View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2024

  1. Configuration menu
    Copy the full SHA
    7398e07 View commit details
    Browse the repository at this point in the history
  2. - update readme for DPO...

    dmahan93 committed Jun 26, 2024
    Configuration menu
    Copy the full SHA
    51af714 View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2024

  1. - Add RM training

    dmahan93 committed Jun 28, 2024
    Configuration menu
    Copy the full SHA
    06c851e View commit details
    Browse the repository at this point in the history