-
Notifications
You must be signed in to change notification settings - Fork 390
Online DPO Support #1816
Copy link
Copy link
Open
Labels
community-requestenhancementNew feature or requestNew feature or requestwaiting-on-maintainersWaiting on maintainers to respondWaiting on maintainers to respond
Metadata
Metadata
Assignees
Labels
community-requestenhancementNew feature or requestNew feature or requestwaiting-on-maintainersWaiting on maintainers to respondWaiting on maintainers to respond
Type
Fields
Give feedbackNo fields configured for issues without a type.
Hey Folks,
Does Nemo-RL support Online DPO(https://arxiv.org/abs/2402.04792)?
Given there are environments in Nemo-RL, was wondering about the possibility of generating preferences online based on the outcomes in the environment and fine-tuning.