Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Referenced the implementation of HALOs, the KTO algorithm has been integrated into this branch. It supports both balanced (referred to as the vanilla version) and unbalanced (referred to as the non-vanilla version) scenarios for handling positive and negative samples in a batch. The vanilla version ensures that the number of positive and negative samples is consistent within each batch, while the non-vanilla version does not require this consistency.
A lightweight dataset was selected for algorithm validation, where the effects of DPO, vanilla KTO, non-vanilla KTO, and the baseline were compared. The dataset and the results are as follows:
* baseline model is "OpenLLMAI/Llama-2-7b-sft-model-ocra-500k"