**To** run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save) (eg for Llama.cpp).

[NEW] ORPO support is finally here thanks to [oKatanaaa](https://github.com/oKatanaaa) and [AT&Dev](https://huggingface.co/AtAndDev)

ORPO merges the SFT and DPO steps into 1. Before one had to do a SFT, then DPO. ORPO now requires only 1 step.

In [1]:
# %%capture
# !pip install unsloth
# # Also get the latest nightly Unsloth!
# !pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

* We support Llama, Mistral, CodeLlama, TinyLlama, Vicuna, Open Hermes etc
* And Yi, Qwen ([llamafied](https://huggingface.co/models?sort=trending&search=qwen+llama)), Deepseek, all Llama, Mistral derived archs.
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* [**NEW**] With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.

In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 128 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = False # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
    "unsloth/llama-2-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",
    "unsloth/gemma-7b-it-bnb-4bit", # Instruct version of Gemma 7b
    "unsloth/gemma-2b-bnb-4bit",
    "unsloth/gemma-2b-it-bnb-4bit", # Instruct version of Gemma 2b
    "unsloth/llama-3-8b-bnb-4bit", # [NEW] 15 Trillion token Llama-3
] # More models at https://huggingface.co/unsloth

basemodel, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B-Instruct",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.




🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2024.12.2: Fast Llama patching. Transformers:4.46.3.
   \\   /|    GPU: NVIDIA GeForce RTX 4060 Laptop GPU. Max memory: 7.996 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.4.0+cu121. CUDA: 8.9. CUDA Toolkit: 12.1. Triton: 3.0.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.27.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

<a name="Data"></a>
### Data Prep
We now use a special ORPO style dataset from [recipe-research](https://huggingface.co/datasets/reciperesearch/dolphin-sft-v0.1-preference).

You need at least 3 columns:
* Instruction
* Accepted
* Rejected

For example:
* Instruction: "What is 2+2?"
* Accepted: "The answer is 4"
* Rejected: "The answer is 5"

The goal of ORPO is to penalize the "rejected" samples, and increase the likelihood of "accepted" samples. [recipe-research](https://huggingface.co/datasets/reciperesearch/dolphin-sft-v0.1-preference) essentially used Mistral to generate the "rejected" responses, and used GPT-4 to generated the "accepted" responses.

In [3]:
def format_chat_template(row):
    row["prompt"] = row["chosen"][0]["content"]
    row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)
    row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)
    return row

from datasets import load_dataset
dataset = load_dataset("argilla/Capybara-Preferences")['train']
dataset = dataset.map(format_chat_template,)
split = dataset.train_test_split(test_size=0.01)
train_dataset = split['train']
eval_dataset = split['test']

Let's print out some examples to see how the dataset should look like

In [4]:
# Enable reward modelling stats
from unsloth import PatchDPOTrainer
PatchDPOTrainer()

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `ORPOTrainer`! More docs here: [TRL ORPO docs](https://huggingface.co/docs/trl/main/en/orpo_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [5]:
from itertools import product
from trl import ORPOConfig, ORPOTrainer
from unsloth import is_bfloat16_supported
import json
from transformers import TrainerCallback

class LoggingCallback(TrainerCallback):
    def __init__(self, log_file):
        self.log_file = log_file

    def on_log(self, args, state, control, logs=None, **kwargs):
        if logs is not None:
            with open(self.log_file, 'a') as f:
                f.write(f"Step {state.global_step}: {logs}\n")


class EarlyStoppingCallback(TrainerCallback):
    def __init__(self, patience: int, min_delta: float = 0.0, log_file = None):
        """
        Early stopping callback to stop training when validation loss does not improve.
        
        Args:
            patience (int): Number of evaluations to wait for an improvement.
            min_delta (float): Minimum change in the monitored metric to qualify as an improvement.
        """
        self.patience = patience
        self.min_delta = min_delta
        self.best_loss = float('inf')
        self.num_bad_epochs = 0
        self.log_file = log_file

    def on_evaluate(self, args, state, control, **kwargs):
        """
        This method is called during evaluation.
        """
        eval_loss = kwargs['metrics'].get('eval_loss', None)
        
        if eval_loss is None:
            return
        
        # Log eval_loss if log_file is specified
        if self.log_file:
            with open(self.log_file, 'a') as f:
                f.write(f"Step {state.global_step}, Eval Loss: {eval_loss}\n")
        
        # Check if eval_loss improved
        if eval_loss < self.best_loss - self.min_delta:
            self.best_loss = eval_loss
            self.num_bad_epochs = 0
        else:
            self.num_bad_epochs += 1
        
        # Stop training if patience is exceeded
        if self.num_bad_epochs >= self.patience:
            print(f"Early stopping triggered. No improvement for {self.patience} evaluations.")
            control.should_training_stop = True


param_grid = {
    "r": [8, 16],
    "lora_alpha": [8, 16, 32],
    'learning_rate': [1e-4, 2e-4, 3e-4],  # 学习率
    'weight_decay': [0.01, 0.02, 0.05],   # 权重衰减
    'beta': [0.001, 0.01],      # 奖励
}


def grid_search(param_grid, basemodel, train_dataset, eval_dataset, tokenizer, max_seq_length):
    best_model = None
    best_eval_loss = float('inf')
    best_params = None
    
    param_combinations = product(*param_grid.values())
    
    for params in param_combinations:
        log_file = f'Llama-3.2-1B-Instruct_evaluation_results/3000+es/{params}_training_logs.txt'
        print(f"Training with params: {params}")
        
        param_dict = dict(zip(param_grid.keys(), params))
        
        
        model = FastLanguageModel.get_peft_model(
            basemodel,
            r = param_dict['r'], # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
            target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                            "gate_proj", "up_proj", "down_proj",],
            lora_alpha = param_dict['lora_alpha'],
            lora_dropout = 0, # Supports any, but = 0 is optimized
            bias = "none",    # Supports any, but = "none" is optimized
            # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
            use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
            random_state = 3407,
            use_rslora = False,  # We support rank stabilized LoRA
            loftq_config = None, # And LoftQ
        )
        
        trainer_args = ORPOConfig(
            warmup_steps=5,
            learning_rate=param_dict['learning_rate'],
            weight_decay=param_dict['weight_decay'],
            seed=3407,
            max_length=max_seq_length,
            max_prompt_length=max_seq_length//2,
            max_completion_length=max_seq_length//2,
            per_device_train_batch_size=2,
            gradient_accumulation_steps=4,
            beta=param_dict['beta'],
            logging_steps=50,
            optim="adamw_8bit",
            lr_scheduler_type="linear",
            num_train_epochs=1,
            # max_steps = 3000,
            fp16=not is_bfloat16_supported(),
            bf16=is_bfloat16_supported(),
            load_best_model_at_end=True,
            metric_for_best_model="eval_loss",
            evaluation_strategy="steps",  # 或 "epoch"
            eval_steps=50,  # 每 100 步执行一次评估
            save_steps=50,  # 每 100 步保存一次模型
            save_total_limit=3,  # 最多保存 3 个检查点
            output_dir="outputs",
            report_to="none",
            push_to_hub=False,
        )
        early_stopping_callback = EarlyStoppingCallback(patience=3, min_delta=0.01, log_file=log_file)
        orpo_trainer = ORPOTrainer(
            model=model,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            tokenizer=tokenizer,
            args=trainer_args,
            callbacks=[LoggingCallback(log_file), early_stopping_callback],
        )
        
        orpo_trainer.train()
        trainer_stats_eval = orpo_trainer.evaluate()

        with open(f"Llama-3.2-1B-Instruct_evaluation_results/3000+es/{params}.json", "w") as f:
            json.dump(trainer_stats_eval, f, indent=4)
        
        eval_loss = trainer_stats_eval.get("eval_loss")
        print(f"Eval loss: {eval_loss}")
        
        if eval_loss < best_eval_loss:
            best_eval_loss = eval_loss
            best_model = orpo_trainer.model
            best_params = param_dict

        torch.cuda.empty_cache()
            
    print(f"Best params: {best_params}")
    print(f"Best eval loss: {best_eval_loss}")
    
    return best_model, best_params

best_model, best_params = grid_search(param_grid, basemodel, train_dataset, eval_dataset, tokenizer, max_seq_length)


Training with params: (8, 8, 0.0001, 0.01, 0.001)


Unsloth 2024.12.2 patched 16 layers with 16 QKV layers, 16 O layers and 16 MLP layers.


Map:   0%|          | 0/15249 [00:00<?, ? examples/s]

Map:   0%|          | 0/15249 [00:00<?, ? examples/s]

Map:   0%|          | 0/15249 [00:00<?, ? examples/s]

Map:   0%|          | 0/155 [00:00<?, ? examples/s]

Map:   0%|          | 0/155 [00:00<?, ? examples/s]

Map:   0%|          | 0/155 [00:00<?, ? examples/s]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096
Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.3297,1.617166,-0.000855,-0.000877,0.11875,2.2e-05,-0.876872,-0.854604,0.572115,0.563781
100,1.5453,1.428514,-0.000621,-0.000641,0.1125,2e-05,-0.641027,-0.620785,-0.129988,-0.1382
150,1.4358,1.311073,-0.000488,-0.000508,0.11875,2.1e-05,-0.508273,-0.487632,-0.098988,-0.105722
200,1.3033,1.230115,-0.000402,-0.000422,0.1125,2e-05,-0.421636,-0.402096,0.213057,0.206967
250,1.281,1.206235,-0.000396,-0.000415,0.1125,1.8e-05,-0.414603,-0.396269,0.301443,0.295725
300,1.2069,1.188942,-0.000384,-0.000404,0.11875,2e-05,-0.403708,-0.384123,0.31447,0.308798
350,1.1997,1.176946,-0.000378,-0.000398,0.11875,2e-05,-0.397628,-0.377683,0.275338,0.270831
400,1.2084,1.171275,-0.000378,-0.000397,0.1125,1.9e-05,-0.397195,-0.378175,0.361004,0.354495
450,1.1909,1.165707,-0.000373,-0.000393,0.11875,2e-05,-0.392512,-0.372827,0.40439,0.401346
500,1.2387,1.157572,-0.000369,-0.000387,0.11875,1.9e-05,-0.387305,-0.36855,0.440445,0.437593


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1444458961486816
Training with params: (8, 8, 0.0001, 0.01, 0.01)




Map:   0%|          | 0/15249 [00:00<?, ? examples/s]

Map:   0%|          | 0/155 [00:00<?, ? examples/s]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.336,1.622597,-0.008537,-0.00876,0.125,0.000223,-0.876031,-0.853708,0.568646,0.560269
100,1.5513,1.434819,-0.006213,-0.006417,0.11875,0.000205,-0.641728,-0.621255,-0.118092,-0.126682
150,1.4416,1.315018,-0.004843,-0.005049,0.11875,0.000206,-0.50489,-0.484257,-0.081851,-0.088844
200,1.31,1.236484,-0.004019,-0.004216,0.11875,0.000197,-0.421639,-0.401911,0.238341,0.23245
250,1.2891,1.213212,-0.00396,-0.004143,0.1125,0.000183,-0.414304,-0.395966,0.307033,0.301158
300,1.2142,1.195054,-0.003843,-0.004038,0.11875,0.000195,-0.403826,-0.384332,0.320616,0.315541
350,1.2068,1.182415,-0.003774,-0.003973,0.11875,0.000199,-0.39734,-0.377439,0.279037,0.273955
400,1.2154,1.177692,-0.00378,-0.003969,0.1125,0.000189,-0.396929,-0.37799,0.361892,0.356232
450,1.1977,1.171717,-0.00373,-0.003924,0.11875,0.000194,-0.392419,-0.373014,0.410178,0.407043
500,1.2453,1.16383,-0.003685,-0.003872,0.11875,0.000187,-0.387209,-0.368463,0.438634,0.435319


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1505024433135986
Training with params: (8, 8, 0.0001, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.3298,1.617434,-0.000855,-0.000877,0.125,2.2e-05,-0.876997,-0.854735,0.572449,0.564093
100,1.5453,1.428558,-0.000621,-0.000641,0.11875,2e-05,-0.641391,-0.621199,-0.129546,-0.137857
150,1.4359,1.309544,-0.000486,-0.000506,0.11875,2.1e-05,-0.506394,-0.48573,-0.104701,-0.111631
200,1.3029,1.229906,-0.000402,-0.000422,0.1125,2e-05,-0.421512,-0.401905,0.217979,0.212157
250,1.2805,1.205445,-0.000395,-0.000414,0.1125,1.8e-05,-0.413849,-0.39539,0.272899,0.267057
300,1.2069,1.188369,-0.000384,-0.000403,0.11875,1.9e-05,-0.403245,-0.383801,0.323131,0.317571
350,1.1994,1.176451,-0.000377,-0.000397,0.11875,2e-05,-0.397132,-0.377003,0.262493,0.257689
400,1.2083,1.171273,-0.000378,-0.000397,0.1125,1.9e-05,-0.396978,-0.377833,0.349794,0.34268
450,1.1908,1.165474,-0.000373,-0.000392,0.11875,2e-05,-0.392319,-0.372563,0.396255,0.392803
500,1.2389,1.157544,-0.000368,-0.000387,0.11875,1.9e-05,-0.387218,-0.368244,0.422267,0.418822


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.144140601158142
Training with params: (8, 8, 0.0001, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.3359,1.622109,-0.00853,-0.008752,0.11875,0.000223,-0.875216,-0.852961,0.562683,0.554172
100,1.5503,1.433454,-0.006205,-0.006407,0.11875,0.000202,-0.640746,-0.620534,-0.127933,-0.136023
150,1.441,1.314609,-0.004841,-0.005047,0.11875,0.000206,-0.504693,-0.484054,-0.089361,-0.096332
200,1.3087,1.236076,-0.004018,-0.004214,0.11875,0.000195,-0.421371,-0.401829,0.223469,0.2177
250,1.2876,1.212435,-0.003959,-0.004143,0.1125,0.000184,-0.414323,-0.395915,0.308888,0.303858
300,1.2141,1.194938,-0.00384,-0.004034,0.11875,0.000195,-0.40344,-0.383959,0.306801,0.301165
350,1.2062,1.182929,-0.003774,-0.003973,0.11875,0.000199,-0.397255,-0.377363,0.272596,0.267543
400,1.2151,1.177519,-0.003779,-0.003969,0.1125,0.00019,-0.396883,-0.377929,0.359698,0.353117
450,1.1972,1.171628,-0.003725,-0.003923,0.11875,0.000198,-0.392333,-0.372517,0.3975,0.39399
500,1.2452,1.16375,-0.003684,-0.003873,0.11875,0.000189,-0.387285,-0.368382,0.430459,0.426767


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1506993770599365
Training with params: (8, 8, 0.0001, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.3302,1.617421,-0.000855,-0.000877,0.11875,2.2e-05,-0.876825,-0.854502,0.574266,0.565868
100,1.5455,1.428114,-0.000621,-0.000641,0.1125,2e-05,-0.641258,-0.620933,-0.128175,-0.136681
150,1.4357,1.309664,-0.000486,-0.000506,0.11875,2.1e-05,-0.50649,-0.485708,-0.099071,-0.105786
200,1.303,1.23007,-0.000402,-0.000422,0.1125,2e-05,-0.421563,-0.401905,0.216183,0.210444
250,1.2809,1.206194,-0.000396,-0.000414,0.1125,1.8e-05,-0.414152,-0.39579,0.302173,0.296378
300,1.2074,1.188849,-0.000384,-0.000403,0.11875,1.9e-05,-0.403301,-0.383856,0.30592,0.300047
350,1.1995,1.176802,-0.000377,-0.000397,0.11875,2e-05,-0.397353,-0.377418,0.268928,0.264134
400,1.2087,1.171356,-0.000378,-0.000397,0.1125,1.9e-05,-0.396883,-0.377854,0.351807,0.345636
450,1.1911,1.16558,-0.000373,-0.000392,0.11875,2e-05,-0.392406,-0.372805,0.401747,0.398303
500,1.2391,1.157736,-0.000368,-0.000387,0.11875,1.9e-05,-0.387332,-0.36835,0.429843,0.425828


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1443215608596802
Training with params: (8, 8, 0.0001, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.3359,1.622498,-0.008539,-0.008763,0.11875,0.000224,-0.87633,-0.853881,0.569772,0.561313
100,1.5512,1.434717,-0.006213,-0.006416,0.1125,0.000203,-0.641582,-0.621273,-0.117293,-0.125816
150,1.4422,1.315942,-0.00486,-0.005069,0.11875,0.000208,-0.506853,-0.486041,-0.083822,-0.090699
200,1.3104,1.236935,-0.004019,-0.004217,0.11875,0.000198,-0.42167,-0.401913,0.217764,0.212011
250,1.2877,1.213114,-0.003961,-0.004144,0.1125,0.000183,-0.414427,-0.396135,0.303636,0.297881
300,1.214,1.195384,-0.003841,-0.004035,0.11875,0.000195,-0.403527,-0.384076,0.316528,0.310977
350,1.2064,1.182915,-0.003774,-0.003972,0.11875,0.000198,-0.397223,-0.377446,0.285102,0.280091
400,1.2151,1.177203,-0.003778,-0.003967,0.1125,0.000189,-0.396725,-0.37784,0.368178,0.363127
450,1.1974,1.171597,-0.003726,-0.003923,0.11875,0.000197,-0.392325,-0.372608,0.407215,0.404266
500,1.2456,1.163667,-0.003684,-0.003872,0.11875,0.000188,-0.387228,-0.368442,0.446772,0.443768


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1504287719726562
Training with params: (8, 8, 0.0002, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0739,1.454647,-0.000645,-0.000666,0.11875,2.1e-05,-0.666169,-0.645142,-0.078454,-0.087268
100,1.4231,1.266181,-0.000405,-0.000425,0.1125,2e-05,-0.424859,-0.404922,0.040702,0.034759
150,1.2835,1.211203,-0.000395,-0.000414,0.11875,1.9e-05,-0.414225,-0.394777,0.371082,0.367467
200,1.2383,1.189942,-0.000386,-0.000405,0.10625,1.8e-05,-0.40462,-0.386472,0.530174,0.52898
250,1.235,1.170809,-0.000378,-0.000396,0.1125,1.8e-05,-0.396272,-0.378305,0.540481,0.539134
300,1.178,1.160139,-0.000371,-0.000389,0.11875,1.8e-05,-0.388883,-0.370781,0.502716,0.501194
350,1.1729,1.153827,-0.000368,-0.000388,0.11875,1.9e-05,-0.387808,-0.36833,0.534563,0.53255
400,1.1836,1.149463,-0.000368,-0.000387,0.11875,1.8e-05,-0.386504,-0.368014,0.503183,0.499004
450,1.169,1.145632,-0.000364,-0.000382,0.11875,1.8e-05,-0.382386,-0.364037,0.516483,0.51596
500,1.2186,1.13838,-0.00036,-0.000377,0.11875,1.8e-05,-0.377206,-0.359685,0.556019,0.554065


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1318371295928955
Training with params: (8, 8, 0.0002, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0801,1.46117,-0.006452,-0.006661,0.11875,0.000208,-0.666073,-0.645235,-0.081469,-0.090007
100,1.4289,1.270569,-0.004047,-0.004247,0.11875,0.0002,-0.424694,-0.404706,0.042952,0.037083
150,1.2895,1.217317,-0.003949,-0.004143,0.11875,0.000195,-0.41433,-0.394859,0.365976,0.362369
200,1.2447,1.195938,-0.003866,-0.004047,0.10625,0.000181,-0.404679,-0.386622,0.515013,0.513928
250,1.2416,1.177572,-0.003789,-0.003967,0.1125,0.000178,-0.39672,-0.378894,0.538332,0.536988
300,1.1842,1.166684,-0.003712,-0.003896,0.11875,0.000183,-0.389592,-0.371245,0.496358,0.495028
350,1.1794,1.160308,-0.003688,-0.003882,0.11875,0.000194,-0.388169,-0.368781,0.533315,0.531609
400,1.1903,1.156181,-0.00369,-0.003873,0.11875,0.000184,-0.38734,-0.368963,0.49158,0.487111
450,1.1752,1.152103,-0.003644,-0.003829,0.11875,0.000184,-0.382865,-0.364426,0.508199,0.506967
500,1.2246,1.144393,-0.003597,-0.003771,0.11875,0.000174,-0.377061,-0.359655,0.550085,0.547072


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1377404928207397
Training with params: (8, 8, 0.0002, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0737,1.454974,-0.000646,-0.000667,0.11875,2.1e-05,-0.666913,-0.646237,-0.066902,-0.07523
100,1.4217,1.261777,-0.000404,-0.000424,0.11875,2e-05,-0.424216,-0.404478,0.046252,0.040588
150,1.2828,1.211262,-0.000395,-0.000414,0.11875,2e-05,-0.414276,-0.394684,0.361302,0.357493
200,1.2384,1.189773,-0.000386,-0.000405,0.10625,1.8e-05,-0.404588,-0.386406,0.50251,0.50118
250,1.2348,1.17086,-0.000378,-0.000396,0.1125,1.8e-05,-0.396187,-0.378335,0.527262,0.525879
300,1.1784,1.160053,-0.000371,-0.000389,0.11875,1.8e-05,-0.389217,-0.371075,0.481047,0.479538
350,1.1729,1.153489,-0.000369,-0.000388,0.11875,1.9e-05,-0.388158,-0.368674,0.509019,0.506442
400,1.1843,1.14989,-0.000369,-0.000387,0.11875,1.9e-05,-0.387276,-0.368729,0.485941,0.480999
450,1.1692,1.146274,-0.000365,-0.000383,0.11875,1.9e-05,-0.383196,-0.364683,0.511482,0.510456
500,1.2182,1.138539,-0.00036,-0.000377,0.11875,1.8e-05,-0.377295,-0.359634,0.554406,0.551795


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1316723823547363
Training with params: (8, 8, 0.0002, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0799,1.460945,-0.006456,-0.006664,0.1125,0.000208,-0.666391,-0.645552,-0.080608,-0.088826
100,1.4286,1.27167,-0.00405,-0.004248,0.11875,0.000198,-0.42483,-0.404987,0.045415,0.039505
150,1.2895,1.21786,-0.003949,-0.004145,0.11875,0.000196,-0.414483,-0.394913,0.356635,0.352794
200,1.2446,1.195968,-0.003866,-0.004049,0.10625,0.000183,-0.404876,-0.38661,0.505306,0.503903
250,1.2417,1.177043,-0.003786,-0.003964,0.1125,0.000178,-0.396407,-0.378597,0.527188,0.525251
300,1.1843,1.166394,-0.003714,-0.003896,0.11875,0.000183,-0.389624,-0.37135,0.491689,0.490212
350,1.1794,1.159996,-0.003687,-0.003882,0.11875,0.000195,-0.38818,-0.368722,0.516519,0.513997
400,1.19,1.155821,-0.003687,-0.003871,0.11875,0.000184,-0.387122,-0.368709,0.486161,0.48142
450,1.1753,1.152326,-0.003647,-0.003832,0.11875,0.000185,-0.383175,-0.364688,0.512532,0.511755
500,1.225,1.144328,-0.003596,-0.003771,0.11875,0.000175,-0.377091,-0.359634,0.549051,0.54666


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1380060911178589
Training with params: (8, 8, 0.0002, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0742,1.45499,-0.000646,-0.000667,0.11875,2.1e-05,-0.666562,-0.645653,-0.077956,-0.086627
100,1.4221,1.262077,-0.000405,-0.000425,0.1125,2e-05,-0.424582,-0.404753,0.045566,0.040153
150,1.2831,1.21129,-0.000395,-0.000415,0.11875,2e-05,-0.414513,-0.394956,0.373491,0.369692
200,1.2406,1.188133,-0.000385,-0.000404,0.10625,1.8e-05,-0.403562,-0.385259,0.487261,0.48647
250,1.2353,1.170933,-0.000379,-0.000397,0.1125,1.8e-05,-0.396603,-0.37862,0.534907,0.533289
300,1.1791,1.16018,-0.000371,-0.000389,0.11875,1.8e-05,-0.389205,-0.371027,0.485233,0.484055
350,1.1728,1.153798,-0.000368,-0.000388,0.11875,2e-05,-0.388016,-0.368494,0.511427,0.508908
400,1.1845,1.150108,-0.000369,-0.000387,0.11875,1.9e-05,-0.387134,-0.368622,0.483781,0.478842
450,1.1693,1.146622,-0.000364,-0.000383,0.11875,1.9e-05,-0.382862,-0.364264,0.501257,0.500194
500,1.2184,1.138534,-0.00036,-0.000378,0.11875,1.8e-05,-0.377517,-0.359668,0.541975,0.539509


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1317851543426514
Training with params: (8, 8, 0.0002, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0799,1.461055,-0.00645,-0.006658,0.11875,0.000208,-0.665803,-0.645021,-0.08383,-0.092265
100,1.4294,1.272745,-0.004051,-0.00425,0.11875,0.000199,-0.425017,-0.405136,0.041514,0.035916
150,1.2894,1.217787,-0.003953,-0.004147,0.11875,0.000195,-0.414728,-0.395271,0.364914,0.361268
200,1.2448,1.196031,-0.003868,-0.004051,0.10625,0.000182,-0.405087,-0.386837,0.498847,0.497575
250,1.2418,1.176857,-0.003786,-0.003964,0.1125,0.000179,-0.396436,-0.378577,0.528009,0.526452
300,1.1844,1.166306,-0.003708,-0.003891,0.11875,0.000183,-0.389134,-0.370792,0.490709,0.488971
350,1.1795,1.159958,-0.003687,-0.003881,0.11875,0.000194,-0.388107,-0.368673,0.521754,0.519348
400,1.1899,1.156021,-0.00369,-0.003875,0.11875,0.000184,-0.387485,-0.369036,0.489609,0.484913
450,1.1752,1.152209,-0.003645,-0.00383,0.11875,0.000186,-0.383034,-0.364483,0.503141,0.501731
500,1.2246,1.144456,-0.003597,-0.003772,0.11875,0.000175,-0.377204,-0.359672,0.54456,0.541875


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1378742456436157
Training with params: (8, 8, 0.0003, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9537,1.410451,-0.000602,-0.000622,0.11875,2e-05,-0.622232,-0.601735,-0.09626,-0.103783
100,1.3432,1.226737,-0.000389,-0.000409,0.1125,2e-05,-0.408662,-0.388902,0.199237,0.195474
150,1.2601,1.19196,-0.000388,-0.000408,0.11875,1.9e-05,-0.407602,-0.388342,0.576075,0.576972
200,1.2211,1.173573,-0.000375,-0.000393,0.10625,1.8e-05,-0.392942,-0.375303,0.59104,0.594793
250,1.2481,1.235556,-0.000383,-0.0004,0.11875,1.7e-05,-0.400462,-0.383092,0.58816,0.592625
300,1.201,1.148131,-0.000365,-0.000382,0.1125,1.8e-05,-0.382141,-0.364584,0.497984,0.499297
350,1.1745,1.146518,-0.000366,-0.000385,0.11875,1.9e-05,-0.385471,-0.366101,0.580721,0.582842
400,1.1862,1.140596,-0.000364,-0.000383,0.11875,1.9e-05,-0.382782,-0.363872,0.508129,0.506305
450,1.1724,1.137278,-0.000359,-0.000377,0.1125,1.8e-05,-0.376834,-0.359007,0.474299,0.477072
500,1.2137,1.12914,-0.000355,-0.000372,0.1125,1.7e-05,-0.371966,-0.354541,0.475319,0.472886


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1179479360580444
Training with params: (8, 8, 0.0003, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9599,1.416276,-0.00601,-0.006214,0.11875,0.000204,-0.621395,-0.600951,-0.102249,-0.109186
100,1.3465,1.231511,-0.003878,-0.004072,0.11875,0.000194,-0.407223,-0.387835,0.243654,0.240368
150,1.2654,1.19779,-0.003885,-0.004076,0.11875,0.000192,-0.407615,-0.388454,0.574312,0.574279
200,1.2261,1.178852,-0.003743,-0.003919,0.10625,0.000176,-0.391892,-0.374296,0.609171,0.613512
250,1.2279,1.163099,-0.003686,-0.003858,0.11875,0.000172,-0.385825,-0.368637,0.608061,0.61143
300,1.1755,1.153099,-0.003644,-0.003822,0.1125,0.000178,-0.382163,-0.364373,0.488369,0.491851
350,1.1696,1.149645,-0.003647,-0.003837,0.11875,0.00019,-0.383701,-0.364667,0.630152,0.632367
400,1.1819,1.145746,-0.00365,-0.003831,0.1125,0.000181,-0.383085,-0.365026,0.55308,0.553126
450,1.1678,1.1426,-0.003593,-0.003766,0.1125,0.000173,-0.376611,-0.359342,0.499157,0.500735
500,1.2152,1.134487,-0.003533,-0.003703,0.11875,0.000171,-0.370325,-0.353251,0.472724,0.472218


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1288748979568481
Training with params: (8, 8, 0.0003, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9537,1.410486,-0.000601,-0.000622,0.11875,2e-05,-0.621621,-0.601212,-0.093426,-0.100355
100,1.3425,1.225709,-0.000389,-0.000409,0.11875,2e-05,-0.409012,-0.389247,0.211985,0.208098
150,1.2591,1.191764,-0.000389,-0.000408,0.11875,1.9e-05,-0.408063,-0.388861,0.563564,0.564158
200,1.2199,1.172272,-0.000374,-0.000392,0.1125,1.7e-05,-0.391672,-0.374179,0.583942,0.58945
250,1.2208,1.156771,-0.00037,-0.000387,0.11875,1.7e-05,-0.386921,-0.369794,0.60781,0.610118
300,1.1695,1.1464,-0.000364,-0.000381,0.1125,1.8e-05,-0.38141,-0.363618,0.474146,0.475491
350,1.1629,1.143982,-0.000365,-0.000384,0.11875,1.9e-05,-0.383834,-0.36504,0.633213,0.634839
400,1.1749,1.139636,-0.000364,-0.000382,0.11875,1.8e-05,-0.382246,-0.364095,0.570135,0.570034
450,1.1617,1.136494,-0.000359,-0.000377,0.1125,1.8e-05,-0.376949,-0.359256,0.50276,0.504564


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1364939212799072
Training with params: (8, 8, 0.0003, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9602,1.416868,-0.006024,-0.00623,0.11875,0.000206,-0.622987,-0.602434,-0.074444,-0.081702
100,1.3469,1.231491,-0.003887,-0.004083,0.11875,0.000196,-0.408295,-0.388662,0.255823,0.252403
150,1.2645,1.198092,-0.003894,-0.004087,0.11875,0.000192,-0.408673,-0.389425,0.591246,0.591242
200,1.2264,1.178467,-0.00374,-0.003915,0.1125,0.000176,-0.391519,-0.373955,0.576273,0.580056
250,1.2267,1.162689,-0.003687,-0.00386,0.11875,0.000173,-0.386025,-0.368727,0.601699,0.604181
300,1.1751,1.153258,-0.003643,-0.00382,0.1125,0.000177,-0.382013,-0.36434,0.492087,0.49495
350,1.1688,1.150595,-0.003647,-0.003836,0.11875,0.000189,-0.383628,-0.364702,0.610069,0.612079
400,1.181,1.14587,-0.003644,-0.003825,0.11875,0.000181,-0.382542,-0.364433,0.573753,0.573442
450,1.1675,1.142511,-0.003593,-0.003768,0.1125,0.000175,-0.376826,-0.359291,0.493873,0.496023
500,1.2151,1.134404,-0.003533,-0.003704,0.1125,0.00017,-0.370374,-0.353337,0.460668,0.45977


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1287119388580322
Training with params: (8, 8, 0.0003, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9532,1.410509,-0.000602,-0.000623,0.11875,2e-05,-0.622669,-0.602195,-0.073398,-0.080772
100,1.3396,1.224375,-0.000389,-0.000408,0.11875,1.9e-05,-0.407897,-0.388615,0.289966,0.286492
150,1.2581,1.191461,-0.000388,-0.000407,0.11875,1.9e-05,-0.407466,-0.388316,0.586973,0.585084
200,1.2195,1.172094,-0.000373,-0.000391,0.1125,1.8e-05,-0.391001,-0.37344,0.601895,0.605085
250,1.2206,1.155742,-0.000368,-0.000385,0.11875,1.7e-05,-0.385259,-0.36789,0.614456,0.616455
300,1.1692,1.146397,-0.000364,-0.000381,0.1125,1.8e-05,-0.381292,-0.363657,0.511999,0.515131
350,1.1628,1.14251,-0.000364,-0.000382,0.11875,1.9e-05,-0.382284,-0.363614,0.643107,0.64535
400,1.1744,1.139262,-0.000364,-0.000382,0.1125,1.8e-05,-0.382371,-0.364009,0.577645,0.577362
450,1.1618,1.13603,-0.000359,-0.000376,0.1125,1.8e-05,-0.376413,-0.358699,0.516131,0.519039
500,1.2084,1.128382,-0.000353,-0.00037,0.1125,1.7e-05,-0.370359,-0.353016,0.477522,0.47741


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1223984956741333
Training with params: (8, 8, 0.0003, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9601,1.416329,-0.006012,-0.006217,0.11875,0.000205,-0.621723,-0.601189,-0.090244,-0.097431
100,1.3469,1.231401,-0.003888,-0.004084,0.11875,0.000196,-0.408401,-0.388813,0.242049,0.238078
150,1.2647,1.197769,-0.003884,-0.004077,0.11875,0.000193,-0.407671,-0.388364,0.573421,0.572894
200,1.226,1.177895,-0.003731,-0.003908,0.10625,0.000177,-0.39078,-0.373056,0.565852,0.569298
250,1.2271,1.163085,-0.003693,-0.003866,0.11875,0.000173,-0.386635,-0.369324,0.591827,0.593433
300,1.1755,1.152684,-0.003641,-0.00382,0.11875,0.000178,-0.381966,-0.364148,0.489241,0.492346
350,1.1693,1.14997,-0.003644,-0.003833,0.11875,0.000189,-0.383269,-0.36438,0.627889,0.629181
400,1.1815,1.145528,-0.003642,-0.003826,0.11875,0.000184,-0.382589,-0.364168,0.561914,0.560998
450,1.1679,1.143156,-0.003599,-0.003777,0.11875,0.000178,-0.377694,-0.359855,0.489997,0.491786


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1431561708450317
Training with params: (8, 16, 0.0001, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.1456,1.492019,-0.000657,-0.000679,0.11875,2.2e-05,-0.679137,-0.657345,-0.385825,-0.395972
100,1.4755,1.344769,-0.00053,-0.00055,0.11875,2e-05,-0.549965,-0.53014,-0.084508,-0.092721
150,1.3147,1.227187,-0.000398,-0.000418,0.11875,2e-05,-0.418102,-0.397933,0.161277,0.155967
200,1.2577,1.202909,-0.000394,-0.000413,0.1125,1.9e-05,-0.412953,-0.394387,0.436275,0.433523
250,1.2505,1.182027,-0.000383,-0.000401,0.1125,1.8e-05,-0.401371,-0.383447,0.411021,0.408955
300,1.1891,1.170733,-0.000376,-0.000395,0.11875,1.9e-05,-0.394523,-0.375796,0.436332,0.434415
350,1.1828,1.162544,-0.000371,-0.000391,0.11875,1.9e-05,-0.390759,-0.371306,0.396014,0.393277
400,1.194,1.158791,-0.000372,-0.000391,0.11875,1.9e-05,-0.390974,-0.372028,0.428472,0.424367
450,1.1775,1.153548,-0.000367,-0.000387,0.11875,1.9e-05,-0.386635,-0.3674,0.462193,0.462004
500,1.2266,1.14581,-0.000363,-0.000382,0.11875,1.8e-05,-0.381698,-0.363452,0.492075,0.491518


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1380729675292969
Training with params: (8, 16, 0.0001, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.1517,1.49841,-0.006577,-0.006795,0.11875,0.000217,-0.679455,-0.657722,-0.384746,-0.394804
100,1.482,1.351666,-0.00531,-0.005509,0.11875,0.000199,-0.55087,-0.53097,-0.088375,-0.096642
150,1.321,1.233738,-0.00398,-0.004182,0.11875,0.000203,-0.418246,-0.397976,0.157356,0.151952
200,1.264,1.209366,-0.003945,-0.00413,0.1125,0.000185,-0.41302,-0.394508,0.437724,0.435323
250,1.2567,1.188187,-0.003835,-0.004014,0.1125,0.00018,-0.401441,-0.383461,0.407673,0.405657
300,1.1955,1.176627,-0.003759,-0.003949,0.11875,0.000189,-0.394873,-0.375935,0.432764,0.431301
350,1.1892,1.168816,-0.003715,-0.00391,0.11875,0.000195,-0.391039,-0.37152,0.394796,0.392427
400,1.2003,1.164986,-0.003721,-0.00391,0.11875,0.000189,-0.390974,-0.37206,0.429181,0.42558
450,1.1837,1.159683,-0.003676,-0.00387,0.11875,0.000193,-0.386972,-0.367632,0.4646,0.464645
500,1.2329,1.152385,-0.003635,-0.003817,0.11875,0.000182,-0.381716,-0.363475,0.496056,0.495713


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1444644927978516
Training with params: (8, 16, 0.0001, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.1456,1.492287,-0.000658,-0.00068,0.1125,2.2e-05,-0.679554,-0.657904,-0.383521,-0.393738
100,1.4758,1.345693,-0.000531,-0.000551,0.1125,2e-05,-0.551067,-0.531205,-0.083272,-0.091328
150,1.315,1.227317,-0.000398,-0.000418,0.11875,2e-05,-0.418003,-0.397765,0.1577,0.152317
200,1.2577,1.203084,-0.000394,-0.000413,0.1125,1.9e-05,-0.412946,-0.394445,0.439059,0.436597
250,1.2504,1.181972,-0.000383,-0.000401,0.1125,1.8e-05,-0.401386,-0.383326,0.40967,0.406938
300,1.189,1.170686,-0.000376,-0.000395,0.11875,1.9e-05,-0.394747,-0.375944,0.439456,0.437313
350,1.1828,1.162383,-0.000371,-0.000391,0.11875,2e-05,-0.390901,-0.3714,0.396056,0.393249
400,1.1938,1.158313,-0.000372,-0.000391,0.11875,1.9e-05,-0.390708,-0.371685,0.432252,0.427903
450,1.1775,1.15349,-0.000367,-0.000387,0.11875,1.9e-05,-0.386712,-0.367374,0.469179,0.46905
500,1.2267,1.145865,-0.000363,-0.000382,0.11875,1.8e-05,-0.381873,-0.363458,0.494992,0.494454


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1385037899017334
Training with params: (8, 16, 0.0001, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.1519,1.498381,-0.006579,-0.006797,0.11875,0.000218,-0.67971,-0.65794,-0.38233,-0.392522
100,1.4818,1.351411,-0.005304,-0.005502,0.11875,0.000198,-0.550189,-0.530388,-0.084931,-0.093218
150,1.321,1.233465,-0.003976,-0.004179,0.11875,0.000203,-0.417898,-0.397624,0.161057,0.156018
200,1.2638,1.20943,-0.003946,-0.00413,0.1125,0.000184,-0.412966,-0.394593,0.438323,0.43582
250,1.2568,1.188254,-0.003835,-0.004015,0.1125,0.00018,-0.401484,-0.383472,0.406483,0.40451
300,1.1956,1.176593,-0.00376,-0.003948,0.11875,0.000188,-0.394773,-0.375989,0.439369,0.437266
350,1.1891,1.168475,-0.003712,-0.003906,0.11875,0.000194,-0.39064,-0.371245,0.397341,0.394781
400,1.2002,1.164653,-0.003719,-0.003909,0.11875,0.00019,-0.390859,-0.371905,0.427821,0.423613
450,1.1835,1.159934,-0.003678,-0.003873,0.11875,0.000195,-0.387299,-0.3678,0.468179,0.468198
500,1.2329,1.151754,-0.003631,-0.003815,0.11875,0.000184,-0.38154,-0.363142,0.496514,0.495844




Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.144431233406067
Training with params: (8, 16, 0.0001, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.1457,1.492149,-0.000657,-0.000679,0.11875,2.2e-05,-0.679023,-0.65728,-0.387251,-0.397144
100,1.4756,1.344905,-0.00053,-0.00055,0.11875,2e-05,-0.55004,-0.530218,-0.089305,-0.097736
150,1.3147,1.227423,-0.000398,-0.000418,0.11875,2e-05,-0.41788,-0.397651,0.157652,0.152221
200,1.2577,1.203213,-0.000394,-0.000413,0.1125,1.9e-05,-0.412929,-0.394338,0.432415,0.430043
250,1.2504,1.182245,-0.000383,-0.000401,0.1125,1.8e-05,-0.401439,-0.383494,0.409547,0.406858
300,1.1895,1.171043,-0.000376,-0.000395,0.11875,1.9e-05,-0.394791,-0.37596,0.439957,0.437754
350,1.1828,1.162574,-0.000371,-0.000391,0.11875,1.9e-05,-0.39077,-0.371272,0.393386,0.390571
400,1.1939,1.158565,-0.000372,-0.000391,0.11875,1.9e-05,-0.391195,-0.372125,0.428989,0.425185
450,1.1775,1.15387,-0.000368,-0.000387,0.11875,1.9e-05,-0.386791,-0.367517,0.467565,0.467559
500,1.2264,1.146382,-0.000363,-0.000382,0.11875,1.8e-05,-0.381853,-0.363491,0.497489,0.496941


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1383883953094482
Training with params: (8, 16, 0.0001, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.1519,1.498666,-0.006579,-0.006796,0.1125,0.000217,-0.679612,-0.657914,-0.385821,-0.395793
100,1.4819,1.351527,-0.005308,-0.005507,0.11875,0.000198,-0.550684,-0.530834,-0.087787,-0.095909
150,1.3213,1.233664,-0.003977,-0.004178,0.11875,0.000201,-0.417772,-0.397722,0.158121,0.152676
200,1.264,1.209364,-0.003944,-0.00413,0.1125,0.000186,-0.412983,-0.39442,0.432396,0.429956
250,1.2569,1.18848,-0.003837,-0.004015,0.1125,0.000179,-0.40155,-0.383699,0.402318,0.399728
300,1.1956,1.177051,-0.003761,-0.00395,0.11875,0.000189,-0.394997,-0.376063,0.434019,0.432227
350,1.1892,1.168551,-0.003714,-0.003907,0.11875,0.000193,-0.390741,-0.37142,0.396027,0.393462
400,1.2005,1.164829,-0.003721,-0.00391,0.11875,0.000189,-0.390971,-0.37206,0.428126,0.424782
450,1.1836,1.159836,-0.003676,-0.00387,0.11875,0.000193,-0.386995,-0.367646,0.46599,0.466171
500,1.2327,1.152635,-0.003636,-0.003819,0.11875,0.000183,-0.381926,-0.36364,0.497075,0.496714


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1447209119796753
Training with params: (8, 16, 0.0002, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9391,1.406538,-0.000597,-0.000617,0.11875,2.1e-05,-0.617297,-0.596778,-0.071216,-0.077936
100,1.3335,1.22125,-0.000389,-0.000408,0.11875,2e-05,-0.408272,-0.388593,0.224308,0.220923
150,1.2576,1.189801,-0.000387,-0.000407,0.11875,1.9e-05,-0.406632,-0.387333,0.511258,0.508993
200,1.2192,1.170668,-0.000373,-0.00039,0.1125,1.8e-05,-0.390303,-0.372578,0.539259,0.541293
250,1.2203,1.156439,-0.000368,-0.000386,0.11875,1.7e-05,-0.385544,-0.368235,0.586899,0.589407
300,1.1681,1.145462,-0.000362,-0.00038,0.1125,1.8e-05,-0.379953,-0.362138,0.489192,0.491915
350,1.1619,1.141603,-0.000363,-0.000381,0.11875,1.9e-05,-0.381358,-0.362837,0.61191,0.613616
400,1.1736,1.138907,-0.000363,-0.000381,0.11875,1.8e-05,-0.381023,-0.363008,0.572625,0.572831
450,1.1611,1.134007,-0.000358,-0.000375,0.11875,1.7e-05,-0.375036,-0.357572,0.529209,0.531578
500,1.2078,1.127341,-0.000352,-0.000369,0.11875,1.7e-05,-0.369408,-0.352477,0.512627,0.511635


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1156840324401855
Training with params: (8, 16, 0.0002, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.945,1.412536,-0.005969,-0.006175,0.11875,0.000206,-0.617479,-0.596907,-0.066224,-0.072702
100,1.3399,1.227466,-0.003889,-0.004087,0.11875,0.000198,-0.4087,-0.388926,0.219158,0.2157
150,1.2637,1.19583,-0.003875,-0.004068,0.11875,0.000193,-0.406798,-0.387528,0.516141,0.513593
200,1.225,1.176932,-0.003727,-0.003903,0.1125,0.000176,-0.390267,-0.372654,0.541924,0.543798
250,1.2263,1.162208,-0.003687,-0.003859,0.1125,0.000173,-0.385942,-0.368682,0.574981,0.577633
300,1.1743,1.151937,-0.00363,-0.003809,0.11875,0.000179,-0.380862,-0.362958,0.482036,0.484776
350,1.1683,1.148009,-0.003633,-0.003819,0.11875,0.000186,-0.381856,-0.363252,0.607908,0.610174
400,1.1798,1.143935,-0.003624,-0.003804,0.11875,0.00018,-0.380383,-0.362374,0.558211,0.558125
450,1.1671,1.140387,-0.003571,-0.003746,0.11875,0.000175,-0.374588,-0.357085,0.513027,0.51521
500,1.2142,1.13274,-0.003513,-0.003683,0.1125,0.00017,-0.368344,-0.351326,0.502712,0.502389


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.121551513671875
Training with params: (8, 16, 0.0002, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.939,1.406257,-0.000597,-0.000617,0.11875,2.1e-05,-0.617303,-0.596708,-0.075102,-0.082478
100,1.3334,1.220891,-0.000389,-0.000408,0.11875,2e-05,-0.408254,-0.388633,0.227301,0.223582
150,1.2575,1.189613,-0.000388,-0.000407,0.11875,1.9e-05,-0.406842,-0.387509,0.517611,0.51529
200,1.219,1.17061,-0.000373,-0.000391,0.1125,1.8e-05,-0.39083,-0.37306,0.546571,0.548788
250,1.2201,1.156876,-0.000369,-0.000386,0.1125,1.7e-05,-0.386174,-0.368985,0.577258,0.580201
300,1.1681,1.146064,-0.000363,-0.000381,0.11875,1.8e-05,-0.380862,-0.362968,0.475618,0.477821
350,1.1617,1.142835,-0.000364,-0.000382,0.11875,1.9e-05,-0.382354,-0.363726,0.582279,0.584492
400,1.1739,1.137894,-0.000363,-0.00038,0.11875,1.8e-05,-0.380499,-0.362693,0.547935,0.547736
450,1.1609,1.135212,-0.000358,-0.000375,0.1125,1.7e-05,-0.374905,-0.357759,0.508822,0.510773
500,1.2082,1.127447,-0.000352,-0.000369,0.1125,1.7e-05,-0.369176,-0.352318,0.494777,0.494329




Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1157292127609253
Training with params: (8, 16, 0.0002, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.945,1.412142,-0.005962,-0.006167,0.11875,0.000205,-0.616665,-0.596182,-0.080835,-0.087658
100,1.3396,1.227082,-0.003882,-0.004078,0.11875,0.000196,-0.407806,-0.388195,0.230801,0.227027
150,1.2636,1.19562,-0.00387,-0.004064,0.11875,0.000195,-0.406436,-0.386971,0.512236,0.510026
200,1.2253,1.175518,-0.003727,-0.003905,0.10625,0.000178,-0.390532,-0.372683,0.528773,0.530518
250,1.2264,1.162507,-0.003688,-0.00386,0.1125,0.000172,-0.385996,-0.368845,0.542705,0.544889
300,1.1744,1.150895,-0.003628,-0.003808,0.1125,0.000179,-0.380772,-0.362824,0.467236,0.469368
350,1.1683,1.148214,-0.00363,-0.003816,0.11875,0.000186,-0.381584,-0.362952,0.592071,0.594158
400,1.1804,1.144415,-0.003624,-0.003804,0.11875,0.000181,-0.380437,-0.362365,0.534439,0.53413
450,1.1667,1.141313,-0.003574,-0.00375,0.1125,0.000175,-0.374981,-0.357443,0.492425,0.494393


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1413127183914185
Training with params: (8, 16, 0.0002, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9387,1.406274,-0.000597,-0.000618,0.11875,2.1e-05,-0.617631,-0.597074,-0.074716,-0.081791
100,1.3337,1.22082,-0.000389,-0.000408,0.11875,2e-05,-0.408224,-0.388579,0.218989,0.21492
150,1.2575,1.190011,-0.000388,-0.000407,0.11875,1.9e-05,-0.407283,-0.387964,0.515469,0.512685
200,1.2191,1.170618,-0.000373,-0.000391,0.1125,1.8e-05,-0.390909,-0.373186,0.534181,0.535852
250,1.2201,1.156979,-0.000369,-0.000386,0.1125,1.7e-05,-0.385901,-0.368781,0.545147,0.546712
300,1.1683,1.145395,-0.000363,-0.000381,0.11875,1.8e-05,-0.380554,-0.362623,0.462255,0.464919
350,1.1619,1.142183,-0.000363,-0.000382,0.11875,1.9e-05,-0.381779,-0.363056,0.594,0.596082
400,1.1737,1.13827,-0.000363,-0.000381,0.11875,1.8e-05,-0.38103,-0.362808,0.544298,0.543644
450,1.1608,1.135875,-0.000359,-0.000376,0.1125,1.7e-05,-0.375754,-0.358522,0.497891,0.499701


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1358752250671387
Training with params: (8, 16, 0.0002, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9448,1.412629,-0.005967,-0.006172,0.11875,0.000204,-0.617181,-0.596733,-0.083542,-0.09034
100,1.3394,1.22745,-0.003889,-0.004085,0.11875,0.000196,-0.40847,-0.388873,0.225026,0.221504
150,1.2637,1.19668,-0.00388,-0.004072,0.11875,0.000192,-0.407249,-0.388013,0.517935,0.515762
200,1.2254,1.176953,-0.003728,-0.003905,0.1125,0.000177,-0.390505,-0.372832,0.544027,0.546216
250,1.2265,1.163437,-0.003689,-0.003861,0.1125,0.000172,-0.38609,-0.368884,0.563039,0.565041
300,1.1743,1.151408,-0.003623,-0.003802,0.1125,0.000179,-0.380158,-0.362281,0.472537,0.475762
350,1.1684,1.149059,-0.003639,-0.003825,0.11875,0.000186,-0.382493,-0.363884,0.583918,0.585821
400,1.1799,1.14431,-0.003629,-0.003809,0.11875,0.000181,-0.38092,-0.362862,0.546206,0.545575
450,1.167,1.142301,-0.003584,-0.003756,0.11875,0.000172,-0.375648,-0.358437,0.523425,0.525825


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.142301082611084
Training with params: (8, 16, 0.0003, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8466,1.32192,-0.000498,-0.00052,0.11875,2.1e-05,-0.519511,-0.498151,-0.171129,-0.1787
100,1.2856,1.20624,-0.000387,-0.000405,0.11875,1.8e-05,-0.405355,-0.387293,0.453518,0.452299
150,1.2439,1.177764,-0.000383,-0.000403,0.11875,1.9e-05,-0.40268,-0.383258,0.613471,0.614684
200,1.2112,1.160944,-0.000366,-0.000383,0.1125,1.7e-05,-0.382745,-0.365616,0.562764,0.568147
250,1.2142,1.146852,-0.000361,-0.000377,0.1125,1.6e-05,-0.377424,-0.361102,0.566032,0.569348
300,1.1633,1.139481,-0.000357,-0.000374,0.11875,1.7e-05,-0.373901,-0.356921,0.416035,0.416348
350,1.1568,1.137884,-0.000361,-0.000379,0.11875,1.8e-05,-0.378761,-0.361063,0.631309,0.632132
400,1.1694,1.13491,-0.00036,-0.000378,0.11875,1.8e-05,-0.378165,-0.360269,0.558225,0.557239
450,1.157,1.128593,-0.000353,-0.00037,0.11875,1.7e-05,-0.369749,-0.352743,0.506384,0.506121
500,1.2022,1.120797,-0.000345,-0.000362,0.11875,1.7e-05,-0.362469,-0.345399,0.449222,0.446153


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1169404983520508
Training with params: (8, 16, 0.0003, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8528,1.331435,-0.005039,-0.005253,0.11875,0.000214,-0.525319,-0.503881,-0.141184,-0.149379
100,1.2922,1.210943,-0.003867,-0.00405,0.11875,0.000183,-0.405013,-0.386745,0.449517,0.447964
150,1.25,1.186845,-0.003844,-0.00404,0.11875,0.000196,-0.403973,-0.384398,0.593071,0.593924
200,1.2177,1.168854,-0.003665,-0.003836,0.10625,0.000171,-0.383627,-0.366495,0.559705,0.564801
250,1.219,1.153691,-0.003624,-0.003789,0.11875,0.000166,-0.37894,-0.362357,0.569952,0.576162
300,1.1691,1.146695,-0.003582,-0.003752,0.11875,0.00017,-0.375198,-0.358171,0.424947,0.426605
350,1.1626,1.145149,-0.003611,-0.00379,0.11875,0.00018,-0.379012,-0.361051,0.647316,0.649534
400,1.1755,1.140873,-0.003601,-0.003775,0.11875,0.000174,-0.377483,-0.360121,0.570904,0.571525
450,1.1634,1.135093,-0.003521,-0.003692,0.11875,0.000171,-0.369212,-0.352122,0.499934,0.500633
500,1.2081,1.129021,-0.003476,-0.003646,0.11875,0.00017,-0.364566,-0.347605,0.468615,0.465435


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1245132684707642
Training with params: (8, 16, 0.0003, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8465,1.32006,-0.000496,-0.000518,0.11875,2.1e-05,-0.517592,-0.496167,-0.16975,-0.177414
100,1.2856,1.205366,-0.000387,-0.000405,0.11875,1.8e-05,-0.404569,-0.386551,0.478525,0.478194
150,1.2432,1.179254,-0.000385,-0.000404,0.11875,1.9e-05,-0.403859,-0.384579,0.638886,0.639678
200,1.2111,1.161502,-0.000367,-0.000384,0.10625,1.7e-05,-0.383599,-0.366522,0.568154,0.57369
250,1.2131,1.14758,-0.000362,-0.000378,0.1125,1.6e-05,-0.378286,-0.362013,0.601729,0.608394
300,1.164,1.140363,-0.000357,-0.000374,0.11875,1.7e-05,-0.374241,-0.357237,0.434687,0.436472
350,1.1563,1.138483,-0.000362,-0.00038,0.11875,1.8e-05,-0.379869,-0.362217,0.642303,0.644003
400,1.1693,1.134019,-0.000359,-0.000377,0.11875,1.8e-05,-0.377001,-0.359178,0.575737,0.576591
450,1.1573,1.129518,-0.000353,-0.00037,0.11875,1.7e-05,-0.370061,-0.353226,0.525308,0.526333
500,1.2026,1.121521,-0.000346,-0.000363,0.11875,1.7e-05,-0.363062,-0.346136,0.473487,0.471235


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.116916537284851
Training with params: (8, 16, 0.0003, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8538,1.32652,-0.004945,-0.005156,0.11875,0.000211,-0.515615,-0.494477,-0.18128,-0.188594
100,1.2927,1.210892,-0.003864,-0.004042,0.1125,0.000178,-0.404203,-0.386443,0.498459,0.498028
150,1.2494,1.1848,-0.003839,-0.004035,0.11875,0.000196,-0.403493,-0.383901,0.669745,0.670798
200,1.2229,1.167943,-0.003665,-0.003839,0.10625,0.000174,-0.38392,-0.36652,0.578988,0.58456
250,1.2185,1.153362,-0.003625,-0.003788,0.1125,0.000162,-0.378759,-0.362513,0.648995,0.653547
300,1.1689,1.146339,-0.003584,-0.003756,0.11875,0.000172,-0.375596,-0.358441,0.468979,0.470209
350,1.1631,1.143608,-0.003602,-0.003778,0.11875,0.000177,-0.377836,-0.360167,0.673941,0.676476
400,1.1753,1.140944,-0.003608,-0.00378,0.11875,0.000172,-0.378032,-0.360789,0.6125,0.612331
450,1.1627,1.136735,-0.003534,-0.003699,0.1125,0.000165,-0.369886,-0.353363,0.533175,0.535227
500,1.2083,1.128659,-0.00347,-0.003637,0.11875,0.000167,-0.363724,-0.347019,0.482487,0.481335


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.12431800365448
Training with params: (8, 16, 0.0003, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8473,1.320524,-0.000497,-0.000519,0.11875,2.2e-05,-0.518701,-0.497196,-0.173512,-0.181876
100,1.2859,1.204388,-0.000387,-0.000406,0.11875,1.8e-05,-0.405535,-0.387484,0.476693,0.475848
150,1.2424,1.179862,-0.000384,-0.000404,0.11875,1.9e-05,-0.403655,-0.384304,0.613116,0.613751
200,1.2107,1.162624,-0.000365,-0.000382,0.1125,1.7e-05,-0.382438,-0.365297,0.533426,0.536717
250,1.241,1.147276,-0.000361,-0.000378,0.11875,1.6e-05,-0.377858,-0.36144,0.647677,0.652067
300,1.1695,1.141024,-0.000359,-0.000376,0.11875,1.7e-05,-0.376056,-0.358859,0.483536,0.485999
350,1.1603,1.137485,-0.000359,-0.000377,0.11875,1.8e-05,-0.377223,-0.359392,0.643641,0.643539
400,1.1693,1.133769,-0.000358,-0.000376,0.11875,1.8e-05,-0.376082,-0.358189,0.573724,0.57247
450,1.1579,1.13006,-0.000353,-0.00037,0.11875,1.7e-05,-0.369917,-0.353367,0.507345,0.507154
500,1.2025,1.12219,-0.000347,-0.000364,0.11875,1.7e-05,-0.364016,-0.347215,0.47223,0.469966


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1173858642578125
Training with params: (8, 16, 0.0003, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8535,1.334038,-0.005068,-0.005283,0.11875,0.000216,-0.528322,-0.506766,-0.170095,-0.177958
100,1.292,1.211028,-0.003866,-0.004048,0.1125,0.000181,-0.404785,-0.386638,0.461021,0.459012
150,1.2496,1.184821,-0.003842,-0.004038,0.11875,0.000196,-0.403834,-0.384218,0.629399,0.630655
200,1.2172,1.167593,-0.003661,-0.003834,0.10625,0.000173,-0.383395,-0.366133,0.555497,0.56075
250,1.2187,1.152712,-0.003625,-0.003787,0.1125,0.000162,-0.378738,-0.362524,0.594804,0.601577
300,1.1695,1.146349,-0.003587,-0.003758,0.11875,0.000171,-0.375752,-0.358699,0.436426,0.439058
350,1.1627,1.144026,-0.003602,-0.003777,0.11875,0.000175,-0.377701,-0.360178,0.656716,0.658578
400,1.1752,1.140148,-0.003589,-0.003764,0.11875,0.000175,-0.376379,-0.358875,0.594785,0.595507
450,1.1625,1.135916,-0.003534,-0.003701,0.1125,0.000167,-0.370096,-0.353356,0.538729,0.54085
500,1.2086,1.127754,-0.003464,-0.003635,0.11875,0.00017,-0.363462,-0.346444,0.482394,0.481661


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1229242086410522
Training with params: (8, 32, 0.0001, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9892,1.437023,-0.000642,-0.000662,0.11875,2.1e-05,-0.662415,-0.641551,0.061351,0.053549
100,1.3737,1.237976,-0.000396,-0.000415,0.11875,2e-05,-0.415478,-0.395511,0.046723,0.040531
150,1.2721,1.203589,-0.000392,-0.000411,0.11875,2e-05,-0.411424,-0.391557,0.4103,0.408505
200,1.231,1.182876,-0.000382,-0.0004,0.10625,1.8e-05,-0.400388,-0.382462,0.532312,0.53266
250,1.2291,1.165783,-0.000375,-0.000393,0.1125,1.8e-05,-0.392887,-0.375025,0.508316,0.507649
300,1.1755,1.155443,-0.000367,-0.000386,0.11875,1.8e-05,-0.38562,-0.367396,0.473238,0.471742
350,1.1691,1.15024,-0.000366,-0.000386,0.11875,1.9e-05,-0.385538,-0.36617,0.508223,0.506026
400,1.1799,1.145492,-0.000366,-0.000385,0.11875,1.9e-05,-0.384592,-0.365975,0.53227,0.528766
450,1.1656,1.140967,-0.000361,-0.000379,0.11875,1.9e-05,-0.379175,-0.360663,0.543074,0.543827
500,1.2141,1.134141,-0.000356,-0.000374,0.11875,1.8e-05,-0.374015,-0.356334,0.558074,0.556642


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1225171089172363
Training with params: (8, 32, 0.0001, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9956,1.443514,-0.006417,-0.006624,0.11875,0.000208,-0.662444,-0.641656,0.060899,0.053358
100,1.3825,1.245278,-0.003984,-0.004179,0.1125,0.000195,-0.417942,-0.398408,0.078831,0.073102
150,1.2795,1.209479,-0.00392,-0.004118,0.11875,0.000198,-0.411813,-0.392035,0.401862,0.399245
200,1.2369,1.189645,-0.003832,-0.004008,0.1125,0.000176,-0.400798,-0.38317,0.510858,0.509567
250,1.2352,1.171715,-0.003751,-0.003926,0.1125,0.000175,-0.392567,-0.375098,0.519703,0.518508
300,1.1797,1.16158,-0.003677,-0.00386,0.11875,0.000183,-0.385958,-0.367672,0.479312,0.478081
350,1.1743,1.156057,-0.003662,-0.003854,0.11875,0.000193,-0.385449,-0.366151,0.526609,0.523867
400,1.1852,1.151468,-0.003661,-0.003845,0.11875,0.000184,-0.384477,-0.366126,0.558838,0.555928
450,1.1709,1.14771,-0.003614,-0.003798,0.11875,0.000184,-0.379767,-0.361379,0.565539,0.566495
500,1.2199,1.139991,-0.003562,-0.003738,0.1125,0.000176,-0.373821,-0.356231,0.58965,0.589032


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1334236860275269
Training with params: (8, 32, 0.0001, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9894,1.437546,-0.000642,-0.000663,0.11875,2.1e-05,-0.662676,-0.64191,0.057611,0.049829
100,1.3743,1.23861,-0.000396,-0.000416,0.11875,2e-05,-0.415643,-0.395804,0.041989,0.035731
150,1.2725,1.203768,-0.000392,-0.000412,0.11875,2e-05,-0.411691,-0.39199,0.391698,0.389497
200,1.2305,1.183613,-0.000383,-0.000401,0.10625,1.8e-05,-0.401028,-0.383312,0.513059,0.512649
250,1.2289,1.165704,-0.000375,-0.000393,0.1125,1.8e-05,-0.392885,-0.375253,0.507787,0.507149
300,1.1753,1.155452,-0.000367,-0.000386,0.11875,1.8e-05,-0.385626,-0.367395,0.477123,0.475775
350,1.1692,1.150336,-0.000366,-0.000386,0.11875,1.9e-05,-0.385637,-0.366249,0.508322,0.505991
400,1.18,1.146311,-0.000367,-0.000386,0.11875,1.9e-05,-0.385716,-0.367006,0.540225,0.536768
450,1.1655,1.141079,-0.000361,-0.000379,0.11875,1.9e-05,-0.379225,-0.360593,0.539136,0.539942
500,1.2138,1.134755,-0.000357,-0.000374,0.11875,1.8e-05,-0.374411,-0.356704,0.55869,0.557782


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1226872205734253
Training with params: (8, 32, 0.0001, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9956,1.443439,-0.006415,-0.006624,0.11875,0.000209,-0.662398,-0.641542,0.061315,0.053654
100,1.3805,1.24467,-0.00396,-0.004158,0.11875,0.000198,-0.415833,-0.396002,0.037615,0.03143
150,1.2786,1.209527,-0.003921,-0.004119,0.11875,0.000198,-0.411888,-0.392083,0.388606,0.386121
200,1.2361,1.188725,-0.003816,-0.003995,0.10625,0.000179,-0.399468,-0.381608,0.534398,0.533232
250,1.2351,1.171793,-0.003753,-0.003929,0.1125,0.000176,-0.392901,-0.375291,0.52004,0.519073
300,1.1798,1.160946,-0.00367,-0.003854,0.11875,0.000183,-0.38537,-0.367033,0.485601,0.483941
350,1.1746,1.154992,-0.003654,-0.003843,0.11875,0.000189,-0.384314,-0.365387,0.517997,0.51536
400,1.1853,1.1513,-0.003657,-0.003841,0.11875,0.000184,-0.384107,-0.365671,0.544374,0.541045
450,1.1708,1.14726,-0.003611,-0.003796,0.11875,0.000186,-0.379643,-0.361062,0.552198,0.552635
500,1.2204,1.140295,-0.003565,-0.003741,0.11875,0.000176,-0.374052,-0.356483,0.576648,0.576059


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1284137964248657
Training with params: (8, 32, 0.0001, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9894,1.437096,-0.000641,-0.000662,0.11875,2.1e-05,-0.662306,-0.641393,0.059825,0.052382
100,1.3768,1.239106,-0.000397,-0.000417,0.1125,2e-05,-0.417086,-0.396982,0.040664,0.034092
150,1.2722,1.203257,-0.000392,-0.000411,0.11875,2e-05,-0.411493,-0.391711,0.393927,0.392005
200,1.23,1.182973,-0.000382,-0.0004,0.1125,1.8e-05,-0.400098,-0.382351,0.53331,0.53281
250,1.2289,1.166201,-0.000376,-0.000393,0.1125,1.7e-05,-0.393154,-0.375656,0.505101,0.504306
300,1.1738,1.15546,-0.000367,-0.000385,0.11875,1.8e-05,-0.385224,-0.367162,0.490479,0.488697
350,1.1684,1.149432,-0.000366,-0.000385,0.11875,1.9e-05,-0.38479,-0.365685,0.515708,0.513602
400,1.1789,1.145194,-0.000366,-0.000385,0.11875,1.9e-05,-0.384737,-0.366072,0.535109,0.531657
450,1.1643,1.141207,-0.000361,-0.000379,0.11875,1.8e-05,-0.379258,-0.360792,0.542769,0.542369
500,1.214,1.13394,-0.000356,-0.000374,0.1125,1.8e-05,-0.373919,-0.356281,0.575821,0.574575


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1269630193710327
Training with params: (8, 32, 0.0001, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9956,1.443248,-0.006416,-0.006624,0.11875,0.000208,-0.662438,-0.641595,0.055477,0.047793
100,1.3808,1.244417,-0.003958,-0.004155,0.11875,0.000197,-0.415497,-0.395768,0.042856,0.036578
150,1.2785,1.209433,-0.003919,-0.004116,0.11875,0.000197,-0.411649,-0.391934,0.39725,0.394987
200,1.2362,1.188818,-0.003822,-0.004,0.1125,0.000178,-0.400004,-0.382174,0.53374,0.532565
250,1.2352,1.170272,-0.003741,-0.003919,0.1125,0.000177,-0.39187,-0.374148,0.501336,0.500175
300,1.18,1.160449,-0.003666,-0.003849,0.11875,0.000183,-0.384854,-0.3666,0.476823,0.475403
350,1.1743,1.154732,-0.003645,-0.003836,0.11875,0.000191,-0.383635,-0.364537,0.515225,0.512946
400,1.185,1.151001,-0.003647,-0.003833,0.11875,0.000186,-0.383328,-0.364739,0.536444,0.532847
450,1.1708,1.147397,-0.003612,-0.003795,0.11875,0.000183,-0.379488,-0.361206,0.543964,0.544171
500,1.2201,1.140643,-0.003566,-0.003741,0.11875,0.000175,-0.374099,-0.356603,0.562679,0.562162


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1332669258117676
Training with params: (8, 32, 0.0002, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8274,1.276849,-0.000431,-0.000453,0.11875,2.2e-05,-0.452562,-0.430818,-0.120351,-0.127723
100,1.2828,1.205765,-0.000385,-0.000402,0.11875,1.7e-05,-0.402336,-0.385033,0.452878,0.451418
150,1.2412,1.1777,-0.000383,-0.000402,0.11875,1.9e-05,-0.402062,-0.38292,0.654456,0.654121
200,1.21,1.160071,-0.000365,-0.000382,0.1125,1.7e-05,-0.381621,-0.364686,0.578632,0.582769
250,1.2137,1.145494,-0.000359,-0.000376,0.1125,1.6e-05,-0.375653,-0.359463,0.613789,0.616351
300,1.1626,1.138784,-0.000357,-0.000375,0.11875,1.7e-05,-0.37457,-0.357169,0.481813,0.483118
350,1.1561,1.136853,-0.000359,-0.000378,0.11875,1.8e-05,-0.377658,-0.359349,0.685012,0.686395
400,1.1678,1.13366,-0.000359,-0.000377,0.11875,1.8e-05,-0.376978,-0.35929,0.577188,0.575438
450,1.156,1.127613,-0.000351,-0.000369,0.11875,1.7e-05,-0.368607,-0.351318,0.583915,0.585263
500,1.2013,1.120402,-0.000346,-0.000362,0.11875,1.7e-05,-0.362128,-0.34559,0.576888,0.576081


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1154929399490356
Training with params: (8, 32, 0.0002, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8338,1.282993,-0.004304,-0.00452,0.11875,0.000217,-0.452045,-0.430365,-0.123117,-0.130569
100,1.2896,1.211171,-0.003851,-0.004025,0.11875,0.000174,-0.402462,-0.385085,0.466434,0.464763
150,1.2473,1.184253,-0.00384,-0.004031,0.11875,0.00019,-0.403057,-0.384044,0.681968,0.681905
200,1.2162,1.167577,-0.003647,-0.003816,0.1125,0.000169,-0.381562,-0.364661,0.589582,0.594352
250,1.2176,1.151506,-0.003601,-0.003767,0.1125,0.000166,-0.376691,-0.360112,0.603887,0.608288
300,1.1683,1.145435,-0.00358,-0.003753,0.11875,0.000173,-0.375278,-0.357951,0.496719,0.498921
350,1.166,1.141697,-0.003584,-0.003765,0.11875,0.000181,-0.37648,-0.358351,0.6924,0.69471
400,1.174,1.139315,-0.003601,-0.003777,0.11875,0.000176,-0.377688,-0.360088,0.619917,0.620551
450,1.1619,1.134875,-0.003535,-0.003709,0.11875,0.000173,-0.370869,-0.353531,0.584812,0.585923
500,1.2077,1.126641,-0.003467,-0.003638,0.11875,0.000171,-0.363819,-0.346698,0.58037,0.579268


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1221457719802856
Training with params: (8, 32, 0.0002, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8275,1.276337,-0.000431,-0.000452,0.11875,2.2e-05,-0.452151,-0.43063,-0.124239,-0.131347
100,1.282,1.205087,-0.000383,-0.000401,0.11875,1.8e-05,-0.400662,-0.383038,0.496159,0.494948
150,1.2423,1.177217,-0.000383,-0.000402,0.11875,1.9e-05,-0.402463,-0.383227,0.655415,0.654087
200,1.2104,1.160058,-0.000365,-0.000382,0.1125,1.7e-05,-0.382031,-0.365073,0.580028,0.583152
250,1.2115,1.14564,-0.000359,-0.000376,0.1125,1.6e-05,-0.375553,-0.359306,0.611551,0.613764
300,1.1625,1.138621,-0.000357,-0.000374,0.11875,1.7e-05,-0.374178,-0.357006,0.47246,0.473849
350,1.1556,1.136102,-0.000359,-0.000377,0.11875,1.8e-05,-0.376805,-0.358861,0.686417,0.688264
400,1.1675,1.133371,-0.000359,-0.000377,0.11875,1.8e-05,-0.377041,-0.359325,0.59636,0.595621
450,1.1556,1.127119,-0.000352,-0.00037,0.11875,1.8e-05,-0.369819,-0.35226,0.583541,0.584926
500,1.2009,1.120348,-0.000345,-0.000363,0.11875,1.7e-05,-0.362542,-0.345458,0.525709,0.523546


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.115854024887085
Training with params: (8, 32, 0.0002, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.833,1.283426,-0.004317,-0.004535,0.11875,0.000219,-0.453508,-0.431657,-0.132812,-0.140305
100,1.2867,1.210475,-0.003851,-0.004031,0.11875,0.00018,-0.403092,-0.38514,0.510988,0.510171
150,1.2473,1.184557,-0.003837,-0.00403,0.11875,0.000193,-0.402978,-0.383727,0.647745,0.647907
200,1.216,1.1667,-0.003657,-0.003827,0.1125,0.00017,-0.382692,-0.365702,0.560359,0.565254
250,1.2175,1.151915,-0.003605,-0.003769,0.1125,0.000165,-0.376922,-0.360472,0.606342,0.611392
300,1.1677,1.145325,-0.003575,-0.003749,0.11875,0.000175,-0.374919,-0.357459,0.481048,0.484902
350,1.1615,1.141851,-0.003593,-0.003777,0.11875,0.000184,-0.37772,-0.35933,0.677882,0.680451
400,1.1736,1.139232,-0.003598,-0.003778,0.11875,0.00018,-0.377818,-0.35981,0.611099,0.61113
450,1.1617,1.134458,-0.003527,-0.0037,0.11875,0.000173,-0.369995,-0.352676,0.560438,0.561202
500,1.2068,1.126725,-0.003466,-0.003635,0.11875,0.000169,-0.363478,-0.346606,0.533432,0.532526


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1209979057312012
Training with params: (8, 32, 0.0002, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8268,1.27699,-0.000431,-0.000452,0.11875,2.1e-05,-0.452376,-0.430964,-0.13102,-0.138942
100,1.2802,1.206109,-0.000385,-0.000402,0.11875,1.8e-05,-0.402366,-0.384614,0.497087,0.497341
150,1.2413,1.177679,-0.000383,-0.000403,0.11875,1.9e-05,-0.40257,-0.383496,0.667045,0.667694
200,1.2099,1.159686,-0.000365,-0.000382,0.1125,1.7e-05,-0.381621,-0.364566,0.571162,0.577081
250,1.2113,1.144904,-0.00036,-0.000376,0.11875,1.6e-05,-0.376306,-0.35983,0.615751,0.618179
300,1.1616,1.1389,-0.000359,-0.000376,0.11875,1.7e-05,-0.375722,-0.358582,0.498394,0.500632
350,1.1553,1.135144,-0.000358,-0.000376,0.11875,1.8e-05,-0.376342,-0.358366,0.69822,0.700628
400,1.1681,1.133455,-0.000361,-0.000379,0.11875,1.8e-05,-0.378597,-0.360783,0.63179,0.631963
450,1.1555,1.127708,-0.000353,-0.00037,0.1125,1.7e-05,-0.36979,-0.352804,0.582829,0.584904
500,1.2,1.119731,-0.000345,-0.000362,0.11875,1.7e-05,-0.362278,-0.345417,0.545226,0.544259


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1153979301452637
Training with params: (8, 32, 0.0002, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8332,1.289145,-0.004407,-0.004624,0.11875,0.000217,-0.462404,-0.440742,-0.106106,-0.114013
100,1.2885,1.21113,-0.003854,-0.004029,0.1125,0.000174,-0.402887,-0.385442,0.501849,0.502481
150,1.2487,1.182989,-0.003827,-0.004017,0.11875,0.00019,-0.401652,-0.382674,0.632926,0.631234
200,1.2423,1.166162,-0.003668,-0.003837,0.1125,0.000169,-0.383699,-0.36682,0.527806,0.527693
250,1.2264,1.152707,-0.003593,-0.003759,0.1125,0.000166,-0.375903,-0.359288,0.558319,0.558103
300,1.1747,1.146222,-0.003578,-0.003752,0.11875,0.000174,-0.375196,-0.357763,0.479072,0.478551
350,1.1657,1.142302,-0.003586,-0.003757,0.11875,0.000171,-0.375667,-0.358569,0.637432,0.635654
400,1.1745,1.139217,-0.003597,-0.003771,0.11875,0.000174,-0.377124,-0.359695,0.554029,0.551319
450,1.1663,1.134018,-0.003518,-0.003691,0.11875,0.000173,-0.369107,-0.351843,0.527472,0.526602
500,1.208,1.126257,-0.003461,-0.003629,0.11875,0.000168,-0.362936,-0.34613,0.532416,0.528568


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1220871210098267
Training with params: (8, 32, 0.0003, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.7469,1.269555,-0.000407,-0.000428,0.11875,2.1e-05,-0.42788,-0.406842,0.127407,0.123739
100,1.2651,1.19364,-0.00038,-0.000397,0.1125,1.6e-05,-0.396512,-0.380166,0.658846,0.660334
150,1.2349,1.176512,-0.000379,-0.000398,0.11875,1.9e-05,-0.39798,-0.378989,0.809394,0.813232
200,1.2095,1.15745,-0.00036,-0.000378,0.11875,1.7e-05,-0.377909,-0.360499,0.630883,0.635534
250,1.2143,1.1433,-0.000356,-0.000372,0.11875,1.6e-05,-0.371614,-0.355729,0.670652,0.678714
300,1.1638,1.137361,-0.000353,-0.000371,0.11875,1.8e-05,-0.370909,-0.353003,0.59942,0.601919
350,1.1586,1.135418,-0.000355,-0.000373,0.11875,1.8e-05,-0.373058,-0.355213,0.75244,0.753613
400,1.1694,1.134996,-0.000356,-0.000375,0.11875,1.9e-05,-0.374881,-0.356193,0.607446,0.608069


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1349955797195435
Training with params: (8, 32, 0.0003, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.7538,1.276189,-0.004066,-0.004276,0.11875,0.00021,-0.427608,-0.40661,0.127694,0.124207
100,1.2697,1.201735,-0.003811,-0.003977,0.1125,0.000166,-0.397742,-0.381134,0.647316,0.649204
150,1.2413,1.18047,-0.003781,-0.003971,0.11875,0.00019,-0.397114,-0.378106,0.83516,0.840174
200,1.2147,1.165367,-0.003624,-0.0038,0.11875,0.000176,-0.380002,-0.362394,0.622177,0.630002
250,1.218,1.149527,-0.003566,-0.003725,0.1125,0.000159,-0.372528,-0.356611,0.715144,0.722903
300,1.1765,1.142762,-0.003526,-0.003703,0.11875,0.000177,-0.370348,-0.352611,0.549586,0.550985
350,1.1647,1.142163,-0.003574,-0.00375,0.11875,0.000176,-0.374963,-0.357378,0.785527,0.787099
400,1.1752,1.141182,-0.003574,-0.003757,0.11875,0.000184,-0.375711,-0.357359,0.551695,0.552251


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1411818265914917
Training with params: (8, 32, 0.0003, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.7478,1.274285,-0.000409,-0.000431,0.11875,2.1e-05,-0.430604,-0.409391,0.165667,0.161112
100,1.2731,1.194666,-0.000382,-0.000399,0.1125,1.7e-05,-0.398811,-0.382023,0.578406,0.580543
150,1.2331,1.175352,-0.000381,-0.000399,0.11875,1.7e-05,-0.398899,-0.381464,0.82577,0.830696
200,1.2084,1.158803,-0.000361,-0.000378,0.11875,1.7e-05,-0.378137,-0.361439,0.580188,0.588962
250,1.2582,1.193912,-0.000368,-0.000386,0.11875,1.8e-05,-0.386049,-0.367986,0.713953,0.720342
300,1.189,1.142847,-0.000356,-0.000374,0.11875,1.8e-05,-0.374193,-0.355774,0.767201,0.774144
350,1.1705,1.142784,-0.000359,-0.000377,0.11875,1.8e-05,-0.377,-0.358926,0.898069,0.902503
400,1.1753,1.144318,-0.000358,-0.000375,0.11875,1.8e-05,-0.375405,-0.357569,0.660803,0.663666
450,1.1679,1.139444,-0.000352,-0.00037,0.11875,1.8e-05,-0.36983,-0.352149,0.753783,0.756043


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1394436359405518
Training with params: (8, 32, 0.0003, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.7534,1.276206,-0.004068,-0.00428,0.11875,0.000212,-0.428029,-0.40684,0.118764,0.115568
100,1.2695,1.199674,-0.003807,-0.003969,0.1125,0.000162,-0.396946,-0.380726,0.627913,0.629996
150,1.2404,1.180686,-0.003787,-0.003973,0.11875,0.000187,-0.39733,-0.378661,0.76312,0.768921
200,1.2145,1.16564,-0.00363,-0.003806,0.11875,0.000176,-0.380615,-0.362975,0.599071,0.606387
250,1.218,1.149965,-0.003561,-0.003719,0.1125,0.000158,-0.371875,-0.356107,0.630802,0.636745
300,1.1699,1.142266,-0.003534,-0.003712,0.11875,0.000177,-0.371169,-0.353449,0.60479,0.606798
350,1.1643,1.142761,-0.003568,-0.003736,0.11875,0.000168,-0.373626,-0.356847,0.788182,0.790295
400,1.1751,1.141769,-0.003561,-0.003736,0.11875,0.000175,-0.373595,-0.356088,0.600147,0.602438


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1417689323425293
Training with params: (8, 32, 0.0003, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.7474,1.269256,-0.000407,-0.000428,0.11875,2.1e-05,-0.427987,-0.406837,0.115766,0.112412
100,1.2635,1.194782,-0.00038,-0.000397,0.1125,1.6e-05,-0.396805,-0.380455,0.607275,0.607202
150,1.2328,1.177104,-0.000382,-0.0004,0.11875,1.8e-05,-0.400156,-0.382358,0.82452,0.828937
200,1.2082,1.158291,-0.000361,-0.000378,0.11875,1.7e-05,-0.378201,-0.361248,0.617509,0.626189
250,1.2119,1.142867,-0.000356,-0.000372,0.11875,1.7e-05,-0.372467,-0.355537,0.635714,0.641938
300,1.1631,1.134395,-0.000354,-0.000372,0.11875,1.8e-05,-0.371733,-0.353584,0.627737,0.631993
350,1.1586,1.135253,-0.000357,-0.000374,0.11875,1.7e-05,-0.374102,-0.356833,0.81505,0.820063
400,1.1692,1.134632,-0.000356,-0.000374,0.11875,1.8e-05,-0.374429,-0.356477,0.594807,0.598343


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.134394645690918
Training with params: (8, 32, 0.0003, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 5,636,096


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.7538,1.276029,-0.00408,-0.004292,0.11875,0.000212,-0.42922,-0.408026,0.115273,0.111441
100,1.2699,1.201671,-0.003827,-0.003989,0.1125,0.000163,-0.398918,-0.382661,0.599457,0.600005
150,1.2411,1.181676,-0.003825,-0.004007,0.11875,0.000182,-0.400711,-0.382496,0.88146,0.887036
200,1.2142,1.165305,-0.003624,-0.003796,0.11875,0.000171,-0.379574,-0.362437,0.656385,0.665037
250,1.2182,1.14756,-0.003548,-0.003705,0.1125,0.000157,-0.370501,-0.354796,0.688911,0.696411
300,1.1694,1.142596,-0.003546,-0.003724,0.11875,0.000178,-0.372376,-0.354603,0.64124,0.64458
350,1.165,1.139977,-0.00356,-0.003735,0.11875,0.000176,-0.373537,-0.355959,0.77382,0.777814
400,1.1746,1.141747,-0.003563,-0.003749,0.11875,0.000186,-0.374895,-0.3563,0.597062,0.600232


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1399770975112915
Training with params: (16, 8, 0.0001, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.341,1.616609,-0.000853,-0.000876,0.11875,2.2e-05,-0.875527,-0.853093,0.568388,0.560353
100,1.5438,1.425882,-0.000618,-0.000638,0.11875,2e-05,-0.638475,-0.618052,-0.085068,-0.093079
150,1.4234,1.272427,-0.000428,-0.000449,0.11875,2.1e-05,-0.448766,-0.42784,0.008608,0.002004
200,1.298,1.230989,-0.000403,-0.000422,0.1125,1.9e-05,-0.421806,-0.40281,0.311487,0.306345
250,1.2827,1.208158,-0.000396,-0.000414,0.1125,1.8e-05,-0.413729,-0.395807,0.353597,0.34919
300,1.2101,1.191111,-0.000385,-0.000404,0.11875,1.9e-05,-0.403978,-0.38471,0.355844,0.35138
350,1.2026,1.17897,-0.000378,-0.000398,0.11875,1.9e-05,-0.39769,-0.378284,0.364473,0.36072
400,1.2109,1.172258,-0.000378,-0.000396,0.1125,1.9e-05,-0.396362,-0.377842,0.444954,0.441015
450,1.1927,1.166402,-0.000373,-0.000393,0.11875,1.9e-05,-0.392883,-0.373464,0.470824,0.470183
500,1.2401,1.158866,-0.000369,-0.000388,0.11875,1.9e-05,-0.387928,-0.369104,0.488193,0.486299


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1447620391845703
Training with params: (16, 8, 0.0001, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.3471,1.622613,-0.00853,-0.008755,0.125,0.000225,-0.87548,-0.852977,0.566656,0.558904
100,1.55,1.432529,-0.006182,-0.006384,0.11875,0.000202,-0.638381,-0.618219,-0.082966,-0.091227
150,1.4305,1.279878,-0.004297,-0.004506,0.11875,0.000208,-0.450584,-0.42974,0.011052,0.003973
200,1.3047,1.237079,-0.004028,-0.004217,0.1125,0.000189,-0.421698,-0.402808,0.314281,0.30874
250,1.2887,1.214609,-0.003958,-0.004135,0.1125,0.000177,-0.413482,-0.395816,0.358616,0.354374
300,1.217,1.197581,-0.003855,-0.004046,0.11875,0.00019,-0.404565,-0.385549,0.351831,0.347855
350,1.2093,1.18543,-0.003786,-0.00398,0.11875,0.000193,-0.397961,-0.378626,0.359735,0.356271
400,1.2173,1.179072,-0.003781,-0.003964,0.1125,0.000183,-0.396361,-0.378105,0.44434,0.440194
450,1.199,1.172488,-0.003734,-0.003927,0.11875,0.000193,-0.392704,-0.373418,0.470825,0.469961
500,1.2466,1.165185,-0.003691,-0.003879,0.11875,0.000187,-0.387861,-0.369126,0.496367,0.494828


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.151390790939331
Training with params: (16, 8, 0.0001, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.3406,1.61647,-0.000853,-0.000876,0.11875,2.2e-05,-0.875637,-0.853208,0.569423,0.561708
100,1.5439,1.426377,-0.000618,-0.000638,0.11875,2e-05,-0.638077,-0.617819,-0.079789,-0.08737
150,1.4248,1.274048,-0.00043,-0.000451,0.11875,2.1e-05,-0.451333,-0.430321,0.024799,0.01825
200,1.2986,1.230601,-0.000402,-0.000421,0.1125,1.9e-05,-0.421093,-0.402251,0.318723,0.314101
250,1.283,1.208256,-0.000396,-0.000414,0.1125,1.8e-05,-0.413594,-0.39553,0.357276,0.352725
300,1.2109,1.19098,-0.000385,-0.000404,0.11875,1.9e-05,-0.403976,-0.384768,0.374276,0.369675
350,1.2024,1.178591,-0.000378,-0.000397,0.11875,2e-05,-0.397399,-0.377846,0.381117,0.377775
400,1.2111,1.172994,-0.000378,-0.000397,0.1125,1.8e-05,-0.396957,-0.378485,0.468111,0.464332
450,1.193,1.166509,-0.000373,-0.000393,0.11875,2e-05,-0.392855,-0.373228,0.488336,0.487138
500,1.2402,1.158778,-0.000369,-0.000388,0.11875,1.9e-05,-0.38806,-0.369166,0.504258,0.502778


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1449179649353027
Training with params: (16, 8, 0.0001, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.3468,1.622131,-0.008528,-0.008751,0.11875,0.000223,-0.875098,-0.852773,0.573326,0.565242
100,1.5499,1.432781,-0.006183,-0.006386,0.11875,0.000203,-0.638587,-0.618252,-0.084539,-0.092812
150,1.4304,1.279997,-0.004304,-0.004514,0.11875,0.000211,-0.451416,-0.430352,0.008189,0.001404
200,1.3049,1.237461,-0.004032,-0.004222,0.1125,0.00019,-0.422234,-0.403197,0.317791,0.312673
250,1.2891,1.214342,-0.003958,-0.004136,0.1125,0.000178,-0.413613,-0.395763,0.35825,0.353734
300,1.2166,1.196859,-0.003848,-0.004039,0.11875,0.000191,-0.403911,-0.384797,0.355633,0.351272
350,1.2092,1.184986,-0.003779,-0.003972,0.11875,0.000193,-0.397184,-0.377929,0.33876,0.335389
400,1.2171,1.178464,-0.003779,-0.003964,0.1125,0.000185,-0.396419,-0.377912,0.427503,0.423654
450,1.199,1.172486,-0.003731,-0.003925,0.11875,0.000194,-0.392454,-0.373085,0.46262,0.461951
500,1.2464,1.164933,-0.003691,-0.00388,0.11875,0.000189,-0.388007,-0.369106,0.487054,0.485804


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1511945724487305
Training with params: (16, 8, 0.0001, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.3411,1.616467,-0.000853,-0.000875,0.11875,2.2e-05,-0.875322,-0.852935,0.569385,0.561317
100,1.5442,1.426439,-0.000618,-0.000638,0.11875,2e-05,-0.63831,-0.618139,-0.080729,-0.088924
150,1.424,1.273748,-0.00043,-0.000451,0.11875,2.1e-05,-0.450791,-0.429725,0.000765,-0.005966
200,1.2988,1.231381,-0.000403,-0.000422,0.1125,1.9e-05,-0.422105,-0.403253,0.299765,0.294327
250,1.2809,1.20835,-0.000396,-0.000414,0.1125,1.8e-05,-0.414318,-0.396423,0.396787,0.392941
300,1.2104,1.190997,-0.000385,-0.000404,0.11875,1.9e-05,-0.404172,-0.385246,0.354259,0.349648
350,1.202,1.179227,-0.000379,-0.000398,0.11875,1.9e-05,-0.398016,-0.378576,0.363963,0.360485
400,1.2109,1.172578,-0.000378,-0.000397,0.1125,1.8e-05,-0.39654,-0.378154,0.44606,0.441645
450,1.1926,1.166526,-0.000374,-0.000393,0.11875,1.9e-05,-0.392998,-0.373511,0.463631,0.462021
500,1.2401,1.158966,-0.000369,-0.000388,0.11875,1.9e-05,-0.387956,-0.369051,0.491282,0.489229


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1453089714050293
Training with params: (16, 8, 0.0001, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.347,1.622044,-0.008527,-0.008752,0.125,0.000224,-0.875181,-0.852737,0.568862,0.560908
100,1.55,1.432255,-0.006181,-0.006383,0.11875,0.000202,-0.638346,-0.618098,-0.083115,-0.091132
150,1.4301,1.27974,-0.004295,-0.004506,0.11875,0.000211,-0.45063,-0.429498,0.005692,-0.000839
200,1.3047,1.237271,-0.004028,-0.004218,0.1125,0.00019,-0.421766,-0.402813,0.307344,0.30186
250,1.2892,1.214618,-0.003959,-0.004138,0.1125,0.000179,-0.413806,-0.395934,0.367109,0.362723
300,1.2161,1.197121,-0.003849,-0.004041,0.11875,0.000192,-0.404127,-0.384911,0.362621,0.358526
350,1.209,1.184853,-0.003779,-0.003974,0.11875,0.000195,-0.397409,-0.377903,0.370439,0.367058
400,1.2178,1.178652,-0.003775,-0.003959,0.1125,0.000184,-0.395927,-0.377479,0.448917,0.445167
450,1.1991,1.172528,-0.003733,-0.003928,0.11875,0.000194,-0.392777,-0.373344,0.474437,0.473553
500,1.2466,1.164873,-0.003689,-0.003878,0.11875,0.000189,-0.387795,-0.368866,0.486306,0.485091


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1512222290039062
Training with params: (16, 8, 0.0002, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0803,1.45435,-0.000644,-0.000665,0.11875,2.1e-05,-0.664915,-0.64387,-0.038596,-0.04764
100,1.4133,1.252222,-0.000406,-0.000426,0.1125,2e-05,-0.425816,-0.405928,0.163028,0.156655
150,1.2843,1.214854,-0.000396,-0.000416,0.11875,1.9e-05,-0.41561,-0.396395,0.40797,0.403573
200,1.2406,1.192615,-0.000388,-0.000405,0.10625,1.8e-05,-0.405437,-0.387845,0.50189,0.50112
250,1.2375,1.173338,-0.000379,-0.000396,0.1125,1.7e-05,-0.396077,-0.37862,0.573546,0.572608
300,1.1817,1.162448,-0.000372,-0.00039,0.1125,1.8e-05,-0.389557,-0.37154,0.490322,0.490014
350,1.1741,1.154189,-0.000369,-0.000389,0.11875,2e-05,-0.3888,-0.369107,0.569741,0.568159
400,1.1858,1.149127,-0.000367,-0.000385,0.11875,1.8e-05,-0.385038,-0.366792,0.490814,0.487082
450,1.17,1.146218,-0.000364,-0.000383,0.11875,1.9e-05,-0.383013,-0.364395,0.526283,0.525594
500,1.2184,1.138049,-0.000359,-0.000377,0.11875,1.8e-05,-0.377106,-0.359463,0.544848,0.542817


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1312202215194702
Training with params: (16, 8, 0.0002, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0872,1.460803,-0.00645,-0.006659,0.11875,0.00021,-0.665943,-0.644956,-0.030269,-0.039421
100,1.419,1.257933,-0.004065,-0.004265,0.1125,0.0002,-0.426535,-0.406539,0.172264,0.165994
150,1.2902,1.219426,-0.003946,-0.00414,0.11875,0.000194,-0.413974,-0.394566,0.412163,0.408234
200,1.246,1.198094,-0.003871,-0.004049,0.10625,0.000177,-0.404874,-0.387133,0.490746,0.489994
250,1.2429,1.178915,-0.003783,-0.003958,0.1125,0.000175,-0.395783,-0.378318,0.567585,0.566358
300,1.1875,1.168436,-0.003715,-0.003896,0.1125,0.000181,-0.389611,-0.371476,0.497346,0.497818
350,1.1806,1.161489,-0.003692,-0.003886,0.11875,0.000193,-0.388565,-0.36925,0.590884,0.590758
400,1.1929,1.155913,-0.003667,-0.003847,0.11875,0.00018,-0.384731,-0.366719,0.505776,0.502567
450,1.1766,1.153607,-0.003647,-0.003833,0.11875,0.000186,-0.383324,-0.364678,0.515685,0.515904
500,1.225,1.145582,-0.003603,-0.003779,0.11875,0.000176,-0.377907,-0.360316,0.541132,0.539843


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1387066841125488
Training with params: (16, 8, 0.0002, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0809,1.455,-0.000643,-0.000664,0.11875,2.1e-05,-0.6637,-0.642531,-0.057273,-0.066636
100,1.4141,1.252288,-0.000406,-0.000426,0.1125,2e-05,-0.426038,-0.40582,0.160608,0.15431
150,1.2841,1.214192,-0.000395,-0.000415,0.11875,1.9e-05,-0.414677,-0.3952,0.394336,0.389929
200,1.2417,1.191882,-0.000387,-0.000405,0.1125,1.8e-05,-0.405,-0.387357,0.502233,0.501894
250,1.2365,1.171877,-0.000378,-0.000395,0.10625,1.7e-05,-0.395424,-0.378024,0.58483,0.584676
300,1.1815,1.162174,-0.000372,-0.00039,0.1125,1.8e-05,-0.389681,-0.37158,0.507455,0.507232
350,1.1745,1.15529,-0.000369,-0.000389,0.11875,2e-05,-0.388812,-0.369072,0.606432,0.606552
400,1.1867,1.149877,-0.000367,-0.000385,0.11875,1.8e-05,-0.385229,-0.366836,0.518683,0.515144
450,1.1702,1.147315,-0.000364,-0.000383,0.11875,1.9e-05,-0.382998,-0.36446,0.538422,0.539054
500,1.2185,1.138896,-0.00036,-0.000377,0.11875,1.8e-05,-0.377311,-0.35964,0.557639,0.55679


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1319576501846313
Training with params: (16, 8, 0.0002, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0874,1.460456,-0.006437,-0.006649,0.11875,0.000211,-0.664878,-0.643747,-0.041989,-0.051019
100,1.4203,1.258749,-0.004057,-0.004258,0.1125,0.000201,-0.425817,-0.40571,0.163048,0.156844
150,1.2903,1.220295,-0.003954,-0.00415,0.11875,0.000196,-0.414985,-0.395399,0.391584,0.386959
200,1.2475,1.197637,-0.003876,-0.004053,0.10625,0.000177,-0.405266,-0.387594,0.495631,0.494473
250,1.2428,1.176717,-0.003778,-0.003951,0.1125,0.000173,-0.39508,-0.377783,0.589714,0.589551
300,1.1867,1.166326,-0.003709,-0.00389,0.1125,0.000181,-0.389038,-0.370941,0.520111,0.520526
350,1.1803,1.159969,-0.003687,-0.003881,0.11875,0.000194,-0.388069,-0.368711,0.611388,0.610943
400,1.1918,1.155675,-0.003675,-0.003856,0.11875,0.000181,-0.385591,-0.367472,0.53265,0.529932
450,1.1758,1.153084,-0.003649,-0.003835,0.11875,0.000187,-0.383547,-0.364883,0.556275,0.555943
500,1.225,1.144099,-0.003597,-0.003774,0.11875,0.000177,-0.377386,-0.359658,0.585168,0.583579


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1370360851287842
Training with params: (16, 8, 0.0002, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0809,1.454245,-0.000645,-0.000666,0.11875,2.1e-05,-0.665822,-0.644745,-0.030656,-0.039449
100,1.4122,1.251546,-0.000406,-0.000426,0.1125,2e-05,-0.426074,-0.406126,0.166788,0.160473
150,1.2845,1.213924,-0.000395,-0.000414,0.11875,1.9e-05,-0.41444,-0.394971,0.398986,0.394811
200,1.2409,1.192115,-0.000387,-0.000404,0.10625,1.8e-05,-0.404315,-0.386734,0.506225,0.505615
250,1.2368,1.17206,-0.000378,-0.000395,0.1125,1.7e-05,-0.395449,-0.377993,0.585058,0.585068
300,1.1814,1.162084,-0.000372,-0.00039,0.11875,1.8e-05,-0.389814,-0.371599,0.491838,0.492081
350,1.1744,1.155381,-0.000369,-0.000389,0.11875,2e-05,-0.388947,-0.36923,0.575238,0.573956
400,1.1869,1.150124,-0.000367,-0.000385,0.11875,1.8e-05,-0.385119,-0.366909,0.503204,0.500047
450,1.1702,1.147099,-0.000364,-0.000382,0.11875,1.8e-05,-0.382329,-0.364002,0.542359,0.542903
500,1.2186,1.138905,-0.00036,-0.000377,0.11875,1.8e-05,-0.377282,-0.359582,0.559974,0.559365


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.132932186126709
Training with params: (16, 8, 0.0002, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0872,1.460681,-0.006432,-0.006643,0.11875,0.000211,-0.66432,-0.643212,-0.049888,-0.05917
100,1.42,1.258545,-0.00406,-0.00426,0.1125,0.0002,-0.426009,-0.405978,0.164354,0.158406
150,1.2904,1.221268,-0.003956,-0.00415,0.11875,0.000194,-0.415015,-0.395643,0.392094,0.387454
200,1.2477,1.198108,-0.003875,-0.004051,0.10625,0.000176,-0.405098,-0.387525,0.509612,0.509537
250,1.2433,1.17805,-0.003782,-0.003955,0.1125,0.000173,-0.395497,-0.378188,0.591361,0.591181
300,1.1874,1.166715,-0.003714,-0.003895,0.1125,0.000181,-0.389509,-0.371408,0.51259,0.51225
350,1.1806,1.159764,-0.003688,-0.00388,0.11875,0.000193,-0.38803,-0.368768,0.604431,0.603385
400,1.1922,1.155215,-0.003674,-0.003854,0.11875,0.00018,-0.385423,-0.367406,0.545921,0.543444
450,1.1767,1.152232,-0.003648,-0.003834,0.11875,0.000186,-0.383411,-0.364805,0.577871,0.577555
500,1.2252,1.143964,-0.003599,-0.003775,0.11875,0.000176,-0.377549,-0.359907,0.574652,0.573035


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1373647451400757
Training with params: (16, 8, 0.0003, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9576,1.405251,-0.000594,-0.000614,0.11875,2e-05,-0.613956,-0.593597,-0.067192,-0.074678
100,1.3311,1.225794,-0.00039,-0.000409,0.1125,1.9e-05,-0.409398,-0.389973,0.344357,0.340156
150,1.2611,1.192457,-0.00039,-0.000409,0.11875,1.9e-05,-0.408774,-0.389555,0.537466,0.534859
200,1.2201,1.173675,-0.000375,-0.000392,0.10625,1.7e-05,-0.392382,-0.375027,0.616264,0.615798
250,1.2193,1.156277,-0.000368,-0.000385,0.10625,1.7e-05,-0.385166,-0.368397,0.704871,0.706573
300,1.1689,1.14553,-0.000364,-0.000382,0.1125,1.7e-05,-0.381552,-0.364261,0.537384,0.538746
350,1.1624,1.140905,-0.000363,-0.000381,0.11875,1.9e-05,-0.381306,-0.362673,0.70944,0.708946
400,1.1742,1.13755,-0.000363,-0.000381,0.1125,1.8e-05,-0.381267,-0.363133,0.624588,0.623895
450,1.1604,1.134871,-0.00036,-0.000377,0.1125,1.7e-05,-0.376624,-0.359578,0.591264,0.590261
500,1.2081,1.126485,-0.000352,-0.00037,0.11875,1.8e-05,-0.36968,-0.35208,0.624366,0.622108


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1158934831619263
Training with params: (16, 8, 0.0003, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9641,1.411268,-0.00594,-0.006144,0.11875,0.000205,-0.614424,-0.593962,-0.061725,-0.069278
100,1.3389,1.233654,-0.003907,-0.004105,0.11875,0.000197,-0.410488,-0.390744,0.324647,0.320329
150,1.2685,1.198552,-0.003875,-0.004068,0.11875,0.000193,-0.4068,-0.387478,0.531228,0.528981
200,1.2276,1.17958,-0.003753,-0.003926,0.10625,0.000173,-0.392627,-0.375296,0.563126,0.561926
250,1.226,1.162046,-0.003681,-0.00385,0.10625,0.000169,-0.385034,-0.368084,0.692941,0.693952
300,1.1769,1.151493,-0.003635,-0.003809,0.1125,0.000174,-0.380879,-0.363526,0.462719,0.462811
350,1.1693,1.147148,-0.003624,-0.003811,0.11875,0.000188,-0.381139,-0.362351,0.682985,0.682613
400,1.1809,1.144909,-0.003635,-0.003817,0.1125,0.000182,-0.38165,-0.363481,0.606934,0.606368
450,1.1673,1.140914,-0.003591,-0.003766,0.1125,0.000175,-0.376599,-0.359056,0.580795,0.57914
500,1.2155,1.132975,-0.003522,-0.003695,0.11875,0.000172,-0.369475,-0.352243,0.57927,0.577014


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.122226595878601
Training with params: (16, 8, 0.0003, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9574,1.404983,-0.000594,-0.000614,0.11875,2e-05,-0.614013,-0.593702,-0.056648,-0.063877
100,1.3319,1.226542,-0.00039,-0.00041,0.1125,1.9e-05,-0.409671,-0.390292,0.351671,0.348436
150,1.2618,1.192155,-0.000388,-0.000407,0.11875,1.9e-05,-0.407077,-0.387907,0.533792,0.531221
200,1.2206,1.176416,-0.000377,-0.000394,0.10625,1.7e-05,-0.394357,-0.377019,0.632734,0.632864
250,1.2207,1.155174,-0.000367,-0.000384,0.10625,1.7e-05,-0.384131,-0.367327,0.701358,0.702263
300,1.1689,1.146052,-0.000365,-0.000382,0.1125,1.7e-05,-0.38202,-0.364586,0.539976,0.541513
350,1.1625,1.141273,-0.000363,-0.000382,0.11875,1.9e-05,-0.38171,-0.363076,0.723729,0.723808
400,1.1745,1.138079,-0.000363,-0.000381,0.1125,1.8e-05,-0.381463,-0.363495,0.642506,0.641395
450,1.1607,1.133905,-0.000359,-0.000376,0.11875,1.7e-05,-0.376151,-0.358683,0.613276,0.612251
500,1.2083,1.126026,-0.000352,-0.00037,0.10625,1.7e-05,-0.369606,-0.352145,0.616293,0.613435


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1208535432815552
Training with params: (16, 8, 0.0003, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.964,1.41088,-0.005937,-0.006141,0.11875,0.000204,-0.614076,-0.593654,-0.06772,-0.075125
100,1.3386,1.233883,-0.003919,-0.004117,0.1125,0.000197,-0.41165,-0.391927,0.313301,0.308883
150,1.2683,1.199715,-0.003894,-0.004087,0.11875,0.000192,-0.408682,-0.389439,0.54912,0.546738
200,1.2272,1.181745,-0.003765,-0.003939,0.10625,0.000174,-0.393858,-0.376461,0.619001,0.61838
250,1.2262,1.162463,-0.003673,-0.00384,0.10625,0.000167,-0.383981,-0.367315,0.7123,0.716008
300,1.1756,1.152984,-0.003652,-0.003827,0.1125,0.000175,-0.3827,-0.365227,0.499828,0.498861
350,1.1695,1.148764,-0.003641,-0.003824,0.11875,0.000183,-0.382403,-0.364077,0.711747,0.710139
400,1.1806,1.145053,-0.003641,-0.003821,0.1125,0.00018,-0.382071,-0.364063,0.61586,0.613794
450,1.1673,1.142439,-0.003605,-0.003782,0.11875,0.000177,-0.378173,-0.360517,0.590155,0.587943
500,1.215,1.133778,-0.003527,-0.003701,0.11875,0.000174,-0.370115,-0.352711,0.602484,0.599573


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1284594535827637
Training with params: (16, 8, 0.0003, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9577,1.40532,-0.000594,-0.000615,0.11875,2e-05,-0.614787,-0.594392,-0.048122,-0.055702
100,1.3331,1.225551,-0.000391,-0.00041,0.1125,2e-05,-0.410486,-0.390918,0.332391,0.328431
150,1.262,1.192719,-0.00039,-0.000409,0.11875,1.9e-05,-0.408989,-0.389977,0.559954,0.557881
200,1.2206,1.175042,-0.000376,-0.000394,0.10625,1.7e-05,-0.39363,-0.376284,0.627709,0.628965
250,1.2317,1.155842,-0.000367,-0.000384,0.1125,1.7e-05,-0.383975,-0.367246,0.695487,0.695087
300,1.1697,1.146923,-0.000365,-0.000383,0.1125,1.7e-05,-0.38256,-0.365433,0.511769,0.51128
350,1.1634,1.141227,-0.000362,-0.000381,0.11875,1.8e-05,-0.380683,-0.362475,0.679054,0.677199
400,1.1745,1.137031,-0.000362,-0.00038,0.1125,1.8e-05,-0.380366,-0.362149,0.581431,0.577708
450,1.1614,1.134996,-0.00036,-0.000378,0.11875,1.8e-05,-0.377963,-0.360275,0.572504,0.570102
500,1.2093,1.126215,-0.000352,-0.00037,0.11875,1.8e-05,-0.369639,-0.35198,0.580019,0.577229


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.121372938156128
Training with params: (16, 8, 0.0003, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9641,1.411008,-0.005933,-0.006137,0.11875,0.000204,-0.613708,-0.593348,-0.067577,-0.074966
100,1.3373,1.231562,-0.003904,-0.004098,0.1125,0.000194,-0.409806,-0.390435,0.344802,0.340805
150,1.2666,1.198773,-0.003903,-0.004095,0.11875,0.000192,-0.409541,-0.390293,0.549616,0.547371
200,1.2311,1.182336,-0.003751,-0.003921,0.10625,0.00017,-0.392101,-0.375121,0.567637,0.567643
250,1.2262,1.162728,-0.003686,-0.003852,0.1125,0.000166,-0.385179,-0.368581,0.66833,0.667562
300,1.1754,1.153148,-0.003647,-0.003819,0.1125,0.000173,-0.381946,-0.364658,0.551093,0.552228
350,1.1693,1.146921,-0.003627,-0.003814,0.11875,0.000187,-0.381431,-0.362691,0.739006,0.739112
400,1.1809,1.144642,-0.003635,-0.003813,0.1125,0.000178,-0.38128,-0.363517,0.676705,0.675842
450,1.1677,1.140494,-0.003593,-0.003765,0.11875,0.000172,-0.376474,-0.359279,0.607327,0.607179
500,1.2151,1.132241,-0.00352,-0.003693,0.11875,0.000173,-0.369299,-0.351994,0.609769,0.607985


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1274555921554565
Training with params: (16, 16, 0.0001, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.1546,1.492983,-0.000659,-0.000681,0.11875,2.2e-05,-0.681252,-0.659352,-0.309896,-0.319294
100,1.4738,1.330751,-0.000508,-0.000528,0.11875,2e-05,-0.527682,-0.507513,-0.03535,-0.042874
150,1.3093,1.228641,-0.000398,-0.000419,0.11875,2e-05,-0.418531,-0.398345,0.20678,0.20138
200,1.2594,1.204612,-0.000394,-0.000412,0.1125,1.8e-05,-0.411641,-0.393572,0.472816,0.468877
250,1.2537,1.183921,-0.000384,-0.000402,0.1125,1.7e-05,-0.401528,-0.384106,0.465045,0.460527
300,1.19,1.171176,-0.000376,-0.000395,0.11875,1.9e-05,-0.394543,-0.375941,0.49932,0.495997
350,1.1844,1.162104,-0.000371,-0.000391,0.11875,1.9e-05,-0.390575,-0.371381,0.510839,0.506856
400,1.1945,1.157773,-0.000371,-0.000389,0.11875,1.8e-05,-0.389168,-0.37087,0.534251,0.529586
450,1.1782,1.152623,-0.000367,-0.000386,0.11875,1.9e-05,-0.38625,-0.367436,0.579925,0.577563
500,1.2268,1.146206,-0.000364,-0.000382,0.11875,1.8e-05,-0.382234,-0.36411,0.59195,0.588437


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1386103630065918
Training with params: (16, 16, 0.0001, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.1611,1.499275,-0.00659,-0.006809,0.11875,0.000219,-0.680942,-0.658994,-0.316624,-0.32604
100,1.4804,1.339041,-0.005094,-0.005295,0.11875,0.000201,-0.529523,-0.50941,-0.035495,-0.043181
150,1.3161,1.234803,-0.00398,-0.00418,0.11875,0.0002,-0.418049,-0.39801,0.213094,0.207439
200,1.2658,1.210544,-0.003947,-0.004128,0.1125,0.000181,-0.412783,-0.394715,0.471043,0.467823
250,1.2576,1.19002,-0.003841,-0.004014,0.1125,0.000173,-0.401385,-0.384129,0.502565,0.498871
300,1.1962,1.176387,-0.003761,-0.003947,0.11875,0.000185,-0.394654,-0.376113,0.525752,0.523439
350,1.1897,1.16807,-0.003712,-0.003902,0.11875,0.00019,-0.390198,-0.371182,0.560278,0.55743
400,1.2003,1.163758,-0.003703,-0.003885,0.11875,0.000183,-0.388546,-0.370296,0.546304,0.543044
450,1.1837,1.159502,-0.003677,-0.003866,0.11875,0.000189,-0.386554,-0.367677,0.5851,0.58434
500,1.2337,1.152209,-0.003639,-0.003819,0.11875,0.000179,-0.381874,-0.363943,0.596208,0.594189


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.144408941268921
Training with params: (16, 16, 0.0001, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.1547,1.492812,-0.000659,-0.000681,0.11875,2.2e-05,-0.680651,-0.658805,-0.313232,-0.322678
100,1.4738,1.331449,-0.000509,-0.000529,0.11875,2e-05,-0.52912,-0.509198,-0.036472,-0.043693
150,1.3099,1.228531,-0.000398,-0.000418,0.11875,2e-05,-0.41833,-0.398107,0.210828,0.205657
200,1.2596,1.204876,-0.000395,-0.000413,0.1125,1.8e-05,-0.412752,-0.394722,0.457343,0.454416
250,1.2514,1.183414,-0.000384,-0.000401,0.1125,1.7e-05,-0.401357,-0.383922,0.48476,0.481259
300,1.1898,1.170656,-0.000376,-0.000395,0.11875,1.8e-05,-0.394841,-0.376416,0.515076,0.512534
350,1.1834,1.162248,-0.000371,-0.00039,0.11875,1.9e-05,-0.390245,-0.371246,0.553538,0.550949
400,1.1941,1.157946,-0.000371,-0.000389,0.11875,1.8e-05,-0.389201,-0.370854,0.547265,0.54414
450,1.1775,1.153121,-0.000367,-0.000386,0.11875,1.9e-05,-0.385991,-0.367342,0.580537,0.579955
500,1.2269,1.145885,-0.000364,-0.000382,0.11875,1.8e-05,-0.381637,-0.363556,0.595316,0.593414


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1374891996383667
Training with params: (16, 16, 0.0001, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.1616,1.499266,-0.006597,-0.006814,0.11875,0.000217,-0.681431,-0.659692,-0.301802,-0.311283
100,1.4818,1.341558,-0.005137,-0.005339,0.11875,0.000201,-0.533857,-0.513719,-0.035916,-0.042756
150,1.3178,1.235209,-0.003983,-0.004183,0.11875,0.0002,-0.418305,-0.398349,0.205444,0.200014
200,1.266,1.21119,-0.003955,-0.004136,0.1125,0.000181,-0.413604,-0.395507,0.47871,0.475302
250,1.2582,1.189923,-0.003844,-0.004016,0.1125,0.000172,-0.40157,-0.384354,0.489285,0.485262
300,1.196,1.177088,-0.003764,-0.003948,0.11875,0.000184,-0.39479,-0.376428,0.514321,0.511564
350,1.19,1.168355,-0.003712,-0.003901,0.11875,0.000189,-0.390132,-0.371243,0.550809,0.547753
400,1.2005,1.164026,-0.003708,-0.003888,0.11875,0.00018,-0.388831,-0.370783,0.553488,0.550806
450,1.1839,1.159492,-0.003678,-0.003865,0.11875,0.000187,-0.386461,-0.367789,0.589599,0.588869
500,1.2333,1.152015,-0.003637,-0.003815,0.11875,0.000178,-0.381455,-0.36365,0.605035,0.603166


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1438453197479248
Training with params: (16, 16, 0.0001, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.1548,1.492947,-0.00066,-0.000682,0.11875,2.2e-05,-0.682094,-0.660197,-0.30021,-0.310039
100,1.4769,1.338316,-0.000517,-0.000537,0.11875,2e-05,-0.536614,-0.516506,-0.029585,-0.036886
150,1.3125,1.229075,-0.000399,-0.000419,0.11875,2e-05,-0.41895,-0.398933,0.194134,0.188967
200,1.2599,1.205419,-0.000395,-0.000413,0.1125,1.8e-05,-0.412957,-0.39491,0.473925,0.470328
250,1.2543,1.18441,-0.000385,-0.000402,0.1125,1.7e-05,-0.401969,-0.384579,0.45178,0.448039
300,1.1901,1.171132,-0.000376,-0.000395,0.11875,1.9e-05,-0.395024,-0.376386,0.487681,0.484662
350,1.1852,1.162486,-0.000372,-0.000391,0.11875,1.9e-05,-0.390905,-0.37163,0.500625,0.496235
400,1.195,1.157742,-0.000371,-0.000389,0.11875,1.8e-05,-0.389219,-0.370901,0.5222,0.517155
450,1.1788,1.152862,-0.000367,-0.000386,0.11875,1.9e-05,-0.386398,-0.367392,0.563703,0.561435
500,1.2272,1.145991,-0.000364,-0.000382,0.11875,1.8e-05,-0.38195,-0.363688,0.586833,0.583672


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.138069748878479
Training with params: (16, 16, 0.0001, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.1615,1.498909,-0.006587,-0.006805,0.11875,0.000218,-0.680459,-0.658693,-0.315081,-0.324479
100,1.4805,1.338776,-0.005098,-0.0053,0.11875,0.000201,-0.529951,-0.509823,-0.034806,-0.04209
150,1.3169,1.234881,-0.003984,-0.004184,0.11875,0.0002,-0.418432,-0.39841,0.207447,0.201734
200,1.2663,1.211086,-0.003946,-0.004125,0.1125,0.000179,-0.412492,-0.394611,0.472356,0.468429
250,1.2604,1.189988,-0.003842,-0.004017,0.1125,0.000174,-0.401663,-0.384228,0.465106,0.460572
300,1.1962,1.177122,-0.003766,-0.003951,0.11875,0.000185,-0.39511,-0.376598,0.512613,0.509407
350,1.1901,1.168143,-0.003712,-0.003903,0.11875,0.00019,-0.390271,-0.371248,0.542902,0.539474
400,1.2007,1.164037,-0.003708,-0.003889,0.1125,0.000181,-0.388902,-0.370777,0.529731,0.525936
450,1.184,1.159405,-0.003676,-0.003865,0.11875,0.000189,-0.386549,-0.367602,0.571833,0.570156
500,1.2335,1.152466,-0.003642,-0.003821,0.11875,0.000179,-0.382091,-0.364235,0.595647,0.591942


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1447515487670898
Training with params: (16, 16, 0.0002, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9439,1.404279,-0.000596,-0.000616,0.11875,2e-05,-0.615943,-0.595536,0.034494,0.027082
100,1.3311,1.224728,-0.00039,-0.00041,0.1125,1.9e-05,-0.409598,-0.390188,0.361708,0.356677
150,1.2606,1.19224,-0.000386,-0.000406,0.11875,1.9e-05,-0.405816,-0.386441,0.524857,0.521564
200,1.2202,1.171872,-0.000374,-0.000391,0.10625,1.7e-05,-0.391099,-0.374071,0.538818,0.536663
250,1.2187,1.154715,-0.000367,-0.000384,0.10625,1.7e-05,-0.384088,-0.367211,0.66519,0.663825
300,1.1694,1.146139,-0.000364,-0.000381,0.1125,1.8e-05,-0.381258,-0.363522,0.525862,0.525067
350,1.1622,1.141791,-0.000364,-0.000384,0.11875,2e-05,-0.384267,-0.364432,0.724274,0.723629
400,1.1734,1.137538,-0.000362,-0.00038,0.11875,1.8e-05,-0.380202,-0.361861,0.633997,0.630896
450,1.1594,1.13348,-0.000358,-0.000376,0.11875,1.7e-05,-0.375882,-0.358395,0.578325,0.576981
500,1.2076,1.126013,-0.000352,-0.000369,0.1125,1.8e-05,-0.369489,-0.351987,0.603693,0.600712


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1203526258468628
Training with params: (16, 16, 0.0002, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9498,1.408222,-0.005925,-0.00613,0.11875,0.000204,-0.612961,-0.592516,-0.003752,-0.011737
100,1.3346,1.229231,-0.003905,-0.004096,0.11875,0.000191,-0.409588,-0.390469,0.358906,0.3544
150,1.2658,1.19854,-0.003892,-0.004081,0.11875,0.000189,-0.408108,-0.389189,0.540647,0.537043
200,1.2256,1.177834,-0.003734,-0.003906,0.10625,0.000172,-0.390581,-0.373411,0.586017,0.584484
250,1.2243,1.161008,-0.003673,-0.003842,0.1125,0.000168,-0.384188,-0.367339,0.704301,0.705249
300,1.1744,1.151578,-0.003632,-0.003809,0.1125,0.000176,-0.380888,-0.363249,0.558139,0.559506
350,1.1677,1.146743,-0.003624,-0.003815,0.11875,0.00019,-0.381459,-0.362424,0.740997,0.740909
400,1.1787,1.143184,-0.003617,-0.003793,0.11875,0.000176,-0.379306,-0.361705,0.672784,0.671828
450,1.1648,1.140301,-0.003592,-0.003763,0.11875,0.000171,-0.37631,-0.359224,0.625029,0.62481
500,1.2135,1.132484,-0.003524,-0.003696,0.1125,0.000172,-0.369572,-0.352376,0.629921,0.628978


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.125895380973816
Training with params: (16, 16, 0.0002, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9437,1.402169,-0.000591,-0.000611,0.11875,2e-05,-0.611472,-0.591164,-0.025327,-0.032687
100,1.3277,1.223024,-0.00039,-0.000409,0.1125,1.9e-05,-0.408823,-0.389818,0.359243,0.35475
150,1.2592,1.190757,-0.000387,-0.000406,0.11875,1.9e-05,-0.406005,-0.387128,0.542035,0.538855
200,1.2188,1.171189,-0.000373,-0.00039,0.10625,1.7e-05,-0.389887,-0.372729,0.597799,0.595537
250,1.2181,1.15367,-0.000366,-0.000383,0.10625,1.7e-05,-0.382713,-0.36612,0.701476,0.703176
300,1.1684,1.145122,-0.000363,-0.00038,0.1125,1.7e-05,-0.379969,-0.362542,0.527623,0.527348
350,1.1621,1.140405,-0.000362,-0.00038,0.11875,1.9e-05,-0.380312,-0.361635,0.723486,0.722887
400,1.1726,1.136716,-0.000361,-0.000379,0.1125,1.8e-05,-0.379068,-0.361374,0.638068,0.637466
450,1.1599,1.132574,-0.000357,-0.000375,0.1125,1.7e-05,-0.374759,-0.357444,0.599427,0.59852
500,1.2071,1.127179,-0.000352,-0.00037,0.1125,1.7e-05,-0.369765,-0.352362,0.612427,0.61019


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1201258897781372
Training with params: (16, 16, 0.0002, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9499,1.410405,-0.005955,-0.00616,0.11875,0.000205,-0.615969,-0.595496,0.030842,0.023365
100,1.3377,1.230974,-0.003903,-0.004098,0.1125,0.000195,-0.40975,-0.390294,0.353361,0.347968
150,1.267,1.198413,-0.003867,-0.004061,0.11875,0.000193,-0.406064,-0.386744,0.533706,0.530819
200,1.2269,1.17804,-0.003742,-0.003913,0.10625,0.000171,-0.391265,-0.374155,0.53175,0.530127
250,1.2247,1.16113,-0.003664,-0.003832,0.10625,0.000168,-0.383186,-0.366357,0.658143,0.659073
300,1.1764,1.152455,-0.003636,-0.003815,0.1125,0.000179,-0.381539,-0.363641,0.498422,0.494723
350,1.1686,1.146922,-0.003627,-0.003823,0.11875,0.000197,-0.382331,-0.362659,0.721895,0.721123
400,1.1802,1.142827,-0.00361,-0.003788,0.1125,0.000178,-0.378828,-0.361034,0.651329,0.64821
450,1.1655,1.139684,-0.003592,-0.003767,0.11875,0.000175,-0.376694,-0.359156,0.615404,0.61414
500,1.214,1.131865,-0.003524,-0.003699,0.11875,0.000175,-0.369908,-0.352412,0.624447,0.622303


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.126202940940857
Training with params: (16, 16, 0.0002, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9439,1.404899,-0.000597,-0.000618,0.11875,2e-05,-0.617829,-0.597443,0.040833,0.033302
100,1.3319,1.224295,-0.00039,-0.000409,0.1125,1.9e-05,-0.409251,-0.389834,0.353716,0.348086
150,1.2605,1.193334,-0.000387,-0.000406,0.11875,1.9e-05,-0.406466,-0.387262,0.534009,0.531666
200,1.2214,1.171403,-0.000373,-0.00039,0.10625,1.7e-05,-0.389905,-0.372716,0.506234,0.505581
250,1.2193,1.155012,-0.000367,-0.000384,0.10625,1.7e-05,-0.383894,-0.36703,0.673682,0.673301
300,1.1697,1.146344,-0.000364,-0.000381,0.1125,1.8e-05,-0.381355,-0.363778,0.516457,0.516282
350,1.1621,1.142227,-0.000364,-0.000384,0.11875,2e-05,-0.383865,-0.364352,0.72783,0.727729
400,1.174,1.137509,-0.000362,-0.00038,0.11875,1.8e-05,-0.37964,-0.362041,0.651402,0.648004
450,1.1599,1.134633,-0.000359,-0.000377,0.11875,1.8e-05,-0.376922,-0.359334,0.613013,0.611722
500,1.2085,1.126446,-0.000353,-0.00037,0.11875,1.7e-05,-0.370014,-0.352602,0.619001,0.617384


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1205928325653076
Training with params: (16, 16, 0.0002, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9503,1.410506,-0.005959,-0.006163,0.11875,0.000204,-0.616324,-0.595942,0.024548,0.017467
100,1.3373,1.23031,-0.003896,-0.00409,0.1125,0.000194,-0.409011,-0.38957,0.349496,0.344094
150,1.2667,1.1982,-0.003866,-0.00406,0.11875,0.000195,-0.40603,-0.386574,0.525566,0.522787
200,1.2269,1.177405,-0.003731,-0.003904,0.10625,0.000173,-0.390433,-0.373148,0.540758,0.539291
250,1.2248,1.161739,-0.003678,-0.003846,0.10625,0.000168,-0.384596,-0.367842,0.677471,0.675905
300,1.1755,1.151464,-0.003627,-0.003803,0.1125,0.000176,-0.380332,-0.362698,0.518588,0.518095
350,1.1687,1.148215,-0.00364,-0.003835,0.11875,0.000196,-0.383542,-0.36399,0.698586,0.697778
400,1.1798,1.144233,-0.003627,-0.00381,0.1125,0.000183,-0.381022,-0.362711,0.634485,0.632058
450,1.1653,1.140342,-0.003587,-0.003764,0.11875,0.000177,-0.3764,-0.358711,0.579897,0.578425
500,1.2144,1.132977,-0.003529,-0.003702,0.1125,0.000173,-0.370203,-0.352859,0.60674,0.603824


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1216886043548584
Training with params: (16, 16, 0.0003, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8469,1.283739,-0.000438,-0.00046,0.11875,2.1e-05,-0.45984,-0.438399,-0.081709,-0.08874
100,1.2829,1.205732,-0.00039,-0.000407,0.10625,1.8e-05,-0.407066,-0.389507,0.524222,0.523187
150,1.243,1.178077,-0.000385,-0.000404,0.11875,1.9e-05,-0.403677,-0.384805,0.651468,0.648499
200,1.2092,1.160165,-0.000366,-0.000382,0.10625,1.6e-05,-0.381624,-0.365506,0.598225,0.597191
250,1.2096,1.143588,-0.00036,-0.000376,0.10625,1.6e-05,-0.37617,-0.360162,0.787162,0.789502
300,1.1614,1.13685,-0.000357,-0.000374,0.1125,1.7e-05,-0.374173,-0.357157,0.602196,0.60262
350,1.1549,1.131973,-0.000357,-0.000376,0.11875,1.9e-05,-0.3755,-0.356799,0.804925,0.803631
400,1.1666,1.130162,-0.000358,-0.000375,0.1125,1.7e-05,-0.375349,-0.357853,0.716869,0.713494
450,1.1541,1.127058,-0.000354,-0.000371,0.1125,1.7e-05,-0.371127,-0.354322,0.629512,0.626286
500,1.2009,1.11836,-0.000346,-0.000362,0.1125,1.6e-05,-0.362113,-0.345725,0.612294,0.608417


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1140716075897217
Training with params: (16, 16, 0.0003, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8531,1.289755,-0.004377,-0.004592,0.11875,0.000215,-0.459198,-0.437709,-0.078516,-0.08576
100,1.289,1.211912,-0.003886,-0.004063,0.10625,0.000177,-0.406338,-0.388617,0.523942,0.523108
150,1.2492,1.184619,-0.003846,-0.004035,0.11875,0.000189,-0.403519,-0.384621,0.653983,0.650636
200,1.215,1.166723,-0.003652,-0.003815,0.10625,0.000163,-0.381515,-0.365213,0.617367,0.617145
250,1.2162,1.149453,-0.003592,-0.003754,0.10625,0.000162,-0.375363,-0.359185,0.73746,0.73786
300,1.1677,1.143099,-0.003571,-0.003742,0.1125,0.00017,-0.374194,-0.35715,0.585452,0.583685
350,1.1611,1.13928,-0.003572,-0.003761,0.11875,0.000189,-0.376064,-0.357191,0.836346,0.834276
400,1.1736,1.136772,-0.003581,-0.003754,0.1125,0.000174,-0.375438,-0.358074,0.721676,0.718461
450,1.1604,1.131917,-0.003536,-0.003708,0.11875,0.000172,-0.370799,-0.353639,0.657843,0.65375
500,1.2068,1.124209,-0.003457,-0.003621,0.1125,0.000164,-0.362103,-0.345732,0.641486,0.636882


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1204299926757812
Training with params: (16, 16, 0.0003, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8473,1.284681,-0.000439,-0.00046,0.11875,2.1e-05,-0.460481,-0.439,-0.074989,-0.081811
100,1.2828,1.206106,-0.000389,-0.000407,0.1125,1.8e-05,-0.407083,-0.389458,0.517555,0.516993
150,1.2426,1.178589,-0.000385,-0.000404,0.11875,1.9e-05,-0.403984,-0.38496,0.664428,0.663136
200,1.2088,1.160041,-0.000366,-0.000382,0.10625,1.6e-05,-0.382186,-0.366023,0.625151,0.626167
250,1.2096,1.144533,-0.000361,-0.000377,0.10625,1.6e-05,-0.377176,-0.361105,0.80082,0.804377
300,1.1617,1.137497,-0.000357,-0.000374,0.1125,1.7e-05,-0.37444,-0.357136,0.596161,0.598083
350,1.1551,1.133745,-0.000357,-0.000376,0.11875,1.9e-05,-0.376178,-0.357317,0.841905,0.842137
400,1.167,1.131076,-0.000358,-0.000375,0.1125,1.7e-05,-0.375067,-0.358247,0.718738,0.716223
450,1.1538,1.127075,-0.000354,-0.000371,0.11875,1.7e-05,-0.370723,-0.353848,0.634912,0.632161
500,1.2009,1.118756,-0.000346,-0.000362,0.10625,1.6e-05,-0.362326,-0.346159,0.655448,0.651971


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1146167516708374
Training with params: (16, 16, 0.0003, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8531,1.289806,-0.004376,-0.004591,0.11875,0.000215,-0.459078,-0.437553,-0.086344,-0.093612
100,1.2889,1.21197,-0.003891,-0.004067,0.1125,0.000176,-0.406687,-0.389101,0.530253,0.530105
150,1.2492,1.185207,-0.003861,-0.00405,0.11875,0.000189,-0.405039,-0.386143,0.669291,0.666247
200,1.215,1.166779,-0.003655,-0.003817,0.10625,0.000162,-0.38173,-0.365536,0.592694,0.591214
250,1.2157,1.149632,-0.003609,-0.003774,0.10625,0.000165,-0.377395,-0.360873,0.779901,0.78322
300,1.1672,1.143103,-0.003579,-0.00375,0.1125,0.000172,-0.375025,-0.357854,0.615336,0.61697
350,1.1612,1.139314,-0.003573,-0.003763,0.11875,0.00019,-0.376262,-0.357264,0.859228,0.859902
400,1.1726,1.137065,-0.003585,-0.003756,0.1125,0.000171,-0.375597,-0.358517,0.720745,0.72048
450,1.1599,1.133124,-0.003547,-0.003723,0.11875,0.000176,-0.372296,-0.354733,0.63699,0.634843
500,1.2073,1.12427,-0.003459,-0.003619,0.10625,0.00016,-0.361902,-0.345928,0.614303,0.612331


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1208423376083374
Training with params: (16, 16, 0.0003, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8471,1.284272,-0.000439,-0.00046,0.11875,2.2e-05,-0.460413,-0.43889,-0.087133,-0.094379
100,1.2827,1.205751,-0.000388,-0.000406,0.10625,1.8e-05,-0.405997,-0.388415,0.524485,0.523868
150,1.2432,1.178606,-0.000385,-0.000404,0.11875,1.9e-05,-0.403643,-0.384701,0.677323,0.674348
200,1.2091,1.160953,-0.000366,-0.000382,0.1,1.6e-05,-0.381955,-0.365794,0.588578,0.587422
250,1.2096,1.144341,-0.00036,-0.000376,0.10625,1.6e-05,-0.376239,-0.359894,0.777787,0.779611
300,1.1618,1.136241,-0.000356,-0.000373,0.1125,1.7e-05,-0.372685,-0.355686,0.575875,0.575255
350,1.155,1.132715,-0.000357,-0.000376,0.11875,1.9e-05,-0.37626,-0.357229,0.844531,0.843314
400,1.1671,1.130832,-0.000358,-0.000375,0.1125,1.7e-05,-0.375191,-0.358144,0.722081,0.720398
450,1.1545,1.127646,-0.000353,-0.000371,0.11875,1.8e-05,-0.370855,-0.353355,0.608686,0.604721
500,1.2014,1.117573,-0.000345,-0.000362,0.1125,1.6e-05,-0.361622,-0.345152,0.596815,0.592454


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1142489910125732
Training with params: (16, 16, 0.0003, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8533,1.291291,-0.004395,-0.004609,0.11875,0.000215,-0.460948,-0.439455,-0.085636,-0.092691
100,1.289,1.212308,-0.003879,-0.004055,0.1125,0.000176,-0.405504,-0.387916,0.520982,0.517285
150,1.2491,1.184811,-0.003847,-0.004037,0.11875,0.00019,-0.403716,-0.38471,0.670609,0.667159
200,1.2155,1.166784,-0.003657,-0.003819,0.1,0.000162,-0.381935,-0.365691,0.606587,0.608779
250,1.2164,1.150971,-0.003605,-0.00377,0.10625,0.000165,-0.377001,-0.36048,0.718182,0.719067
300,1.1683,1.142352,-0.003561,-0.003732,0.1125,0.000171,-0.37321,-0.356141,0.569254,0.569042
350,1.1607,1.139664,-0.003571,-0.00376,0.11875,0.000189,-0.376011,-0.357107,0.813233,0.812213
400,1.1735,1.137471,-0.003589,-0.003759,0.1125,0.00017,-0.375885,-0.358892,0.71064,0.70768
450,1.1605,1.133194,-0.003548,-0.003717,0.11875,0.000168,-0.371658,-0.354813,0.629438,0.624985
500,1.2073,1.124949,-0.003462,-0.003624,0.1125,0.000162,-0.362368,-0.346186,0.596479,0.591677


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1205017566680908
Training with params: (16, 32, 0.0001, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9961,1.437374,-0.000641,-0.000661,0.11875,2.1e-05,-0.661308,-0.640556,0.103136,0.094838
100,1.3739,1.238494,-0.000397,-0.000417,0.1125,2e-05,-0.416873,-0.397151,0.253864,0.247074
150,1.273,1.206947,-0.000392,-0.000412,0.11875,1.9e-05,-0.411546,-0.39216,0.506591,0.504837
200,1.2327,1.183218,-0.000381,-0.000399,0.10625,1.7e-05,-0.398549,-0.381499,0.581964,0.581711
250,1.2292,1.164542,-0.000375,-0.000392,0.1125,1.7e-05,-0.392137,-0.375199,0.658968,0.657761
300,1.1749,1.154957,-0.000367,-0.000386,0.1125,1.8e-05,-0.385523,-0.367432,0.602921,0.600705
350,1.1684,1.148805,-0.000366,-0.000386,0.11875,2e-05,-0.385797,-0.366283,0.677426,0.674335
400,1.1807,1.143058,-0.000363,-0.000382,0.11875,1.8e-05,-0.381525,-0.363353,0.629807,0.624731
450,1.1647,1.139803,-0.000361,-0.000379,0.11875,1.8e-05,-0.379063,-0.361027,0.624481,0.621524
500,1.2138,1.132612,-0.000357,-0.000374,0.1125,1.8e-05,-0.374455,-0.356722,0.655279,0.650891


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1267799139022827
Training with params: (16, 32, 0.0001, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0023,1.443597,-0.006404,-0.00661,0.11875,0.000206,-0.661016,-0.640401,0.114171,0.106135
100,1.3795,1.244724,-0.003965,-0.004163,0.1125,0.000198,-0.416258,-0.396482,0.249686,0.242776
150,1.2787,1.21278,-0.003925,-0.004117,0.11875,0.000192,-0.411676,-0.392488,0.502043,0.49968
200,1.2373,1.18979,-0.003812,-0.003985,0.10625,0.000173,-0.398549,-0.381236,0.591517,0.591934
250,1.2352,1.171738,-0.00375,-0.003921,0.1125,0.000171,-0.3921,-0.375013,0.639197,0.637891
300,1.1812,1.16076,-0.003673,-0.003855,0.1125,0.000182,-0.385547,-0.367318,0.580694,0.579292
350,1.1749,1.154557,-0.003658,-0.003852,0.11875,0.000194,-0.385209,-0.365784,0.664521,0.66101
400,1.1871,1.148953,-0.003629,-0.003811,0.11875,0.000182,-0.381098,-0.36289,0.624246,0.618714
450,1.1705,1.145701,-0.003607,-0.00379,0.11875,0.000183,-0.378999,-0.360681,0.629063,0.626626
500,1.2199,1.139002,-0.003568,-0.003745,0.11875,0.000177,-0.374468,-0.356757,0.645653,0.641929


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1271048784255981
Training with params: (16, 32, 0.0001, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9968,1.437689,-0.000641,-0.000661,0.11875,2.1e-05,-0.661363,-0.640601,0.115408,0.106145
100,1.3755,1.239873,-0.000398,-0.000418,0.1125,2e-05,-0.41754,-0.397629,0.248587,0.241581
150,1.273,1.206483,-0.000392,-0.000411,0.11875,1.9e-05,-0.410987,-0.391691,0.532391,0.531353
200,1.2308,1.183881,-0.000382,-0.000399,0.10625,1.7e-05,-0.399274,-0.381934,0.600607,0.600761
250,1.2288,1.16457,-0.000374,-0.000392,0.1125,1.7e-05,-0.391726,-0.374446,0.647093,0.645632
300,1.1747,1.154863,-0.000368,-0.000386,0.1125,1.8e-05,-0.385862,-0.367706,0.60688,0.60561
350,1.1683,1.148683,-0.000366,-0.000385,0.11875,1.9e-05,-0.385029,-0.365671,0.678771,0.675815
400,1.1804,1.143226,-0.000363,-0.000382,0.11875,1.8e-05,-0.381571,-0.363207,0.625833,0.621433
450,1.1644,1.140262,-0.000361,-0.000379,0.1125,1.8e-05,-0.37939,-0.361183,0.623793,0.623506
500,1.2134,1.133928,-0.000357,-0.000375,0.1125,1.8e-05,-0.374968,-0.357198,0.672185,0.671441


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1259394884109497
Training with params: (16, 32, 0.0001, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0024,1.445449,-0.006426,-0.006634,0.11875,0.000207,-0.663355,-0.642627,0.115746,0.106959
100,1.3824,1.245479,-0.003975,-0.004175,0.1125,0.0002,-0.417521,-0.397541,0.247119,0.240249
150,1.2789,1.213588,-0.003924,-0.00412,0.11875,0.000196,-0.412014,-0.392441,0.506713,0.504505
200,1.2381,1.189871,-0.00382,-0.003993,0.10625,0.000173,-0.39934,-0.382024,0.591559,0.590946
250,1.2357,1.170482,-0.00375,-0.003922,0.1125,0.000172,-0.392231,-0.375037,0.643441,0.642347
300,1.181,1.16107,-0.003676,-0.003859,0.1125,0.000183,-0.385939,-0.36759,0.608259,0.60694
350,1.1749,1.154946,-0.003663,-0.003858,0.1125,0.000195,-0.385823,-0.366274,0.683277,0.680058
400,1.1868,1.149712,-0.003636,-0.003819,0.11875,0.000183,-0.381912,-0.363574,0.636482,0.631139
450,1.1707,1.146731,-0.003618,-0.003799,0.11875,0.000181,-0.379944,-0.361795,0.632879,0.630546
500,1.2205,1.13914,-0.003571,-0.003749,0.11875,0.000178,-0.374933,-0.357149,0.675347,0.672312


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1328364610671997
Training with params: (16, 32, 0.0001, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.9963,1.438241,-0.000641,-0.000662,0.11875,2.1e-05,-0.661989,-0.641239,0.115108,0.106937
100,1.3737,1.238995,-0.000397,-0.000417,0.1125,2e-05,-0.416749,-0.396844,0.253184,0.246234
150,1.2724,1.206626,-0.000392,-0.000411,0.11875,1.9e-05,-0.411271,-0.392083,0.495861,0.493266
200,1.2309,1.183463,-0.000381,-0.000398,0.10625,1.7e-05,-0.39782,-0.380517,0.589688,0.590045
250,1.229,1.165611,-0.000375,-0.000392,0.1125,1.7e-05,-0.391911,-0.374906,0.651942,0.651134
300,1.1753,1.156035,-0.000368,-0.000386,0.1125,1.8e-05,-0.386234,-0.367987,0.592151,0.591637
350,1.1688,1.151101,-0.000367,-0.000387,0.11875,2e-05,-0.386672,-0.367002,0.670306,0.66885
400,1.1812,1.14507,-0.000364,-0.000382,0.11875,1.8e-05,-0.381948,-0.363605,0.59776,0.593328
450,1.1648,1.14125,-0.000361,-0.000379,0.11875,1.8e-05,-0.379199,-0.361025,0.611652,0.611148
500,1.214,1.133934,-0.000356,-0.000374,0.11875,1.8e-05,-0.374028,-0.356356,0.64232,0.640414


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1279441118240356
Training with params: (16, 32, 0.0001, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,2.0026,1.445002,-0.006424,-0.006632,0.11875,0.000208,-0.663189,-0.642436,0.120953,0.112507
100,1.3824,1.244913,-0.003972,-0.004173,0.1125,0.0002,-0.417269,-0.397245,0.249044,0.242037
150,1.279,1.213217,-0.003925,-0.00412,0.11875,0.000195,-0.412005,-0.392478,0.497169,0.494891
200,1.2379,1.190074,-0.003818,-0.003992,0.10625,0.000174,-0.399213,-0.381803,0.570926,0.570941
250,1.2358,1.170869,-0.003746,-0.003918,0.1125,0.000172,-0.391776,-0.374563,0.639081,0.638739
300,1.1812,1.16148,-0.003679,-0.003861,0.1125,0.000182,-0.386104,-0.367925,0.594931,0.594096
350,1.1747,1.155196,-0.003664,-0.003857,0.11875,0.000194,-0.385724,-0.366352,0.675416,0.672927
400,1.1869,1.150013,-0.003637,-0.003819,0.11875,0.000182,-0.381941,-0.363739,0.624863,0.620553
450,1.1712,1.146411,-0.003613,-0.003796,0.11875,0.000183,-0.379612,-0.3613,0.621166,0.619155
500,1.2204,1.139566,-0.003572,-0.00375,0.1125,0.000178,-0.375001,-0.357175,0.655181,0.651654


Early stopping triggered. No improvement for 3 evaluations.




Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1330552101135254
Training with params: (16, 32, 0.0002, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8285,1.27115,-0.000419,-0.000441,0.11875,2.1e-05,-0.440567,-0.419158,0.002653,-0.004405
100,1.2801,1.204134,-0.000386,-0.000403,0.1125,1.8e-05,-0.403372,-0.385846,0.588155,0.58591
150,1.2425,1.17798,-0.000386,-0.000405,0.11875,1.9e-05,-0.404772,-0.386093,0.726142,0.724114
200,1.2066,1.159207,-0.000365,-0.000381,0.10625,1.6e-05,-0.380623,-0.364561,0.602356,0.601016
250,1.2078,1.141846,-0.000359,-0.000375,0.10625,1.6e-05,-0.375169,-0.359112,0.780225,0.783277
300,1.1611,1.135222,-0.000358,-0.000374,0.1125,1.7e-05,-0.374229,-0.357538,0.626713,0.62747
350,1.1541,1.13201,-0.000357,-0.000376,0.11875,1.9e-05,-0.375681,-0.35673,0.809235,0.807797
400,1.1645,1.128763,-0.000357,-0.000374,0.1125,1.7e-05,-0.374454,-0.357175,0.697171,0.693992
450,1.1514,1.125944,-0.000354,-0.000371,0.11875,1.7e-05,-0.370912,-0.354263,0.628171,0.624942
500,1.1999,1.116757,-0.000345,-0.000362,0.1125,1.7e-05,-0.362335,-0.345339,0.649356,0.645279


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1119813919067383
Training with params: (16, 32, 0.0002, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8348,1.277544,-0.004195,-0.004409,0.11875,0.000214,-0.440932,-0.41955,0.011839,0.005088
100,1.2858,1.208896,-0.003855,-0.004032,0.10625,0.000177,-0.403236,-0.385499,0.589102,0.586755
150,1.2468,1.18284,-0.003842,-0.004027,0.11875,0.000185,-0.402745,-0.384244,0.720263,0.717732
200,1.2131,1.165027,-0.003648,-0.003807,0.10625,0.000159,-0.380699,-0.364821,0.628832,0.628179
250,1.2131,1.147783,-0.003603,-0.003764,0.10625,0.000161,-0.376385,-0.360314,0.791723,0.793916
300,1.166,1.141387,-0.003582,-0.003751,0.1125,0.000168,-0.375053,-0.358246,0.613635,0.613732
350,1.16,1.136918,-0.00356,-0.003746,0.1125,0.000186,-0.374623,-0.356048,0.779807,0.777611
400,1.1706,1.134444,-0.003566,-0.003736,0.1125,0.00017,-0.373609,-0.356617,0.660433,0.658097
450,1.1578,1.131313,-0.00354,-0.003708,0.1125,0.000168,-0.370833,-0.354012,0.610459,0.607944
500,1.2059,1.123426,-0.003458,-0.003623,0.1125,0.000164,-0.362292,-0.345843,0.638867,0.635099


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1183849573135376
Training with params: (16, 32, 0.0002, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8286,1.27106,-0.000419,-0.000441,0.11875,2.1e-05,-0.440653,-0.419349,0.002289,-0.004765
100,1.2795,1.203359,-0.000386,-0.000403,0.10625,1.8e-05,-0.403401,-0.385582,0.577863,0.575347
150,1.2403,1.177211,-0.000384,-0.000403,0.11875,1.9e-05,-0.40296,-0.384394,0.691073,0.688186
200,1.2068,1.157891,-0.000365,-0.000381,0.10625,1.6e-05,-0.380677,-0.364878,0.637377,0.63769
250,1.2069,1.141506,-0.00036,-0.000376,0.10625,1.6e-05,-0.376318,-0.360198,0.721275,0.721794
300,1.1601,1.134598,-0.000358,-0.000374,0.1125,1.7e-05,-0.374142,-0.357515,0.571191,0.570587
350,1.1535,1.132104,-0.000356,-0.000375,0.11875,1.9e-05,-0.375174,-0.356427,0.755165,0.754179
400,1.1645,1.129022,-0.000357,-0.000375,0.1125,1.7e-05,-0.374633,-0.357465,0.639214,0.636464
450,1.1518,1.12544,-0.000354,-0.000371,0.11875,1.7e-05,-0.371018,-0.354378,0.61051,0.607437
500,1.199,1.116759,-0.000346,-0.000363,0.11875,1.7e-05,-0.362507,-0.345984,0.626967,0.623454


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1121203899383545
Training with params: (16, 32, 0.0002, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8352,1.277529,-0.004198,-0.004412,0.11875,0.000214,-0.441169,-0.41977,0.007077,0.000238
100,1.2861,1.21005,-0.003859,-0.004036,0.10625,0.000177,-0.403616,-0.385879,0.595961,0.593568
150,1.247,1.183293,-0.003848,-0.004033,0.11875,0.000185,-0.403279,-0.384817,0.69671,0.69411
200,1.2133,1.164807,-0.003646,-0.003805,0.10625,0.000158,-0.380455,-0.364606,0.620993,0.620784
250,1.2133,1.14649,-0.003587,-0.003745,0.10625,0.000158,-0.374549,-0.358741,0.730322,0.729446
300,1.166,1.13999,-0.00357,-0.003738,0.1125,0.000168,-0.373775,-0.35698,0.599366,0.599325
350,1.1599,1.137236,-0.003566,-0.003753,0.11875,0.000188,-0.375345,-0.356583,0.79214,0.790949
400,1.1714,1.133794,-0.003564,-0.003735,0.1125,0.000171,-0.373534,-0.356391,0.668242,0.664501
450,1.158,1.130491,-0.00354,-0.003707,0.1125,0.000167,-0.370726,-0.353999,0.623818,0.621391
500,1.2061,1.12196,-0.003452,-0.003624,0.1125,0.000172,-0.362356,-0.345177,0.647865,0.644124


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1172550916671753
Training with params: (16, 32, 0.0002, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8284,1.270908,-0.000419,-0.00044,0.11875,2.1e-05,-0.440274,-0.41882,0.009984,0.003221
100,1.2796,1.203571,-0.000386,-0.000403,0.10625,1.8e-05,-0.403329,-0.385673,0.578685,0.575877
150,1.2406,1.176714,-0.000384,-0.000402,0.11875,1.9e-05,-0.402195,-0.383626,0.677844,0.675908
200,1.2072,1.158228,-0.000364,-0.00038,0.1,1.6e-05,-0.380452,-0.364464,0.645689,0.64606
250,1.2071,1.141744,-0.00036,-0.000376,0.10625,1.6e-05,-0.375999,-0.359987,0.75008,0.749287
300,1.16,1.134892,-0.000358,-0.000375,0.1125,1.7e-05,-0.374757,-0.35795,0.607657,0.606907
350,1.1536,1.132159,-0.000357,-0.000376,0.11875,1.9e-05,-0.375665,-0.356881,0.789179,0.788146
400,1.1651,1.12782,-0.000357,-0.000374,0.1125,1.7e-05,-0.374052,-0.356594,0.676604,0.672353
450,1.1521,1.124949,-0.000354,-0.000371,0.1125,1.7e-05,-0.370922,-0.354012,0.614824,0.6116
500,1.1991,1.116567,-0.000346,-0.000363,0.1125,1.7e-05,-0.362713,-0.345576,0.641666,0.638066


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1119515895843506
Training with params: (16, 32, 0.0002, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.8354,1.277684,-0.004202,-0.004416,0.11875,0.000214,-0.441561,-0.420182,0.003865,-0.002925
100,1.286,1.209502,-0.003855,-0.004033,0.10625,0.000178,-0.403307,-0.385526,0.588046,0.585489
150,1.2467,1.183942,-0.003853,-0.004039,0.11875,0.000186,-0.403891,-0.385299,0.705992,0.703555
200,1.2131,1.164884,-0.003651,-0.00381,0.10625,0.000159,-0.381044,-0.365103,0.634142,0.633084
250,1.2131,1.14727,-0.003598,-0.003758,0.10625,0.00016,-0.375823,-0.359786,0.774365,0.774782
300,1.1669,1.142286,-0.003597,-0.003766,0.1125,0.000169,-0.376551,-0.359679,0.618856,0.61816
350,1.1604,1.137978,-0.003572,-0.00376,0.11875,0.000188,-0.376006,-0.357242,0.776993,0.775524
400,1.1711,1.134618,-0.003574,-0.003748,0.1125,0.000174,-0.374791,-0.357374,0.666023,0.663522
450,1.1578,1.131163,-0.003547,-0.003716,0.11875,0.000169,-0.371607,-0.354715,0.627608,0.624881
500,1.2059,1.123142,-0.003459,-0.003626,0.1125,0.000167,-0.362602,-0.345942,0.636167,0.632038


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1182057857513428
Training with params: (16, 32, 0.0003, 0.01, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.749,1.266749,-0.00041,-0.000431,0.11875,2.1e-05,-0.430689,-0.409619,0.220429,0.21442
100,1.2611,1.19203,-0.000383,-0.000399,0.10625,1.6e-05,-0.398885,-0.382642,0.620917,0.619421
150,1.2312,1.1741,-0.000385,-0.000403,0.11875,1.8e-05,-0.403038,-0.385315,0.836532,0.839076
200,1.2076,1.155874,-0.000362,-0.000378,0.10625,1.5e-05,-0.377851,-0.362355,0.62134,0.624804
250,1.2065,1.139127,-0.000358,-0.000374,0.10625,1.6e-05,-0.373778,-0.358211,0.734568,0.735332
300,1.1593,1.133752,-0.000356,-0.000373,0.1125,1.7e-05,-0.372684,-0.355852,0.699137,0.698716
350,1.1564,1.129922,-0.000356,-0.000374,0.11875,1.8e-05,-0.374156,-0.356201,0.936111,0.93462
400,1.1648,1.131283,-0.000357,-0.000375,0.1125,1.8e-05,-0.375283,-0.356984,0.815813,0.81596


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.129921555519104
Training with params: (16, 32, 0.0003, 0.01, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.7546,1.269323,-0.004065,-0.004279,0.11875,0.000214,-0.427858,-0.406473,0.228342,0.222179
100,1.2668,1.198888,-0.003831,-0.003994,0.1125,0.000164,-0.399428,-0.383066,0.611394,0.610922
150,1.2375,1.180966,-0.003856,-0.004031,0.11875,0.000175,-0.403097,-0.385573,0.866672,0.867623
200,1.21,1.159826,-0.003624,-0.003776,0.10625,0.000152,-0.377647,-0.362401,0.702247,0.704039
250,1.2126,1.1432,-0.003556,-0.003722,0.1125,0.000166,-0.372173,-0.355575,0.815023,0.817445
300,1.1655,1.138839,-0.003545,-0.003715,0.1125,0.00017,-0.3715,-0.354494,0.679748,0.682758
350,1.1621,1.133781,-0.003553,-0.003732,0.11875,0.00018,-0.373216,-0.355255,0.938739,0.94129
400,1.171,1.133964,-0.003556,-0.003734,0.1125,0.000178,-0.373436,-0.355594,0.820017,0.822025


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1337809562683105
Training with params: (16, 32, 0.0003, 0.02, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.7484,1.265901,-0.00041,-0.000431,0.11875,2.1e-05,-0.430833,-0.409621,0.224665,0.218545
100,1.2612,1.191856,-0.000383,-0.000399,0.1125,1.6e-05,-0.398969,-0.382504,0.625556,0.623871
150,1.231,1.17397,-0.000385,-0.000403,0.11875,1.8e-05,-0.402506,-0.384852,0.840622,0.841744
200,1.2029,1.153943,-0.000361,-0.000377,0.1,1.5e-05,-0.376639,-0.361374,0.738579,0.741226
250,1.2059,1.138941,-0.000357,-0.000373,0.10625,1.6e-05,-0.373245,-0.35695,0.769153,0.770561
300,1.1594,1.131662,-0.000354,-0.000371,0.1125,1.7e-05,-0.371109,-0.354276,0.650426,0.653823
350,1.1561,1.128765,-0.000356,-0.000374,0.11875,1.9e-05,-0.374464,-0.355815,0.929229,0.932401
400,1.1646,1.129466,-0.000356,-0.000375,0.1125,1.9e-05,-0.37467,-0.355901,0.760876,0.761513
450,1.153,1.12383,-0.000351,-0.000368,0.1125,1.8e-05,-0.368463,-0.350859,0.780276,0.779472
500,1.1994,1.116978,-0.000343,-0.00036,0.10625,1.7e-05,-0.36032,-0.343398,0.64628,0.645919


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1120051145553589
Training with params: (16, 32, 0.0003, 0.02, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.7546,1.271404,-0.004096,-0.004307,0.1125,0.000211,-0.430664,-0.409587,0.229199,0.222715
100,1.2675,1.19901,-0.003838,-0.004001,0.1125,0.000163,-0.400133,-0.383788,0.623313,0.622353
150,1.2372,1.182802,-0.003872,-0.004051,0.11875,0.000178,-0.405069,-0.387232,0.843906,0.844616
200,1.21,1.162364,-0.003633,-0.003785,0.10625,0.000152,-0.378467,-0.363256,0.711118,0.713205
250,1.2126,1.14468,-0.003568,-0.003725,0.10625,0.000157,-0.37253,-0.356816,0.844184,0.845607
300,1.1664,1.138479,-0.003556,-0.003725,0.1125,0.000169,-0.372486,-0.355629,0.729584,0.733995
350,1.1618,1.135063,-0.003555,-0.003736,0.11875,0.000181,-0.373646,-0.355534,0.973796,0.975844
400,1.1703,1.138469,-0.003581,-0.003762,0.11875,0.00018,-0.37618,-0.358135,0.796956,0.796922


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1350634098052979
Training with params: (16, 32, 0.0003, 0.05, 0.001)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.7501,1.324893,-0.000435,-0.000457,0.1125,2.2e-05,-0.457037,-0.435498,0.209543,0.20236
100,1.2798,1.190476,-0.000382,-0.000399,0.10625,1.7e-05,-0.398562,-0.38176,0.785874,0.785348
150,1.2345,1.173017,-0.000383,-0.000402,0.11875,1.9e-05,-0.401717,-0.382979,0.891871,0.894226
200,1.2061,1.153437,-0.000361,-0.000377,0.10625,1.5e-05,-0.376595,-0.361233,0.712573,0.716876
250,1.2078,1.140122,-0.000358,-0.000373,0.10625,1.5e-05,-0.373462,-0.358213,0.766107,0.766411
300,1.1589,1.132308,-0.000353,-0.00037,0.1125,1.7e-05,-0.369739,-0.352676,0.679801,0.682221
350,1.1552,1.129128,-0.000355,-0.000375,0.11875,2e-05,-0.374911,-0.355285,0.886715,0.88535
400,1.1647,1.129104,-0.000356,-0.000376,0.1125,2e-05,-0.375958,-0.356431,0.740366,0.737915
450,1.1525,1.124779,-0.00035,-0.000368,0.1125,1.8e-05,-0.368054,-0.349786,0.747389,0.745288
500,1.2002,1.116663,-0.000342,-0.00036,0.1,1.7e-05,-0.3597,-0.342464,0.695551,0.695031


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.1120831966400146
Training with params: (16, 32, 0.0003, 0.05, 0.01)


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 15,249 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,906
 "-____-"     Number of trainable parameters = 11,272,192


Step,Training Loss,Validation Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
50,1.7552,1.273172,-0.004104,-0.004314,0.11875,0.00021,-0.43136,-0.410351,0.214086,0.208285
100,1.2674,1.197093,-0.003828,-0.00399,0.10625,0.000162,-0.398969,-0.382781,0.651558,0.650482
150,1.2372,1.179531,-0.00385,-0.004019,0.11875,0.000169,-0.40193,-0.385001,0.862206,0.864785
200,1.21,1.162001,-0.003624,-0.003773,0.1,0.000149,-0.377314,-0.362448,0.680743,0.683019
250,1.2137,1.144829,-0.003581,-0.00374,0.10625,0.000159,-0.373988,-0.358084,0.829225,0.829951
300,1.1651,1.137881,-0.003541,-0.003713,0.1125,0.000172,-0.371341,-0.354112,0.766705,0.771194
350,1.162,1.134471,-0.003545,-0.003725,0.11875,0.000179,-0.372453,-0.354521,0.96896,0.973029
400,1.1711,1.135997,-0.003556,-0.003742,0.1125,0.000186,-0.374182,-0.355596,0.825778,0.827171
450,1.1593,1.130558,-0.003497,-0.00367,0.11875,0.000173,-0.366976,-0.349679,0.814732,0.815243
500,1.2064,1.123103,-0.003438,-0.003611,0.1125,0.000173,-0.36108,-0.343816,0.699475,0.700499


Early stopping triggered. No improvement for 3 evaluations.


Early stopping triggered. No improvement for 3 evaluations.
Eval loss: 1.119097352027893
Best params: {'r': 16, 'lora_alpha': 32, 'learning_rate': 0.0002, 'weight_decay': 0.05, 'beta': 0.001}
Best eval loss: 1.1119515895843506


In [6]:
# from trl import ORPOConfig, ORPOTrainer
# from unsloth import is_bfloat16_supported

# orpo_trainer = ORPOTrainer(
#     model = model,
#     train_dataset = train_dataset,
#     eval_dataset= eval_dataset,
#     tokenizer = tokenizer,
#     args = ORPOConfig(
#         warmup_steps = 5,
#         learning_rate = 2e-4,
#         weight_decay = 0.01,
#         seed = 3407,
#         max_length = max_seq_length,
#         max_prompt_length = max_seq_length//2,
#         max_completion_length = max_seq_length//2,
#         per_device_train_batch_size = 2,
#         gradient_accumulation_steps = 4,
#         beta = 0.1,
#         logging_steps = 100,
#         optim = "adamw_8bit",
#         lr_scheduler_type = "linear",
#         num_train_epochs = 1,
#         # max_steps = 30, # Change to num_train_epochs = 1 for full training runs
#         fp16 = not is_bfloat16_supported(),
#         bf16 = is_bfloat16_supported(),
#         output_dir = "outputs",
#         report_to = "none", # Use this for WandB etc
#         push_to_hub=False,
#     ),
# )

In [7]:
# # orpo_trainer.train(resume_from_checkpoint="outputs/checkpoint-6685")
# orpo_trainer.train()

In [8]:
# trainer_stats_eval = orpo_trainer.evaluate()

# for key, value in trainer_stats_eval.items():
#     print(f"{key}: {value}")

# import json
# with open("Llama-3.2-1B-Instruct_evaluation_results.json", "w") as f:
#     json.dump(trainer_stats_eval, f, indent=4)

<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

In [9]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

NameError: name 'model' is not defined

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
# model.save_pretrained("orpo_model") # Local saving
# tokenizer.save_pretrained("orpo_model")
model.push_to_hub("AMomozZz/orpo_model1", token = "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") # Online saving
tokenizer.push_to_hub("AMomozZz/orpo_model1", token = "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") # Online saving

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "AMomozZz/orpo_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# alpaca_prompt = You MUST copy from above!

inputs = tokenizer(
[
    alpaca_prompt.format(
        "What is a famous tall tower in Paris?", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in `llama.cpp` or a UI based system like `GPT4All`. You can install GPT4All by going [here](https://gpt4all.io/index.html).

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Zephyr DPO 2x faster [free Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)
2. Llama 7b 2x faster [free Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)
3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)
5. Mistral 7b [free Kaggle version](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)
6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗 HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!
7. `ChatML` for ShareGPT datasets, [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing)
8. Text completions like novel writing [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)

<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Support our work if you can! Thanks!
</div>