In [1]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer=FastLanguageModel.from_pretrained(
    model_name="Qwen/Qwen2.5-1.5B-Instruct",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth 2024.10.7: Fast Qwen2 patching. Transformers = 4.45.2.
   \\   /|    GPU: NVIDIA GeForce RTX 3090. Max memory: 23.668 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.0+cu121. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.27.post2. FA2 = True]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


Unsloth: We fixed a gradient accumulation bug, but it seems like you don't have the latest transformers version!
Please update transformers, TRL and unsloth via:
`pip install --upgrade --no-cache-dir --no-deps unsloth transformers git+https://github.com/huggingface/trl.git`


In [2]:
model=FastLanguageModel.get_peft_model(
    model,
    r=64, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=64,
    lora_dropout=0, # Currently only supports dropout=0
    bias="none",    # Currently only supports bias="none"
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing="unsloth", # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  # We support rank stabilized LoRA
    loftq_config=None, # And LoftQ
)

Unsloth 2024.10.7 patched 28 layers with 0 QKV layers, 28 O layers and 28 MLP layers.


In [17]:
prompt_template = """<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{}<|im_end|>
<|im_start|>assistant
{}<|im_end|>"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

def format_prompt(examples):
    prompts   = examples["prompt"]
    responses = examples["chosen"]
    texts = []
    for prompt, response in zip(prompts, responses):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = prompt_template.format(prompt, response) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}

from datasets import load_dataset
dataset = load_dataset("OpenEndedLM/OpenEndPreference-v0.1")["train"]
dataset = dataset.map(format_prompt, batched=True)

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [18]:
print(dataset["text"][0])

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the significance of the 13 stripes on the American flag?<|im_end|>
<|im_start|>assistant
The 13 stripes on the American flag represent the original 13 colonies that declared their independence from Great Britain and became the first states in the United States. These colonies were:

1. Delaware
2. Pennsylvania
3. New Jersey
4. Georgia
5. Connecticut
6. Massachusetts
7. Maryland
8. South Carolina
9. New Hampshire
10. Virginia
11. New York
12. North Carolina
13. Rhode Island

The red and white stripes symbolize the struggle and unity of these colonies against British rule, with each stripe representing one of the original states. The alternating pattern of red and white is a traditional design for flags, but in this context, it also serves as a visual reminder of the historical union and the principles upon which the United States was founded.

The flag's design has evolved over time to reflect changes wit

In [22]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,

        num_train_epochs = 3, # Set this for 1 full training run.
        #max_steps = 60,

        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Map (num_proc=2):   0%|          | 0/100 [00:00<?, ? examples/s]

In [23]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 100 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 36
 "-____-"     Number of trainable parameters = 73,859,072


**** Unsloth: Please use our fixed gradient_accumulation_steps by updating transformers, TRL and Unsloth!
`pip install --upgrade --no-cache-dir --no-deps unsloth transformers git+https://github.com/huggingface/trl.git`


  0%|          | 0/36 [00:00<?, ?it/s]

{'loss': 0.5018, 'grad_norm': 0.36850401759147644, 'learning_rate': 4e-05, 'epoch': 0.08}
{'loss': 0.5265, 'grad_norm': 0.3502880930900574, 'learning_rate': 8e-05, 'epoch': 0.16}
{'loss': 0.4692, 'grad_norm': 0.37345170974731445, 'learning_rate': 0.00012, 'epoch': 0.24}
{'loss': 0.5662, 'grad_norm': 0.43245166540145874, 'learning_rate': 0.00016, 'epoch': 0.32}
{'loss': 0.442, 'grad_norm': 0.3521885573863983, 'learning_rate': 0.0002, 'epoch': 0.4}
{'loss': 0.4482, 'grad_norm': 0.43603280186653137, 'learning_rate': 0.00019354838709677422, 'epoch': 0.48}
{'loss': 0.5398, 'grad_norm': 0.5475645065307617, 'learning_rate': 0.0001870967741935484, 'epoch': 0.56}
{'loss': 0.5601, 'grad_norm': 0.5059583783149719, 'learning_rate': 0.00018064516129032257, 'epoch': 0.64}
{'loss': 0.5702, 'grad_norm': 0.5280370116233826, 'learning_rate': 0.00017419354838709678, 'epoch': 0.72}
{'loss': 0.4482, 'grad_norm': 0.402022123336792, 'learning_rate': 0.00016774193548387098, 'epoch': 0.8}
{'loss': 0.5843, 'gra

In [24]:
# alpaca_prompt=Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs=tokenizer(
[
    prompt_template.format(
        "Whats a triangle", # prompt
        "", # output - leave this blank for generation!
    )
], return_tensors="pt").to("cuda")

outputs=model.generate(**inputs, max_new_tokens=512, use_cache=True)
print(tokenizer.batch_decode(outputs)[0])

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Whats a triangle<|im_end|>
<|im_start|>assistant
<|im_end|>A triangle is a polygon with three sides and three vertices. It's one of the basic shapes in geometry and has several important properties:

1. **Classification**: Triangles can be classified based on their angles (acute, right, obtuse) or by their side lengths (equilateral, isosceles, scalene).

2. **Properties**:
   - All interior angles sum up to 180 degrees.
   - Each angle is less than 180 degrees.
   - The sum of any two sides must be greater than the third side (triangle inequality theorem).
   
3. **Types**:
   - Equilateral Triangle: All three sides are equal, and all angles are 60 degrees.
   - Isosceles Triangle: Two sides are equal, and the angles opposite these sides are also equal.
   - Scalene Triangle: No sides are equal, and therefore no angles are equal.

4. **Geometric Shapes**: A triangle is fundamental in various fields such as mathe

In [25]:
model.save_pretrained_merged("output")

Unsloth: You're not saving a tokenizer as well?
You can do it separately via `tokenizer.save_pretrained(...)`


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 11.8 out of 31.11 RAM for saving.


100%|██████████| 28/28 [00:00<00:00, 183.38it/s]


Unsloth: Saving model... This might take 5 minutes for Llama-7b...





Done.
