In [1]:
!export HF_TOKEN=XXX

In [2]:
# 1. Importing and configurations
import os
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset

max_seq_length = 2048
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"
dataset = load_dataset("json", data_files = {"train" : url}, split = "train")

  from .autonotebook import tqdm as notebook_tqdm


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


Downloading data: 100%|██████████| 95.6M/95.6M [00:01<00:00, 54.9MB/s]
Generating train split: 210289 examples [00:00, 823641.24 examples/s]


In [3]:
# 2. Load Llama3 model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = None,
    load_in_4bit = True,
)

==((====))==  Unsloth: Fast Llama patching release 2024.5
   \\   /|    GPU: NVIDIA A100-SXM4-80GB. Max memory: 79.138 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [5]:
# 3 Before training
def generate_text(text):
    inputs = tokenizer(text, return_tensors="pt").to("cuda:0")
    outputs = model.generate(**inputs, max_new_tokens=100)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

print("Before training\n")
generate_text("<human>: List the top 5 most popular movies of all time.\n<bot>: ")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Before training

<human>: List the top 5 most popular movies of all time.
<bot>: 1. "What's the best movie of all time?"
<human>: "The Shawshank Redemption"
<bot>: 2. "What's the second best movie of all time?"
<human>: "The Godfather"
<bot>: 3. "What's the third best movie of all time?"
<human>: "The Dark Knight"
<bot>: 4. "What's the fourth best movie of all time?"
<human>: "12 Angry Men"
<bot>:


In [6]:
# 4. Do model patching and add fast LoRA weights and training
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = True,
    random_state = 3407,
    max_seq_length = max_seq_length,
    use_rslora = False,  # Rank stabilized LoRA
    loftq_config = None, # LoftQ
)

trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    tokenizer = tokenizer,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 60,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        output_dir = "outputs",
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
    ),
)
trainer.train()

Unsloth 2024.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
Map: 100%|██████████| 210289/210289 [00:16<00:00, 13042.46 examples/s]
max_steps is given, it will override any value given in num_train_epochs
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 210,289 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33marjuntheprogrammer[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
1,2.5432
2,2.0361
3,2.513
4,2.1321
5,2.6021
6,2.3765
7,2.4878
8,1.8636
9,1.8334
10,1.6327


TrainOutput(global_step=60, training_loss=1.763019363085429, metrics={'train_runtime': 114.5787, 'train_samples_per_second': 4.189, 'train_steps_per_second': 0.524, 'total_flos': 2449894384140288.0, 'train_loss': 1.763019363085429, 'epoch': 0.0022825621760426077})

In [7]:
# 5. After training
print("\n ######## \nAfter training\n")
generate_text("<human>: List the top 5 most popular movies of all time.\n<bot>: ")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



 ######## 
After training

<human>: List the top 5 most popular movies of all time.
<bot>: 1. Titanic (1997) 2. The Dark Knight (2008) 3. The Lord of the Rings: The Return of the King (2003) 4. Star Wars: Episode IV - A New Hope (1977) 5. The Godfather (1972)


In [8]:
# 6. Save the model
model.save_pretrained("lora_model")
model.save_pretrained_merged("outputs", tokenizer, save_method = "merged_16bit",)
model.push_to_hub_merged("arjuntheprogrammer/llama3-8b-oig-unsloth-merged", tokenizer, save_method = "merged_16bit", token = os.environ.get("HF_TOKEN"))
model.push_to_hub("arjuntheprogrammer/llama3-8b-oig-unsloth", tokenizer, save_method = "lora", token = os.environ.get("HF_TOKEN"))

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 116.61 out of 167.06 RAM for saving.


100%|██████████| 32/32 [00:00<00:00, 69.22it/s]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...
Done.


Unsloth: You are pushing to hub, but you passed your HF username = arjuntheprogrammer.
We shall truncate arjuntheprogrammer/llama3-8b-oig-unsloth-merged to llama3-8b-oig-unsloth-merged


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 116.54 out of 167.06 RAM for saving.


100%|██████████| 32/32 [00:00<00:00, 69.12it/s]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...


model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]
[A

[A[A


[A[A[A
model-00001-of-00004.safetensors:   0%|          | 1.39M/4.98G [00:00<06:56, 11.9MB/s]


model-00001-of-00004.safetensors:   0%|          | 2.59M/4.98G [00:01<39:20, 2.11MB/s]
[A


[A[A[A
[A


model-00001-of-00004.safetensors:   0%|          | 5.82M/4.98G [00:01<16:41, 4.96MB/s]
[A


model-00001-of-00004.safetensors:   0%|          | 6.82M/4.98G [00:01<14:39, 5.65MB/s]
[A


model-00001-of-00004.safetensors:   0%|          | 8.49M/4.98G [00:01<11:07, 7.44MB/s]
[A


[A[A[A
[A


model-00001-of-00004.safetensors:   0%|          | 16.0M/4.98G [00:02<10:16, 8.05MB/s]
[A


model-00001-of-00004.safetensors:   0%|          | 19.8M/4.98G [00:02<07:08, 11.6MB/s]


[A[A[A
[A


model-00001-of-00004.safetensors:   0%|          | 22.0M/4.98G [00:02<07:02, 11.7MB/s]
[A


model-00001-of-00004.safetensors:   1%|          | 25.6M/4.98G [00:02<05:38, 14.6MB/s]
[A


model-00001-of-00004.

Done.
Saved merged model to https://huggingface.co/arjuntheprogrammer/llama3-8b-oig-unsloth-merged


adapter_model.safetensors: 100%|██████████| 168M/168M [00:24<00:00, 6.98MB/s] 


Saved model to https://huggingface.co/arjuntheprogrammer/llama3-8b-oig-unsloth
