<a href="https://colab.research.google.com/github/ChrisM-Git/AI/blob/main/LORAtuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Trelis Livestream

Most weeks at 5 pm Irish time on [YouTube](youtube.com/@TrelisResearch) and [X](twitter.com/@TrelisResearch).

Find other fine-tuning/inference resources at [Trelis.com](Trelis.com/About).

[RunPod Affiliate Link](https://runpod.io?ref=jmfkcdio
) (supports the channel).


# LoRA Fine-tuning Parameter Selection
Full fine-tuning.
- Choosing learning rate.

LoRA Settings:
- [visual examples.](https://docs.google.com/presentation/d/18npsbcrkCi3kY41ejxquEqR6TjLPFk9l6nwSavaXN1w/edit?usp=sharing)
- What are LoRA modules?
- LoRA alpha and LoRA r (rank).
- rank stabilized LoRA.
- choosing alpha.
- choosing r based on the application
- making lm_head and embed_tokens trainable.

Batch size:
- Pros and cons of increasing/decreasing.
- Gradient Accumulation.

Dataset sizes and epochs:
- How dataset size required depends on training type.
- Number of epochs.

Learning rate scheduler:
- constant vs cosine vs annealing.

Training examples:
- [wandb](https://wandb.ai/trelis/function-calling-v3b?nw=nwuserronankmcgovern)


### Key Links:
- [ADVANCED Fine-tuning Repo](https://trelis.com/ADVANCED-fine-tuning/)
- [One click RunPod templates](github.com/TrelisResearch/one-click-llms) - quick note on last week and how to inference with two images.

In [None]:
!pip install accelerate transformers hf_transfer -q
# !pip install flash-attn -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.6/302.6 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.4/4.4 MB[0m [31m26.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m50.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import os
# Setting the environment variable to enable fast downloads with the hf_transfer rust library.
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

In [None]:
import transformers
import torch
from transformers import AutoModelForCausalLM

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    cache_dir='',
    device_map='auto',
    # attn_implementation="flash_attention_2",
    torch_dtype=torch.bfloat16)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir='')

In [None]:
print(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 2048)
    (layers): ModuleList(
      (0-21): 22 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=2048, out_features=5632, bias=False)
          (up_proj): Linear(in_features=2048, out_features=5632, bias=False)
          (down_proj): Linear(in_features=5632, out_features=2048, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Line

In [None]:
!pip install peft -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/251.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/251.6 kB[0m [31m821.4 kB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.2/251.6 kB[0m [31m1.3 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m245.8/251.6 kB[0m [31m2.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.6/251.6 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from peft import LoraConfig

lora_config = LoraConfig(
    target_modules=['gate_proj','k_proj'],
    lora_dropout=0.1, #prevent overfitting. Not supported unsloth.
    bias="none",
    task_type="CAUSAL_LM",
    modules_to_save=['lm_head','embed_tokens'],
    r=128, # rank
    lora_alpha=32,
    use_rslora=True,
)

In [None]:
# ## Learning rates

# base_learning_rate=1e-4

# # base rule of thumb
# adapter_learning_rate= base_learning_rate * alpha / r = base_rate * 2.

# # rank stabilised lora
# adapter_learning_rate = alpha * base_learning_rate / sqrt(r)

# # setting alpha
# alpha ~ sqrt(r - where r is the rough size of the original matrix)

In [None]:
alpha = 2000**0.5
print(alpha)

44.721359549995796


In [None]:
from peft import get_peft_model
model=get_peft_model(model,lora_config)

model.print_trainable_parameters()

trainable params: 159,186,944 || all params: 1,259,235,328 || trainable%: 12.6416


In [None]:
!pip install git+https://github.com/huggingface/trl.git -q -U

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.4/102.4 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m18.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for trl (pyproject.toml) ... [?25l[?25hdone


In [None]:
!pip install datasets -q

In [None]:
from datasets import load_dataset

dataset_name = "Trelis/openassistant-llama-style"

train_dataset = load_dataset(dataset_name, split="train[:100]")
eval_dataset = load_dataset(dataset_name, split="test[:10]")

print(train_dataset)
print(eval_dataset)

Dataset({
    features: ['text'],
    num_rows: 100
})
Dataset({
    features: ['text'],
    num_rows: 10
})


In [None]:
# docs are here: https://huggingface.co/docs/trl/en/sft_trainer
from trl import SFTTrainer, SFTConfig

trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    args=SFTConfig(
        # max_steps=1,
        max_seq_length=2048,
        dataset_text_field="text",
        num_train_epochs=1,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        gradient_accumulation_steps=4,
        do_eval=True,
        lr_scheduler_type="constant",
        learning_rate=1e-4,
        eval_strategy="steps",
        eval_steps=0.2,
        warmup_ratio=0.05,
        output_dir='./',
        gradient_checkpointing=True,
        gradient_checkpointing_kwargs={"use_reentrant": True}, # trl will soon require this (true is faster but more complicated and bug prone)
        logging_steps=1,
    )
    # peft_config=lora_config, #no need to pass this in if you have already applied the peft config.
)

trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss,Validation Loss
1,1.8235,1.518413
2,1.635,1.484386
3,1.6174,1.461405


TrainOutput(global_step=3, training_loss=1.6919945081075032, metrics={'train_runtime': 337.3955, 'train_samples_per_second': 0.296, 'train_steps_per_second': 0.009, 'total_flos': 598431981010944.0, 'train_loss': 1.6919945081075032, 'epoch': 0.9230769230769231})