<a href="https://colab.research.google.com/github/wandb/edu/blob/main/llm-training-course/colab/finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<!--- @wandbcode{llmtraining-colab} -->

# Training a 3B Llama on instruction dataset with Weights & Biases, HuggingFace, LoRA and Quantization

Tested on Google Colab V100 GPU. Check out [W&B HuggingFace documentation](https://docs.wandb.ai/guides/integrations/huggingface) for more details.

In [1]:
!python -m pip install -U wandb transformers trl datasets "protobuf==3.20.3" evaluate peft bitsandbytes accelerate sentencepiece -qqq

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.5/8.5 MB[0m [31m39.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m155.3/155.3 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m31.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.9/190.9 kB[0m [31m19.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m24.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━

In [2]:
!wget https://github.com/wandb/edu/raw/main/llm-training-course/colab/utils.py

--2024-03-03 16:42:55--  https://github.com/wandb/edu/raw/main/llm-training-course/colab/utils.py
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/wandb/edu/main/llm-training-course/colab/utils.py [following]
--2024-03-03 16:42:55--  https://raw.githubusercontent.com/wandb/edu/main/llm-training-course/colab/utils.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8155 (8.0K) [text/plain]
Saving to: ‘utils.py’


2024-03-03 16:42:56 (68.3 MB/s) - ‘utils.py’ saved [8155/8155]



Let's grab the Alpaca (GPT-4 curated instructions and outputs) dataset:

In [3]:
import wandb
wandb.init(project="alpaca_ft", # the project I am working on
           job_type="train",
           tags=["hf_sft_lora", "3b"]) # the Hyperparameters I want to keep track of
artifact = wandb.use_artifact('capecape/alpaca_ft/alpaca_gpt4_splitted:v4', type='dataset')
artifact_dir = artifact.download()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


[34m[1mwandb[0m: \ 1 of 2 files downloaded...[34m[1mwandb[0m:   2 of 2 files downloaded.  


In [4]:
from datasets import load_dataset
alpaca_ds = load_dataset("json", data_dir=artifact_dir)

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

In [5]:
# Let's subsample the training and test dataset - you may want to switch to full dataset in your experiments
alpaca_ds['train'] = alpaca_ds['train'].select(range(512))
alpaca_ds['test'] = alpaca_ds['test'].select(range(10))

Let's log the dataset also as a table so we can inspect it on the workspace.

In [6]:
def prompt_no_input(row):
    return ("Below is an instruction that describes a task. "
            "Write a response that appropriately completes the request.\n\n"
            "### Instruction:\n{instruction}\n\n### Response:\n{output}").format_map(row)

def prompt_input(row):
    return ("Below is an instruction that describes a task, paired with an input that provides further context. "
            "Write a response that appropriately completes the request.\n\n"
            "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n{output}").format_map(row)

def create_prompt(row):
    return prompt_no_input(row) if row["input"] == "" else prompt_input(row)

In [7]:
train_dataset = alpaca_ds["train"]
eval_dataset = alpaca_ds["test"]

In [8]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

In [9]:
#model_id = 'openlm-research/open_llama_3b_v2'#
#model_id = 'meta-llama/Llama-2-70b-chat-hf'
model_id = 'meta-llama/Llama-2-7b-chat-hf'

Let's define our configurations for LoRA, quantization and model training so that it fits on our GPU.

In [11]:
from peft import LoraConfig, get_peft_model

peft_config = LoraConfig(
    r=64,  # the rank of the LoRA matrices
    lora_alpha=16, # the weight
    lora_dropout=0.1, # dropout to add to the LoRA layers
    bias="none", # add bias to the nn.Linear layers?
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj","v_proj","o_proj"], # the name of the layers to add LoRA
)

In [12]:
from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16
)

In [13]:
model_kwargs = dict(
    device_map={"" : 0},
    trust_remote_code=True,
    # low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    # use_flash_attention_2=True,
    use_cache=False,
    quantization_config=bnb_config,
)

In [14]:
from transformers import TrainingArguments
from trl import SFTTrainer

In [23]:
batch_size = 2
gradient_accumulation_steps = 16
num_train_epochs = 10

We'll add a `report_to="wandb"` flag here to get the benefits of [W&B HuggingFace integration](https://docs.wandb.ai/guides/integrations/huggingface).

In [24]:
output_dir = "./output/"
training_args = TrainingArguments(
    num_train_epochs=num_train_epochs,
    output_dir=output_dir,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    fp16=True,
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    gradient_accumulation_steps=gradient_accumulation_steps,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs=dict(use_reentrant=False),
    evaluation_strategy="epoch",
    logging_strategy="steps",
    logging_steps=1,
    save_strategy="epoch",
    report_to="wandb",
)

In [17]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [25]:
from utils import LLMSampleCB

trainer = SFTTrainer(
    model=model_id,
    model_init_kwargs=model_kwargs,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    packing=True,
    max_seq_length=1024,
    args=training_args,
    formatting_func=create_prompt,
    peft_config=peft_config,
)



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [26]:
# remove answers
def create_prompt_no_anwer(row):
    row["output"] = ""
    return {"text": create_prompt(row)}

test_dataset = eval_dataset.map(create_prompt_no_anwer)

We will add a custom W&B callback to the trainer so that we can sample and log model generations in W&B dashboard. Review [W&B HuggingFace documentation](https://docs.wandb.ai/guides/integrations/huggingface) for the most up-to-date best practices.

In [27]:
wandb_callback = LLMSampleCB(trainer, test_dataset, num_samples=10, max_new_tokens=256)

In [28]:
trainer.add_callback(wandb_callback)

It's time to train!

In [None]:
trainer.train()
wandb.finish()

[34m[1mwandb[0m: Currently logged in as: [33mbalabala76[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss
0,1.4055,1.280125
1,1.1598,1.130723
2,1.1175,1.113722
3,1.0876,1.063155


  0%|          | 0/10 [00:00<?, ?it/s]

Checkpoint destination directory ./output/checkpoint-3 already exists and is non-empty. Saving will proceed but saved results may be invalid.
[34m[1mwandb[0m: Adding directory to artifact (./output/checkpoint-3)... Done. 4.3s


  0%|          | 0/10 [00:00<?, ?it/s]

[34m[1mwandb[0m: Adding directory to artifact (./output/checkpoint-7)... Done. 8.3s


  0%|          | 0/10 [00:00<?, ?it/s]

[34m[1mwandb[0m: Adding directory to artifact (./output/checkpoint-10)... Done. 3.5s


  0%|          | 0/10 [00:00<?, ?it/s]

[34m[1mwandb[0m: Adding directory to artifact (./output/checkpoint-14)... Done. 3.7s


Check out the sample generations in your W&B workspace. We've trained on a very small sample of dataset, so likely they won't be good. Try to improve this result! Train on a larger dataset, experiment with different hyperparameters and settings. Then share a W&B report with your results. Good luck!