This notebook is taken directly from https://github.com/tcapelle/llm_recipes/tree/main

# From Llama to Alpaca: Finetunning and LLM with Weights & Biases
In this notebooks you will learn how to finetune a pretrained LLama model on an Instruction dataset. We will use an updated version of the Alpaca dataset that, instead of davinci-003 (GPT3) generations uses GPT4 to get an even better instruction dataset! More details on the [official repo page](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM#how-good-is-the-data)

> This notebook requires a A100/A10 GPU with at least 24GB of memory. You could tweak the params down and run on a T4 but it would take very long time

This notebooks has a companion project and [report](wandb.me/alpaca)

In [1]:
!pip install wandb transformers trl datasets "protobuf==3.20.3" evaluate



















## With Huggingface TRL

Let's grab the Alpaca (GPT-4 curated instructions and outputs) dataset:

In [2]:
# !wget https://raw.githubusercontent.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/main/data/alpaca_gpt4_data.json

In [3]:
import json

dataset_file = "alpaca_gpt4_data.json"

with open(dataset_file, "r") as f:
    alpaca = json.load(f)

In [4]:
import wandb
wandb.init(project="alpaca_ft", # the project I am working on
           tags=["hf_sft"]) # the Hyperparameters I want to keep track of
artifact = wandb.use_artifact('capecape/alpaca_ft/alpaca_gpt4_splitted:latest', type='dataset')
artifact_dir = artifact.download()

[34m[1mwandb[0m: Currently logged in as: [33mnelectric[0m ([33mneelectric[0m). Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: wandb version 0.16.2 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


[34m[1mwandb[0m: Tracking run with wandb version 0.16.1


[34m[1mwandb[0m: Run data is saved locally in [35m[1m/home/service/BioLlama/utilities/finetuning/wandb/run-20240117_002102-30bl1xzs[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.


[34m[1mwandb[0m: Syncing run [33mfrosty-waterfall-40[0m


[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/neelectric/alpaca_ft[0m


[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/neelectric/alpaca_ft/runs/30bl1xzs[0m


[34m[1mwandb[0m: \ 1 of 2 files downloaded...

[34m[1mwandb[0m:   2 of 2 files downloaded.  


In [5]:
print(artifact_dir)
from datasets import load_dataset
alpaca_ds = load_dataset("json", data_dir=artifact_dir)

  from .autonotebook import tqdm as notebook_tqdm


/home/service/BioLlama/utilities/finetuning/artifacts/alpaca_gpt4_splitted:v8


In [6]:
alpaca_ds

DatasetDict({
    train: Dataset({
        features: ['instruction', 'input', 'output'],
        num_rows: 51002
    })
    test: Dataset({
        features: ['instruction', 'input', 'output'],
        num_rows: 1000
    })
})

Let's log the dataset also as a table so we can inspect it on the workspace.

In [7]:
def prompt_no_input(row):
    return ("Below is an instruction that describes a task. "
            "Write a response that appropriately completes the request.\n\n"
            "### Instruction:\n{instruction}\n\n### Response:\n{output}").format_map(row)

In [8]:
def prompt_input(row):
    return ("Below is an instruction that describes a task, paired with an input that provides further context. "
            "Write a response that appropriately completes the request.\n\n"
            "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n{output}").format_map(row)

In [9]:
def create_prompt(row):
    return prompt_no_input(row) if row["input"] == "" else prompt_input(row)

In [10]:
train_dataset = alpaca_ds["train"]
eval_dataset = alpaca_ds["test"]

In [11]:
import torch
# from cti.transformers.transformers.src.transformers.models.auto import AutoModelForCausalLM, AutoTokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer

In [12]:
model_id = 'meta-llama/Llama-2-7b-hf'

In [13]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
    low_cpu_mem_usage=True,
    torch_dtype=torch.bfloat16,
)

Loading checkpoint shards:   0%|                                                                                                                                                                                       | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:  50%|███████████████████████████████████████████████████████████████████████████████████████▌                                                                                       | 1/2 [00:01<00:01,  1.70s/it]

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.08s/it]

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.17s/it]




Training the full models is expensive, but if you have a GPU that can fit the full model, you can skip this part. Let's just train the last 8 layers of the model (Llama2-7B has 32)

In [14]:
n_freeze = 15

# freeze layers (disable gradients)
for param in model.parameters(): param.requires_grad = False
for param in model.lm_head.parameters(): param.requires_grad = True
for param in model.model.layers[n_freeze].parameters(): param.requires_grad = True

In [15]:
# Just freeze embeddings for small memory decrease
model.model.embed_tokens.weight.requires_grad_(False);

In [16]:
def param_count(m):
    params = sum([p.numel() for p in m.parameters()])/1_000_000
    trainable_params = sum([p.numel() for p in m.parameters() if p.requires_grad])/1_000_000
    print(f"Total params: {params:.2f}M, Trainable: {trainable_params:.2f}M")
    return params, trainable_params

params, trainable_params = param_count(model)

Total params: 6738.42M, Trainable: 333.46M


In [17]:
import torch
print(torch.__version__)

1.13.1+cu117


In [18]:
# !pip uninstall transformers -y
!pip install transformers
!pip install -i https://pip.repos.neuron.amazonaws.com transformers-neuronx









Looking in indexes: https://pip.repos.neuron.amazonaws.com


Collecting transformers-neuronx
  Using cached https://pip.repos.neuron.amazonaws.com/transformers-neuronx/transformers_neuronx-0.9.474-py3-none-any.whl (194 kB)


Collecting torch-neuronx (from transformers-neuronx)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.1.1.13.0-py3-none-any.whl (2.5 MB)


INFO: pip is looking at multiple versions of torch-neuronx to determine which version is compatible with other requirements. This could take a while.


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.1.1.12.1-py3-none-any.whl (2.5 MB)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.1.1.12.0-py3-none-any.whl (2.5 MB)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.1.1.11.0-py3-none-any.whl (2.4 MB)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.1.1.10.1-py3-none-any.whl (2.4 MB)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.1.1.10.0-py3-none-any.whl (2.4 MB)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.1.1.9.1-py3-none-any.whl (2.4 MB)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.1.1.9.0-py3-none-any.whl (2.4 MB)
INFO: pip is still looking at multiple versions of torch-neuronx to determine which version is compatible with other requirements. This could take a while.
  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.1.1.8.0-py3-none-any.whl (1.5 MB)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.1.1.7.0-py3-none-any.whl (1.5 MB)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.0.1.6.1-py3-none-any.whl (1.5 MB)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.0.1.6.0-py3-none-any.whl (1.5 MB)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.0.1.5.0-py3-none-any.whl (1.5 MB)
  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.13.0.1.4.0-py3-none-any.whl (1.4 MB)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.12.0.1.4.0-py3-none-any.whl (1.4 MB)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.11.0.1.2.0-py3-none-any.whl (55 kB)


Collecting torch-neuron~=1.11.0 (from torch-neuronx->transformers-neuronx)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuron/torch_neuron-1.11.0.2.9.17.0-py3-none-linux_x86_64.whl (38.4 MB)


Collecting torch-neuronx (from transformers-neuronx)


  Using cached https://pip.repos.neuron.amazonaws.com/torch-neuronx/torch_neuronx-1.11.0.1.1.1-py3-none-any.whl (55 kB)


Collecting transformers-neuronx


  Using cached https://pip.repos.neuron.amazonaws.com/transformers-neuronx/transformers_neuronx-0.8.268-py3-none-any.whl (154 kB)


  Using cached https://pip.repos.neuron.amazonaws.com/transformers-neuronx/transformers_neuronx-0.7.216-py3-none-any.whl (158 kB)


  Using cached https://pip.repos.neuron.amazonaws.com/transformers-neuronx/transformers_neuronx-0.7.84-py3-none-any.whl (150 kB)


  Using cached https://pip.repos.neuron.amazonaws.com/transformers-neuronx/transformers_neuronx-0.6.106-py3-none-any.whl (142 kB)


  Using cached https://pip.repos.neuron.amazonaws.com/transformers-neuronx/transformers_neuronx-0.5.58-py3-none-any.whl (120 kB)


  Using cached https://pip.repos.neuron.amazonaws.com/transformers-neuronx/transformers_neuronx-0.4.149-py3-none-any.whl (119 kB)


  Using cached https://pip.repos.neuron.amazonaws.com/transformers-neuronx/transformers_neuronx-0.4.60-py3-none-any.whl (91 kB)


  Using cached https://pip.repos.neuron.amazonaws.com/transformers-neuronx/transformers_neuronx-0.3.32-py3-none-any.whl (71 kB)


[31mERROR: Cannot install accelerate==0.26.1 and transformers-neuronx because these package versions have conflicting dependencies.[0m[31m
[0m
The conflict is caused by:
    torch-neuronx 1.13.1.1.13.0 depends on torch-xla==1.13.1+torchneurond
    torch-neuronx 1.13.1.1.12.1 depends on torch-xla==1.13.1+torchneuronc
    torch-neuronx 1.13.1.1.12.0 depends on torch-xla==1.13.1+torchneuronc
    torch-neuronx 1.13.1.1.11.0 depends on torch-xla==1.13.1+torchneuronb
    torch-neuronx 1.13.1.1.10.1 depends on torch-xla==1.13.1+torchneurona
    torch-neuronx 1.13.1.1.10.0 depends on torch-xla==1.13.1+torchneurona
    torch-neuronx 1.13.1.1.9.1 depends on torch-xla==1.13.1+torchneuron8
    torch-neuronx 1.13.1.1.9.0 depends on torch-xla==1.13.1+torchneuron8
    torch-neuronx 1.13.1.1.8.0 depends on torch-xla==1.13.1+torchneuron7
    torch-neuronx 1.13.1.1.7.0 depends on torch-xla==1.13.1+torchneuron6
    torch-neuronx 1.13.0.1.6.1 depends on torch-xla==1.13.0+torchneuron5
   

In [19]:
# from transformers import TrainingArguments
from trl import SFTTrainer

In [20]:
batch_size = 32

total_num_steps = 11_210 // batch_size
print(total_num_steps)
print("changing total batch size down to 100 to save time")

350
changing total batch size down to 100 to save time


In [21]:
from cti.transformers.transformers.src.transformers import TrainingArguments

In [22]:
# !pip uninstall torch_xla -y
# !pip install -i https://pip.repos.neuron.amazonaws.com torch-xla

In [23]:
output_dir = "/home/service/BioLlama/utilities/finetuning/finetuned_models/"
training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size//4,
    bf16=True,
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_steps=total_num_steps // 10,
    # num_train_epochs=1,
    max_steps=total_num_steps,
    gradient_accumulation_steps=1,
    gradient_checkpointing=True,
    evaluation_strategy="steps",
    eval_steps=total_num_steps // 3,
    # logging strategies
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=1,
    save_strategy="epoch", #changed to epoch so we save every epoch i guess?
)

In [24]:
# from utils import LLMSampleCB, token_accuracy

trainer = SFTTrainer(
    model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    packing=True,
    max_seq_length=1024,
    args=training_args,
    formatting_func=create_prompt,
    # compute_metrics=token_accuracy,
)



In [25]:
# remove answers
def create_prompt_no_anwer(row):
    row["output"] = ""
    return {"text": create_prompt(row)}

test_dataset = eval_dataset.map(create_prompt_no_anwer)

In [26]:
# wandb_callback = LLMSampleCB(trainer, test_dataset, num_samples=10, max_new_tokens=256)

In [27]:
# trainer.add_callback(wandb_callback)

In [28]:
trainer.train()
wandb.finish()

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...




Step,Training Loss,Validation Loss


In [None]:
import os
print(os.path.abspath(output_dir))

In [None]:
trainer.save_model(output_dir)
#print contents of output_dir
!ls -l $output_dir
#print full path of output_dir
# !pwd $output_dir