# **Llama Fine Tuning**

Reading Source: https://www.datacamp.com/tutorial/fine-tuning-llama-2

In [1]:
%%capture
%pip install accelerate peft bitsandbytes transformers trl

In [2]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig
from trl import SFTTrainer

In [3]:
# Model from Hugging Face hub
base_model = "NousResearch/Llama-2-7b-chat-hf"

# New instruction dataset
guanaco_dataset = "mlabonne/guanaco-llama2-1k"

# Fine-tuned model
new_model = "llama-2-7b-chat-guanaco"

## Loading dataset, model, and tokenizer

In [4]:
dataset = load_dataset(guanaco_dataset, split="train")

README.md:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

(â€¦)-00000-of-00001-9ad84bb9cf65a42f.parquet:   0%|          | 0.00/967k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [5]:
# Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/parquet/mlabonne--guanaco-llama2-1k-f1f1134768f90029/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901. Subsequent calls will reuse this data.

## 4-bit quantization configuration

In [6]:
compute_dtype = getattr(torch, "float16")

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)

## Loading Llama 2 model

In [7]:
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1

config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

## Loading tokenizer

In [8]:
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

## PEFT (Parameter efficient fine tuning) parameters

In [9]:
peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

## Training parameters

Hyperparameters List:

output_dir: The output directory is where the model predictions and checkpoints will be stored.

num_train_epochs: One training epoch.

fp16/bf16: Disable fp16/bf16 training.

per_device_train_batch_size: Batch size per GPU for training.

per_device_eval_batch_size: Batch size per GPU for evaluation.

gradient_accumulation_steps: This refers to the number of steps required to accumulate the gradients during the update process.

gradient_checkpointing: Enabling gradient checkpointing.

max_grad_norm: Gradient clipping.

learning_rate: Initial learning rate.

weight_decay: Weight decay is applied to all layers except bias/LayerNorm weights.

Optim: Model optimizer (AdamW optimizer).

lr_scheduler_type: Learning rate schedule.

max_steps: Number of training steps.

warmup_ratio: Ratio of steps for a linear warmup.

group_by_length: This can significantly improve performance and accelerate the training process.

save_steps: Save checkpoint every 25 update steps.

logging_steps: Log every 25 update steps.

In [13]:
training_params = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="tensorboard"
)

In [14]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_params,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [16]:
trainer.train()

Step,Training Loss
25,1.4252
50,1.5941
75,1.2517
100,1.5128
125,1.2275
150,1.4759
175,1.073
200,1.4942
225,1.1821
250,1.5537


TrainOutput(global_step=1000, training_loss=1.308550672531128, metrics={'train_runtime': 2419.589, 'train_samples_per_second': 0.413, 'train_steps_per_second': 0.413, 'total_flos': 1.679542884421632e+16, 'train_loss': 1.308550672531128, 'epoch': 1.0})

In [18]:
trainer.model.save_pretrained(new_model)
trainer.tokenizer.save_pretrained(new_model)

Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


('llama-2-7b-chat-guanaco/tokenizer_config.json',
 'llama-2-7b-chat-guanaco/special_tokens_map.json',
 'llama-2-7b-chat-guanaco/tokenizer.model',
 'llama-2-7b-chat-guanaco/added_tokens.json',
 'llama-2-7b-chat-guanaco/tokenizer.json')

## Evaluation

In [21]:
from tensorboard import notebook
log_dir = "results/runs"
notebook.start("--logdir {} --port 4000".format(log_dir))

Reusing TensorBoard on port 4000 (pid 162), started 0:00:40 ago. (Use '!kill 162' to kill it.)

In [22]:
logging.set_verbosity(logging.CRITICAL)

prompt = "Who is Leonardo Da Vinci?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

<s>[INST] Who is Leonardo Da Vinci? [/INST] Leonardo da Vinci (1452-1519) was a renowned Italian polymath, artist, inventor, engineer, and scientist. He is widely regarded as one of the most influential figures of the Renaissance and is known for his contributions to various fields, including painting, sculpture, anatomy, mathematics, engineering, and architecture.

Da Vinci was born in Vinci, Italy, and was trained in art by his father, a local artist. He later moved to Florence, where he became a member of the Medici court and became known for his work as a painter, sculptor, and engineer. His most famous works include the Mona Lisa and The Last Supper.

Da Vinci was also a prolific inventor and engineer, and his designs for machines and machines have been studied and used to this day. He


In [23]:
prompt = "What is Datacamp Career track?"
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

<s>[INST] What is Datacamp Career track? [/INST] DataCamp Career Track is a program designed to help individuals gain the skills and knowledge necessary to succeed in a career in data science. The program includes a range of courses and resources, including interactive coding exercises, video lectures, and hands-on projects. Additionally, participants have access to a community of learners and mentors, as well as a range of career support services. The program is designed to be flexible and adaptable to the needs of each individual, allowing learners to set their own pace and focus on the areas of data science that are most relevant to their career goals.

The DataCamp Career Track is designed to help individuals gain the skills and knowledge necessary to succeed in a career in data science, including data analysis, data visualization, machine learning, and data engineering. The program is open to individuals of all skill levels and backgrounds, and learners can choose to
