<a href="https://colab.research.google.com/github/ccrader/python_projects/blob/main/Fine_Tuning_Code_Along_Tuning_Llama_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -q -U transformers datasets accelerate peft trl bitsandbytes wandb

In [2]:
from google.colab import userdata

hf_token = userdata.get('huggingface')

In [3]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    AutoTokenizer,
    TrainingArguments,
    pipeline
)
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training
from trl import SFTTrainer

In [4]:
# We will be training only added weights instead of retraining the whole model
#

base_model = 'NousResearch/Llama-2-7b-hf'
new_model = 'llama-2-7b-miniplatypus'

dataset = load_dataset('ccrader/mini-platypus', split='train')
tokenizer = AutoTokenizer.from_pretrained(base_model, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token #end of sentence tokenizer
tokenizer.padding_side = 'right'

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [5]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']
)

# Load base moodel
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map={"": 0}
)

model = prepare_model_for_kbit_training(model)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [6]:
training_arguments = TrainingArguments(
    output_dir='.\results',
    num_train_epochs=4,
    per_device_train_batch_size=10,
    gradient_accumulation_steps = 1,
    evaluation_strategy='steps',
    eval_steps=1000,
    logging_steps=1,
    optim='paged_adamw_8bit',
    learning_rate=2e-4,
    lr_scheduler_type='linear',
    warmup_steps=10,
    report_to='wandb'
    ,max_steps=20 #remove for real training, this is to be time efficient
)

trainer= SFTTrainer(
    model=model,
    train_dataset=dataset,
    eval_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field='instruction',
    max_seq_length=512,
    tokenizer=tokenizer,
    args=training_arguments
)

#Train model
trainer.train()

trainer.model.save_pretrained(new_model)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
max_steps is given, it will override any value given in num_train_epochs
[34m[1mwandb[0m: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mcrader_28[0m ([33mcrader_28-n-a[0m). Use [1m`wandb login --relogin`[0m to force relogin


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss,Validation Loss


In [7]:
prompt = 'What is a large language model?'
instruction = f'### Instruction:\n{prompt}\n\n### Response:\n'
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=128)
result = pipe(instruction)
print(result[0]['generated_text'][len(instruction):])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
  return fn(*args, **kwargs)
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)



A large language model is a neural network that has been trained on a large corpus of text data, typically millions or billions of words. These models are capable of generating human-like text, responding to questions, and even generating creative works such as poetry and stories. Large language models are often used for tasks such as language translation, text summarization, and sentiment analysis.

### Response:

A large language model is a type of artificial intelligence model that has been trained on a large amount of


In [8]:
del model
del pipe
del trainer
import gc
gc.collect()
gc.collect()

0

In [9]:
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map={"": 0},
)
model = PeftModel.from_pretrained(model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [10]:
# Pushing model to HuggingFace

model.push_to_hub(new_model, use_temp_dir=False, token=hf_token)
tokenizer.push_to_hub(new_model, use_temp_dir=False, token=hf_token)

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/3.59G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/ccrader/llama-2-7b-miniplatypus/commit/f4443490333bd2afbd05761817b02671c4cdcb0f', commit_message='Upload tokenizer', commit_description='', oid='f4443490333bd2afbd05761817b02671c4cdcb0f', pr_url=None, pr_revision=None, pr_num=None)