<a href="https://colab.research.google.com/github/EdBerg21/AI-Professional-Prompts/blob/main/3excellentADAPTERSFORLLAMA209py_of_uhuuuullama2Finetune_opt_bnb_peft.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Fine-tune large models using 🤗 `peft` adapters, `transformers` & `bitsandbytes`

In this tutorial we will cover how we can fine-tune large language models using the very recent `peft` library and `bitsandbytes` for loading large models in 8-bit.
The fine-tuning method will rely on a recent method called "Low Rank Adapters" (LoRA), instead of fine-tuning the entire model you just have to fine-tune these adapters and load them properly inside the model.
After fine-tuning the model you can also share your adapters on the 🤗 Hub and load them very easily. Let's get started!

### Install requirements

First, run the cells below to install the requirements:

In [56]:
!pip install -q bitsandbytes datasets accelerate loralib
!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


### Model loading

Here let's load the `opt-6.7b` model, its weights in half-precision (float16) are about 13GB on the Hub! If we load them in 8-bit we would require around 7GB of memory instead.

In [2]:
!huggingface-cli login --token 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

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [4]:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
     "meta-llama/Llama-2-7b-chat-hf",
     load_in_8bit=True,
     device_map='auto',
)

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Post-processing on the model

Finally, we need to apply some post-processing on the 8-bit model to enable training, let's freeze all our layers, and cast the layer-norm in `float32` for stability. We also cast the output of the last layer in `float32` for the same reasons.

In [None]:
for param in model.parameters():
  param.requires_grad = False  # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()  # reduce number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

### Apply LoRA

Here comes the magic with `peft`! Let's load a `PeftModel` and specify that we are going to use low-rank adapters (LoRA) using `get_peft_model` utility function from `peft`.

In [53]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [51]:
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 8388608 || all params: 6746804224 || trainable%: 0.12433454005023165


### Training

In [None]:
torch.cuda.empty_cache()

In [None]:
torch.cuda.memory_summary()



In [50]:
import torch
print(torch.rand(1, device="cuda"))

tensor([0.4765], device='cuda:0')


In [None]:
!nvidia-smi

Sun Jan 14 11:53:29 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   73C    P0              31W /  70W |  15097MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [None]:
import gc

gc.collect()

torch.cuda.empty_cache()

In [None]:
import torch
foo = torch.tensor([1,2,3])
foo = foo.to('cuda')

In [43]:
import torch
torch.cuda.empty_cache()

In [45]:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:64"

In [None]:
!watch nvidia-smi

[?1l>

In [41]:
import os
os.environ['CUDA_VISIBLE_DEVICES']='2, 3'

In [None]:
!ps -elf | grep python

4 Z root          58       7  0  80   0 -     0 -      09:42 ?        00:00:05 [python3] <defunct>
4 S root          59       7  0  80   0 - 16611 do_epo 09:42 ?        00:00:00 python3 /usr/local/bi
4 S root         105       7  0  80   0 - 91279 do_epo 09:42 ?        00:00:12 /usr/bin/python3 /usr
4 S root         256     105 46  80   0 - 7251399 do_sel 09:43 ?      01:07:45 /usr/bin/python3 -m c
1 S root         301       1  0  80   0 - 135331 futex_ 09:43 ?       00:00:19 /usr/bin/python3 /usr
0 S root       37143   37141  0  80   0 -  1621 pipe_r 12:08 ?        00:00:00 grep python


In [None]:
!kill -9 256

In [None]:
import transformers
from datasets import load_dataset
data = load_dataset("domenicrosati/TruthfulQA")
data = data.map(lambda samples: tokenizer(samples['Question']), batched=True)
tokenizer.pad_token = tokenizer.eos_token
trainer = transformers.Trainer(
    model=model,
    train_dataset=data['train'],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4,        gradient_accumulation_steps=4,
        warmup_steps=200,
        max_steps=300,
        learning_rate=2e-4,
        num_train_epochs=5,
        fp16=True,
        logging_steps=1,
        output_dir='outputs'
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

Downloading readme:   0%|          | 0.00/3.08k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/476k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/817 [00:00<?, ? examples/s]

## Share adapters on the 🤗 Hub

model.push_to_hub("ybelkada/opt-6.7b-lora")

In [None]:
trainer.model.push_to_hub(
    repo_id="EdBerg/ALlama-2-7B"
)

CommitInfo(commit_url='https://huggingface.co/EdBerg/ALlama-2-7B/commit/86e486f0e3f312099b97d23789c0585aab352e90', commit_message='Upload model', commit_description='', oid='86e486f0e3f312099b97d23789c0585aab352e90', pr_url=None, pr_revision=None, pr_num=None)

## Load adapters from the Hub

You can also directly load adapters from the Hub using the commands below:

In [None]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = "EdBerg/ALlama-2-7B"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)

adapter_config.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/33.6M [00:00<?, ?B/s]

## Inference

You can then directly use the trained model or the model that you have loaded from the 🤗 Hub for inference as you would do it usually in `transformers`.

In [None]:
batch = tokenizer("Two things are infinite: ", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=500)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))



 Two things are infinite:  the universe and human stupidity; and I'm not sure about the universe.
--Einstein
The more I learn, the more I realize how much I don't know.
--Socrates
The more you know, the more you realize you don't know.
--Sophocles
I don't know what will be my last words, but I do know this: I'm going to die.
--Woody Allen
It is impossible to make a good first impression with a bad last impression.
--Anonymous
I don't know the key to success, but the key to failure is trying to please everybody.
--Bill Cosby
It's not the size of the dog in the fight, it's the size of the fight in the dog.
--Mark Twain
The only thing necessary for the triumph of evil is for good men to do nothing.
--Edmund Burke
The only thing we have to fear is fear itself.
--Franklin D. Roosevelt
The only thing that interferes with my learning is my education.
--Albert Einstein
The only thing we know is that we know nothing.
--Socrates
The only thing that is constant is change.
--Heraclitus
The only 

In [None]:
batch = tokenizer("If you judge prople", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=500)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))



 If you judge prople, you have no time to love them.

- Rabbi Tarfon


In [None]:
batch = tokenizer("I'm not upset", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=500)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))



 I'm not upset that you lied to me, I'm upset that from now on I can't believe you.
I'm not upset that you lied to me, I'm upset that from now on I can't believe you.
I'm not upset that you lied to me, I'm upset that from now on I can't believe you.
I'm not upset that you lied to me, I'm upset that from now on I can't believe you.
I'm not upset that you lied to me, I'm upset that from now on I can't believe you.
I'm not upset that you lied to me, I'm upset that from now on I can't believe you.
I'm not upset that you lied to me, I'm upset that from now on I can't believe you.
I'm not upset that you lied to me, I'm upset that from now on I can't believe you.
I'm not upset that you lied to me, I'm upset that from now on I can't believe you.
I'm not upset that you lied to me, I'm upset that from now on I can't believe you.
I'm not upset that you lied to me, I'm upset that from now on I can't believe you.
I'm not upset that you lied to me, I'm upset that from now on I can't believe you.
I

In [None]:
batch = tokenizer("A friend is someone", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=500)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))



 A friend is someone who knows the song in your heart and can sing it back to you when you have forgotten the words. A friend is someone who knows your secrets and won't tell them to anyone else. A friend is someone who understands your problems, but refuses to let you solve them alone. A friend is someone who can see the good in you even when you can't see it in yourself. A friend is someone who will stand by you when everyone else has given up on you. A friend is someone who will stay with you when everyone else has gone away. A friend is someone who will always be there to help you up when you fall. A friend is someone who will always be there to listen when you need someone to talk to. A friend is someone who will always be there to help you, even when you don't ask for it. A friend is someone who will always be there to help you, even when you don't want to be helped. A friend is someone who will always be there to help you, even when you don't know that you need help. A friend 

In [None]:
batch = tokenizer("A room without", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=500)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))



 A room without books is like a body without a soul. - Marcus Tullius Cicero
A good book is like a good friend. You can always count on it. - Unknown
A good book is a magic carpet on which we are carried to far-off lands. - W. D. Richards
A book is a dream that you hold in your hand. - Neil Gaiman
A book is like a garden carried in the pocket. - Chinese Proverb
A book is a garden carried in the pocket. - Chinese Proverb
A book is a tree, which bears the fruit of knowledge. - Arab Proverb
A book is a tree of which the roots grow in the earth and the branches toward the sky. - Amos Bronson Alcott
A book is a gift you can open again and again. - Garrison Keillor
A book is a gift you can open again and again. - Garrison Keillor
A book is a gift you can open again and again. - Robert Breault
A book is a gift you can open again and again. - Robert Brault
A book is a gift you can open again and again. - Robert Byrne
A book is a gift you can open again and again. - Robert D. Richardson
A boo

As you can see by fine-tuning for few steps we have almost recovered the quote from Albert Einstein that is present in the [training data](https://huggingface.co/datasets/Abirate/english_quotes).