## Fine-tune large models using 🤗 `peft` adapters, `transformers` & `bitsandbytes`

In this tutorial we will cover how we can fine-tune large language models using the very recent `peft` library and `bitsandbytes` for loading large models in 8-bit.
The fine-tuning method will rely on a recent method called "Low Rank Adapters" (LoRA), instead of fine-tuning the entire model you just have to fine-tune these adapters and load them properly inside the model. 
After fine-tuning the model you can also share your adapters on the 🤗 Hub and load them very easily. Let's get started!

### Install requirements

First, run the cells below to install the requirements:

In [1]:
import pandas as pd

In [11]:
!pip install -q bitsandbytes datasets accelerate loralib

!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [2]:
import evaluation_metrics
import datasets
from datasets import load_dataset
from datasets import Dataset

  def splitDocuments(docs: pd.Series()) -> list():


In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
df_wiki = pd.read_csv('/Users/alealcoforado/Documents/Projetos/Datasets/wikipedia/wikipedia_ptbr.csv')


In [None]:
df_wiki

In [32]:
(df_wiki.loc[0]['text'].split("\n"))

['',
 'Astronomia',
 '',
 'Astronomia é uma ciência natural que estuda corpos celestes (como estrelas, planetas, cometas, nebulosas, aglomerados de estrelas, galáxias) e fenômenos que se originam fora da atmosfera da Terra (como a radiação cósmica de fundo em micro_ondas).',
 'Preocupada com a evolução, a física e a química de objetos celestes, bem como a formação e o desenvolvimento do universo.',
 'A astronomia é uma das mais antigas ciências.',
 'Culturas pré_históricas deixaram registrados vários artefatos astronômicos, como Stonehenge, os montes de Newgrange e os menires.',
 'As primeiras civilizações, como os babilônios, gregos, chineses, indianos, persas e maias realizaram observações metódicas do céu noturno.',
 'No entanto, a invenção do telescópio permitiu o desenvolvimento da astronomia moderna.',
 'Historicamente, a astronomia incluiu disciplinas tão diversas como astrometria, navegação astronômica, astronomia observacional e a elaboração de calendários.',
 'Durante o perío

In [36]:
import pandas as pd
import re

def split_paragraphs(df):
    paragraphs = []
    for index, row in df.iterrows():
        article = row['text']
        article_paragraphs = article.split('\n')
        for paragraph in article_paragraphs:
            # remove any sentences that only contain special characters
            paragraph = re.sub(r'^\W+$', '', paragraph)
            # remove any sentences with 4 or fewer words
            words = paragraph.split()
            if len(words) > 4:
                paragraph = paragraph.strip()
                paragraphs.append((row['title'], paragraph))
    return pd.DataFrame(paragraphs, columns=['title', 'paragraph'])

df_wiki_open = split_paragraphs(df_wiki)

In [41]:
df_wiki_open['paragraph']

0           Astronomia é uma ciência natural que estuda co...
1           Preocupada com a evolução, a física e a químic...
2               A astronomia é uma das mais antigas ciências.
3           Culturas pré_históricas deixaram registrados v...
4           As primeiras civilizações, como os babilônios,...
                                  ...                        
18317890    A equipe é formada pelos melhores profissionai...
18317891    Além disso, a Life academias também investe em...
18317892    A rede oferece uma ampla gama de serviços e at...
18317893    Com uma abordagem personalizada e uma equipe a...
18317894    Venha fazer parte da nossa família e descobrir...
Name: paragraph, Length: 18317895, dtype: object

: 

In [38]:
data

NameError: name 'data' is not defined

### Model loading

Here let's load the `opt-6.7b` model, its weights in half-precision (float16) are about 13GB on the Hub! If we load them in 8-bit we would require around 7GB of memory instead.

In [21]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM
import pandas as pd
import datasets_handler

quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)

device_map = {
    "transformer.word_embeddings": 0,
    "transformer.word_embeddings_layernorm": 0,
    "lm_head": "cpu",
    "transformer.h": 0,
    "transformer.ln_f": 0,
}


In [53]:

# model = AutoModelForCausalLM.from_pretrained(
#     "facebook/opt-1.3b", 
#     load_in_8bit=True, 
#     device_map='auto',
#     # load_in_8bit_fp32_cpu_offload=True,
#     # device_map='cpu'
#     # device_map=device_map,
#     # quantization_config=quantization_config,
# )

tokenizer = AutoTokenizer.from_pretrained("facebook/opt-1.3b")

Downloading (…)okenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

### Load dataset

In [24]:
import transformers
from datasets import load_dataset
# data = load_dataset("Abirate/english_quotes")
# data

Using custom data configuration Abirate--english_quotes-6e72855d06356857
Found cached dataset json (/Users/alealcoforado/.cache/huggingface/datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)


  0%|          | 0/1 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['quote', 'author', 'tags'],
        num_rows: 2508
    })
})

In [None]:
# data['train']['quote']

In [2]:
import pandas as pd
import datasets_handler

  def splitDocuments(docs: pd.Series()) -> list():


In [46]:
# which_dataset = 'folhauol' 
# which_dataset = 'bbc-news'
# which_dataset = 'ag_news'
# which_dataset = 'imdb'
which_dataset = 'ml'

hyp_template = "{}"
# hyp_template = "O tema principal deste texto é {}."
# hyp_template = "this text is about {}."
# hyp_template = "this article is about {}."

raw_data = datasets_handler.getDataset(which_dataset)

In [50]:
raw_data = raw_data.astype(str)
train_dataset = Dataset.from_dict(raw_data)

dataset_dict = datasets.DatasetDict({"train":train_dataset})
dataset = dataset_dict
dataset


In [55]:
data = dataset.map(lambda samples: tokenizer(samples['pista_raw']), batched=True)
data


  0%|          | 0/2 [00:00<?, ?ba/s]

DatasetDict({
    train: Dataset({
        features: ['NÚMERO', 'pista_raw', 'autor_raw', 'macro_raw', 'micro_raw', 'explicação_raw', 'ml_dataset_index', 'interna_raw', 'tem_texto?', 'tem_explicação?', 'tem_macro?', 'é_papel?', 'ano', 'revisar?', 'link', '2017_respostas', '2016_observaçao', '2015_dificuldade', 'input_ids', 'attention_mask'],
        num_rows: 1653
    })
})

In [62]:
(data['train']['pista_raw'])

['やめて！これが私の家, え? そうではない!',
 'Os presos andam um atrás do outro',
 'Foi de divindade a um videogame',
 'A matemática de Leonardo também funciona ao contrário, porém o problema é saber isso enquanto fico com a dúvida: jogar ou ler',
 'Previno o M80 se ingerido em frente ao foxtrot almejado pelos alquimistas',
 'O meu irmão vive junto ao descobridor, já eu tenho uma grande paixão pelo mar',
 'Se o peixe do sr. hacker fosse nativo brasileiro ele entenderia ecgh>iecfaei',
 'Entre o marechal e a república, ele nos deu a obra de sua linha do seu ritmo e vibração',
 'Olhando a família e educação é notável que uma Boa Ideia do MEC passou a infância e adolescência entre a capital e a bela chuva',
 'casoEnsolarado',
 'Zero a esquerda e 1308 donas? Menos... com um zero a esquerda só cinquenta bastam',
 'UYTGHJMNB\n WSXDCFT\n BVCDFGFDERT\n VFRTHBV\n CXZASDSAQWE\n CFTGBHU',
 'Espalhados pela cidade, onde todos se juntam: RP, RM, LP, LG, LMR, MM, AP, EML, LP, ABUC, OMP',
 'eduroam',
 'A escola nos in

### Post-processing on the model

Finally, we need to apply some post-processing on the 8-bit model to enable training, let's freeze all our layers, and cast the layer-norm in `float32` for stability. We also cast the output of the last layer in `float32` for the same reasons.

In [None]:
for param in model.parameters():
  param.requires_grad = False  # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()  # reduce number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

### Apply LoRA

Here comes the magic with `peft`! Let's load a `PeftModel` and specify that we are going to use low-rank adapters (LoRA) using `get_peft_model` utility function from `peft`.

In [63]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [64]:
from peft import LoraConfig, get_peft_model 

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

NameError: name 'model' is not defined

### Training

In [None]:
trainer = transformers.Trainer(
    model=model, 
    train_dataset=data['train'],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4, 
        gradient_accumulation_steps=4,
        warmup_steps=100, 
        max_steps=200, 
        learning_rate=2e-4, 
        fp16=True,
        logging_steps=1, 
        output_dir='outputs'
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

Downloading readme:   0%|          | 0.00/5.55k [00:00<?, ?B/s]

Downloading and preparing dataset json/Abirate--english_quotes to /root/.cache/huggingface/datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/647k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/Abirate___json/Abirate--english_quotes-6e72855d06356857/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

Map:   0%|          | 0/2508 [00:00<?, ? examples/s]

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
1,2.3644
2,2.2004
3,2.3007
4,2.1847
5,1.8771
6,2.3068
7,2.1941
8,2.4425
9,2.4613
10,2.0232


## Share adapters on the 🤗 Hub

In [None]:
from huggingface_hub import notebook_login

notebook_login()

Token is valid.
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
model.push_to_hub("ybelkada/opt-6.7b-lora", use_auth_token=True)

Uploading the following files to ybelkada/opt-6.7b-lora: adapter_config.json,adapter_model.bin


Upload 1 LFS files:   0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.bin:   0%|          | 0.00/33.6M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/ybelkada/opt-6.7b-lora/commit/6f240b184e666b54a51b3fe482e4711448e6c751', commit_message='Upload model', commit_description='', oid='6f240b184e666b54a51b3fe482e4711448e6c751', pr_url=None, pr_revision=None, pr_num=None)

## Load adapters from the Hub

You can also directly load adapters from the Hub using the commands below:

In [None]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = "ybelkada/opt-6.7b-lora"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)


Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 112
CUDA SETUP: Loading binary /usr/local/lib/python3.8/dist-packages/bitsandbytes/libbitsandbytes_cuda112.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)"adapter_model.bin";:   0%|          | 0.00/33.6M [00:00<?, ?B/s]

## Inference

You can then directly use the trained model or the model that you have loaded from the 🤗 Hub for inference as you would do it usually in `transformers`.

In [None]:
batch = tokenizer("Two things are infinite: ", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=50)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))





 Two things are infinite:  the universe and human stupidity; and I'm not sure about the universe.  -Albert Einstein
I'm not sure about the universe either.


As you can see by fine-tuning for few steps we have almost recovered the quote from Albert Einstein that is present in the [training data](https://huggingface.co/datasets/Abirate/english_quotes).