# Finetuning LLM using QLoRA on WikiTRUE dataset

This notebook contains QLoRA finetuning on the WikiTRUE dataset (true information about our RAW agents extracted from Wikipedia), for the Harry Potter research paper implementation pipeline. 

## Setting up the SuperLLM model

In [1]:
!pip install numpy



In [2]:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets

We use the SuperLLM model.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "qu-bit/SuperLLM"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})

In [4]:
from peft import prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

In [5]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [6]:
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=8,
    lora_alpha=32,
#     target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 4194304 || all params: 3504607232 || trainable%: 0.11967971650867153


We are training only 0.11 % if the weights, which is super-efficent.

## Loading the Dataset

We now load our WikiTRUE dataset (true information about our RAW agents extracted from Wikipedia).

In [7]:
# Load your dataset
from datasets import load_dataset
data = load_dataset("text", data_files={"train": "/datsets/Wiki_TRUE.txt"}) #Replace with the path of the WikiTRUE dataset in your system

Generating train split: 0 examples [00:00, ? examples/s]

In [8]:
data

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 5372
    })
})

In [9]:
data = data.map(lambda samples: tokenizer(samples["text"]), batched=True)

Map:   0%|          | 0/5372 [00:00<?, ? examples/s]

## Output before training

In [13]:
device = "cuda:0"
from IPython.display import display, Markdown

def make_inference(text):
  batch = tokenizer(text, return_tensors='pt').to(device)

  output_tokens = model.generate(**batch, max_new_tokens=200)

  display(Markdown((tokenizer.decode(output_tokens[0], skip_special_tokens= True))))

Let's see the output before training the model.

In [11]:
import time
start = time.time()
make_inference("How many grand slams has Serena Williams won?")
end = time.time()
print(end - start)

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


How many grand slams has Serena Williams won?
 Unterscheidung between a grand slam and a major tennis tournament is important because a grand slam refers specifically to the four most prestigious tennis tournaments held annually around the world. These tournaments are the Australian Open, French Open, Wimbledon, and US Open.
Serena Williams has won 23 grand slam titles in her career, which is a record for the Open Era and second all-time behind Margaret Court, who won 24 grand slams in the 1960s and 1970s. Williams has won at least one grand slam title in each of the last 15 years, and has won at least one major title in each of the last 18 years.
Williams' first grand slam title came at the 1999 US Open, when she was just 17 years old. She went on to win her second grand slam title at the 2002 Australian Open, and has since won at least one grand slam title every year until 2017, when she was pregnant and did not compete in any grand slam tournaments.
In addition to her grand slam titles, Williams has also won 14 WTA titles and has been ranked as the number one player in the world on eight separate occasions. She is widely regarded as one of the greatest tennis players of all time, and her record-breaking grand slam titles and consistent dominance on the court have cemented her place in tennis history.

39.67527985572815


## Training

We train the model for 100 steps.

In [14]:
import transformers

tokenizer.pad_token = tokenizer.eos_token

trainer = transformers.Trainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=100,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

  self.scaler = torch.cuda.amp.GradScaler(**kwargs)
max_steps is given, it will override any value given in num_train_epochs
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ······································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································································

[34m[1mwandb[0m: [32m[41mERROR[0m API key must be 40 characters long, yours was 36028
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········································


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
1,2.7665
2,4.0882
3,2.1691
4,2.9582
5,2.6255
6,3.3257
7,4.6745
8,6.3801
9,2.3551
10,8.7672


TrainOutput(global_step=100, training_loss=2.9706010711193085, metrics={'train_runtime': 605.9814, 'train_samples_per_second': 0.66, 'train_steps_per_second': 0.165, 'total_flos': 1231015468400640.0, 'train_loss': 2.9706010711193085, 'epoch': 0.07446016381236038})

## Output after training

Now we test the output for the same question.

In [16]:
make_inference("How many grand slams Serena has won till date?")

How many grand slams Serena has won till date?
 everybody knows that Serena Williams has won 23 Grand Slam titles in singles, 14 in doubles, and 4 in mixed doubles.

How many Grand Slam titles has Serena Williams won in singles?
Serena Williams has won 23 Grand Slam titles in singles.

How many Grand Slam titles has Serena Williams won in doubles?
Serena Williams has won 14 Grand Slam titles in doubles.

How many Grand Slam titles has Serena Williams won in mixed doubles?
Serena Williams has won 4 Grand Slam titles in mixed doubles.

In [25]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [26]:
model.push_to_hub("reinforced-superllm")

adapter_model.safetensors:   0%|          | 0.00/16.8M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/AritraRay2005/reinforced-superllm/commit/523e63869117f51140f4fb5de6bcf4d20ed67e3b', commit_message='Upload model', commit_description='', oid='523e63869117f51140f4fb5de6bcf4d20ed67e3b', pr_url=None, pr_revision=None, pr_num=None)

We have successfully pushed our finetuned model to HuggingFaceHub as Reinforced-SuperLLM (https://huggingface.co/AritraRay2005/reinforced-superllm).

In [17]:
model.save_pretrained("reinforced_model")