> This code is a copy, original code was run on top of a Databricks cluster, python 3.10.12

Using the HuggingFace model at: [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)

In [None]:
!pip install -q datasets==2.16.0
!pip install -q trl
!pip install -q bitsandbytes
!pip install -q git+https://github.com/huggingface/transformers
!pip install -q peft
!pip install -q --upgrade accelerate
#==0.27.2
!pip install -q --upgrade torch torchvision

In [None]:
import torch
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, PeftModel
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, GenerationConfig)
from trl import SFTTrainer

In [None]:
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_compute_dtype = compute_dtype,
    bnb_4bit_use_double_quant = True,
)

In [None]:
model = AutoModelForCausalLM.from_pretrained(
        "tiiuae/falcon-7b-instruct",
        quantization_config = bnb_config,
        device_map = {"": 0},
        trust_remote_code = True
)

In [None]:
peft_config = LoraConfig(
    lora_alpha = 16,
    lora_dropout = 0.1,
    r = 64,
    bias = "none",
    task_type = "CAUSAL_LM",
    target_modules = [
        "query_key_value"
        # Other possible layers
        # "dense",
        # "dense_h_to_4h",
        # "dense_4h_to_h",
    ],
)

In [None]:
model.config.use_cache = False
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

In [None]:
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b-instruct", trust_remote_code = True)
tokenizer.pad_token = tokenizer.eos_token

In [None]:
import time
dir_path = '/dbfs/mnt/boidspoc/raw/research_stuff/LoRA/Tiiuae-falcon-7b-instruct-LoRA-2/'
model_name_is = f"peft-dialogue-summary-training-{str(int(time.time()))}___test"
output_dir = f'{dir_path}/{model_name_is}'

In [None]:
training_arguments = TrainingArguments(
    output_dir = output_dir,
    per_device_train_batch_size = 1,
    gradient_accumulation_steps = 4,
    optim = 'paged_adamw_32bit',
    save_steps = 500, #250
    fp16 = True,
    logging_steps = 100,
    learning_rate = 2e-4,
    max_grad_norm = 0.3,
    max_steps = 1000, # 10000
    warmup_ratio = 0.03,
    lr_scheduler_type = "constant",
)

In [None]:
model.config.use_cache = False
dataset = load_dataset("timdettmers/openassistant-guanaco", split = "train")

In [None]:
trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    peft_config = peft_config,
    dataset_text_field = "text",
    max_seq_length = 512,
    tokenizer = tokenizer,
    args = training_arguments,
    packing = True,
)

In [None]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

In [None]:
trainer.train() # Took 16h, loss started at 1.9 and finished at 0.8+-

In [None]:
model.save_pretrained(f"{output_dir}/output_dir")

## Inference

In [None]:
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
    load_in_4bit = True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_compute_dtype = compute_dtype,
    bnb_4bit_use_double_quant = True,
)

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    "tiiuae/falcon-7b-instruct", quantization_config = bnb_config, device_map = "auto", trust_remote_code = True
)

In [None]:
model = PeftModel.from_pretrained(model, f"{output_dir}/output_dir", local_files_only = True)

In [None]:
tok = AutoTokenizer.from_pretrained('tiiuae/falcon-7b-instruct')
tok.pad_token = tok.eos_token

In [None]:
prompt = "Write a 4chan style greentext about someone who loves the new romantic comedy movie, with an ironic twist that re-contextualizes the story at the end of it. It should start with '>be me"

In [None]:
peft_encoding = tok(prompt, return_tensors = "pt").to("cuda:0")
peft_outputs = model.generate(
  input_ids = peft_encoding.input_ids, 
  generation_config = GenerationConfig(
    max_new_tokens = 256, 
    pad_token_id = tok.eos_token_id, 
    eos_token_id = tok.eos_token_id, 
    attention_mask = peft_encoding.attention_mask, 
    temperature = 0.1, 
    top_p = 0.1, 
    repetition_penalty = 1.2, 
    num_return_sequences = 1,
  )
)
peft_text_output = tok.decode(peft_outputs[0], skip_special_tokens = True)

Printing `peft_text_output` resulted with the following final:

    '>be me ### Assistant: Hi there!\n\nI recently watched the new romantic comedy movie on 4chan. I must say, I was pleasantly surprised by the plot twists and turns. I especially enjoyed the way the story played with my expectations and made me question my assumptions.\n\nThe story centered around a guy named John who had just broken up with his ex-girlfriend. The movie followed John as he navigated his post-breakup blues and tried to find himself again. Along the way, he met a woman named Sarah, who swept him off his feet.\n\nAt first glance, it seemed like a typical romcom scenario. But as the story progressed, things started to get interesting. John discovered that Sarah was actually an escort working out of Las Vegas. This revelation completely changed the game for him and opened up a whole new world of possibilities.\n\nJohn quickly realized that he could make a lot of money by exploiting this situation. He started advertising himself as an escort on social media and began charging clients for dates and companionship. Before long, he was making more money than he ever had before and living a life of luxury.\n\nBut then something unexpected happened.'

For sure it could be better, but this is just an example in order to figure out how to use the Peft library with LoRA/QLoRA