This notebook goes over a qlora example with a baby LLM (and my baby GPU) to get a feel for CUDA quantization

In [1]:
# custom utility functions
from src.torch_utils import gpu_summary, clear_gpu
from src.quantize import quantize, unquantize

In [2]:
import torch
import numpy as np
import gc
import pandas as pd
from transformers import pipeline
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from datasets import load_dataset, load_dataset_builder

model_name = 'bigscience/bloomz-560m'

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

foundation_model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    config=bnb_config,
    device_map = {"": 0}
    )

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

  from .autonotebook import tqdm as notebook_tqdm


In [8]:
input = tokenizer("How many wheels does a car have?", return_tensors='pt').to('cuda')
tokenized_output = foundation_model.generate(
    input_ids=input['input_ids'],
    attention_mask=input['attention_mask'],
    max_new_tokens=100,
    early_stopping=False,
    eos_token_id=tokenizer.eos_token_id,
    )
output = tokenizer.decode(tokenized_output[0], skip_special_tokens=True)
output

'How many wheels does a car have? four'

In [9]:
from datasets import load_dataset
dataset = "fka/awesome-chatgpt-prompts"

#Create the Dataset to create prompts.
data = load_dataset(dataset)

data = data.map(lambda samples: tokenizer(samples["prompt"]), batched=True)
train_sample = data["train"].select(range(50))

del data
train_sample = train_sample.remove_columns('act')

display(train_sample)

Dataset({
    features: ['prompt', 'input_ids', 'attention_mask'],
    num_rows: 50
})

In [10]:
import peft
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16, #As bigger the R bigger the parameters to train.
    lora_alpha=16, # a scaling factor that adjusts the magnitude of the weight matrix. It seems that as higher more weight have the new training.
    target_modules=["query_key_value"],
    lora_dropout=0.05, #Helps to avoid Overfitting.
    bias="none", # this specifies if the bias parameter should be trained.
    task_type="CAUSAL_LM"
)

output_dir = '/home/smckean/Produced/peft_poc_output'

In [15]:
from trl import SFTTrainer
from transformers import TrainingArguments # , Trainer

training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=10,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
)


# Initialize the SFTTrainer
trainer = SFTTrainer(
    model=foundation_model,
    args=training_args,
    train_dataset=train_sample,
    tokenizer=tokenizer,
    peft_config=lora_config,
    dataset_text_field="prompt",
)



In [16]:
trainer.train()

                                                 


{'loss': 2.9028, 'grad_norm': 3.9721927642822266, 'learning_rate': 0.0, 'epoch': 10.0}


                                                 
100%|██████████| 500/500 [01:21<00:00,  6.13it/s]

{'train_runtime': 81.5281, 'train_samples_per_second': 6.133, 'train_steps_per_second': 6.133, 'train_loss': 2.90278466796875, 'epoch': 10.0}





TrainOutput(global_step=500, training_loss=2.90278466796875, metrics={'train_runtime': 81.5281, 'train_samples_per_second': 6.133, 'train_steps_per_second': 6.133, 'total_flos': 84255525519360.0, 'train_loss': 2.90278466796875, 'epoch': 10.0})

In [22]:
import os
peft_model_path = os.path.join(output_dir, f"lora_model")
trainer.model.save_pretrained(peft_model_path)



In [23]:
from peft import AutoPeftModelForCausalLM, PeftConfig
bnb_config2 = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

loaded_model = AutoPeftModelForCausalLM.from_pretrained(
    peft_model_path,
    #torch_dtype=torch.bfloat16,
    is_trainable=False,
    #load_in_4bit=True,
    quantization_config=bnb_config2,
    device_map = 'cuda')

  0%|          | 0/12 [07:13<?, ?it/s]


In [29]:
import textwrap

In [36]:
input = tokenizer("I want you to act as a motivational coach. ", return_tensors='pt').to('cuda')
tokenized_output = foundation_model.generate(
    input_ids=input['input_ids'],
    attention_mask=input['attention_mask'],
    max_new_tokens=1000,
    early_stopping=False,
    repetition_penalty=3.1,
    eos_token_id=tokenizer.eos_token_id,
    )
output = tokenizer.decode(tokenized_output[0], skip_special_tokens=True)
wrapper = textwrap.TextWrapper(width=80)  # replace 80 with your desired line width
wrapped_output = wrapper.fill(output)

print(wrapped_output)

I want you to act as a motivational coach.  I will provide some information
about myself and what i can do in order for my clients achieve their goals. My
first step is giving them instructions on how they should approach the
challenges of life, such as:  "I need help figuring out if this job could be
better suited than other jobs available at work; it would also make me feel more
confident when applying..." "My client needs advice regarding improving his
current financial situation so that he may increase income while still being
able affording luxury cars ... "Your goal here are two things - both achievable
but not mutually exclusive: 1) To motivate your clientele through effective
communication skills 2)...


In [34]:
input = tokenizer("I want you to act as a motivational coach. ", return_tensors='pt').to('cuda')
tokenized_output = loaded_model.generate(
    input_ids=input['input_ids'],
    attention_mask=input['attention_mask'],
    max_new_tokens=1000,
    early_stopping=False,
    repetition_penalty=3.1,
    eos_token_id=tokenizer.eos_token_id,
    )
output = tokenizer.decode(tokenized_output[0], skip_special_tokens=True)
wrapper = textwrap.TextWrapper(width=80)  # replace 80 with your desired line width
wrapped_output = wrapper.fill(output)

print(wrapped_output)

I want you to act as a motivational coach.  I will provide some information
about your subject matter and ask the reader how they can improve their
performance in order that it becomes easier for them, "I" or "the other person
who is doing this task at me"". The purpose of my suggestion should be "to help
people feel more confident themselves by improving existing habits which are
causing problems such as: stress levels anxiety depression etc., so these issues
may not seem like too big an issue but instead become something less than
"outrageous"; however difficult those changes might appear.   My first request
would be: “I need someone else with experience helping individuals overcome
challenges related to: health conditions/mental illnesses / personality
disorders”  “Personal development needs improvement”; “The effects on others’
well-being have been studied; therefore there has also recently emerged interest
from academics regarding possible solutions ... ” "My goal here could invol