### Goal:
#### Use LoRA to adapt only certain parts of the model (the attention layers) to save memory and speed up training.
#### - Use the `lora_alpha` and `lora_dropout` parameters help in scaling the updates and preventing overfitting.
#### - Make sure to facilitate a smaller fine-tuning dataset and less computational power compared to full model fine-tuning.


In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import get_peft_model, LoraConfig

  from .autonotebook import tqdm as notebook_tqdm


#### Scenario: Customizing an LLM for tech customer support to answer software-related questions effectively


In [None]:
#Loading a pre-trained language model
model_name = "gpt2"  
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [4]:
#Setting up the LoRA configuration
lora_config = LoraConfig(
    r=8,  # The rank of the low-rank decomposition
    lora_alpha=32,  # Scaling factor
    lora_dropout=0.1,  # Dropout rate for regularization
    target_modules=["attn.c_attn"]  # Target specific parts of the model (e.g., attention layers)
)

In [5]:
#Wrap the model with the LoRA configuration
model = get_peft_model(model, lora_config)



In [6]:
# Preparing the model for training with LoRA
# Using a small dataset for fine-tuning
train_data = [
    {"input": "How do I resolve software crashes on Windows 10?", "output": "Try updating your drivers and checking for system updates."},
    {"input": "What should I do if an application stops responding?", "output": "You can use Task Manager to force close the app and restart it."},
]

In [11]:
# Tokenize input and output data
train_encodings = []
for item in train_data:
    input_ids = tokenizer(item['input'], return_tensors='pt').input_ids
    output_ids = tokenizer(item['output'], return_tensors='pt').input_ids
    
    # Ensure batch sizes match for input and target tensors
    if input_ids.size(1) == output_ids.size(1):
        train_encodings.append((input_ids, output_ids))
    else:
        print(f"Skipping training pair due to size mismatch: input size {input_ids.size(1)}, output size {output_ids.size(1)}")

Skipping training pair due to size mismatch: input size 10, output size 14


In [8]:
# Defining a simple training loop for fine-tuning
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)

In [9]:
model.train()


PeftModel(
  (base_model): LoraModel(
    (model): GPT2LMHeadModel(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2SdpaAttention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D(nf=2304, nx=768)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (l

In [12]:
for epoch in range(3):  # Running for three epochs for testing
    for input_ids, output_ids in train_encodings:
        outputs = model(input_ids=input_ids, labels=output_ids)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        print(f"Epoch {epoch + 1}, Loss: {loss.item()}")

Epoch 1, Loss: 8.916942596435547
Epoch 2, Loss: 8.262154579162598
Epoch 3, Loss: 8.393136024475098


In [13]:
model.eval()


PeftModel(
  (base_model): LoraModel(
    (model): GPT2LMHeadModel(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2SdpaAttention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D(nf=2304, nx=768)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=2304, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (l

In [15]:
with torch.no_grad():
    input_prompt = "How can I fix slow performance on my laptop?"
    input_ids = tokenizer(input_prompt, return_tensors='pt').input_ids
    output = model.generate(input_ids, max_length=80)
    response = tokenizer.decode(output[0], skip_special_tokens=True)

print("Generated response:", response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Generated response: How can I fix slow performance on my laptop?

The best way to fix slow performance on your laptop is to use a USB-C port. This port is located on the back of the laptop. It is located on the back of the laptop and is connected to the USB-C port.

If you have a USB-C port, you can use a USB-C port
