### Summary of LoRA Fine-Tuning Workflow

- Loaded a pretrained causal language model (`bloomz-560m`) and its tokenizer.
- Prepared prompts dataset (`fka/awesome-chatgpt-prompts`) and tokenized the input prompts.
- Configured LoRA (Low-Rank Adaptation) to fine-tune only a small subset of parameters efficiently.
- Wrapped the base model with LoRA adapters using `peft` library.
- Set up training using Hugging Face `Trainer` with efficient training arguments.
- Fine-tuned the LoRA model on a subset of prompts.
- Saved the fine-tuned LoRA adapters separately from the base model.
- Loaded the saved LoRA adapters onto the base model for inference.
- Generated motivational coaching style text using the LoRA-adapted model.



# Load the PEFT and Datasets Libraries.

The PEFT library contains the Hugging Face implementation of differente fine-tuning techniques, like LoRA Tuning.

Using the Datasets library we have acces to a huge amount of Datasets.

In [4]:
# !pip install -q peft==0.8.2
# !pip install -q datasets==2.16.1

From the transformers library we import the necesary classes to import the model and the tokenizer.

Then we can load the Tokenizer and the model.

Bloom is one of the smallest and smarter model available to be trained with PEFT Library using Prompt Tuning. You can use either of the models in the Bloom Family, I encorage you to use at least two of them and see the differences.

I'm using the smallest one just to spend less time trainig, and avoid memory problems in Colab.

In [1]:
'''This imports two classes from the transformers library by Hugging Face:
AutoModelForCausalLM: A generic class to load causal language models (used for text generation).
AutoTokenizer: Automatically loads the appropriate tokenizer for the chosen model. Tokenizers are responsible for converting text into tokens that the model can understand.'''
from transformers import AutoModelForCausalLM, AutoTokenizer
'''Sets the model_name variable to "bigscience/bloomz-560m", which is a pre-trained language model from the BigScience project.
It has 560 million parameters. The second line is commented out, but if used, it would switch to a larger version of the model (1.1 billion parameters).'''
model_name = "bigscience/bloomz-560m"
#model_name="bigscience/bloom-1b1"
'''Downloads and loads the tokenizer associated with the given model name from Hugging Face’s model hub.'''
tokenizer = AutoTokenizer.from_pretrained(model_name)
'''Downloads and loads the pre-trained causal language model specified by model_name.'''
foundation_model = AutoModelForCausalLM.from_pretrained(model_name)

  from .autonotebook import tqdm as notebook_tqdm


## Inference with the pre-trained model.
I'm going to do a test with the pre-trained model without fine-tuning, to see if something changes after the fine-tuning.

In [2]:
#this function returns the outputs from the model received, and inputs.
'''model: The language model (e.g. foundation_model from earlier).
inputs: A dictionary (from the tokenizer) containing at least input_ids and attention_mask.
max_new_tokens: Optional argument to set how many new tokens to generate. Default is 100.'''
def get_outputs(model, inputs, max_new_tokens=100):
    '''Calls the .generate() method from the Hugging Face transformers library, which generates text from the model.
input_ids: The tokenized input text (e.g. prompt).
attention_mask: Tells the model which tokens to pay attention to (1) and which to ignore (0).
max_new_tokens: The maximum number of new tokens to generate in addition to the input.
repetition_penalty=1.5: Helps reduce repetition in the generated text. A value >1 discourages the model from repeating the same text.
early_stopping=True: The generation will stop early if an end-of-sequence token is generated or the model sees fit.
eos_token_id: Tells the model what token represents "end of sequence" so it knows when to stop generating.'''
        outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=max_new_tokens,
        repetition_penalty=1.5, #Avoid repetition.
        early_stopping=True, #The model can stop before reach the max_length
        eos_token_id=tokenizer.eos_token_id
    )
    return outputs

The dataset used for the fine-tuning contains prompts to be used with Large Language Models.

I'm going to request the pre-trained model that acts like a motivational coach.

In [3]:
#Inference original model
'''Tokenizes the input string using the previously loaded tokenizer.
return_tensors="pt" means the tokenizer returns PyTorch tensors, which are required for model input.'''
input_sentences = tokenizer("I want you to act as a motivational coach. ", return_tensors="pt")
'''Calls your previously defined get_outputs() function.
Uses the foundation_model to generate a response of up to 50 new tokens.
The output is a tensor containing the full token sequence (original prompt + generated continuation).'''
foundational_outputs_sentence = get_outputs(foundation_model, input_sentences, max_new_tokens=50)
'''Decodes the output tokens back into human-readable text.
batch_decode() is used instead of decode() because the model returns a batch (even if it's just one item).
skip_special_tokens=True removes any special tokens like <pad> or <eos> that might be present.'''
print(tokenizer.batch_decode(foundational_outputs_sentence, skip_special_tokens=True))



["I want you to act as a motivational coach.  Don't be afraid of being challenged."]


Not sure if the answer is correct or not, but for sure is not a prompt. We need to train our model if we want that acts like a prompt engineer.

# Preparing the Dataset.
The Dataset used is:

https://huggingface.co/datasets/fka/awesome-chatgpt-prompts

In [4]:
'''This imports the load_dataset function from the Hugging Face datasets library.
datasets is used to easily load, manipulate, and preprocess large datasets for NLP.'''
from datasets import load_dataset
'''This is a Hugging Face-hosted dataset named "fka/awesome-chatgpt-prompts" — a community-curated collection of interesting ChatGPT prompt examples'''
dataset = "fka/awesome-chatgpt-prompts"

#Create the Dataset to create prompts.
'''Downloads and loads the dataset.

The resulting data object is a dictionary-like structure with splits, typically including:
data["train"] — the main portion of the dataset.
🧠 Each example in this dataset typically contains fields like:
act: Describes the kind of task or role (e.g., "motivational coach").
prompt: The actual prompt to give to the language model.'''
data = load_dataset(dataset)
'''Applies the tokenizer to the "prompt" field across the dataset.
batched=True ensures the tokenizer processes batches of samples (faster and more efficient).
Adds new columns to each dataset example: input_ids, attention_mask, etc.'''
data = data.map(lambda samples: tokenizer(samples["prompt"]), batched=True)
'''Selects the first 50 examples from the train split.
Useful for:Reducing computation during testing or prototyping
Quick sanity checks'''
train_sample = data["train"].select(range(50))
'''Removes the act column from each example.
That column describes the persona or role (e.g., "motivational coach") but may not be needed if you're only working with raw prompts.'''
train_sample = train_sample.remove_columns('act')
'''Displays the processed train_sample dataset in a notebook interface (e.g., Jupyter or Google Colab).
Shows the first few examples, which should now include:prompt, input_ids, attention_mask'''
display(train_sample)

Generating train split: 100%|████████████████████████████████████████████████| 203/203 [00:00<00:00, 6266.28 examples/s]
Map: 100%|███████████████████████████████████████████████████████████████████| 203/203 [00:00<00:00, 5322.29 examples/s]


Dataset({
    features: ['prompt', 'input_ids', 'attention_mask'],
    num_rows: 50
})

In [5]:
print(train_sample[:1])

{'prompt': ['Imagine you are an experienced Ethereum developer tasked with creating a smart contract for a blockchain messenger. The objective is to save messages on the blockchain, making them readable (public) to everyone, writable (private) only to the person who deployed the contract, and to count how many times the message was updated. Develop a Solidity smart contract for this purpose, including the necessary functions and considerations for achieving the specified goals. Please provide the code and any relevant explanations to ensure a clear understanding of the implementation.'], 'input_ids': [[186402, 1152, 1306, 660, 72560, 28857, 167625, 84544, 20165, 376, 1002, 26168, 267, 30479, 17477, 613, 267, 120755, 238776, 17, 1387, 47881, 632, 427, 14565, 29866, 664, 368, 120755, 15, 16997, 4054, 136044, 375, 4859, 12, 427, 39839, 15, 9697, 1242, 375, 13614, 12, 3804, 427, 368, 2298, 5268, 109891, 368, 17477, 15, 530, 427, 11210, 4143, 7112, 11866, 368, 11011, 1620, 36320, 17, 21265,

# Fine-Tuning.
First is necesary create a LoRA config.


In [6]:
# TARGET_MODULES
# https://github.com/huggingface/peft/blob/39ef2546d5d9b8f5f8a7016ec10657887a867041/src/peft/utils/other.py#L220
'''Imports the PEFT (Parameter-Efficient Fine-Tuning) library, which integrates with Hugging Face Transformers.
LoraConfig: Used to define the LoRA configuration. get_peft_model: Wraps a pretrained model with LoRA. PeftModel: The resulting model class after LoRA adaptation.'''
import peft
from peft import LoraConfig, get_peft_model, PeftModel
'''| Parameter                            | Description                                                                                                                            |
| ------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------- |
| `r=4`                                | **Rank** of the low-rank decomposition. Higher = more trainable parameters.                                                            |
| `lora_alpha=1`                       | **Scaling factor**: Scales the LoRA weights. Often 1, but increasing can improve training if `r` is small.                             |
| `target_modules=["query_key_value"]` | Specifies which submodules in the model to inject LoRA layers into. `"query_key_value"` is often used in transformer attention layers. |
| `lora_dropout=0.05`                  | Dropout applied **within LoRA layers** during training. Helps prevent overfitting.                                                     |
| `bias="lora_only"`                   | Only trains bias parameters that are **directly related to LoRA layers** (not all biases).                                             |
| `task_type="CAUSAL_LM"`              | Specifies the task: **Causal Language Modeling** (like GPT-style next-token prediction).                                               |
 '''
lora_config = LoraConfig(
    r=4, #As bigger the R bigger the parameters to train.
    lora_alpha=1, # a scaling factor that adjusts the magnitude of the weight matrix. Usually set to 1
    target_modules=["query_key_value"], #You can obtain a list of target modules in the URL above.
    lora_dropout=0.05, #Helps to avoid Overfitting.
    bias="lora_only", # this specifies if the bias parameter should be trained.
    task_type="CAUSAL_LM"
)

The most important parameter is **r**, it defines how many parameters will be trained. As bigger the valuer more parameters are trained, but it means that the model will be able to learn more complicated relations between input and output.

Yo can find a list of the **target_modules** available on the [Hugging Face Documentation]( https://github.com/huggingface/peft/blob/39ef2546d5d9b8f5f8a7016ec10657887a867041/src/peft/utils/other.py#L220)

**lora_dropout** is like the commom dropout is used to avoid overfitting.

**bias** I was hesitating if use *none* or *lora_only*. For text classification the most common value is none, and for chat or question answering, *all* or *lora_only*.

**task_type**. Indicates the task the model is beign trained for. In this case, text generation.

### Create the PEFT model.



In [7]:
'''This wraps your original foundation_model (e.g., BLOOMZ-560M) with LoRA adapters using the lora_config you defined earlier.
The result is a new model (peft_model) where: Most of the original model’s parameters are frozen (not trainable).
Only small LoRA modules inside query_key_value layers (and optionally some bias terms) are trainable.'''
peft_model = get_peft_model(foundation_model, lora_config)
print(peft_model.print_trainable_parameters())

trainable params: 466,944 || all params: 559,607,808 || trainable%: 0.0834
None


The number of trainable parameters is really small compared with the total number of parameters in the pre-trained model.

In [8]:
#Create a directory to contain the Model
import os
working_dir = './'

output_directory = os.path.join(working_dir, "peft_lab_outputs")

In the TrainingArgs we inform the number of epochs we want to train, the output directory and the learning_rate.

In [9]:
#Creating the TrainingArgs
'''Brings in the Trainer API from the Hugging Face Transformers library, which simplifies model training.
TrainingArguments is used to configure the training process (output paths, learning rate, batch size, etc.).'''
import transformers
from transformers import TrainingArguments, Trainer
'''| Parameter                     | Description                                                                                                                                   |
| ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| `output_dir=output_directory` | Directory to save model checkpoints and logs. Make sure `output_directory` is defined beforehand (e.g., as a string like `"./lora_outputs"`). |
| `auto_find_batch_size=True`   | Automatically finds the **largest batch size** that fits in memory. Very helpful if you're unsure what the hardware can handle.               |
| `learning_rate=3e-2`          | This is **0.03**, which is relatively high — but that’s okay for LoRA, since you're only training a small number of parameters.               |
| `num_train_epochs=2`          | The number of times to iterate over the training dataset.                                                                                     |
| `use_cpu=True`                | Forces training on CPU instead of GPU. This is fine for prototyping or small models, but **slow** for larger ones.                            |
 '''
training_args = TrainingArguments(
    output_dir=output_directory,
    auto_find_batch_size=True, # Find a correct bvatch size that fits the size of Data.
    learning_rate= 3e-2, # Higher learning rate than full fine-tuning.
    num_train_epochs=2,
    #use_cpu=True
)

Now we can train the model.
To train the model we need:


*   The PEFT Model.
*   The training_args
* The Dataset
* The result of DataCollator, the Dataset ready to be procesed in blocks.





In [10]:
#This cell may take up to 15 minutes to execute.
'''| Parameter                    | Purpose                                                                                                                                                                                                                                                                                          |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `model=peft_model`           | Uses the **LoRA-wrapped model** (from `get_peft_model`) for training.                                                                                                                                                                                                                            |
| `args=training_args`         | Passes in your previously defined `TrainingArguments`.                                                                                                                                                                                                                                           |
| `train_dataset=train_sample` | The training dataset — in your case, the **first 50 tokenized prompts** from the ChatGPT prompt dataset.                                                                                                                                                                                         |
| `data_collator=...`          | This dynamically pads the inputs in a batch so they are of equal length, which is required by PyTorch. <br> You're using `DataCollatorForLanguageModeling` with `mlm=False`, meaning you're training with **causal language modeling** (like GPT), **not masked language modeling** (like BERT). |
 '''
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_sample,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
trainer.train()

Step,Training Loss


TrainOutput(global_step=14, training_loss=2.642571040562221, metrics={'train_runtime': 199.9828, 'train_samples_per_second': 0.5, 'train_steps_per_second': 0.07, 'total_flos': 21871163621376.0, 'train_loss': 2.642571040562221, 'epoch': 2.0})

In [14]:
#Save the model.
'''Creates a path to save the model by combining output_directory (where you store outputs/checkpoints) with a subfolder name like "lora_model".'''
peft_model_path = os.path.join(output_directory, f"lora_model")
'''Saves the LoRA-adapted model weights (not the full base model) to the specified directory.
This includes:LoRA adapter weights, Config files (adapter_config.json, adapter_model.bin), Metadata for later loading'''
trainer.model.save_pretrained(peft_model_path)

In [15]:
#Load the Model.
'''| Parameter            | Description                                                                                                                                                                                     |
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `foundation_model`   | This is your original base model (e.g., `bloomz-560m`). It's being re-wrapped with the LoRA adapter weights.                                                                                    |
| `peft_model_path`    | The path to the folder where your LoRA adapter was saved (contains `adapter_config.json`, `adapter_model.bin`, etc.).                                                                           |
| `is_trainable=False` | Indicates that the model is **loaded in inference mode**, so you’re not planning to fine-tune further. LoRA weights will be frozen. <br> ✅ Useful when you're doing generation/evaluation only. |
 '''
loaded_model = PeftModel.from_pretrained(foundation_model,
                                        peft_model_path,
                                        is_trainable=False)

## Inference the fine-tuned model.

In [16]:
'''Converts your prompt string into token IDs and attention masks in PyTorch tensor format. This will be the input for the model.'''
input_sentences = tokenizer("I want you to act as a motivational coach. ", return_tensors="pt")
'''Calls your earlier-defined get_outputs function. Uses the LoRA-adapted model (loaded_model) to generate up to 50 new tokens.
Generates continuation text based on the prompt.'''
foundational_outputs_sentence = get_outputs(loaded_model, input_sentences, max_new_tokens=50)
'''Decodes the generated tokens back into text. skip_special_tokens=True removes tokens like <eos>, <pad>, etc.
Prints the final output as a list of strings (usually one string for a single input).'''
print(tokenizer.batch_decode(foundational_outputs_sentence, skip_special_tokens=True))

['I want you to act as a motivational coach.  I will provide some information about yourself and your goals, such as:  "I need help improving my confidence level." "I\'ve been looking for ways of increasing motivation in order not only achieving these objectives but also being more confident myself by doing something that makes']
