# AI Motivational Quote Generator

This notebook fine-tunes the TinyLlama model using PEFT (LoRA) to generate motivational quotes based on a topic.



### Install Dependencies

In [1]:
# Installing required libraries for training
!pip install -q transformers datasets accelerate peft trl bitsandbytes
print("Dependencies installed.")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m462.8/462.8 kB[0m [31m33.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.4/59.4 MB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[?25hDependencies installed.


### Import Libraries

In [14]:
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    pipeline,
    EarlyStoppingCallback
)
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig
import time
from huggingface_hub import notebook_login

print("Imports successful.")

Imports successful.


### Load Dataset and Preprocessing

In [3]:
print("Loading and formatting the dataset...")
dataset = load_dataset("Abirate/english_quotes", split="train")

# This function 'explodes' the dataset: one quote with 3 tags
# becomes 3 separate training examples. This enriches the data.
def format_and_explode_tags(batch):
    new_texts = []

    # Iterate through each quote and its corresponding tag list
    for quote, tags in zip(batch['quote'], batch['tags']):
        if not quote or not tags:
            continue

        # Create a new training example for EACH tag
        for tag in tags:
            if tag:
                formatted_string = f"Keyword: {tag}\nQuote: {quote} - Unknown"
                new_texts.append(formatted_string)

    return {"text": new_texts}

# Use batched=True for efficient processing.
# This allows the map function to return a different number of rows
# than it received.
processed_dataset = dataset.map(
    format_and_explode_tags,
    batched=True,
    remove_columns=dataset.column_names
)

# Filter out any potential None entries
processed_dataset = processed_dataset.filter(lambda x: x['text'] is not None)

# We'll use 10% of the data for evaluation
split_data = processed_dataset.train_test_split(test_size=0.1)
train_dataset = split_data['train']
eval_dataset = split_data['test']

print(f"Original dataset size: {len(dataset)}")
print(f"Exploded dataset size: {len(processed_dataset)}")
print(f"Training examples: {len(train_dataset)}")
print(f"Evaluation examples: {len(eval_dataset)}")
print("\n--- Data sample after exploding tags: ---")
for i in range(5):
    print(processed_dataset[i]['text'])
print("---------------------------------------")
print("Dataset preprocessing complete.")

Loading and formatting the dataset...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

quotes.jsonl:   0%|          | 0.00/647k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/2508 [00:00<?, ? examples/s]

Map:   0%|          | 0/2508 [00:00<?, ? examples/s]

Filter:   0%|          | 0/8011 [00:00<?, ? examples/s]

Original dataset size: 2508
Exploded dataset size: 8011
Training examples: 7209
Evaluation examples: 802

--- Data sample after exploding tags: ---
Keyword: be-yourself
Quote: “Be yourself; everyone else is already taken.” - Unknown
Keyword: gilbert-perreira
Quote: “Be yourself; everyone else is already taken.” - Unknown
Keyword: honesty
Quote: “Be yourself; everyone else is already taken.” - Unknown
Keyword: inspirational
Quote: “Be yourself; everyone else is already taken.” - Unknown
Keyword: misattributed-oscar-wilde
Quote: “Be yourself; everyone else is already taken.” - Unknown
---------------------------------------
Dataset preprocessing complete.


### Configure 4-bit Quantization (QLoRA)

In [4]:
# Setting up 4-bit quantization (QLoRA) config
# This is what allows us to load and train the model on a T4 GPU
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)
print("BitsAndBytesConfig created.")

BitsAndBytesConfig created.


### Load TinyLlama Model and Tokenizer

In [5]:
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
print(f"Loading base model and tokenizer: {model_name}...")

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto", # Use accelerate to auto-map to GPU
    trust_remote_code=True,
)
model.config.use_cache = False

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print("Model and tokenizer loaded successfully.")

Loading base model and tokenizer: TinyLlama/TinyLlama-1.1B-Chat-v1.0...


config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

Model and tokenizer loaded successfully.


### Test Base Model (Before Fine-Tuning)

In [7]:
print("\n--- Benchmarking BASE Model (Before Fine-Tuning) ---")

prompts = {
    "life": "Keyword: life\nQuote:",
    "inspiration": "Keyword: inspiration\nQuote:",
    "friendship": "Keyword: friendship\nQuote:",
    "happiness": "Keyword: happiness\nQuote:",
    "love": "Keyword: love\nQuote:",
    "success": "Keyword: success\nQuote:",
    "husband and wife": "Keyword: husband and wife\nQuote:",
}

# This pipeline uses the 4-bit model on the GPU
base_pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    dtype=torch.float16,
    device_map="auto"
)

print(f"Running baseline benchmark...")
start_time = time.time()
results = base_pipe(
    list(prompts.values()),
    max_new_tokens=80,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    eos_token_id=tokenizer.eos_token_id
)
end_time = time.time()
base_model_time = end_time - start_time

print(f"\nBaseline generation for 7 prompts finished in {base_model_time:.2f}s.")
print("\n--- Baseline Model Outputs: ---")
for i, (keyword, result) in enumerate(zip(prompts.keys(), results)):
    print(f"\nKeyword: {keyword}")
    print(f"Output: {result[0]['generated_text']}")

print("\nNote: The base model doesn't understand our prompt format and gives generic chat replies.")

Device set to use cuda:0



--- Benchmarking BASE Model (Before Fine-Tuning) ---
Running baseline benchmark...

Baseline generation for 7 prompts finished in 51.35s.

--- Baseline Model Outputs: ---

Keyword: life
Output: Keyword: life
Quote: "In every life, a rainbow is born. A rainbow is a promise. A promise that one day the rain will clear, the clouds will part and the sun will shine."
--Jennifer L. Holm
Tagged: Jennifer L. Holm, Rainbow, Promise, Sun, Universe, Life, Promise
Book, Literature, Quote

Keyword: inspiration
Output: Keyword: inspiration
Quote: "The more I practice, the more I learn."
Tagline: "Learn to love the journey"
Keyword: growth, transformation, self-discovery

Brand Persona:

1. Adele - a successful musician who has achieved her dreams despite facing adversity
2. Emma - a writer who struggles with self-d

Keyword: friendship
Output: Keyword: friendship
Quote: "I have the greatest friends in the world, and they are the ones who love me the most. And that's what I want for myself, too. The 

### Configure PEFT (LoRA)

In [8]:
# Setting up LoRA (PEFT) parameters
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "v_proj"], # Target attention layers
    bias="none",
    task_type="CAUSAL_LM",
)
print("\nLoRA config created.")


LoRA config created.


### Configure Training Arguments

In [11]:
training_args = SFTConfig(
    output_dir="./results",          # Checkpoint directory
    num_train_epochs=3,              # A single epoch is often enough for LoRA
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=True,  # Use 16-bit precision (mixed-precision) for GPU
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    eval_strategy="steps",
    eval_steps=25,
    load_best_model_at_end=True,
    metric_for_best_model = "eval_loss",
    greater_is_better = False,
    logging_steps=25,                # Log training progress every 25 steps
    save_strategy="steps",
    dataset_text_field="text",
    max_length=512,
    report_to="none",
)
print("SFTConfig (training args) set.")

SFTConfig (training args) set.


### Initialize the SFTTrainer

In [15]:
# Initializing the SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=peft_config,
    processing_class=tokenizer,
    args=training_args,
)

early_stopping_callback = EarlyStoppingCallback(
    early_stopping_patience = 3,
    early_stopping_threshold = 0.0,
)
trainer.add_callback(early_stopping_callback)

print("SFTTrainer initialized.")

SFTTrainer initialized.




### Start Training

In [16]:
print("\nStarting LoRA fine-tuning...")
trainer.train()
print("Fine-tuning complete.")

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 2}.



Starting LoRA fine-tuning...


  return fn(*args, **kwargs)


Step,Training Loss,Validation Loss,Entropy,Num Tokens,Mean Token Accuracy
25,2.6029,2.391708,2.447729,23171.0,0.540013
50,2.1255,2.008281,2.092705,46399.0,0.576352
75,1.9314,1.861735,1.874428,72577.0,0.59976
100,1.7695,1.834758,1.88342,96128.0,0.604127
125,1.8236,1.810721,1.819964,119693.0,0.608024
150,1.8045,1.798737,1.850623,143122.0,0.607969
175,1.7189,1.779508,1.786521,166369.0,0.6128
200,1.7522,1.767906,1.81468,190103.0,0.614323
225,1.6764,1.753691,1.736063,213124.0,0.617863
250,1.6526,1.742755,1.784353,237015.0,0.620127


  return fn(*args, **kwargs)
  return fn(*args, **kwargs)


Fine-tuning complete.


### Push the Adapters to Hugging Face Hub

In [17]:
# This will save an API token to your Colab environment
print("Logging in to Hugging Face Hub...")
notebook_login()

Logging in to Hugging Face Hub...


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [18]:
adapter_repo_name = "bkqz/tinyllama-quotes-adapters"

print(f"Pushing LoRA adapters to: {adapter_repo_name}...")
trainer.push_to_hub(adapter_repo_name)
print("Adapters successfully saved to the Hub.")

Pushing LoRA adapters to: bkqz/tinyllama-quotes-adapters...


Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...t/results/tokenizer.model: 100%|##########|  500kB /  500kB            

  ...adapter_model.safetensors:   6%|6         |  549kB / 9.02MB            

  ...results/training_args.bin:   6%|6         |   375B / 6.16kB            

Adapters successfully saved to the Hub.


### Test Fine-Tuned Model (After Fine-Tuning)





In [25]:
print("\n--- Benchmarking FINE-TUNED Model (After Training) ---")

print("Casting model to float16...")
trainer.model.to(torch.float16)
print("Model cast complete.")

# Create a new pipeline with our trained model (base + adapters)
finetuned_pipe = pipeline(
    "text-generation",
    model=trainer.model, # This now includes the LoRA adapters
    tokenizer=tokenizer,
    dtype=torch.float16,
    device_map="auto"
)

print(f"Running fine-tuned benchmark...")
start_time = time.time()
results_ft = finetuned_pipe(
    list(prompts.values()), # Use the same prompts
    max_new_tokens=80,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    eos_token_id=tokenizer.eos_token_id
)
end_time = time.time()
finetuned_model_time = end_time - start_time

print(f"\nFine-tuned generation for 7 prompts finished in {finetuned_model_time:.2f}s.")
print("\n--- Fine-Tuned Model Outputs: ---")
for i, (keyword, result) in enumerate(zip(prompts.keys(), results_ft)):
    print(f"\nKeyword: {keyword}")
    print(f"Output: {result[0]['generated_text']}")

Device set to use cuda:0



--- Benchmarking FINE-TUNED Model (After Training) ---
Casting model to float16...
Model cast complete.
Running fine-tuned benchmark...

Fine-tuned generation for 7 prompts finished in 21.06s.

--- Fine-Tuned Model Outputs: ---

Keyword: life
Output: Keyword: life
Quote: “Don't wait for the perfect time to start living your best life. Begin your journey right now.” - Unknown

Keyword: inspiration
Output: Keyword: inspiration
Quote: “Be grateful for what you already have while you pursue your goals. If you arenâ€™t grateful for what you already have, what makes you think you would be happy with more.” - Unknown

Keyword: friendship
Output: Keyword: friendship
Quote: “You don't understand me until you understand your friends.” - Unknown

Keyword: happiness
Output: Keyword: happiness
Quote: “Life is too short to be unhappy.” - Unknown

Keyword: love
Output: Keyword: love
Quote: “Whenever you find yourself on the side of the majority, it is time to modify your opinion.” - Unknown

Keyword

### 📊 Final Comparison & Insights

#### 1. Output Quality & Task Adherence

The primary objective was to specialize the base model for a new task. The results confirm this was successful.

* **Baseline Model:** Failed to adhere to the required `Keyword: ... Quote:` format. Outputs were unstructured, included extraneous metadata (tags, authors), and were often irrelevant to the prompt's intent.
* **Fine-Tuned Model:** Consistently adhered to the `Keyword: ... Quote: ... - Unknown` structure. Outputs were clean, relevant, and directly addressed the prompt.

**Quality Conclusion:** The LoRA fine-tuning successfully overrode the base model's default, general-purpose behavior and taught it the specific new output format.

#### 2. Inference Performance (T4 GPU)

A significant and unexpected improvement in performance was observed.

* **Baseline Model (GPU):** 51.35s
* **Fine-Tuned Model (GPU):** 21.06s

**Performance Conclusion:** The fine-tuned model shows a **2.4x speedup** in inference.

The likely cause is that fine-tuning constrained the generation task. The base model, when prompted, produced high-variance, complex text, incurring high computational cost. The specialized model has a much clearer, more efficient generation path, which drastically reduced inference latency.

**Project Summary:** The fine-tuning was successful. The model learned to consistently follow the new format in our tests, while also running over 50% faster.