## QuestGen-LLM: Fine-Tuning

This notebook covers the fine-tuning of various pre-trained _large language models_ (LLMs) on the prepared ["quest"](../data/quests_train.json) dataset. Each language model applied is trained and validated on the dataset (with frozen parameters) and the results of these evaluations are compared. The LLMs employed for this application are listed in the following table with their respective parameter count.

| S. No. | Large Language Model             | Parameters | Developed By | Notes                                                 |
| :----: | :------------------------------- | :--------: | :----------: | :---------------------------------------------------- |
|   1.   | GPT-2[^1]                        |    124M    |    OpenAI    | Base model from the GPT-2 family                      |
|   2.   | GPT-2 Medium[^2]                 |    355M    |    OpenAI    | Larger variant with improved language modeling        |
|   3.   | GPT-2 Large[^3]                  |    774M    |    OpenAI    | Capable of generating more coherent longer text       |
|   4.   | Llama-3.2-1B-Instruct[^4] †      |     1B     |     Meta     | Instruction-tuned model for question-answering        |
|   5.   | TinyLlama-1.1B-Chat-v1.0[^5] \*† |    1.1B    |  TinyLlama   | Lightweight chat-tuned model for constrained hardware |

> Fine-tuning uses _supervised fine-tuning_\* (SHF) and _reinforcement learning with human feedback_† (RLHF).

<!-- References -->

[^1]: https://huggingface.co/openai-community/gpt2
[^2]: https://huggingface.co/openai-community/gpt2-medium
[^3]: https://huggingface.co/openai-community/gpt2-large
[^4]: https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct
[^5]: https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0


In [1]:
from __future__ import annotations

import json
import os
import shutil
import time
from dataclasses import dataclass, field
from os import PathLike
from pathlib import Path
from typing import Any, Final

import torch
from datasets import Dataset, DatasetDict, load_dataset
from huggingface_hub import login
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    DataCollatorForLanguageModeling,
    EarlyStoppingCallback,
    PreTrainedModel,
    PreTrainedTokenizerFast,
    PreTrainedTokenizer,
    Trainer,
    TrainerCallback,
    TrainingArguments,
    set_seed,
)
from transformers.tokenization_utils_base import BatchEncoding

from utils.dirpath import get_cache_dirpath, get_target_dirpath

In [2]:
# Get the HF access token from the environment
HF_ACCESS_TOKEN: Final[str] = os.getenv("HUGGINGFACE_HUB_TOKEN")

# Save the HF token to ~/.huggingface/token
login(token=HF_ACCESS_TOKEN)

In [3]:
# Map for the model identifiers: (model_key -> model_id)
MODEL_IDENTIFIERS: Final[dict[str, str]] = {
    "gpt2": "openai-community/gpt2",
    "gpt2-medium": "openai-community/gpt2-medium",
    "gpt2-large": "openai-community/gpt2-large",
    "llama-3.2-1b-instruct": "meta-llama/Llama-3.2-1B-Instruct",
    "tinyllama-1.1b-chat": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
}

In [None]:
data_dir: Path = get_target_dirpath("data")

# Load the quest dataset
quest_set: DatasetDict = load_dataset(
    "text",
    data_files={
        "train": str(data_dir / "quests_train.txt"),
        "val": str(data_dir / "quests_val.txt"),
        "test": str(data_dir / "quests_test.txt"),
    },
    cache_dir=str(data_dir / ".cache"),
)
quest_set

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 19954
    })
    val: Dataset({
        features: ['text'],
        num_rows: 2486
    })
})

In [5]:
quest_set["train"][:21]

{'text': ['### Instruction:',
  'Generate a video game quest description based on the following structured information.',
  '',
  '### Input:',
  'Quest Name: Perilous Passage',
  'Objective: save the Mana Queen',
  'First Tasks: go through the gate to the Forsaken Vaults',
  'First Task Locations: Forsaken Vaults - a perilous dungeon',
  'Quest Giver: NONE - NONE (location: NONE)',
  'Reward: NONE -  (amount: 1)',
  'Characters: Mana Queen - a good female spirit (location: Forsaken Vaults)',
  'Tools: NONE',
  'Locations: NONE',
  'Items: NONE',
  'Enemies: NONE',
  'Groups: NONE',
  'Title: Torchlight II',
  'Motivation: NONE',
  '',
  '### Response:',
  "The Mana Queen has come and gone Through this gate, she journeyed on. Follow her and pay the cost. Hasten forth, or she'll be lost."]}

In [6]:
quest_set["val"][23:44]

{'text': ['Generate a video game quest description based on the following structured information.',
  '',
  '### Input:',
  'Quest Name: A Child in the Lighthouse',
  "Objective: save Ardrouine's little son from worgs",
  'First Tasks: go to the abandoned lighthouse',
  'First Task Locations:  - abandoned lighthouse to the northwest',
  'Quest Giver: NONE - NONE (location: NONE)',
  'Reward:  - coins (amount: 60)',
  'Characters: NONE',
  'Tools: NONE',
  'Locations: NONE',
  'Items: NONE',
  'Enemies: NONE',
  'Groups: NONE',
  "Title: Baldur's Gate",
  'Motivation: NONE',
  '',
  '### Response:',
  "Please help me, I am just poor Ardrouine! I don't know where else to turn. My little boy was playing in that abandoned lighthouse to the northwest when a pack of worgs surrounded it. Please just turn them back, and I can coax him down. There's not much time! I can pay you 60 coins: this money is all my husband brought back from market this past week. My son's life is worth this and so muc

In [7]:
# Map for the target modules: (model_key -> target_modules)
TARGET_MODULES: Final[dict[str, list[str]]] = {
    "gpt2": ["c_attn", "c_proj", "c_fc"],
    "gpt2-medium": ["c_attn", "c_proj", "c_fc"],
    "gpt2-large": ["c_attn", "c_proj", "c_fc"],
    "llama-3.2-1b-instruct": [
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    "tinyllama-1.1b-chat": [
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
}

In [8]:
# Set the constants for model tuning here
MAX_LENGTH: Final[int] = 32
BATCH_SIZE: Final[int] = 1
N_EPOCHS: Final[int] = 1
SEED: Final[int] = 42
LR_RATE: Final[float] = 5e-6
LOGGING_STEPS: Final[int] = 10

In [9]:
@dataclass
class QuestGenLLM:
    tokenizer: PreTrainedTokenizer | PreTrainedTokenizerFast
    model: PreTrainedModel
    model_key: str  # Alias for the model, e.g, "gpt2"
    model_id: str  # Hugging Face model name, e.g., "openai-community/gpt2"
    fp16_available: bool  # Mixed precision
    device: str = field(init=False)
    dtype: str = field(init=False)

    def __post_init__(self):
        # Automatically determine the device used by the model
        self.device = str(getattr(self.model, "device", "N/A"))

        # Automatically determine the dtype used by the model
        self.dtype = str(getattr(self.model, "dtype", "N/A")).replace("torch.", "")

    @classmethod
    def from_pretrained(
        cls,
        model_key: str,
        model_id: str,
        cache_dir: PathLike = get_cache_dirpath("models"),
        seed: int = SEED,
        use_cpu: bool = False,
    ) -> QuestGenLLM:
        def apply_lora_adapter(
            model: PreTrainedModel,
            r: int = 8,
            alpha: int = 16,
            dropout: float = 0.1,
            task_type: str = "CAUSAL_LM",
        ) -> PreTrainedModel:
            # Prepare model for k-bit training
            model = prepare_model_for_kbit_training(model)

            # Define the LoRA config
            lora_config: LoraConfig = LoraConfig(
                r=r,
                lora_alpha=alpha,
                lora_dropout=dropout,
                target_modules=TARGET_MODULES[model_key],
                bias="none",
                task_type=task_type,
            )

            try:
                # Apply LoRA adapters to the model
                model = get_peft_model(model, lora_config)
            except Exception as e:
                print(f"[LoRAINFO] Adapter failed to apply: {e}")
                raise

            # Display information about the model parameters
            trainable_params: int = sum(
                p.numel() for p in model.parameters() if p.requires_grad
            )
            all_params: int = sum(p.numel() for p in model.parameters())
            trainable_percent: float = 100 * trainable_params / all_params
            print(
                "[LoRAINFO] trainable params: {:,} || all params: {:,} || trainable%: {:.4f}".format(
                    trainable_params, all_params, trainable_percent
                )
            )

            return model

        print(f"[DOWNLOAD] {model_key} ({model_id})")
        start_time: float = time.time()

        # Clear PyTorch's CUDA memory cache
        torch.cuda.empty_cache()

        # Set the random seed for reproducibility
        set_seed(seed)

        # Determine if mixed precision is available
        fp16_available: bool = (
            torch.cuda.is_available()
            and torch.cuda.get_device_capability(0)[0] >= 7
            and torch.cuda.get_device_capability(0)[1] >= 0
        )

        # Download the tokenizer using the model id
        tokenizer: PreTrainedTokenizerFast = AutoTokenizer.from_pretrained(
            model_id,
            cache_dir=(cache_dir / model_key),
            use_fast=True,
            token=HF_ACCESS_TOKEN,
            trust_remote_code=True,
        )

        model: PreTrainedModel
        if fp16_available and not use_cpu:
            # Set the bitsandbytes configuration for quantization
            bnb_config: BitsAndBytesConfig = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_use_double_quant=True,
                bnb_4bit_compute_dtype=torch.float16,
                # llm_int8_enable_fp32_cpu_offload=True,
            )

            # Download the model using the model id (for GPU)
            model = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.float16,
                quantization_config=bnb_config,
                cache_dir=(cache_dir / model_key),
                token=HF_ACCESS_TOKEN,
                trust_remote_code=True,
                low_cpu_mem_usage=True,
            )
            model.to("cuda")
        else:
            # Download the model using the model id (for CPU)
            model = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.float32,
                cache_dir=(cache_dir / model_key),
                token=HF_ACCESS_TOKEN,
                trust_remote_code=True,
                low_cpu_mem_usage=True,
            )
            model.to("cpu")

        # Apply the LoRA adapters to the model
        model = apply_lora_adapter(model)

        end_time: float = time.time()
        elapsed: float = end_time - start_time
        print(f'[COMPLETE] "{model_key}" ready in {elapsed:.2f}s.\n')

        return cls(tokenizer, model, model_key, model_id, fp16_available)

    def tokenize_and_train(
        self,
        dataset: DatasetDict,
        max_length: int = MAX_LENGTH,
        learning_rate: int = LR_RATE,
        batch_size: int = BATCH_SIZE,
        epochs: int = N_EPOCHS,
        seed: int = SEED,
        logging_steps: int = LOGGING_STEPS,
        output_dir: PathLike = get_target_dirpath("out"),
        logging_dir: PathLike = get_target_dirpath("logs"),
        gradient_checkpointing: bool = True,
        load_best_model_at_end: bool = True,
        callbacks: list[TrainerCallback] = [
            EarlyStoppingCallback(early_stopping_patience=2)
        ],
        activate_fp16: bool = False,
        activate_eval: bool = True,
        activate_save: bool = True,
        activate_logs: bool = False,
        activate_tensorboard: bool = False,
        activate_callbacks: bool = True,
    ) -> Trainer:
        # Ensure the training and validation sets
        if not all(split in dataset for split in ["train", "val"]):
            raise ValueError("DatasetDict must contain both 'train' and 'val' splits.")

        # Ensure the output and logging directories
        os.makedirs(output_dir, exist_ok=True)
        os.makedirs(logging_dir, exist_ok=True)

        start_time: float
        end_time: float
        elapsed: float

        # Set the random seed for reproducibility
        set_seed(seed)

        # Set the padding token for the tokenizer
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
        self.tokenizer.padding_side = "right"

        # Tokenize the dataset with `max_length` padding
        print(f"[TOKENIZE] {self.model_key} ({self.model_id})")
        start_time = time.time()
        tokenized_data: Dataset = dataset.map(
            QuestGenLLM.tokenize_dataset,
            batched=True,
            remove_columns=["text"],
            fn_kwargs={"tokenizer": self.tokenizer, "max_length": max_length},
        )
        end_time = time.time()
        elapsed = end_time - start_time
        print(f"[COMPLETE] Elapsed: {elapsed:.2f}s\n")

        # Set the model padding token (from the tokenizer)
        self.model.config.pad_token_id = self.tokenizer.pad_token_id

        # Turn off `use_cache` if `gradient_checkpointing` is on
        self.model.config.use_cache = not gradient_checkpointing

        # Set up the training configurations
        training_args: TrainingArguments = TrainingArguments(
            output_dir=(output_dir / self.model_key),
            learning_rate=learning_rate,
            per_device_train_batch_size=batch_size,
            per_device_eval_batch_size=batch_size,
            num_train_epochs=epochs,
            log_level=("info" if activate_logs else "error"),
            logging_steps=logging_steps,
            eval_strategy=("epoch" if activate_eval else "no"),
            save_strategy=("epoch" if activate_save else "no"),
            logging_dir=(logging_dir / self.model_key),
            save_total_limit=2,
            eval_accumulation_steps=2,
            gradient_accumulation_steps=2,
            gradient_checkpointing=gradient_checkpointing,
            fp16=(self.fp16_available and activate_fp16),
            load_best_model_at_end=load_best_model_at_end,
            metric_for_best_model="eval_loss",
            seed=seed,
            report_to=("tensorboard" if activate_tensorboard else "none"),
            label_names=["labels"],
        )

        # Set up the data collator for the model
        data_collator: DataCollatorForLanguageModeling = (
            DataCollatorForLanguageModeling(tokenizer=self.tokenizer, mlm=False)
        )

        # Prepare and run the trainer
        trainer: Trainer = Trainer(
            model=self.model,
            args=training_args,
            data_collator=data_collator,
            train_dataset=tokenized_data["train"],
            eval_dataset=(tokenized_data["val"] if activate_eval else None),
            callbacks=(callbacks if activate_callbacks else []),
        )

        print(f"[FINETUNE] {self.model_key} ({self.model_id})")
        start_time: float = time.time()
        trainer.train()
        end_time: float = time.time()
        elapsed: float = end_time - start_time
        print(f"[COMPLETE] Elapsed: {elapsed:.2f}s\n")

        # Save the model and tokenizer for later use
        if activate_save:
            trainer.save_model()
            self.tokenizer.save_pretrained(save_directory=training_args.output_dir)

        return trainer

    @staticmethod
    def tokenize_dataset(
        examples: dict[str, list[str]],
        tokenizer: PreTrainedTokenizer | PreTrainedTokenizerFast,
        max_length: int = MAX_LENGTH,
    ) -> dict[str, list[list[int]]]:
        encodings: BatchEncoding = tokenizer(
            examples["text"],
            padding="longest",
            truncation=True,
            max_length=max_length,
            return_tensors="pt",
        )

        input_ids: list[list[int]] = encodings["input_ids"]
        attention_mask: list[list[int]] = encodings["attention_mask"]

        labels: list[list[int]] = input_ids.clone()
        labels[input_ids == tokenizer.pad_token_id] = -100

        return {
            "input_ids": input_ids.tolist(),
            "attention_mask": attention_mask.tolist(),
            "labels": labels.tolist(),
        }

    def to_dict(self) -> dict[str, Any]:
        return {
            "model_key": self.model_key,
            "model_id": self.model_id,
            "device": self.device,
            "dtype": self.dtype,
            "vocab_size": getattr(self.tokenizer, "vocab_size", "unknown"),
            "max_length": getattr(self.tokenizer, "model_max_length", "unknown"),
            "model_type": getattr(
                getattr(self.model, "config", None), "model_type", "unknown"
            ),
            "num_parameters": self.model.num_parameters()
            if hasattr(self.model, "num_parameters")
            else "N/A",
            "fp16_available": self.fp16_available,
        }

    def clear_cache(self, cache_dir: PathLike = get_cache_dirpath("models")) -> None:
        def remove_dir(dir_path: PathLike) -> None:
            if os.path.exists(dir_path):
                shutil.rmtree(dir_path)
                print(f"Cache directory '{dir_path}' removed.")
            else:
                print(f"No cache directory found at '{dir_path}'.")

        remove_dir(cache_dir / self.model_key)

    def print_model_information(self) -> None:
        print(json.dumps(self.to_dict(), indent=2))

    def __str__(self) -> str:
        return f"{self.model_key} ({self.model_id})"

In [10]:
# Download the GPT-2 Base model
gpt2_base_model: QuestGenLLM = QuestGenLLM.from_pretrained(
    model_key="gpt2", model_id=MODEL_IDENTIFIERS["gpt2"]
)
gpt2_base_model

[DOWNLOAD] gpt2 (openai-community/gpt2)
[LoRAINFO] trainable params: 1,179,648 || all params: 83,152,128 || trainable%: 1.4187
[COMPLETE] "gpt2" ready in 10.91s.



QuestGenLLM(tokenizer=GPT2TokenizerFast(name_or_path='openai-community/gpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>'}, clean_up_tokenization_spaces=False, added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}
), model=PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GPT2LMHeadModel(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 768)
        (wpe): Embedding(1024, 768)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-11): 12 x GPT2Block(
            (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=768, out_features=2304, bias=True)
 

In [11]:
# Build and train the GPT-2 Base model with the quest data
gpt2_base_trainer: Trainer = gpt2_base_model.tokenize_and_train(quest_set)
gpt2_base_trainer

[TOKENIZE] gpt2 (openai-community/gpt2)


Map:   0%|          | 0/2486 [00:00<?, ? examples/s]

[COMPLETE] Elapsed: 0.36s

[FINETUNE] gpt2 (openai-community/gpt2)


Epoch,Training Loss,Validation Loss
1,2.2405,


[COMPLETE] Elapsed: 2446.42s



<transformers.trainer.Trainer at 0x7fe3306b1a90>

In [12]:
# Download the GPT-2 Medium model
gpt2_medium_model: QuestGenLLM = QuestGenLLM.from_pretrained(
    model_key="gpt2-medium", model_id=MODEL_IDENTIFIERS["gpt2-medium"]
)
gpt2_medium_model

[DOWNLOAD] gpt2-medium (openai-community/gpt2-medium)
[LoRAINFO] trainable params: 3,145,728 || all params: 206,973,952 || trainable%: 1.5199
[COMPLETE] "gpt2-medium" ready in 32.62s.



QuestGenLLM(tokenizer=GPT2TokenizerFast(name_or_path='openai-community/gpt2-medium', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>'}, clean_up_tokenization_spaces=False, added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}
), model=PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GPT2LMHeadModel(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 1024)
        (wpe): Embedding(1024, 1024)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-23): 24 x GPT2Block(
            (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=1024, out_features=3072, b

In [13]:
# Build and train the GPT-2 Medium model with the quest data
gpt2_medium_trainer: Trainer = gpt2_medium_model.tokenize_and_train(quest_set)
gpt2_medium_trainer

[TOKENIZE] gpt2-medium (openai-community/gpt2-medium)


Map:   0%|          | 0/19954 [00:00<?, ? examples/s]

Map:   0%|          | 0/2486 [00:00<?, ? examples/s]

[COMPLETE] Elapsed: 1.59s

[FINETUNE] gpt2-medium (openai-community/gpt2-medium)
{'loss': 4.7756, 'grad_norm': 1.684780240058899, 'learning_rate': 4.994988473489026e-06, 'epoch': 0.0010023053021950487}
{'loss': 4.7976, 'grad_norm': 4.057989597320557, 'learning_rate': 4.98997694697805e-06, 'epoch': 0.0020046106043900974}
{'loss': 4.8453, 'grad_norm': 2.278386354446411, 'learning_rate': 4.984965420467074e-06, 'epoch': 0.0030069159065851457}
{'loss': 4.8711, 'grad_norm': 1.7902874946594238, 'learning_rate': 4.9799538939560996e-06, 'epoch': 0.004009221208780195}
{'loss': 5.1045, 'grad_norm': 2.8318560123443604, 'learning_rate': 4.974942367445124e-06, 'epoch': 0.005011526510975243}
{'loss': 4.4405, 'grad_norm': 2.982595443725586, 'learning_rate': 4.969930840934149e-06, 'epoch': 0.006013831813170291}
{'loss': 4.8737, 'grad_norm': 1.2856559753417969, 'learning_rate': 4.9649193144231735e-06, 'epoch': 0.00701613711536534}
{'loss': 4.6095, 'grad_norm': 1.283393144607544, 'learning_rate': 4.95990

<transformers.trainer.Trainer at 0x7fe30dda9d10>

In [14]:
# Download the GPT-2 Large model
gpt2_large_model: QuestGenLLM = QuestGenLLM.from_pretrained(
    model_key="gpt2-large", model_id=MODEL_IDENTIFIERS["gpt2-large"]
)
gpt2_large_model

[DOWNLOAD] gpt2-large (openai-community/gpt2-large)
[LoRAINFO] trainable params: 5,898,240 || all params: 426,033,920 || trainable%: 1.3845
[COMPLETE] "gpt2-large" ready in 62.60s.



QuestGenLLM(tokenizer=GPT2TokenizerFast(name_or_path='openai-community/gpt2-large', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>'}, clean_up_tokenization_spaces=False, added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}
), model=PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GPT2LMHeadModel(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 1280)
        (wpe): Embedding(1024, 1280)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-35): 36 x GPT2Block(
            (ln_1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=1280, out_features=3840, bi

In [15]:
# Build and train the GPT-2 Large model with the quest data
gpt2_large_trainer: Trainer = gpt2_large_model.tokenize_and_train(quest_set)
gpt2_large_trainer

[TOKENIZE] gpt2-large (openai-community/gpt2-large)


Map:   0%|          | 0/19954 [00:00<?, ? examples/s]

Map:   0%|          | 0/2486 [00:00<?, ? examples/s]

[COMPLETE] Elapsed: 1.42s

[FINETUNE] gpt2-large (openai-community/gpt2-large)
{'loss': 4.2989, 'grad_norm': 1.9841364622116089, 'learning_rate': 4.994988473489026e-06, 'epoch': 0.0010023053021950487}
{'loss': 4.3669, 'grad_norm': 3.23563814163208, 'learning_rate': 4.98997694697805e-06, 'epoch': 0.0020046106043900974}
{'loss': 4.5519, 'grad_norm': 4.506359100341797, 'learning_rate': 4.984965420467074e-06, 'epoch': 0.0030069159065851457}
{'loss': 4.5961, 'grad_norm': 2.799888849258423, 'learning_rate': 4.9799538939560996e-06, 'epoch': 0.004009221208780195}
{'loss': 4.4529, 'grad_norm': 3.559256076812744, 'learning_rate': 4.974942367445124e-06, 'epoch': 0.005011526510975243}
{'loss': 4.1272, 'grad_norm': 5.287730693817139, 'learning_rate': 4.969930840934149e-06, 'epoch': 0.006013831813170291}
{'loss': 4.5184, 'grad_norm': 2.398099660873413, 'learning_rate': 4.9649193144231735e-06, 'epoch': 0.00701613711536534}
{'loss': 4.2055, 'grad_norm': 2.139139413833618, 'learning_rate': 4.9599077879

<transformers.trainer.Trainer at 0x7fe30c4825d0>

In [16]:
# Download the Llama 3.2 model
llama32_model: QuestGenLLM = QuestGenLLM.from_pretrained(
    model_key="llama-3.2-1b-instruct",
    model_id=MODEL_IDENTIFIERS["llama-3.2-1b-instruct"],
)
llama32_model

[DOWNLOAD] llama-3.2-1b-instruct (meta-llama/Llama-3.2-1B-Instruct)
[LoRAINFO] trainable params: 5,636,096 || all params: 754,911,232 || trainable%: 0.7466
[COMPLETE] "llama-3.2-1b-instruct" ready in 52.69s.



QuestGenLLM(tokenizer=PreTrainedTokenizerFast(name_or_path='meta-llama/Llama-3.2-1B-Instruct', vocab_size=128000, model_max_length=131072, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|begin_of_text|>', 'eos_token': '<|eot_id|>'}, clean_up_tokenization_spaces=True, added_tokens_decoder={
	128000: AddedToken("<|begin_of_text|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128001: AddedToken("<|end_of_text|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128002: AddedToken("<|reserved_special_token_0|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128003: AddedToken("<|reserved_special_token_1|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128004: AddedToken("<|finetune_right_pad_id|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128005: AddedToken("<|reserved

In [17]:
# Build and train the Llama 3.2 model with the quest data
llama32_trainer: Trainer = llama32_model.tokenize_and_train(quest_set)
llama32_trainer

[TOKENIZE] llama-3.2-1b-instruct (meta-llama/Llama-3.2-1B-Instruct)


Map:   0%|          | 0/19954 [00:00<?, ? examples/s]

Map:   0%|          | 0/2486 [00:00<?, ? examples/s]

[COMPLETE] Elapsed: 1.85s

[FINETUNE] llama-3.2-1b-instruct (meta-llama/Llama-3.2-1B-Instruct)
{'loss': 5.5898, 'grad_norm': 8.2940673828125, 'learning_rate': 4.994988473489026e-06, 'epoch': 0.0010023053021950487}
{'loss': 5.0065, 'grad_norm': 7.0037760734558105, 'learning_rate': 4.98997694697805e-06, 'epoch': 0.0020046106043900974}
{'loss': 5.1688, 'grad_norm': 10.214489936828613, 'learning_rate': 4.984965420467074e-06, 'epoch': 0.0030069159065851457}
{'loss': 5.8673, 'grad_norm': 5.771705627441406, 'learning_rate': 4.9799538939560996e-06, 'epoch': 0.004009221208780195}
{'loss': 5.4176, 'grad_norm': 7.813205242156982, 'learning_rate': 4.974942367445124e-06, 'epoch': 0.005011526510975243}
{'loss': 5.3445, 'grad_norm': 6.220595359802246, 'learning_rate': 4.969930840934149e-06, 'epoch': 0.006013831813170291}
{'loss': 5.3704, 'grad_norm': 9.442450523376465, 'learning_rate': 4.9649193144231735e-06, 'epoch': 0.00701613711536534}
{'loss': 4.9348, 'grad_norm': 7.940437316894531, 'learning_rat

config.json:   0%|          | 0.00/877 [00:00<?, ?B/s]

{'eval_loss': nan, 'eval_runtime': 357.4155, 'eval_samples_per_second': 6.955, 'eval_steps_per_second': 6.955, 'epoch': 1.0}
{'train_runtime': 9916.5914, 'train_samples_per_second': 2.012, 'train_steps_per_second': 1.006, 'train_loss': 2.152799887397646, 'epoch': 1.0}
[COMPLETE] Elapsed: 9916.80s



<transformers.trainer.Trainer at 0x7fe30c482ad0>

In [18]:
# Download the TinyLlama model
tinyllama_model: QuestGenLLM = QuestGenLLM.from_pretrained(
    model_key="tinyllama-1.1b-chat", model_id=MODEL_IDENTIFIERS["tinyllama-1.1b-chat"]
)
tinyllama_model

[DOWNLOAD] tinyllama-1.1b-chat (TinyLlama/TinyLlama-1.1B-Chat-v1.0)
[LoRAINFO] trainable params: 6,307,840 || all params: 621,914,112 || trainable%: 1.0143
[COMPLETE] "tinyllama-1.1b-chat" ready in 46.91s.



QuestGenLLM(tokenizer=LlamaTokenizerFast(name_or_path='TinyLlama/TinyLlama-1.1B-Chat-v1.0', vocab_size=32000, model_max_length=2048, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '</s>'}, clean_up_tokenization_spaces=False, added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
), model=PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(32000, 2048)
        (layers): ModuleList(
          (0-21): 22 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear4bit(
                (b

In [19]:
# Build and train the TinyLlama model with the quest data
tinyllama_trainer: Trainer = tinyllama_model.tokenize_and_train(quest_set)
tinyllama_trainer

[TOKENIZE] tinyllama-1.1b-chat (TinyLlama/TinyLlama-1.1B-Chat-v1.0)


Map:   0%|          | 0/19954 [00:00<?, ? examples/s]

Map:   0%|          | 0/2486 [00:00<?, ? examples/s]

[COMPLETE] Elapsed: 1.59s

[FINETUNE] tinyllama-1.1b-chat (TinyLlama/TinyLlama-1.1B-Chat-v1.0)
{'loss': 4.6219, 'grad_norm': 5.794121742248535, 'learning_rate': 4.994988473489026e-06, 'epoch': 0.0010023053021950487}
{'loss': 4.3511, 'grad_norm': 8.858926773071289, 'learning_rate': 4.98997694697805e-06, 'epoch': 0.0020046106043900974}
{'loss': 4.4122, 'grad_norm': 8.283576965332031, 'learning_rate': 4.984965420467074e-06, 'epoch': 0.0030069159065851457}
{'loss': 4.7192, 'grad_norm': 5.620419979095459, 'learning_rate': 4.9799538939560996e-06, 'epoch': 0.004009221208780195}
{'loss': 4.6064, 'grad_norm': 7.443430423736572, 'learning_rate': 4.974942367445124e-06, 'epoch': 0.005011526510975243}
{'loss': 4.2972, 'grad_norm': 6.061771392822266, 'learning_rate': 4.969930840934149e-06, 'epoch': 0.006013831813170291}
{'loss': 4.413, 'grad_norm': 6.0641398429870605, 'learning_rate': 4.9649193144231735e-06, 'epoch': 0.00701613711536534}
{'loss': 4.259, 'grad_norm': 6.6521453857421875, 'learning_rat

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

{'eval_loss': nan, 'eval_runtime': 910.8783, 'eval_samples_per_second': 2.729, 'eval_steps_per_second': 2.729, 'epoch': 1.0}
{'train_runtime': 24446.6671, 'train_samples_per_second': 0.816, 'train_steps_per_second': 0.408, 'train_loss': 1.614406400062857, 'epoch': 1.0}
[COMPLETE] Elapsed: 24446.97s



<transformers.trainer.Trainer at 0x7fe2dff7a5d0>

In [None]:
tinyllama_model.model.eval()
with torch.no_grad():
    inputs = tinyllama_model.tokenizer(
        "I am thou, thou art i", return_tensors="pt", padding=True
    ).to(tinyllama_model.model.device)
    inputs["labels"] = inputs["input_ids"].clone()
    outputs = tinyllama_model.model(**inputs)
    print(outputs.loss)

tensor(5.4469, device='cuda:0')
