# Fine-tuning Mistral-7B-Instruct with QLoRA on YouTube Transcripts

This notebook performs QLoRA fine-tuning on `mistralai/Mistral-7B-Instruct-v0.3` using a dataset derived from YouTube video transcripts.

**Steps:**
1. Installs necessary libraries.
2. Sets up Hugging Face Hub authentication.
3. Loads and prepares the dataset (`train.jsonl`).
4. Configures the QLoRA parameters and loads the base model in 4-bit.
5. Sets up the `SFTTrainer` from the TRL library.
6. Runs the fine-tuning process.
7. Saves the trained LoRA adapter locally.
8. (Optional) Pushes the adapter to the Hugging Face Hub.
9. (Optional) Performs basic evaluation (Perplexity, ROUGE-L).

## 1. Setup & Installs

Install the required libraries. `bitsandbytes` requires a specific version compatible with Colab's GPU environment (usually T4 or A100).

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# 1. Install core libraries
!pip install -q transformers==4.38.2 datasets==2.18.0 accelerate==0.27.2 peft==0.9.0 trl==0.7.11 torch torchvision torchaudio sentencepiece py7zr ninja huggingface_hub evaluate rouge_score pyyaml triton==3.2.0 bitsandbytes

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/130.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m130.7/130.7 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.5/8.5 MB[0m [31m80.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m39.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m26.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.9/190.9 kB[0m [31m17.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m155.3/155.3 kB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m2.4 MB/s[0m eta 

## 2. Hugging Face Hub Authentication

Log in to Hugging Face Hub to save the adapter and potentially download gated models. You'll need a User Access Token with `write` permissions.

Get your token here: https://huggingface.co/settings/tokens

In [None]:
from huggingface_hub import login, notebook_login
# Use notebook_login() for interactive login in Colab/Jupyter
# or login("YOUR_HF_TOKEN") if running in a script
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## 3. Load and Prepare Dataset

Upload your `train.jsonl` file (generated by `data_gen.py`) to your Colab session. You can do this using the file browser on the left panel.

Alternatively, if you've pushed it to a Hugging Face dataset repository, you can load it directly from there.

In [None]:
cd /content/drive/MyDrive/mistral_finetuning

/content/drive/MyDrive/mistral_finetuning


In [None]:
import os
import torch
from datasets import load_dataset, DatasetDict # Import DatasetDict
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer
import evaluate # For ROUGE score
import numpy as np

# --- Configuration ---
# Model and Tokenizer
base_model_name = "mistralai/Mistral-7B-Instruct-v0.3"

# Dataset paths (ensure these files are uploaded to Colab)
train_dataset_path = "train.jsonl"
test_dataset_path = "test.jsonl" # Path to the test split

# Option 2: Load from Hugging Face Hub (replace with your repo ID if you pushed the dataset)
# dataset_hub_id = "your_username/your_dataset_repo_name"
# dataset_files = {"train": "train.jsonl", "test": "test.jsonl"}

# QLoRA config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4", # Recommended
    bnb_4bit_compute_dtype=torch.bfloat16, # Use bfloat16 for faster training
    bnb_4bit_use_double_quant=True, # Recommended
)

# LoRA config
peft_config = LoraConfig(
    r=8,                 # LoRA attention dimension (rank)
    lora_alpha=16,       # Alpha parameter for scaling
    lora_dropout=0.1,   # Dropout probability for LoRA layers
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[ # Find target modules using script below or common sense
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        # "gate_proj", # Optional
        # "up_proj",   # Optional
        # "down_proj", # Optional
    ],
)

# Training arguments
output_dir = "./mistral-qlora-adapter_run4" # Local directory to save adapter
per_device_train_batch_size = 2
gradient_accumulation_steps = 8
# num_train_epochs = 1.0 # Can use epochs or max_steps
max_steps = 175 # Adjust based on dataset size and desired training time (~200-400 recommended)
learning_rate = 1e-4
optim = "paged_adamw_32bit" # Recommended optimizer for QLoRA
logging_steps = 10
save_steps = 25 # Save checkpoints periodically
max_grad_norm = 0.3
warmup_ratio = 0.03
lr_scheduler_type = "constant" # Or "cosine", "linear"
evaluation_strategy = "steps" # Evaluate during training using the test set
eval_steps = 25             # Evaluate every N steps
# report_to="tensorboard" # Or wandb

# SFT Trainer specific
max_seq_length = MAX_CHUNK_TOKENS = 512 # Defined in data_gen.py, ensure consistency
packing = False # Set to True if you want to pack sequences, requires more memory

# Hugging Face Hub repo ID (optional)
hf_hub_repo_id = "your_username/mistral-7b-instruct-youtube-qlora" # CHANGE THIS to your HF username/repo name

# --- Load Dataset ---
train_dataset = None
eval_dataset = None

try:
    # Check if local files exist
    if os.path.exists(train_dataset_path) and os.path.exists(test_dataset_path):
        print(f"Loading dataset from local files: {train_dataset_path}, {test_dataset_path}")
        # Load both files into a DatasetDict
        dataset = load_dataset('json', data_files={'train': train_dataset_path, 'test': test_dataset_path})
        train_dataset = dataset['train']
        eval_dataset = dataset['test'] # Use the 'test' split for evaluation
        print(f"Datasets loaded: Train size={len(train_dataset)}, Eval size={len(eval_dataset)}")
    # elif dataset_hub_id: # Option to load from Hub
    #     print(f"Local files not found. Attempting to load from Hub: {dataset_hub_id}")
    #     dataset = load_dataset(dataset_hub_id, data_files=dataset_files)
    #     train_dataset = dataset['train']
    #     eval_dataset = dataset['test']
    #     print(f"Datasets loaded from Hub: Train size={len(train_dataset)}, Eval size={len(eval_dataset)}")
    else:
        missing_files = []
        if not os.path.exists(train_dataset_path): missing_files.append(train_dataset_path)
        if not os.path.exists(test_dataset_path): missing_files.append(test_dataset_path)
        raise FileNotFoundError(f"Dataset file(s) not found. Please upload: {', '.join(missing_files)}")

except Exception as e:
    print(f"Error loading dataset: {e}")
    # Stop execution if datasets aren't loaded
    # exit()

# Ensure evaluation strategy is set correctly if eval_dataset exists
if eval_dataset is None:
    evaluation_strategy = "no"
    eval_steps = None
    print("Warning: No evaluation dataset loaded. Disabling evaluation during training.")
else:
    # Keep evaluation_strategy and eval_steps as defined earlier
    print("Evaluation dataset loaded. Evaluation during training is enabled.")


# --- Format dataset for SFTTrainer ---
# Mistral Instruct format:
# <s>[INST] Instruction [/INST] Answer </s>
# We need a function that takes a sample and returns a formatted string.

def format_instruction(sample):
    # Uses the 'instruction', 'input', and 'output' fields from train.jsonl/test.jsonl
    # 'input' contains the original transcript chunk
    # 'output' contains the LLM-generated answer
    instruction = sample['instruction']
    context = sample['input'] # The transcript chunk
    response = sample['output'] # The LLM-generated answer

    # Combine instruction and context for the prompt
    prompt = f"{instruction}\n---\n{context}\n---" # Separators help delineate

    # Format according to Mistral Instruct template
    return [f"<s>[INST] {prompt} [/INST] {response} </s>"]

print("Dataset formatting function defined.")
# Example of formatted text:
if train_dataset and len(train_dataset) > 0:
    print("\nExample formatted training sample:")
    print(format_instruction(train_dataset[0]))
else:
    print("Train dataset is empty or not loaded, cannot show example.")

Loading dataset from local files: train.jsonl, test.jsonl
Datasets loaded: Train size=2900, Eval size=724
Evaluation dataset loaded. Evaluation during training is enabled.
Dataset formatting function defined.

Example formatted training sample:
["<s>[INST] According to the segment, what are the reasons and benefits of doing a mini cut in natural bodybuilding?\n---\nprobably super hungry um for the next six weeks you don't necessarily get more hungry or at least that's what I would infer based on the hormonal results from this study and it sort of does actually match my own personal experience with dieting now it is worth noting that in the study that I just referenced um the subjects started their cut very lean which is normal in natural bodybuilding uh and I think that their average was 99.6% body fat so it sort of Still Remains to be seen whether someone starting at say 12 to 15% would experience those same changes in Gin and leptin uh as someone who was starting leaner and then you 

## 4. Load Model and Tokenizer with QLoRA Config

Load the base model (`Mistral-7B-Instruct-v0.3`) with 4-bit quantization using the `BitsAndBytesConfig`. We also load the corresponding tokenizer.

In [None]:
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token # Set pad token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training
print("Tokenizer loaded.")

# Load Model with QLoRA config
print(f"Loading base model: {base_model_name} with 4-bit quantization...")
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    device_map="auto", # Automatically map layers to GPU
    trust_remote_code=True, # Necessary for some models
    # torch_dtype=torch.bfloat16, # dtype is set in bnb_config
)
print("Base model loaded.")

# --- Sanity Check: Find LoRA Target Modules ---
# Uncomment the following lines to see all linear layer names
# This helps verify the `target_modules` in LoraConfig
# print("\nModel Architecture:")
# print(model)
# print("\nFinding potential LoRA target modules (Linear layers):")
# linear_layers = set()
# for name, module in model.named_modules():
#     if isinstance(module, torch.nn.Linear):
#          #Focus on layers typically targeted by LoRA in transformers
#          if any(layer_name in name for layer_name in ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']):
#              # Get the last part of the name (e.g., 'q_proj')
#              layer_name = name.split('.')[-1]
#              linear_layers.add(layer_name)
# print(f"Found linear layer names: {linear_layers}")
# print(f"Using target modules: {peft_config.target_modules}")
# print("Ensure these match the typical layers for Mistral architecture.")

# --- Prepare model for k-bit training ---
# Cast layer norms and head to fp32 for stability
# model = prepare_model_for_kbit_training(model) # TRL's SFTTrainer handles this

# --- Create PEFT Model ---
# Note: SFTTrainer can also handle PEFT model creation if peft_config is passed
# Creating it explicitly here for clarity
# print("\nApplying LoRA adapter to the base model...")
# model = get_peft_model(model, peft_config)
# print("LoRA adapter applied.")
# model.print_trainable_parameters()

# Configure cache usage (optional, but recommended)
model.config.use_cache = False # Important for training stability with gradient checkpointing
# model.config.pretraining_tp = 1 # If you face tensor parallelism issues
print("Model prepared for training.")

Tokenizer loaded.
Loading base model: mistralai/Mistral-7B-Instruct-v0.3 with 4-bit quantization...


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Base model loaded.
Model prepared for training.


## 5. Configure SFTTrainer

We use the `SFTTrainer` from the TRL library, which simplifies the process of supervised fine-tuning for instruction-following tasks.

In [None]:
training_arguments = TrainingArguments(
        output_dir=output_dir,
        per_device_train_batch_size=per_device_train_batch_size,
        gradient_accumulation_steps=gradient_accumulation_steps,
        optim=optim,
        save_steps=save_steps,
        logging_steps=logging_steps,
        learning_rate=learning_rate,
        # num_train_epochs=num_train_epochs,
        max_steps=max_steps,
        fp16=False, # Use bf16 if available (Ampere GPUs like A100)
        bf16=True, # Set to True for Ampere GPUs, False for T4 (if bnb_compute_dtype is bfloat16)
        max_grad_norm=max_grad_norm,
        warmup_ratio=warmup_ratio,
        group_by_length=True, # Speeds up training by grouping similar length sequences
        lr_scheduler_type=lr_scheduler_type,
        # Evaluation settings (only if eval_dataset is provided)
        evaluation_strategy=evaluation_strategy, # Use evaluation_strategy for transformers 4.38.2
        eval_steps=eval_steps,
        report_to="none",
        # Pushing to Hub options
        # push_to_hub=True, # Set to True to push model/adapter during training
        # hub_model_id=hf_hub_repo_id, # Repository name on Hugging Face Hub
        # hub_strategy="checkpoint", # Push on every save
        # hub_token=os.getenv("HF_TOKEN") # Use token stored in environment or login()
    )

print("Training Arguments configured.")

trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset, # Pass evaluation dataset here
    peft_config=peft_config, # Pass PEFT config here
    # dataset_text_field="text", # Use if you pre-formatted into a 'text' column
    formatting_func=format_instruction, # Pass the formatting function
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

print("SFTTrainer initialized.")
# TRL automatically handles prepare_model_for_kbit_training when peft_config is passed
# model.print_trainable_parameters()

# Verify bf16 setting based on GPU availability
if torch.cuda.is_bf16_supported():
    print("\nBF16 is supported. Training will use BF16.")
    if not training_arguments.bf16:
      print("Warning: BF16 supported but not enabled in TrainingArguments. Enabling it.")
      training_arguments.bf16 = True
      training_arguments.fp16 = False # Ensure fp16 is off if bf16 is on
else:
    print("\nBF16 is NOT supported. Ensure compute_dtype in BitsAndBytesConfig is appropriate (e.g., float16) and bf16=False in TrainingArguments.")
    if training_arguments.bf16:
        print("Warning: BF16 is not supported, but bf16=True in TrainingArguments. Setting bf16=False and fp16=True.")
        training_arguments.bf16 = False
        training_arguments.fp16 = True # Fallback to fp16 if bf16 not available

# Re-initialize trainer if arguments changed (e.g., bf16 status)
# This might not be strictly necessary as args are references, but safer
trainer.args = training_arguments
print("Trainer arguments updated based on hardware support.")

Training Arguments configured.
SFTTrainer initialized.

BF16 is supported. Training will use BF16.
Trainer arguments updated based on hardware support.


## 6. Start Fine-tuning

Launch the training process. This will take some time depending on the dataset size, `max_steps`, and the Colab GPU assigned (T4 is slower than A100). Aiming for < 2 hours on a T4.

In [None]:
print("Starting fine-tuning...")
train_result = trainer.train()
print("Fine-tuning finished.")

# --- Log Training Metrics ---
metrics = train_result.metrics
metrics["train_samples"] = len(train_dataset)
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
print("Training metrics saved.")

Starting fine-tuning...


Step,Training Loss,Validation Loss
25,0.322,2.660602
50,0.0116,3.871646
75,0.0074,4.02056
100,0.0053,3.682944
125,0.0047,3.59101
150,0.0036,3.584726
175,0.0032,3.703533



Invalid credentials in Authorization header - silently ignoring the lookup for the file config.json in mistralai/Mistral-7B-Instruct-v0.3.

Invalid credentials in Authorization header - silently ignoring the lookup for the file config.json in mistralai/Mistral-7B-Instruct-v0.3.

Invalid credentials in Authorization header - silently ignoring the lookup for the file config.json in mistralai/Mistral-7B-Instruct-v0.3.

Invalid credentials in Authorization header - silently ignoring the lookup for the file config.json in mistralai/Mistral-7B-Instruct-v0.3.

Invalid credentials in Authorization header - silently ignoring the lookup for the file config.json in mistralai/Mistral-7B-Instruct-v0.3.

Invalid credentials in Authorization header - silently ignoring the lookup for the file config.json in mistralai/Mistral-7B-Instruct-v0.3.

Invalid credentials in Authorization header - silently ignoring the lookup for the file config.json in mistralai/Mistral-7B-Instruct-v0.3.


Fine-tuning finished.
***** train metrics *****
  epoch                    =      175.0
  total_flos               = 21390874GF
  train_loss               =     0.0593
  train_runtime            = 0:03:28.11
  train_samples            =       2900
  train_samples_per_second =     13.454
  train_steps_per_second   =      0.841
Training metrics saved.


## 7. Save Adapter Locally

Save the trained QLoRA adapter weights to the specified output directory.

In [None]:
print(f"Saving LoRA adapter to {output_dir}...")
trainer.save_model(output_dir) # Saves the adapter config and weights
print(f"Adapter saved locally to {output_dir}")

# Optional: Save the tokenizer as well (good practice)
tokenizer.save_pretrained(output_dir)
print(f"Tokenizer saved locally to {output_dir}")

Saving LoRA adapter to ./mistral-qlora-adapter_run4...
Adapter saved locally to ./mistral-qlora-adapter_run4
Tokenizer saved locally to ./mistral-qlora-adapter_run4



Invalid credentials in Authorization header - silently ignoring the lookup for the file config.json in mistralai/Mistral-7B-Instruct-v0.3.


## 8. (Optional) Push Adapter to Hugging Face Hub

Push the trained adapter and tokenizer to your Hugging Face Hub repository for easy sharing and loading later.

In [None]:
# Make sure hf_hub_repo_id is set correctly
push_to_hub = True # Set to False if you don't want to push

if push_to_hub:
    print(f"Pushing adapter and tokenizer to Hugging Face Hub repo: {hf_hub_repo_id}...")
    try:
        # Push the adapter (trainer saves adapter to output_dir)
        trainer.model.push_to_hub(hf_hub_repo_id, use_auth_token=True)

        # Push the tokenizer
        tokenizer.push_to_hub(hf_hub_repo_id, use_auth_token=True)

        print("Successfully pushed to Hub.")
    except Exception as e:
        print(f"Error pushing to Hub: {e}")
else:
    print("Skipping push to Hugging Face Hub.")

Pushing adapter and tokenizer to Hugging Face Hub repo: EricBlv/mistral-7b-instruct-youtube-qlora...




adapter_model.safetensors:   0%|          | 0.00/13.7M [00:00<?, ?B/s]

No files have been modified since last commit. Skipping to prevent empty commit.


Successfully pushed to Hub.


## 9. (Optional) Evaluation

Perform evaluation on the held-out test set (if created) to calculate Perplexity and ROUGE scores.

In [None]:
import math
import torch # Ensure torch is imported if not already
import evaluate # Ensure evaluate is imported

# Check if eval_dataset was loaded successfully earlier
if 'eval_dataset' in locals() and eval_dataset: # More robust check
    print("\nStarting evaluation on the test set...")

    # --- Perplexity ---
    # The trainer.evaluate() function uses the eval_dataset passed during init
    try:
        # Ensure metrics dictionary exists from training results
        if 'metrics' not in locals():
             metrics = {} # Initialize if running eval separately

        print("Running trainer.evaluate() for perplexity...")
        eval_metrics = trainer.evaluate() # This runs evaluation on eval_dataset
        perplexity = math.exp(eval_metrics["eval_loss"])
        print(f"Evaluation Loss (on test set): {eval_metrics['eval_loss']:.4f}")
        print(f"Perplexity (on test set): {perplexity:.4f}")
        # Save eval metrics
        metrics["eval_perplexity"] = perplexity
        # Ensure trainer object exists before logging/saving
        if 'trainer' in locals():
            trainer.log_metrics("eval", eval_metrics)
            trainer.save_metrics("eval", eval_metrics)
            print("Evaluation metrics saved.")
        else:
            print("Warning: Trainer object not found, skipping metric logging/saving.")
    except Exception as e:
        print(f"Could not calculate perplexity during evaluation: {e}")

    # --- ROUGE Score (More involved for generative tasks) ---
    # Requires generating predictions and comparing them to references.

    # --- ROUGE Score ---
    print("\nAttempting to calculate ROUGE score...")
    rouge_scorer = None
    try:
        print("Loading ROUGE scorer using evaluate.load('rouge')...")
        rouge_scorer = evaluate.load('rouge')
        print("ROUGE scorer loaded successfully.")
    except Exception as e:
        # Print the full exception clearly
        print("\n!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
        print(f"!!! FAILED TO LOAD ROUGE SCORER !!!")
        print(f"!!! Error Type: {type(e).__name__}")
        print(f"!!! Error Details: {e}")
        print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n")
        print("Skipping ROUGE score calculation.")
        # rouge_scorer remains None
        # rouge_scorer remains None

    if rouge_scorer: # Proceed only if scorer loaded
        print("ROUGE scorer check passed.") # Added print
        # Ensure the model and tokenizer are available
        if 'model' not in locals() or 'tokenizer' not in locals():
             print("Error: Model or tokenizer not found. Cannot generate predictions for ROUGE.")
        else:
            print("Model and tokenizer found for ROUGE generation.") # Added print
            # Ensure the model is in evaluation mode and on the correct device
            model.eval() # Ensure evaluation mode
            if torch.cuda.is_available():
                device = torch.device("cuda")
            else:
                device = torch.device("cpu")
            print(f"Using device: {device} for ROUGE generation.") # Added print
            # Ensure model is on device (trainer usually handles this, but safe check)
            try:
                model.to(device)
            except Exception as e:
                print(f"Warning: Could not move model to device {device}: {e}")


            all_preds = []
            all_labels = []

            # Prepare inputs and get references (outputs) from the eval_dataset (test.jsonl)
            eval_batch_size = 4 # Adjust based on GPU memory
            print(f"Generating predictions for {len(eval_dataset)} test samples...")
            for i in range(0, len(eval_dataset), eval_batch_size):
                print(f"\nProcessing batch starting at index {i}...") # Added print for loop entry
                # Get batch indices
                indices = range(i, min(i + eval_batch_size, len(eval_dataset)))
                # Select dictionary items for the batch using indices
                # This ensures we get dictionaries even if direct iteration yields strings
                try:
                    batch_samples_dicts = [eval_dataset[idx] for idx in indices]
                    print(f"  Successfully accessed batch samples for indices {indices}.") # Added print
                except Exception as e:
                     print(f"  Error accessing eval_dataset batch at index {i}: {e}. Skipping batch.")
                     continue # Skip this batch if dataset access fails

                # Extract the prompt part (instruction + input) for generation
                prompts = []
                labels = []
                try:
                    for sample_idx, sample in enumerate(batch_samples_dicts): # Iterate over the list of dictionaries
                        instruction = sample['instruction'] # Should work now
                        context = sample['input']
                        response = sample['output'] # This is the reference
                        prompt_text = f"<s>[INST] {instruction}\n---\n{context}\n--- [/INST]" # Match training format
                        prompts.append(prompt_text)
                        labels.append(response) # The reference LLM-generated answer
                    print(f"  Successfully processed prompts and labels for batch.") # Added print
                except KeyError as e:
                     print(f"  Error: Missing key {e} in eval_dataset sample: {sample}. Skipping batch.")
                     continue # Skip batch if data format is wrong
                except Exception as e:
                     print(f"  Unexpected error processing batch sample {sample}: {e}. Skipping batch.")
                     continue

                # Tokenize prompts
                try:
                    inputs = tokenizer(prompts, return_tensors="pt", padding=True, truncation=True, max_length=max_seq_length).to(device)
                    print(f"  Successfully tokenized prompts for batch.") # Added print
                except Exception as e:
                    print(f"  Error tokenizing prompts for batch {i}: {e}. Skipping batch.")
                    continue

                # Generate predictions
                # Use the final model (potentially PEFT model)
                current_model_for_generation = getattr(trainer, 'model', model) # Use trainer's model if available
                try:
                    print(f"  Generating predictions for batch...") # Added print
                    with torch.no_grad():
                         # Adjust generation parameters as needed
                        outputs = current_model_for_generation.generate(
                            **inputs,
                            max_new_tokens=max_seq_length, # Allow generating up to max length
                            eos_token_id=tokenizer.eos_token_id,
                            pad_token_id=tokenizer.pad_token_id, # Explicitly set pad_token_id
                            do_sample=False, # Use greedy decoding for reproducible evaluation
                            num_beams=1 # for greedy
                        )
                    print(f"  Successfully generated predictions for batch.") # Added print
                except Exception as e:
                     print(f"  Error during model.generate for batch {i}: {e}. Skipping batch.")
                     continue

                # Decode generated sequences
                # Important: Decode *only the generated part*, not the prompt
                preds_decoded = []
                try:
                    print(f"  Decoding predictions for batch...") # Added print
                    for idx, output_tokens in enumerate(outputs):
                        # Find the end of the prompt tokens in the output
                        input_token_len = len(inputs["input_ids"][idx])
                        # Handle potential padding tokens in input_ids if tokenizer pads left
                        if tokenizer.padding_side == 'left':
                           # Ensure attention mask sum is calculated correctly
                           mask_sum = inputs["attention_mask"][idx].sum().item()
                           input_token_len = int(mask_sum) # Cast to int just in case

                        generated_tokens = output_tokens[input_token_len:]
                        pred = tokenizer.decode(generated_tokens, skip_special_tokens=True)
                        preds_decoded.append(pred.strip())
                    print(f"  Successfully decoded predictions for batch.") # Added print
                except Exception as e:
                    print(f"  Error decoding predictions for batch {i}: {e}. Skipping batch.")
                    continue # Skip batch if decoding fails

                all_preds.extend(preds_decoded)
                all_labels.extend(labels)

                # Removed progress print from here to avoid clutter, main check is loop entry

            # Compute ROUGE after processing all batches
            print("\nFinished processing all batches for ROUGE.") # Added print
            if len(all_preds) == len(all_labels) and len(all_preds) > 0: # Ensure we have pairs to compare
                try:
                    print(f"Computing ROUGE score for {len(all_preds)} generated predictions...")
                    rouge_results = rouge_scorer.compute(
                        predictions=all_preds,
                        references=all_labels
                    )
                    print("\nROUGE Scores (on test set):")
                    print(rouge_results)

                    # Add ROUGE-L to metrics dictionary
                    if 'rougeL' in rouge_results:
                        metrics["eval_rougeL"] = rouge_results['rougeL']
                        # Log and save updated metrics if trainer exists
                        if 'trainer' in locals():
                            trainer.log_metrics("eval", {"rougeL": rouge_results['rougeL']})
                            trainer.save_metrics("eval", metrics) # Save combined eval metrics
                            print("Evaluation metrics including ROUGE saved.")
                        else:
                             print("Warning: Trainer object not found, skipping ROUGE metric logging/saving.")

                except Exception as e:
                    print(f"Could not compute or save ROUGE scores: {e}")
                    # Print examples even if scoring fails
                    print("\nExample Prediction:", all_preds[0] if all_preds else "N/A")
                    print("Example Reference:", all_labels[0] if all_labels else "N/A")
            else:
                 print("\nWarning: No valid prediction/label pairs generated. Cannot compute ROUGE score.")
                 if len(all_preds) != len(all_labels):
                      print(f"Mismatch in length: Predictions={len(all_preds)}, Labels={len(all_labels)}")

    else: # rouge_scorer is None
         print("Skipping ROUGE score calculation because scorer failed to load.")


else:
    print("\nNo evaluation dataset loaded ('eval_dataset' not found or is None). Skipping evaluation step.")

print("\n--- Training and Evaluation Complete ---")

# Final check for output directory and optional Hub push info
if 'output_dir' in locals():
    print(f"Adapter saved in: {output_dir}")
if 'push_to_hub' in locals() and push_to_hub and 'hf_hub_repo_id' in locals() and hf_hub_repo_id:
    print(f"Adapter pushed to: https://huggingface.co/{hf_hub_repo_id}")

# Clean up memory (important in Colab)
# Consider uncommenting these if you face memory issues later
# print("\nAttempting memory cleanup...")
# try:
#     del model
#     del trainer
#     import gc
#     gc.collect()
#     torch.cuda.empty_cache()
#     print("Memory cleanup attempted.")
# except NameError:
#     print("Model or trainer not defined, skipping specific cleanup.")
# except Exception as e:
#      print(f"Error during memory cleanup: {e}")


Starting evaluation on the test set...
Running trainer.evaluate() for perplexity...


Evaluation Loss (on test set): 3.7035
Perplexity (on test set): 40.5905
***** eval metrics *****
  epoch                   =      175.0
  eval_loss               =     3.7035
  eval_runtime            = 0:00:00.25
  eval_samples_per_second =      3.933
  eval_steps_per_second   =      3.933
Evaluation metrics saved.

Attempting to calculate ROUGE score...
Loading ROUGE scorer using evaluate.load('rouge')...




ROUGE scorer loaded successfully.
Skipping ROUGE score calculation because scorer failed to load.

--- Training and Evaluation Complete ---
Adapter saved in: ./mistral-qlora-adapter_run4
Adapter pushed to: https://huggingface.co/EricBlv/mistral-7b-instruct-youtube-qlora


## 10. Download Adapter

If you want to download the adapter directly from Colab, you can zip the output directory.

In [None]:
import shutil

adapter_zip_name = f"{os.path.basename(output_dir)}"
# Check if the directory exists before zipping
if os.path.isdir(output_dir):
    print(f"Zipping adapter directory: {output_dir} -> {adapter_zip_name}.zip")
    shutil.make_archive(adapter_zip_name, 'zip', output_dir)
    print(f"Adapter zipped to {adapter_zip_name}.zip")
    # You can now download this zip file from the Colab file browser
else:
    print(f"Output directory {output_dir} not found. Cannot create zip file.")

Zipping adapter directory: ./mistral-qlora-adapter_run3 -> mistral-qlora-adapter_run3.zip
Adapter zipped to mistral-qlora-adapter_run3.zip


In [None]:
ls

[0m[01;34mbitsandbytes[0m/                   [01;34mmistral-qlora-adapter_run4[0m/
[01;34mmistral-qlora-adapter[0m/          mistral_qlora_youtube.ipynb
[01;34mmistral-qlora-adapter_run2[0m/     test.jsonl
[01;34mmistral-qlora-adapter_run3[0m/     train.jsonl
mistral-qlora-adapter_run3.zip


In [None]:
cd mistral-qlora-adapter_run3

/content/drive/MyDrive/mistral_finetuning/mistral-qlora-adapter_run3


In [None]:
ls

adapter_config.json        [0m[01;34mcheckpoint-25[0m/           tokenizer_config.json
adapter_model.safetensors  [01;34mcheckpoint-50[0m/           tokenizer.json
all_results.json           [01;34mcheckpoint-75[0m/           tokenizer.model
[01;34mcheckpoint-100[0m/            eval_results.json        training_args.bin
[01;34mcheckpoint-125[0m/            README.md                train_results.json
[01;34mcheckpoint-150[0m/            special_tokens_map.json


In [None]:
cd ..

/content/drive/MyDrive/mistral_finetuning


In [None]:
ls

[0m[01;34mbitsandbytes[0m/                   [01;34mmistral-qlora-adapter_run4[0m/
[01;34mmistral-qlora-adapter[0m/          mistral_qlora_youtube.ipynb
[01;34mmistral-qlora-adapter_run2[0m/     test.jsonl
[01;34mmistral-qlora-adapter_run3[0m/     train.jsonl
mistral-qlora-adapter_run3.zip


In [None]:
# merge_adapter.py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import os
import shutil

# --- Configuration ---
base_model_name = "mistralai/Mistral-7B-Instruct-v0.3"
# *** CONFIRM this is the adapter path from your BEST run ***
# Use the absolute path as determined before
adapter_path = "/content/drive/MyDrive/mistral_finetuning/mistral-qlora-adapter_run3"
# *** This directory will be created in Colab ***
merged_model_path = "./merged_mistral_adapter"

# --- Ensure paths exist ---
if not os.path.isdir(adapter_path):
    print(f"Error: Adapter path not found: {adapter_path}")
    print("Please ensure the adapter files are in the correct directory.")
    exit(1)

print(f"Loading CLEAN base model for merging: {base_model_name}")
# Load base model WITHOUT quantization for merging
# device_map='auto' should still work on Colab GPU
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.bfloat16, # Use bf16 for efficiency on Colab GPU
    # quantization_config=None, # Ensure no quantization config is passed
    device_map="auto",
    trust_remote_code=True,
)
print("Clean base model loaded.")

print(f"Loading adapter: {adapter_path}")
# Load the LoRA adapter onto the CLEAN base model
model = PeftModel.from_pretrained(base_model, adapter_path)
print("Adapter loaded onto clean base model.")

print("Merging adapter...")
# Merge the adapter weights into the base model
model = model.merge_and_unload()
print("Merge complete.")

print(f"Saving merged model to: {merged_model_path}")
# Ensure target directory exists
os.makedirs(merged_model_path, exist_ok=True)

# Save the merged model (should work now without meta tensors)
try:
    model.save_pretrained(merged_model_path)
except NotImplementedError as e:
     print(f"ERROR during save_pretrained: {e}")
     print("This might indicate the merge didn't fully resolve meta tensors.")
     print("Consider saving state_dict manually or further debugging.")
     exit(1)
except Exception as e:
     print(f"An unexpected error occurred during save_pretrained: {e}")
     exit(1)


print("Loading tokenizer...")
# Load and save the tokenizer associated with the base model
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.save_pretrained(merged_model_path)

print("Merged model and tokenizer saved successfully.")

# Optional: Clean up memory if needed in Colab
import gc
del model
del base_model
gc.collect()
if torch.cuda.is_available(): torch.cuda.empty_cache()
print("Memory cleanup attempted.")

Loading CLEAN base model for merging: mistralai/Mistral-7B-Instruct-v0.3


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Clean base model loaded.
Loading adapter: /content/drive/MyDrive/mistral_finetuning/mistral-qlora-adapter_run3
Adapter loaded onto clean base model.
Merging adapter...
Merge complete.
Saving merged model to: ./merged_mistral_adapter


You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers


Loading tokenizer...
Merged model and tokenizer saved successfully.
Memory cleanup attempted.


In [None]:
    !ls -lh ./merged_mistral_adapter/

total 14G
-rw------- 1 root root  653 Apr 20 23:39 config.json
-rw------- 1 root root  111 Apr 20 23:39 generation_config.json
-rw------- 1 root root 4.7G Apr 20 23:39 model-00001-of-00003.safetensors
-rw------- 1 root root 4.7G Apr 20 23:39 model-00002-of-00003.safetensors
-rw------- 1 root root 4.3G Apr 20 23:40 model-00003-of-00003.safetensors
-rw------- 1 root root  24K Apr 20 23:40 model.safetensors.index.json
-rw------- 1 root root  414 Apr 20 23:40 special_tokens_map.json
-rw------- 1 root root 138K Apr 20 23:40 tokenizer_config.json
-rw------- 1 root root 1.9M Apr 20 23:40 tokenizer.json
-rw------- 1 root root 574K Apr 20 23:40 tokenizer.model
