# Finetuning Base Model for Conversation

- **Authors:** Riyaadh Gani and Damilola Ogunleye
- **Project:** Food Recognition & Recipe LLM  
- **Purpose:** Fine-tuning pipeline for our base model for conversational abilities

---

## Overview

This notebook fine-tunes GPT-2 Medium on the processed OASST1 dataset with:
- LoRA for parameter-efficient training
- Multi-turn conversation support
- RAG-ready architecture (context field preserved)
 
**Requirements:**
- Processed [OASST1 dataset](./data/processed/oasst1_multiturn_en)
- GPU with at least 8GB VRAM (or use Colab/Kaggle)
- Training_config.yaml for hyperparameter fine_tuning

### Setup and Installation

In [None]:
%pip install -q --upgrade pip

%pip install "transformers==4.40.2" \
             "peft==0.11.1" \
             "accelerate==0.30.1" \
             "datasets==2.19.1" \
             bitsandbytes \
             scipy \
             wandb \
             pyyaml \
             trl \
             safetensors \
             sentencepiece

In [17]:
import os
os.environ["TRANSFORMERS_NO_TF"] = "1"

# Clone your repo into /content if not already present
REPO_URL = "https://github.com/Gani332/DeepLearningLLM.git"
REPO_PATH = "/content/DeepLearningLLM"

# Clone repo only if it doesn't exist already
if not os.path.exists(REPO_PATH):
    !git clone {REPO_URL} {REPO_PATH}
else:
    print("Repo already exists ‚Äî pulling latest changes...")
    %cd {REPO_PATH}
    !git pull

# Change working directory to your repo
os.chdir(REPO_PATH)

# Show current working directory and its contents
print("Working directory:", os.getcwd())
print("Contents:", os.listdir("."))


Repo already exists ‚Äî pulling latest changes...
/content/DeepLearningLLM
Already up to date.
Working directory: /content/DeepLearningLLM
Contents: ['.git', 'llm_data_preprocessing.ipynb', 'prepare_recipe_data_for_gpt2.py', 'wandb', 'finetune_llm', 'models', '.DS_Store', '.gitignore', 'lstm_model_training.ipynb', 'supportDocs']


In [18]:
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling,
    set_seed
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType
from datasets import load_from_disk
import numpy as np
from pathlib import Path
import wandb
import os

usingGoogleDrive = False  # Set to True if using Google Colab
set_seed(42)

In [19]:
if usingGoogleDrive:
    from google.colab import drive
    drive.mount('/content/drive')

    print("‚úì Google Drive mounted at /content/drive")

    # Base project directory in Google Drive
    PROJECT_DIR = Path("/content/drive/MyDrive/cooking-assistant-project")

    # Verify directories exist
    required_dirs = [
        PROJECT_DIR / "data" / "processed" / "oasst1_multiturn_en" / "train",
        PROJECT_DIR / "data" / "processed" / "oasst1_multiturn_en" / "val",
    ]

    print("Checking for processed data...")
    all_exist = True
    for path in required_dirs:
        exists = path.exists()
        all_exist = all_exist and exists
        print(f"  {'‚úì' if exists else '‚úó'} {path}")

    if not all_exist:
        print("\n‚ö†Ô∏è ERROR: Processed data not found!")
        print("Please run the OASST1 processing notebook first.")
        raise FileNotFoundError("Processed data not found. Run OASST1 notebook first.")

    # Change working directory
    os.chdir(PROJECT_DIR)
    print(f"\n‚úì Working directory: {os.getcwd()}")

In [20]:
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("  ‚ö†Ô∏è NO GPU DETECTED!")
    print("  You MUST enable GPU for training:")
    print("     Runtime ‚Üí Change runtime type ‚Üí Hardware accelerator ‚Üí GPU ‚Üí Save")
    raise RuntimeError("GPU required for training. Please enable it in Runtime settings.")


PyTorch version: 2.8.0+cu126
CUDA available: True
GPU: Tesla T4
GPU Memory: 15.83 GB


### Download the Model
Only need to do once

In [6]:
from transformers import AutoTokenizer, AutoModelForCausalLM

# Selected pretrained model
model_name = "gpt2-medium"

print(f"Downloading {model_name}...")

# loads the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name) # converts text to tokens
model = AutoModelForCausalLM.from_pretrained(model_name) # loads the pre-trained language model

# Save locally for reuse
tokenizer.save_pretrained(f"./models/base/{model_name}")
model.save_pretrained(f"./models/base/{model_name}")
print("Model downloaded successfully!")

Downloading gpt2-medium...


Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


Model downloaded successfully!


### Load the dataset

In [7]:
print("Contents:", os.listdir("/content/datasets/datasets"))

Contents: ['OASST1', 'Cleaned']


In [None]:
# Get the configuration parameters from the YAML file
import yaml

config_path = Path('finetune_llm/config/training_config.yaml')

# Check if the file exists before trying to open it
if not config_path.exists():
    raise FileNotFoundError(f"The file {config_path} was not found. Please create config/training_config.yaml")
with open(config_path, 'r') as file:
    # 1. Use safe_load() for security
    config = yaml.safe_load(file)
    
    # The YAML file content is now a standard Python dictionary
    print(f"Loaded data type: {type(config)}")
    
    # Check loaded properly
    training_data_path = config['data']['train_data']
    validation_data_path = config['data']['val_data']
    
    print(f" Path to train data: {training_data_path}")
    print(f" Path to val data: {validation_data_path}")

Loaded data type: <class 'dict'>
 Path to train data: /content/datasets/datasets/OASST1/processed/oasst1_multiturn_en/train
 Path to val data: /content/datasets/datasets/OASST1/processed/oasst1_multiturn_en/val


In [9]:
train_path = Path(config["data"]["train_data"])
val_path = Path(config["data"]["val_data"])

# Check if datasets exist
if not train_path.exists():
    raise FileNotFoundError(
        f"Training data not found at {train_path}.\n"
        f"Please run the OASST1 processing notebook first!"
    )

# Load datasets
train_dataset = load_from_disk(str(train_path))
val_dataset = load_from_disk(str(val_path))

print(f"‚úì Train dataset: {len(train_dataset):,} examples")
print(f"‚úì Validation dataset: {len(val_dataset):,} examples")

# Show a sample
print("\nSample training example:")
print("="*70)
print(train_dataset[0]['text'][:500] + "...")
print("="*70)

‚úì Train dataset: 5,139 examples
‚úì Validation dataset: 572 examples

Sample training example:
The following is a conversation between a user and a helpful cooking assistant.

Previous conversation:
User: Let's play a game of chess. I'll start:
1. d4

User: Let's play a game of chess. I'll start:
1. d4
Assistant: d5...


### Load the Model and Tokenizer

In [10]:
print(f"Loading {config['model']['base_model']}...")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config['model']['base_model'])

# GPT-2 doesn't have a pad token, so we set it to eos_token
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"  # Pad on the left for generation

print(f"‚úì Tokenizer loaded")
print(f"  Vocab size: {len(tokenizer):,}")
print(f"  Special tokens: {tokenizer.special_tokens_map}")

Loading ./models/base/gpt2-medium...
‚úì Tokenizer loaded
  Vocab size: 50,257
  Special tokens: {'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}


In [11]:
print(f"\nLoading model...")

model = AutoModelForCausalLM.from_pretrained(
    config['model']['base_model'],
    torch_dtype=torch.float16 if config['training']['fp16'] else torch.float32,
    device_map="auto",
)

# Get model size
model_size = sum(p.numel() for p in model.parameters())
print(f"‚úì Model loaded")
print(f"  Total parameters: {model_size:,} ({model_size/1e6:.1f}M)")
print(f"  Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")


Loading model...
‚úì Model loaded
  Total parameters: 354,823,168 (354.8M)
  Trainable parameters: 354,823,168


### Configure LoRA

In [12]:
if config['lora']['use_lora']:
    print("Configuring LoRA...")
    
    # LoRA configuration
    peft_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        r=config['lora']['lora_r'],
        lora_alpha=config['lora']['lora_alpha'],
        lora_dropout=config['lora']['lora_dropout'],
        target_modules=config['lora']['target_modules'],
        bias="none",
    )
    
    # Prepare model for k-bit training (memory efficient)
    model = prepare_model_for_kbit_training(model)
    
    # Add LoRA adapters
    model = get_peft_model(model, peft_config)
    
    # Print trainable parameters
    model.print_trainable_parameters()
    
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    total_params = sum(p.numel() for p in model.parameters())
    
    print(f"\n‚úì LoRA configured")
    print(f"  Trainable parameters: {trainable_params:,} ({trainable_params/1e6:.2f}M)")
    print(f"  Total parameters: {total_params:,} ({total_params/1e6:.1f}M)")
    print(f"  Trainable %: {100 * trainable_params / total_params:.2f}%")
else:
    print("Training full model (no LoRA)")

Configuring LoRA...




trainable params: 4,325,376 || all params: 359,148,544 || trainable%: 1.2043

‚úì LoRA configured
  Trainable parameters: 4,325,376 (4.33M)
  Total parameters: 359,148,544 (359.1M)
  Trainable %: 1.20%


### Tokenise the Dataset

Convert the string schema into something the model can actually understand and add the padding etc etc 

In [13]:
print("Tokenizing datasets...")

def tokenize_function(examples):
    """
    Tokenize the text and prepare for causal language modeling.
    """
    # Tokenize
    tokenized = tokenizer(
        examples['text'],
        truncation=True,
        max_length=config['model']['max_length'],
        padding="max_length",
        return_tensors=None,
    )
    
    # For causal LM, labels are the same as input_ids
    # tokenized["labels"] = tokenized["input_ids"].copy()

    # Labels = input_ids but mask padding positions with -100 so the loss ignores them
    tokenized["labels"] = [ids.copy() for ids in tokenized["input_ids"]]
    for i, mask in enumerate(tokenized["attention_mask"]):
        for j, m in enumerate(mask):
            if m == 0:
                tokenized["labels"][i][j] = -100
    
    return tokenized

# Tokenize datasets
print("  Tokenizing train dataset...")
tokenized_train = train_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=train_dataset.column_names,
    desc="Tokenizing train"
)

print("  Tokenizing validation dataset...")
tokenized_val = val_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=val_dataset.column_names,
    desc="Tokenizing val"
)

print(f"‚úì Tokenization complete")
print(f"  Train: {len(tokenized_train):,} examples")
print(f"  Val: {len(tokenized_val):,} examples")

# Show tokenized example (remember padding is on the left with <|endoftext|>)
print("\nSample tokenized example:")
sample = tokenized_train[0]
print(f"  Input IDs shape: {len(sample['input_ids'])}")
print(f"  Attention mask shape: {len(sample['attention_mask'])}")
print(f"  Sample decoded: {tokenizer.decode(sample['input_ids'][:100])}...")

Tokenizing datasets...
  Tokenizing train dataset...


Tokenizing train:   0%|          | 0/5139 [00:00<?, ? examples/s]

  Tokenizing validation dataset...


Tokenizing val:   0%|          | 0/572 [00:00<?, ? examples/s]

‚úì Tokenization complete
  Train: 5,139 examples
  Val: 572 examples

Sample tokenized example:
  Input IDs shape: 512
  Attention mask shape: 512
  Sample decoded: <|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|

### Set up the Training Args

In [14]:
output_dir = Path(config['training']['output_dir'])
output_dir.mkdir(parents=True, exist_ok=True)

print(f"Setting up training arguments...")

training_args = TrainingArguments(
    # Output
    output_dir=str(output_dir),
    
    # Training
    num_train_epochs=config['training']['num_train_epochs'],
    per_device_train_batch_size=config['training']['per_device_train_batch_size'],
    per_device_eval_batch_size=config['training']['per_device_eval_batch_size'],
    gradient_accumulation_steps=config['training']['gradient_accumulation_steps'],
    
    # Optimization
    learning_rate=config['training']['learning_rate'],
    weight_decay=config['training']['weight_decay'],
    warmup_steps=config['training']['warmup_steps'],
    optim=config['training']['optim'],
    lr_scheduler_type=config['training']['lr_scheduler_type'],

    # Precision
    fp16=config['training']['fp16'],
    
    # Logging
    logging_dir=str(output_dir / "logs"),
    logging_steps=config['training']['logging_steps'],

    # Evaluation
    evaluation_strategy="steps",
    eval_steps=config['training']['eval_steps'],

    # Saving
    save_strategy="steps",
    save_steps=config['training']['save_steps'],
    save_total_limit=config['training']['save_total_limit'],
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    
    # Other
    report_to="wandb" if config['training']['use_wandb'] else "none",
    seed=42,
    dataloader_num_workers=4,
    remove_unused_columns=False,
)

print(f"‚úì Training arguments configured")
print(f"  Effective batch size: {config['training']['per_device_train_batch_size'] * config['training']['gradient_accumulation_steps']}")
print(f"  Total training steps: ~{len(tokenized_train) // (config['training']['per_device_train_batch_size'] * config['training']['gradient_accumulation_steps']) * config['training']['num_train_epochs']}")

Setting up training arguments...
‚úì Training arguments configured
  Effective batch size: 16
  Total training steps: ~963


### Initialise the Trainer

In [15]:
# Data collator for language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,  # We're doing causal LM, not masked LM 
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    data_collator=data_collator,
)

print("‚úì Trainer initialized")

‚úì Trainer initialized


  self.scaler = torch.cuda.amp.GradScaler(**kwargs)


### Train the Model

In [16]:
if config['training']['use_wandb']:
    wandb.init(
        project=config['training']['wandb_project'],
        config=config,
        name=f"{config['model']['base_model']}-{config['training']['num_train_epochs']}epochs"
    )

print("\n" + "="*70)
print("STARTING TRAINING")
print("="*70)
print(f"Model: {config['model']['base_model']}")
print(f"Epochs: {config['training']['num_train_epochs']}")
print(f"Training examples: {len(tokenized_train):,}")
print(f"Validation examples: {len(tokenized_val):,}")
print("="*70 + "\n")

# Train and pray
train_result = trainer.train()

print("\n" + "="*70)
print("TRAINING COMPLETE!")
print("="*70)
print(f"Training time: {train_result.metrics['train_runtime']:.2f} seconds")
print(f"Training samples/second: {train_result.metrics['train_samples_per_second']:.2f}")
print(f"Final train loss: {train_result.metrics['train_loss']:.4f}")
print("="*70)

  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mriyaadh-gani2[0m ([33mriyaadh-gani2-ucl[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin



STARTING TRAINING
Model: ./models/base/gpt2-medium
Epochs: 3
Training examples: 5,139
Validation examples: 572





Step,Training Loss,Validation Loss
100,3.5362,2.318838
200,1.9489,1.834234
300,1.8926,1.784452
400,1.8304,1.762046
500,1.7871,1.750593
600,1.7882,1.739409
700,1.776,1.736444
800,1.8011,1.732749
900,1.7488,1.732258





TRAINING COMPLETE!
Training time: 2893.83 seconds
Training samples/second: 5.33
Final train loss: 2.1963


### Evaluate the Model

In [17]:
print("\nEvaluating model on validation set...")

eval_results = trainer.evaluate()

print("\n" + "="*70)
print("EVALUATION RESULTS")
print("="*70)
for key, value in eval_results.items():
    print(f"  {key}: {value:.4f}")
print("="*70)

# Calculate perplexity
perplexity = np.exp(eval_results['eval_loss'])
print(f"\n Perplexity: {perplexity:.2f}")
print("   (Lower is better - good models have perplexity < 30)")


Evaluating model on validation set...





EVALUATION RESULTS
  eval_loss: 1.7323
  eval_runtime: 38.8117
  eval_samples_per_second: 14.7380
  eval_steps_per_second: 3.6840
  epoch: 2.9977

 Perplexity: 5.65
   (Lower is better - good models have perplexity < 30)


### Save the Fine-tuned Model

In [18]:
print("\nSaving final model...")

final_model_path = output_dir / "final"
final_model_path.mkdir(exist_ok=True)

# Save model and tokenizer
trainer.save_model(str(final_model_path))
tokenizer.save_pretrained(str(final_model_path))

print(f"‚úì Model saved to: {final_model_path}")

# Save training config as a json 
import json
config_path = final_model_path / "training_config.json"
with open(config_path, 'w') as f:
    json.dump(config, f, indent=2)
print(f"‚úì Config saved to: {config_path}")


Saving final model...
‚úì Model saved to: models/gpt2-conversational-v1/final
‚úì Config saved to: models/gpt2-conversational-v1/final/training_config.json




### Test the Model

In [19]:
print("\nTesting the fine-tuned model...")

# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

def generate_response(user_input, context="", max_length=256, temperature=0.7):
    """
    Generate a response using the fine-tuned model.
    
    Args:
        user_input: User's question/message
        context: Conversation history or retrieved information
        max_length: Maximum tokens to generate
        temperature: Sampling temperature (higher = more creative)
    """
    # Format prompt (matches training format)
    prompt = f"""The following is a conversation between a user and a helpful cooking assistant.

{context}

User: {user_input}
Assistant:"""
    
    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_length,
            temperature=temperature,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    
    # Decode
    full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract just the assistant's response
    if "Assistant:" in full_response:
        response = full_response.split("Assistant:")[-1].strip()
        # Remove any trailing user prompts
        if "User:" in response:
            response = response.split("User:")[0].strip()
    else:
        response = full_response
    
    return response


Testing the fine-tuned model...


In [20]:
# Test with sample queries

test_queries = [
    "How do I make scrambled eggs?",
    "What's the difference between baking and roasting?",
    "Can you explain what saut√©ing means?",
    "My cake collapsed in the middle. What went wrong?",
    "How do I know when pasta is done?",
    "I have eggs, tomatoes, and spinach. What can I cook for breakfast?"
]

print("\n" + "="*70)
print("TEST RESPONSES")
print("="*70)

for i, query in enumerate(test_queries, 1):
    print(f"\nüî∏ Query {i}: {query}")
    print("-" * 70)
    response = generate_response(query)
    print(f"Response: {response}")
    print()


TEST RESPONSES

üî∏ Query 1: How do I make scrambled eggs?
----------------------------------------------------------------------
Response: There are several ways to make scrambled eggs. One of the most common


üî∏ Query 2: What's the difference between baking and roasting?
----------------------------------------------------------------------
Response: The differences between baking and roasting can be confusing, but it can be very important to understand the difference between the two techniques to make sure you are getting the best results.

Baking:

Baking involves heating a mixture of ingredients, such as flour, sugar, and salt, and then slowly adding water and baking the mixture at a high temperature. The resulting product is a cake or cookies that is moist and soft, but not dense or dense.

Roasting:

Roasting involves the addition of heat and moisture to a pan, creating a dense, soft, and chewy product that is ready to eat.

The two methods are closely related, but they hav

### Test Multi-Turn Conversations

In [21]:
print("\n" + "="*70)
print("MULTI-TURN CONVERSATION TEST")
print("="*70)

# Simulate a multi-turn conversation
conversation_history = []

turn_1_input = "How do I make risotto?"
print(f"\nüë§ User: {turn_1_input}")

# First turn (no history)
response_1 = generate_response(turn_1_input, context="")
print(f"ü§ñ Assistant: {response_1}")

# Add to history
conversation_history.append(f"User: {turn_1_input}")
conversation_history.append(f"Assistant: {response_1}")

# Second turn (with history)
turn_2_input = "What type of rice should I use?"
context = "Previous conversation:\n" + "\n".join(conversation_history)

print(f"\nüë§ User: {turn_2_input}")
response_2 = generate_response(turn_2_input, context=context)
print(f"ü§ñ Assistant: {response_2}")

conversation_history.append(f"User: {turn_2_input}")
conversation_history.append(f"Assistant: {response_2}")

# Third turn
turn_3_input = "How long does it take?"
context = "Previous conversation:\n" + "\n".join(conversation_history)

print(f"\nüë§ User: {turn_3_input}")
response_3 = generate_response(turn_3_input, context=context)
print(f"ü§ñ Assistant: {response_3}")

print("\n" + "="*70)


MULTI-TURN CONVERSATION TEST

üë§ User: How do I make risotto?
ü§ñ Assistant: To make risotto, you can use a heavy-bottomed pot, a heavy-bottomed saucepan, and a heavy-bottomed griddle. To make risotto, you can use a heavy-bottomed pot, a heavy-bottomed saucepan, and a heavy-bottomed griddle.

Step 1: Preheat the oven to 375¬∞F.
Step 2: In a medium saucepan, combine the flour, salt, and pepper. Stir in the olive oil and cook, stirring frequently, until the mixture is smooth and thickened, about 5 minutes.
Step 3: Add the mushrooms, stir to combine, and cook, stirring occasionally, until the mushrooms are tender, about 5 minutes.
Step 4: Add the remaining ingredients, stirring to combine, and cook, stirring occasionally, until the mixture is thickened, about 5 minutes.
Step 5: Add the rice, stir to combine, and cook, stirring occasionally, until the rice is cooked through, about 5 minutes.
Step 6: Add the spinach, stir to combine, and cook, stirring occasionally, until the spinach is

## Summary

In [22]:
print("\n" + "="*70)
print("TRAINING SUMMARY")
print("="*70)
print(f"\n‚úÖ Successfully fine-tuned {config['model']['base_model']}")
print(f"‚úÖ Final validation perplexity: {perplexity:.2f}")
print(f"‚úÖ Model saved to: {final_model_path}")

print(f"\nüìÅ Files:")
print(f"  Model weights: {final_model_path}")
print(f"  Tokenizer: {final_model_path}")
print(f"  Training config: {config_path}")
print(f"  Checkpoints: {output_dir}")


TRAINING SUMMARY

‚úÖ Successfully fine-tuned ./models/base/gpt2-medium
‚úÖ Final validation perplexity: 5.65
‚úÖ Model saved to: models/gpt2-conversational-v1/final

üìÅ Files:
  Model weights: models/gpt2-conversational-v1/final
  Tokenizer: models/gpt2-conversational-v1/final
  Training config: models/gpt2-conversational-v1/final/training_config.json
  Checkpoints: models/gpt2-conversational-v1
