# Complete VN Training Pipeline - Unified Character ü§ñ‚ù§Ô∏è

**Purpose:** Self-contained notebook for training LLaMA 3.1 on VN (Doki Doki Literature Club) dating simulator with a **unified character model**

**What this notebook does:**
1. üìö Loads merged VN JSONL data (all 4 characters combined: Monika, Sayori, Natsuki, Yuri ‚Üí 1 unified character)
2. üìù Uses unified persona with affection tracking and emotion guidance
3. üéØ Fine-tunes LLaMA 3.1 with LoRA on ~439 examples (vs 90-128 per character)
4. ‚úÖ Tests generation with optimized parameters

**Why Merged Characters:**
- **Better data volume**: ~439 examples total (closer to 250-500 minimum viable) vs 90-128 per character
- **Quality over variety**: Single well-trained character > 4 poorly-trained characters
- **Improved coherence**: More robust training with larger dataset
- **Trade-off**: Lost distinct personalities (Monika/Sayori/Natsuki/Yuri) for better response quality

**Key Features:**
- Unified general persona combining elements from all 4 characters
- Affection tracking (0-100 scale)
- Emotion-based guidance for appropriate responses
- Multi-turn conversation support
- Optimized generation (max_new_tokens=50, matches training avg 14.6 words)
- Optional character_name parameter for cosmetic purposes

**Data Source:** Run `notebooks/VN/01c_merge_characters__VN.ipynb` first to create merged dataset

---

## 1. Setup and Configuration

In [1]:
!pip3 install torch
!pip3 install pandas
!pip3 install numpy
!pip3 install tqdm
!pip3 install matplotlib
!pip3 install seaborn
!pip3 install transformers
!pip3 install datasets
!pip3 install accelerate
!pip3 install bitsandbytes
!pip3 install tensorboard
!pip3 install pyyaml
!pip3 install peft
!pip3 install --upgrade ipywidgets traitlets ipykernel tqdm


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1

In [2]:
# Check environment
import sys
from pathlib import Path

# Add parent to path
if Path.cwd().name == 'VN':
    sys.path.insert(0, str(Path.cwd().parent.parent))
    print("‚úì Running from VN directory")
else:
    print(f"‚ö†Ô∏è  Current directory: {Path.cwd()}")
    print("Please run from notebooks/VN/ directory")

‚ö†Ô∏è  Current directory: /common/home/projectgrps/CS425/CS425G3/CS425-Dating-Simulator/notebooks/VN_no_emotion
Please run from notebooks/VN/ directory


In [3]:
# Core imports
import torch
import json
import pandas as pd
import numpy as np
import random
import re
from datetime import datetime
from tqdm.notebook import tqdm

# Transformers and PEFT
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling,
    EarlyStoppingCallback
)
from peft import (
    LoraConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
    TaskType
)
from datasets import Dataset, DatasetDict

# Visualization
import matplotlib.pyplot as plt
%matplotlib inline

print("‚úì All imports successful")

‚úì All imports successful


In [4]:
# GPU Configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print(f"CUDA Version: {torch.version.cuda}")
    
    # Clear cache
    torch.cuda.empty_cache()
else:
    print("‚ö†Ô∏è  No GPU detected - training will be VERY slow")

Device: cuda
GPU: NVIDIA A40
Memory: 47.71 GB
CUDA Version: 12.8


## 2. Training Configuration

**‚ö†Ô∏è CUSTOMIZE THESE SETTINGS:**

In [5]:
# ==================== CONFIGURATION (NO EMOTION VERSION) ====================

# Model settings
MODEL_NAME = "meta-llama/Llama-3.1-8B-Instruct"

# Data paths - Use MERGED character data (all 4 combined into one) - NO EMOTION VERSION
VN_DATA_DIR = "../../data/processed/VN_no_emotion"
VN_CHARACTERS = ['Merged']  # Single unified character
OUTPUT_DIR = "../../checkpoints/dating_sim_vn_merged_no_emotion"

# Training hyperparameters
CONFIG = {
    # Data
    'max_length': 128,
    'train_split': 0.9,
    
    # Training
    'num_epochs': 12,
    'batch_size': 2,
    'gradient_accumulation_steps': 4,
    'learning_rate': 2e-4,
    'warmup_steps': 100,
    'weight_decay': 0.01,
    
    # LoRA parameters
    'lora_r': 8,
    'lora_alpha': 16,
    'lora_dropout': 0.05,
    'lora_target_modules': ['q_proj', 'v_proj', 'k_proj', 'o_proj'],
    
    # Memory optimization
    'gradient_checkpointing': True,
    'fp16': True,
    'bf16': False,
    
    # Logging
    'logging_steps': 10,
    'eval_steps': 30,
    'save_steps': 30,
    'save_total_limit': 3,
}

print("Configuration (NO EMOTION VERSION):")
print(f"  Model: {MODEL_NAME}")
print(f"  VN Data Dir: {VN_DATA_DIR}")
print(f"  Character Mode: {', '.join(VN_CHARACTERS)} (merged from Monika, Sayori, Natsuki, Yuri)")
print(f"  Output: {OUTPUT_DIR}")
print(f"  Epochs: {CONFIG['num_epochs']}")
print(f"  Effective batch size: {CONFIG['batch_size'] * CONFIG['gradient_accumulation_steps']}")

Configuration (NO EMOTION VERSION):
  Model: meta-llama/Llama-3.1-8B-Instruct
  VN Data Dir: ../../data/processed/VN_no_emotion
  Character Mode: Merged (merged from Monika, Sayori, Natsuki, Yuri)
  Output: ../../checkpoints/dating_sim_vn_merged_no_emotion
  Epochs: 12
  Effective batch size: 8


---
## 3. Load Raw Cleaned Data

In [6]:
# Load VN merged data (single unified character) - NO EMOTION VERSION
print("Loading merged VN data (NO EMOTION)...")

file_path = f"{VN_DATA_DIR}/vn_training_data_merged_no_emotion.jsonl"
print(f"  Loading from: {file_path}")

with open(file_path, 'r', encoding='utf-8') as f:
    all_data = [json.loads(line) for line in f]

print(f"‚úì Total loaded: {len(all_data)} training examples (NO EMOTION)")

# Convert to DataFrame for easier manipulation
df = pd.DataFrame(all_data)

print(f"\nColumns: {list(df.columns)}")
print(f"Dataset shape: {df.shape}")

# Display sample
print("\nSample data (first example's messages):")
if len(df) > 0:
    sample_messages = df.iloc[0]['messages']
    for msg in sample_messages[:2]:  # Show first 2 messages
        print(f"  {msg['role']}: {msg['content'][:100]}...")
    print(f"  ... ({len(sample_messages)} total messages in this example)")

Loading merged VN data (NO EMOTION)...
  Loading from: ../../data/processed/VN_no_emotion/vn_training_data_merged_no_emotion.jsonl
‚úì Total loaded: 200 training examples (NO EMOTION)

Columns: ['messages']
Dataset shape: (200, 1)

Sample data (first example's messages):
  system: You are a member of the Literature Club - friendly, thoughtful, and passionate about literature and ...
  user: Don't make promises you can't keep! Fine... I'll stop by for a cupcake, okay? I told you, don't call...
  ... (6 total messages in this example)


In [7]:
# Data statistics
print("="*80)
print("Merged Character Dataset Statistics")
print("="*80)

# Extract affection from system prompt
def extract_affection(messages):
    """Extract affection level from system prompt"""
    system_msg = messages[0]['content'] if messages and messages[0]['role'] == 'system' else ""
    import re
    match = re.search(r'Current affection: (\d+)/100', system_msg)
    return int(match.group(1)) if match else None

df['affection'] = df['messages'].apply(extract_affection)
affection_stats = df['affection'].describe()

print(f"\nAffection Distribution:")
print(f"  Mean:         {affection_stats['mean']:.1f}/100")
print(f"  Median (50%): {affection_stats['50%']:.1f}/100")
print(f"  Min:          {affection_stats['min']:.0f}/100")
print(f"  Max:          {affection_stats['max']:.0f}/100")

print("\n" + "="*80)
print("Multi-turn Conversation Statistics")
print("="*80)

# Count turns per conversation
df['num_turns'] = df['messages'].apply(len)
turn_stats = df['num_turns'].describe()

print(f"  Mean turns:   {turn_stats['mean']:.1f}")
print(f"  Median turns: {turn_stats['50%']:.1f}")
print(f"  Min turns:    {turn_stats['min']:.0f}")
print(f"  Max turns:    {turn_stats['max']:.0f}")

print("\n" + "="*80)
print(f"‚úì Loaded unified character with {len(df)} training examples")
print("  (Merged from: Monika, Sayori, Natsuki, Yuri)")
print("="*80)

Merged Character Dataset Statistics

Affection Distribution:
  Mean:         50.6/100
  Median (50%): 55.0/100
  Min:          20/100
  Max:          79/100

Multi-turn Conversation Statistics
  Mean turns:   8.3
  Median turns: 8.0
  Min turns:    3
  Max turns:    16

‚úì Loaded unified character with 200 training examples
  (Merged from: Monika, Sayori, Natsuki, Yuri)


---
## 4. Load Tokenizer for Formatting

In [8]:
# Load LLaMA 3.1 tokenizer
print("Loading LLaMA 3.1 tokenizer for data formatting...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

print(f"‚úì Tokenizer loaded: {tokenizer.__class__.__name__}")
print(f"  Special tokens: {tokenizer.special_tokens_map}")
print(f"  EOS token: {tokenizer.eos_token} (ID: {tokenizer.eos_token_id})")

Loading LLaMA 3.1 tokenizer for data formatting...
‚úì Tokenizer loaded: PreTrainedTokenizerFast
  Special tokens: {'bos_token': '<|begin_of_text|>', 'eos_token': '<|eot_id|>'}
  EOS token: <|eot_id|> (ID: 128009)


---
## 5. Format Data with LLaMA 3.1 Instruction Template

**Note:** VN data is already pre-formatted with character personas, affection tracking, and emotion guidance in the system prompts. We just need to apply the chat template.

---
## 6. Apply Chat Template to Pre-formatted Messages

VN data already contains complete conversations with system/user/assistant messages.

In [9]:
def format_vn_conversation(messages):
    """
    Apply LLaMA 3.1 chat template to pre-formatted VN messages.
    
    VN data already has:
    - System prompt with character description
    - Affection tracking (e.g., "Current affection: 25/100")
    - Emotion guidance (e.g., "The user is happy! Match their enthusiasm")
    - Multi-turn user/assistant dialogue
    
    We just apply the chat template to format for LLaMA 3.1.
    """
    # Apply LLaMA 3.1 chat template
    formatted = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False  # Don't add generation prompt for training
    )
    
    return formatted

print("‚úì Formatting function defined")
print("\nThis function simply applies the LLaMA 3.1 chat template to")
print("pre-formatted VN conversations (no persona building or scenario generation needed)")

‚úì Formatting function defined

This function simply applies the LLaMA 3.1 chat template to
pre-formatted VN conversations (no persona building or scenario generation needed)


In [10]:
# Test formatting with a sample
print("="*80)
print("Sample Formatted Conversation (LLaMA 3.1 Format)")
print("="*80)

sample_messages = df.iloc[0]['messages']
sample_formatted = format_vn_conversation(sample_messages)

# Show first 600 chars of formatted output
print(sample_formatted[:600] + "..." if len(sample_formatted) > 600 else sample_formatted)
print("\n" + "="*80)
print(f"Full length: {len(sample_formatted)} characters")

Sample Formatted Conversation (LLaMA 3.1 Format)
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

You are a member of the Literature Club - friendly, thoughtful, and passionate about literature and writing.

You have a nuanced personality that adapts to the situation and your mood. You can be:
- Confident and philosophical when discussing ideas
- Warm and caring when someone needs support
- Direct and passionate about your interests
- Shy and introspective in new situations

You value meaningful connections, enjoy deep conversations, and appreciate both classic litera...

Full length: 1312 characters


In [11]:
# Apply formatting to all conversations
print("Applying LLaMA 3.1 chat template to all VN conversations...")
df['text'] = df['messages'].apply(format_vn_conversation)
print(f"‚úì Formatted {len(df)} conversations")

# Statistics
df['text_length'] = df['text'].apply(len)
print(f"\nFormatted text length statistics:")
print(f"  Mean: {df['text_length'].mean():.0f} characters")
print(f"  Median: {df['text_length'].median():.0f} characters")
print(f"  Min: {df['text_length'].min()} characters")
print(f"  Max: {df['text_length'].max()} characters")

# Token length estimate (rough: ~4 chars per token)
df['estimated_tokens'] = df['text_length'] / 4
print(f"\nEstimated token lengths:")
print(f"  Mean: {df['estimated_tokens'].mean():.0f} tokens")
print(f"  Median: {df['estimated_tokens'].median():.0f} tokens")
print(f"  Max: {df['estimated_tokens'].max():.0f} tokens")
print(f"\n‚ö†Ô∏è  Examples longer than {CONFIG['max_length']} tokens will be truncated during training")

Applying LLaMA 3.1 chat template to all VN conversations...
‚úì Formatted 200 conversations

Formatted text length statistics:
  Mean: 1677 characters
  Median: 1692 characters
  Min: 890 characters
  Max: 2464 characters

Estimated token lengths:
  Mean: 419 tokens
  Median: 423 tokens
  Max: 616 tokens

‚ö†Ô∏è  Examples longer than 128 tokens will be truncated during training


In [12]:
# Convert to HuggingFace Dataset
dataset_df = df[['text']].copy()
dataset = Dataset.from_pandas(dataset_df)

# Train/validation split
train_test = dataset.train_test_split(
    test_size=1-CONFIG['train_split'],
    seed=42
)

train_dataset = train_test['train']
val_dataset = train_test['test']

print(f"‚úì Dataset created")
print(f"  Train samples: {len(train_dataset)}")
print(f"  Validation samples: {len(val_dataset)}")
print(f"\nExample training sample (first 400 chars):")
print(train_dataset[0]['text'][:400] + "...")

‚úì Dataset created
  Train samples: 180
  Validation samples: 20

Example training sample (first 400 chars):
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

You are a member of the Literature Club - friendly, thoughtful, and passionate about literature and writing.

You have a nuanced personality that adapts to the situation and your mood. You can be:
- Confident and philosophical when discussing ideas
- Warm and caring when some...


In [13]:
# Set padding token for tokenizer
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id
    print("‚úì Set pad_token to eos_token")

print(f"Tokenizer info:")
print(f"  Pad token: {tokenizer.pad_token} (ID: {tokenizer.pad_token_id})")
print(f"  EOS token: {tokenizer.eos_token} (ID: {tokenizer.eos_token_id})")

‚úì Set pad_token to eos_token
Tokenizer info:
  Pad token: <|eot_id|> (ID: 128009)
  EOS token: <|eot_id|> (ID: 128009)


In [14]:
# Load base model
print(f"\nLoading model: {MODEL_NAME}")
print("This may take a few minutes...\n")

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16 if CONFIG['fp16'] else torch.bfloat16 if CONFIG['bf16'] else torch.float32,
    device_map='auto',
    trust_remote_code=True
)

print("‚úì Model loaded")
total_params = sum(p.numel() for p in model.parameters())
print(f"  Total parameters: {total_params:,}")
print(f"  Size: ~{total_params * 2 / 1e9:.2f} GB (FP16)")


Loading model: meta-llama/Llama-3.1-8B-Instruct
This may take a few minutes...



`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

‚úì Model loaded
  Total parameters: 8,030,261,248
  Size: ~16.06 GB (FP16)


In [15]:
# Configure LoRA
if CONFIG['gradient_checkpointing']:
    model.gradient_checkpointing_enable()
    print("‚úì Gradient checkpointing enabled")

lora_config = LoraConfig(
    r=CONFIG['lora_r'],
    lora_alpha=CONFIG['lora_alpha'],
    target_modules=CONFIG['lora_target_modules'],
    lora_dropout=CONFIG['lora_dropout'],
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

model = get_peft_model(model, lora_config)

print("\n‚úì LoRA applied")
model.print_trainable_parameters()

trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nMemory for trainable params: ~{trainable_params * 2 / 1e9:.3f} GB (FP16)")

‚úì Gradient checkpointing enabled

‚úì LoRA applied
trainable params: 6,815,744 || all params: 8,037,076,992 || trainable%: 0.0848

Memory for trainable params: ~0.014 GB (FP16)


---
## 9. Tokenize Training Data

In [16]:
def tokenize_function(examples):
    """
    Tokenize formatted dialogues.
    """
    tokenized = tokenizer(
        examples,
        truncation=True,
        max_length=CONFIG['max_length'],
        padding='max_length',
        return_tensors='pt'
    )
    
    # For causal LM, labels = input_ids
    tokenized['labels'] = tokenized['input_ids'].clone()
    
    return tokenized

print("‚úì Tokenization function defined")

‚úì Tokenization function defined


In [17]:
# Tokenize datasets
print("Tokenizing datasets...")

train_texts = [train_dataset[i]['text'] for i in range(len(train_dataset))]
val_texts = [val_dataset[i]['text'] for i in range(len(val_dataset))]

train_tokenized = tokenize_function(train_texts)
val_tokenized = tokenize_function(val_texts)

# Create torch datasets
class DialogueDataset(torch.utils.data.Dataset):
    def __init__(self, encodings):
        self.encodings = encodings
    
    def __len__(self):
        return len(self.encodings['input_ids'])
    
    def __getitem__(self, idx):
        return {key: val[idx] for key, val in self.encodings.items()}

train_torch_dataset = DialogueDataset(train_tokenized)
val_torch_dataset = DialogueDataset(val_tokenized)

print(f"‚úì Tokenization complete")
print(f"  Train samples: {len(train_torch_dataset)}")
print(f"  Val samples: {len(val_torch_dataset)}")

Tokenizing datasets...
‚úì Tokenization complete
  Train samples: 180
  Val samples: 20


---
## 10. Configure Training

In [18]:
# Training arguments
training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    
    # Training
    num_train_epochs=CONFIG['num_epochs'],
    per_device_train_batch_size=CONFIG['batch_size'],
    per_device_eval_batch_size=CONFIG['batch_size'],
    gradient_accumulation_steps=CONFIG['gradient_accumulation_steps'],
    
    # Optimization
    learning_rate=CONFIG['learning_rate'],
    weight_decay=CONFIG['weight_decay'],
    warmup_steps=CONFIG['warmup_steps'],
    lr_scheduler_type='cosine',
    
    # Memory optimization
    fp16=CONFIG['fp16'],
    bf16=CONFIG['bf16'],
    gradient_checkpointing=CONFIG['gradient_checkpointing'],
    
    # Logging and saving
    logging_dir=f"{OUTPUT_DIR}/logs",
    logging_steps=CONFIG['logging_steps'],
    eval_steps=CONFIG['eval_steps'],
    save_steps=CONFIG['save_steps'],
    save_total_limit=CONFIG['save_total_limit'],
    eval_strategy='steps',
    save_strategy='steps',
    load_best_model_at_end=True,
    metric_for_best_model='eval_loss',
    
    # Other
    report_to='tensorboard',
    remove_unused_columns=False,
)

print("Training configuration:")
print(f"  Output dir: {OUTPUT_DIR}")
print(f"  Effective batch size: {CONFIG['batch_size'] * CONFIG['gradient_accumulation_steps']}")
total_steps = len(train_torch_dataset) // (CONFIG['batch_size'] * CONFIG['gradient_accumulation_steps']) * CONFIG['num_epochs']
print(f"  Total steps: {total_steps}")
print(f"  Mixed precision: {'FP16' if CONFIG['fp16'] else 'BF16' if CONFIG['bf16'] else 'FP32'}")

Training configuration:
  Output dir: ../../checkpoints/dating_sim_vn_merged_no_emotion
  Effective batch size: 8
  Total steps: 264
  Mixed precision: FP16


In [19]:
# Data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

# Early stopping
early_stopping = EarlyStoppingCallback(
    early_stopping_patience=3,
    early_stopping_threshold=0.001
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_torch_dataset,
    eval_dataset=val_torch_dataset,
    data_collator=data_collator,
    # callbacks=[early_stopping],
)

print("‚úì Trainer initialized with early stopping")

The model is already on multiple devices. Skipping the move to device specified in `args`.


‚úì Trainer initialized with early stopping


---
## 11. Train Model üöÄ

In [20]:
# Clear GPU cache before training
import gc
gc.collect()
torch.cuda.empty_cache()

print("Starting training...")
print(f"Monitor progress: tensorboard --logdir {OUTPUT_DIR}/logs")
print()

Starting training...
Monitor progress: tensorboard --logdir ../../checkpoints/dating_sim_vn_merged_no_emotion/logs



In [21]:
# Train!
train_result = trainer.train()

print("\n" + "="*80)
print("Training Complete! üéâ")
print("="*80)
print(f"Training loss: {train_result.training_loss:.4f}")
print(f"Training time: {train_result.metrics['train_runtime']:.2f} seconds")

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss,Validation Loss
30,2.2501,1.769248
60,0.1122,0.10345
90,0.0819,0.068504
120,0.012,0.011948
150,0.0112,0.010886
180,0.0109,0.010859
210,0.011,0.010856
240,0.0109,0.010855
270,0.0111,0.010855



Training Complete! üéâ
Training loss: 0.3781
Training time: 256.70 seconds


---
## 12. Save Model

In [22]:
# Save final model
final_model_path = f"{OUTPUT_DIR}/final"
trainer.save_model(final_model_path)
tokenizer.save_pretrained(final_model_path)

print(f"‚úì Model saved to: {final_model_path}")

# Save training metrics
metrics_path = f"{OUTPUT_DIR}/training_metrics.json"
with open(metrics_path, 'w') as f:
    json.dump(train_result.metrics, f, indent=2)

print(f"‚úì Metrics saved to: {metrics_path}")

‚úì Model saved to: ../../checkpoints/dating_sim_vn_merged_no_emotion/final
‚úì Metrics saved to: ../../checkpoints/dating_sim_vn_merged_no_emotion/training_metrics.json


---
## 13. Test Generation with FIXED Parameters üîß

Test the trained model with corrected generation function

In [23]:
# Set model to eval mode
model.eval()

# Unified character description (merged from all 4 original characters)
UNIFIED_PERSONA_BASE = """You are a member of the Literature Club - friendly, thoughtful, and passionate about literature and writing.

You have a nuanced personality that adapts to the situation and your mood. You can be:
- Confident and philosophical when discussing ideas
- Warm and caring when someone needs support
- Direct and passionate about your interests
- Shy and introspective in new situations

You value meaningful connections, enjoy deep conversations, and appreciate both classic literature and creative expression. You're genuine in your emotions and thoughtful in your responses."""


def generate_response_unified(
    user_input,
    emotion="neutral",
    affection=50,
    character_name=None,  # Optional: cosmetic name parameter
    max_new_tokens=50,
    temperature=0.7,
    top_p=0.85,
):
    """
    Generate response using unified character persona.

    OPTIMIZATIONS FOR MERGED CHARACTER:
    - max_new_tokens: 50 (matches training data avg 14.6 words)
    - temperature: 0.7 (natural variation)
    - early_stopping: True (respects EOS tokens)
    - Single unified persona (no character-specific descriptions)

    Args:
        user_input: User's message
        emotion: User's emotional state (joy, neutral, anger, surprise, etc.)
        affection: Affection level 0-100
        character_name: Optional name for cosmetic purposes (e.g., "Monika")
        max_new_tokens: Max tokens to generate (default 50)
        temperature: Sampling temperature (default 0.7)
        top_p: Nucleus sampling threshold (default 0.85)
    """
    # Build emotion guidance
    emotion_guidance = {
        "joy": "The user is happy! Match their enthusiasm and share in their joy.",
        "neutral": "Respond naturally based on the conversation context.",
        "anger": "The user appears upset. Stay calm, be understanding, and don't escalate.",
        "surprise": "Respond naturally based on the conversation context.",
        "sadness": "The user seems down. Be supportive and caring.",
        "fear": "The user seems worried. Be reassuring and supportive.",
    }.get(emotion, "Respond naturally based on the conversation context.")

    # Build system prompt with unified persona
    if character_name:
        # Optional: Include cosmetic name if provided
        persona = f"Your name is {character_name}. {UNIFIED_PERSONA_BASE}"
    else:
        persona = UNIFIED_PERSONA_BASE

    system_content = f"""{persona}

Current affection: {affection}/100
User's emotional state: {emotion}

{emotion_guidance}"""

    # Build messages for LLaMA 3.1 chat template
    messages = [
        {"role": "system", "content": system_content},
        {"role": "user", "content": user_input},
    ]

    # Apply chat template WITH generation prompt
    prompt = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )

    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    input_length = inputs["input_ids"].shape[1]

    # Generate with optimized parameters
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_p=top_p,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
            early_stopping=True,
            repetition_penalty=1.2,
            no_repeat_ngram_size=3,
        )

    # Extract only the generated tokens
    generated_tokens = outputs[0][input_length:]
    response = tokenizer.decode(generated_tokens, skip_special_tokens=True).strip()

    # Clean any accidental special tokens
    response = re.sub(r"<[^>]+>\s*", "", response)
    response = " ".join(response.split())

    return response


print("‚úì Unified character generation function ready")
print("\nOPTIMIZATIONS FOR MERGED CHARACTER:")
print("  ‚Ä¢ Single unified persona (no character-specific traits)")
print("  ‚Ä¢ max_new_tokens: 50 (matches training avg 14.6 words)")
print("  ‚Ä¢ temperature: 0.7 (natural variation)")
print("  ‚Ä¢ early_stopping: True (respects EOS tokens)")
print("  ‚Ä¢ Optional character_name parameter for cosmetic purposes")

‚úì Unified character generation function ready

OPTIMIZATIONS FOR MERGED CHARACTER:
  ‚Ä¢ Single unified persona (no character-specific traits)
  ‚Ä¢ max_new_tokens: 50 (matches training avg 14.6 words)
  ‚Ä¢ temperature: 0.7 (natural variation)
  ‚Ä¢ early_stopping: True (respects EOS tokens)
  ‚Ä¢ Optional character_name parameter for cosmetic purposes


In [24]:
# Test unified character with diverse scenarios
test_cases = [
    # (user_input, emotion, affection, optional_character_name)
    ("How's the Literature Club going?", "neutral", 30, None),
    ("I really enjoyed your poem today!", "joy", 60, None),
    ("You seem happy today!", "joy", 40, "Sayori"),  # Optional: cosmetic name
    ("Is everything okay? You seem a bit off...", "neutral", 25, None),
    ("What are you reading?", "neutral", 20, None),
    ("Your taste in literature is really impressive!", "joy", 50, "Yuri"),  # Optional: cosmetic name
    ("Tell me about your favorite book", "neutral", 35, None),
    ("I'd love to hear you read your poetry", "joy", 65, None),
    ("What are your hobbies?", "neutral", 45, None),
    ("Would you like to go out for lunch?", "joy", 70, "Monika"),  # Optional: cosmetic name
    ("I appreciate how thoughtful you are.", "joy", 55, None),
    ("This conversation means a lot to me.", "joy", 80, None),
]

print("Testing unified character model:")
print(f"Parameters: max_new_tokens=50, temperature=0.7, top_p=0.85\n")
print("="*80)

for user_input, emotion, affection, char_name in test_cases:
    response = generate_response_unified(
        user_input,
        emotion=emotion,
        affection=affection,
        character_name=char_name,  # Optional cosmetic name
        max_new_tokens=50,
        temperature=0.7,
        top_p=0.85
    )

    # Count tokens in response
    response_tokens = len(tokenizer.encode(response))

    # Display with or without name
    display_name = f" as '{char_name}'" if char_name else ""
    print(f"Unified Character{display_name} (Affection: {affection}/100, Emotion: {emotion})")
    print(f"User: {user_input}")
    print(f"Response ({response_tokens} tokens): {response}")
    print("-"*80)

print("\n‚úÖ Unified character testing complete")
print("   Model trained on ~439 merged examples (vs 90-128 per character)")
print("   Expected: Better coherence and quality than separate character training")

The following generation flags are not valid and may be ignored: ['early_stopping']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Testing unified character model:
Parameters: max_new_tokens=50, temperature=0.7, top_p=0.85

Unified Character (Affection: 30/100, Emotion: neutral)
User: How's the Literature Club going?
Response (51 tokens): It's been lovely lately. We've had some great discussions about existentialism and philosophy. Someone brought up Camus' concept of "absurdity" and how it relates to modern life. It was fascinating to see people from different backgrounds share their
--------------------------------------------------------------------------------
Unified Character (Affection: 60/100, Emotion: joy)
User: I really enjoyed your poem today!
Response (51 tokens): You noticed my poetry? That means so much to me. I was feeling particularly inspired by the works of Shelley and it just flowed out naturally. There's something special about capturing moments in words, don't you think? It's like freezing time and
--------------------------------------------------------------------------------
Unified Charact

In [25]:
# Diagnostic: Verify EOS token generation for unified character
print("="*80)
print("EOS Token Generation Diagnostic (Unified Character)")
print("="*80)

# Test with a simple prompt
test_input = "Hello!"

system_content = f"""{UNIFIED_PERSONA_BASE}

Current affection: 50/100
User's emotional state: neutral

Respond naturally based on the conversation context."""

messages = [
    {"role": "system", "content": system_content},
    {"role": "user", "content": test_input}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
input_length = inputs['input_ids'].shape[1]

print(f"\nTest prompt: '{test_input}'")
print(f"Input length: {input_length} tokens")
print(f"EOS token ID: {tokenizer.eos_token_id}")

# Generate with return_dict to get detailed output
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        temperature=0.7,
        top_p=0.85,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
        early_stopping=True,
        return_dict_in_generate=True,
        output_scores=True
    )

# Analyze generated tokens
generated_ids = outputs.sequences[0][input_length:]
generated_text = tokenizer.decode(generated_ids, skip_special_tokens=False)

print(f"\n{'‚îÄ'*80}")
print("Generated Token Analysis:")
print(f"{'‚îÄ'*80}")
print(f"Total tokens generated: {len(generated_ids)}")
print(f"Token IDs (first 20): {generated_ids.tolist()[:20]}")

# Check for EOS token
eos_found = tokenizer.eos_token_id in generated_ids
print(f"\n{'‚úÖ' if eos_found else '‚ùå'} EOS token ({tokenizer.eos_token_id}) found: {eos_found}")

if eos_found:
    eos_position = (generated_ids == tokenizer.eos_token_id).nonzero()[0].item()
    print(f"   EOS position: {eos_position}/{len(generated_ids)} tokens")
    print(f"   Generated {eos_position} tokens before EOS")
    print(f"   ‚úÖ Model learned proper stopping behavior")
else:
    print(f"   Model hit max_new_tokens limit without generating EOS")
    print(f"   ‚ö†Ô∏è  Model needs more training to learn proper stopping")

print(f"\n{'‚îÄ'*80}")
print("Generated Text (with special tokens):")
print(f"{'‚îÄ'*80}")
print(generated_text[:300] + "..." if len(generated_text) > 300 else generated_text)

print(f"\n{'‚îÄ'*80}")
print("Generated Text (cleaned):")
print(f"{'‚îÄ'*80}")
clean_text = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()
print(clean_text)

print("\n" + "="*80)

EOS Token Generation Diagnostic (Unified Character)

Test prompt: 'Hello!'
Input length: 166 tokens
EOS token ID: 128009

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Generated Token Analysis:
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
Total tokens generated: 25
Token IDs (first 20): [2675, 2873, 11919, 3432, 13, 2650, 596, 701, 2046, 2133, 779, 3117, 30, 42033, 7185, 3621, 477, 527, 499, 1120]

‚úÖ EOS token (128009) found: True
   EOS position: 24/25 tokens
   Generated 24 tokens before EOS
   ‚úÖ Model learned proper stopping behavior

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚