## The Code Implementation: FixedMetacognitiveReuseTrainer

### The provided Python code implements a sophisticated training system that teaches LLaMA-3.2-3B to recognize and apply metacognitive patterns.

## Install necessary Libraries 

In [1]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu113
!pip install transformers datasets accelerate peft bitsandbytes

!pip install wandb matplotlib tqdm

Looking in indexes: https://download.pytorch.org/whl/cu113
INFO: pip is looking at multiple versions of torch to determine which version is compatible with other requirements. This could take a while.
[31mERROR: Could not find a version that satisfies the requirement nvidia-cuda-nvrtc-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64" (from torch) (from versions: 12.9.41, 12.9.86)[0m[31m
[0m[31mERROR: No matching distribution found for nvidia-cuda-nvrtc-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64"[0m[31m
Collecting bitsandbytes
  Downloading bitsandbytes-0.47.0-py3-none-manylinux_2_24_x86_64.whl.metadata (11 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.wh

## Implemented Code 

In [2]:
import torch
from transformers import (
    AutoTokenizer, 
    AutoModelForCausalLM, 
    TrainingArguments, 
    Trainer,
    DataCollatorForLanguageModeling
)
from datasets import Dataset
import json
from typing import Dict, List
import time
import os
from huggingface_hub import login

try:
    login(token="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")
except Exception as e:
    print(f"Error during HF login: {e}")

class FixedMetacognitiveReuseTrainer:
    def __init__(self, model_name="meta-llama/Llama-3.2-3B-Instruct"):
        self.model_name = model_name
        self.tokenizer = None
        self.model = None
        self.reuse_patterns = {}
        
    def setup_model(self):
        """Initialize model with P100-compatible settings"""
        print("Loading model and tokenizer...")
        
        # Load tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
        
        # Load model with memory-efficient settings
        self.model = AutoModelForCausalLM.from_pretrained(
            self.model_name,
            torch_dtype=torch.float16,
            device_map="auto",
            low_cpu_mem_usage=True,
            trust_remote_code=True
        )
        
        print(f"Model loaded on {self.model.device}")
        
    def extract_reuse_patterns(self, reasoning_data: List[Dict]) -> Dict:
        """FIXED: Efficient pattern extraction"""
        if len(reasoning_data) < 3:
            print("Not enough data for pattern extraction, using default patterns")
            return {
                "think step by step": 1,
                "analyze the problem": 1,
                "break it down": 1
            }
            
        patterns = {}
        
        # Quick pattern extraction without infinite loops
        for item in reasoning_data:
            reasoning = item.get("reasoning", "")
            
            # Extract unique sentences as patterns
            sentences = [s.strip() for s in reasoning.split('.') if len(s.strip()) > 10]
            for sentence in sentences[:3]:  # Limit to first 3 sentences
                if sentence and len(sentence) > 15:
                    if sentence not in patterns:
                        patterns[sentence] = 1
                    else:
                        patterns[sentence] += 1
        
        # Only keep patterns that appear multiple times, or use all if small dataset
        if len(reasoning_data) < 5:
            print(f"Using all {len(patterns)} patterns (small dataset)")
            return patterns
        else:
            frequent_patterns = {k: v for k, v in patterns.items() if v > 1}
            print(f"Found {len(frequent_patterns)} frequent patterns")
            return frequent_patterns
    
    def prepare_training_data(self, dataset, reuse_patterns: Dict):
        """FIXED: Efficient data preparation"""
        
        def format_with_metacognition(example):
            reasoning = example.get("reasoning", "")
            question = example.get("question", "")
            solution = example.get("solution", "")
            
            # SIMPLIFIED: Just mention pattern reuse without complex matching
            if reuse_patterns:
                pattern_preview = list(reuse_patterns.keys())[:2]  # First 2 patterns
                metacognitive_prompt = f"Consider patterns like: {'; '.join(pattern_preview)}. "
            else:
                metacognitive_prompt = "Think step by step. "
            
            # Format the training example
            formatted_text = f"""<|start_header_id|>user<|end_header_id|>

{question}

{metacognitive_prompt}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

Reasoning: {reasoning}

Solution: {solution}<|eot_id|>"""
            
            return {"text": formatted_text}
        
        return dataset.map(format_with_metacognition, load_from_cache_file=False)
    
    def train(self, training_data, output_dir="/kaggle/working/llama3-metacognitive-fast"):
        """FIXED: Proper training function with small dataset handling"""
        
        print(f"Starting training with {len(training_data)} examples...")
        start_time = time.time()
        
        # Create output directory
        os.makedirs(output_dir, exist_ok=True)
        
        # QUICK pattern extraction (time-limited)
        print("Extracting patterns...")
        pattern_start = time.time()
        self.reuse_patterns = self.extract_reuse_patterns(training_data)
        print(f"Pattern extraction took: {time.time() - pattern_start:.2f}s")
        
        # Prepare dataset
        dataset = Dataset.from_list(training_data)
        
        print("Preparing training data...")
        data_start = time.time()
        processed_dataset = self.prepare_training_data(dataset, self.reuse_patterns)
        print(f"Data preparation took: {time.time() - data_start:.2f}s")
        
        # Tokenize dataset
        def tokenize_function(examples):
            return self.tokenizer(
                examples["text"], 
                truncation=True, 
                padding=False,
                max_length=512,  # REDUCED further for small datasets
                return_tensors=None
            )
        
        print("Tokenizing dataset...")
        tokenize_start = time.time()
        tokenized_dataset = processed_dataset.map(
            tokenize_function, 
            batched=True,
            remove_columns=processed_dataset.column_names,
            load_from_cache_file=False
        )
        print(f"Tokenization took: {time.time() - tokenize_start:.2f}s")
        
        # ADAPTIVE Training arguments based on dataset size
        dataset_size = len(training_data)
        
        if dataset_size < 5:
            # TINY DATASET SETTINGS - Very fast training
            training_args = TrainingArguments(
                output_dir=output_dir,
                overwrite_output_dir=True,
                num_train_epochs=1,  # Only 1 epoch for tiny datasets
                per_device_train_batch_size=1,
                gradient_accumulation_steps=1,
                max_steps=10,  # Limit to 10 steps maximum
                warmup_steps=2,
                learning_rate=1e-5,
                fp16=True,
                logging_steps=1,
                save_steps=5,
                save_total_limit=1,
                prediction_loss_only=True,
                remove_unused_columns=False,
                dataloader_pin_memory=False,
                report_to=None,  # Disable wandb for quick tests
            )
        else:
            # NORMAL DATASET SETTINGS
            training_args = TrainingArguments(
                output_dir=output_dir,
                overwrite_output_dir=True,
                num_train_epochs=2,
                per_device_train_batch_size=1,
                gradient_accumulation_steps=4,
                warmup_steps=10,
                learning_rate=2e-5,
                fp16=True,
                logging_steps=5,
                save_steps=100,
                save_total_limit=1,
                prediction_loss_only=True,
                remove_unused_columns=False,
                dataloader_pin_memory=False,
            )
        
        # Data collator
        data_collator = DataCollatorForLanguageModeling(
            tokenizer=self.tokenizer,
            mlm=False,
        )
        
        # Initialize trainer
        trainer = Trainer(
            model=self.model,
            args=training_args,
            data_collator=data_collator,
            train_dataset=tokenized_dataset,
        )
        
        # Start training with progress tracking
        print("Starting training...")
        training_start = time.time()
        
        try:
            # Train with explicit step tracking
            trainer.train()
            
            # Save model
            trainer.save_model()
            self.tokenizer.save_pretrained(output_dir)
            
            # Save reuse patterns
            with open(f"{output_dir}/reuse_patterns.json", "w") as f:
                json.dump(self.reuse_patterns, f, indent=2)
            
            print("✅ Training completed successfully!")
            
        except Exception as e:
            print(f"❌ Training error: {e}")
            # Try to save anyway if possible
            try:
                trainer.save_model()
                self.tokenizer.save_pretrained(output_dir)
                print("✅ Model saved despite error")
            except:
                print("❌ Could not save model")
        
        total_time = time.time() - start_time
        print(f"⏱️ Total execution time: {total_time/60:.2f} minutes")
        
        return total_time

def create_sample_data():
    """Create better sample training data"""
    sample_data = [
        {
            "question": "If a train travels 60 mph for 2 hours, how far does it go?",
            "reasoning": "To find distance, multiply speed by time. This is a basic distance calculation using the formula distance = speed × time.",
            "solution": "60 mph × 2 hours = 120 miles"
        },
        {
            "question": "A car travels 45 mph for 3 hours. What distance is covered?",
            "reasoning": "Using the distance formula: distance = speed × time. Multiply the speed by the time traveled.",
            "solution": "45 mph × 3 hours = 135 miles"
        },
        {
            "question": "Solve for x: 2x + 5 = 15",
            "reasoning": "First, isolate the variable by subtracting 5 from both sides. Then divide by the coefficient.",
            "solution": "2x = 10, x = 5"
        },
        {
            "question": "What is 20% of 150?",
            "reasoning": "To find percentage, multiply the number by the percentage divided by 100.",
            "solution": "150 × 0.20 = 30"
        }
    ]
    return sample_data

def check_output_files(output_dir):
    """Check what files were created"""
    print(f"\n📁 Checking output directory: {output_dir}")
    if os.path.exists(output_dir):
        files = os.listdir(output_dir)
        if files:
            print("✅ Output files found:")
            for file in files:
                file_path = os.path.join(output_dir, file)
                size = os.path.getsize(file_path) / (1024*1024) if os.path.isfile(file_path) else 0
                print(f"   📄 {file} ({size:.1f} MB)")
        else:
            print("❌ Output directory exists but is empty")
    else:
        print("❌ Output directory does not exist")

# SIMPLIFIED Main execution - skip quick test, go directly to training
if __name__ == "__main__":
    print("🚀 Starting direct training (skip quick test)...")
    
    # Create trainer and setup model
    trainer = FixedMetacognitiveReuseTrainer()
    trainer.setup_model()
    
    # Use better sample data (4 examples instead of 3)
    training_data = create_sample_data()
    print(f"📊 Using {len(training_data)} training examples")
    
    # Train with timeout protection
    import signal
    def timeout_handler(signum, frame):
        raise TimeoutError("Training taking too long - stopping")
    
    # Set timeout for 10 minutes (600 seconds)
    signal.signal(signal.SIGALRM, timeout_handler)
    signal.alarm(600)
    
    try:
        total_time = trainer.train(training_data)
        signal.alarm(0)  # Cancel timeout
        print(f"🎉 Training finished in {total_time/60:.2f} minutes")
        
        # Check output files
        check_output_files("/kaggle/working/llama3-metacognitive-fast")
        
    except TimeoutError:
        print("⏰ Training stopped due to timeout (10 minutes)")
        signal.alarm(0)
    except Exception as e:
        print(f"💥 Training failed: {e}")
        signal.alarm(0)

2025-09-24 20:36:35.727124: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1758746195.942109      19 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1758746196.002838      19 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


🚀 Starting direct training (skip quick test)...
Loading model and tokenizer...


tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/878 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/20.9k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.46G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

Model loaded on cuda:0
📊 Using 4 training examples
Starting training with 4 examples...
Extracting patterns...
Using all 7 patterns (small dataset)
Pattern extraction took: 0.00s
Preparing training data...


Map:   0%|          | 0/4 [00:00<?, ? examples/s]

Data preparation took: 0.03s
Tokenizing dataset...


Map:   0%|          | 0/4 [00:00<?, ? examples/s]

Tokenization took: 22.14s




Starting training...
❌ Training error: api_key not configured (no-tty). call wandb.login(key=[your_api_key])
✅ Model saved despite error
⏱️ Total execution time: 0.73 minutes
🎉 Training finished in 0.73 minutes

📁 Checking output directory: /kaggle/working/llama3-metacognitive-fast
✅ Output files found:
   📄 config.json (0.0 MB)
   📄 training_args.bin (0.0 MB)
   📄 special_tokens_map.json (0.0 MB)
   📄 model-00001-of-00002.safetensors (4735.8 MB)
   📄 tokenizer.json (16.4 MB)
   📄 chat_template.jinja (0.0 MB)
   📄 model-00002-of-00002.safetensors (1392.1 MB)
   📄 generation_config.json (0.0 MB)
   📄 runs (0.0 MB)
   📄 tokenizer_config.json (0.0 MB)
   📄 model.safetensors.index.json (0.0 MB)


## Check What Your Model Learned

In [3]:
import os
import json
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

def load_model_memory_efficient():
    """Load model with memory optimizations for P100"""
    
    model_path = "/kaggle/working/llama3-metacognitive-fast"
    
    # Memory-efficient loading
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    
    model = AutoModelForCausalLM.from_pretrained(
        model_path,
        torch_dtype=torch.float16,
        device_map="auto",
        low_cpu_mem_usage=True
    )
    
    # Enable memory optimizations
    model.eval()  # Set to evaluation mode
    torch.cuda.empty_cache()  # Clear cache
    
    return tokenizer, model

def generate_with_memory_management(question, tokenizer, model):
    """Generate response with memory management"""
    
    prompt = f"""<|start_header_id|>user<|end_header_id|>

{question}

Think step by step.<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

Reasoning:"""
    
    # Clear cache before generation
    torch.cuda.empty_cache()
    
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=300,  # Shorter to save memory
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            repetition_penalty=1.1
        )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Clear cache after generation
    torch.cuda.empty_cache()
    
    return response

def analyze_training_results():
    """Analyze what the model learned"""
    
    model_path = "/kaggle/working/llama3-metacognitive-fast"
    
    # Check if reuse patterns were saved
    patterns_file = f"{model_path}/reuse_patterns.json"
    if os.path.exists(patterns_file):
        with open(patterns_file, 'r') as f:
            patterns = json.load(f)
        print("📊 Reuse patterns learned:")
        for pattern, count in list(patterns.items())[:5]:  # Show first 5
            print(f"  - {pattern} (count: {count})")
    else:
        print("❌ Reuse patterns file not found")
    
    # Check training configuration
    args_file = f"{model_path}/training_args.bin"
    if os.path.exists(args_file):
        print("✅ Training configuration saved")
    else:
        print("❌ Training args file not found")
    
    # Test with similar but different questions
    test_cases = [
        ("If a plane flies 500 mph for 2 hours, how far?", "distance calculation"),
        ("Solve: 3x + 7 = 22", "algebraic equation"), 
        ("Find 30% of 250", "percentage calculation"),
        ("A boat travels 40 knots for 5 hours. Distance?", "new distance scenario"),
        ("What is 12% of 350?", "new percentage problem")
    ]
    
    print("\n🧪 Testing model on new variations:")
    print("=" * 60)
    
    tokenizer, model = load_model_memory_efficient()
    
    for question, expected_type in test_cases:
        print(f"\n🔍 Testing: {expected_type}")
        print(f"Question: {question}")
        response = generate_with_memory_management(question, tokenizer, model)
        
        # Extract just the reasoning part for cleaner display
        if "Reasoning:" in response:
            reasoning_part = response.split("Reasoning:")[1].split("Answer:")[0] if "Answer:" in response else response.split("Reasoning:")[1]
            print(f"Reasoning: {reasoning_part.strip()}")
        else:
            print(f"Full response: {response}")
        
        print("-" * 40)

# Run the analysis
analyze_training_results()

❌ Reuse patterns file not found
✅ Training configuration saved

🧪 Testing model on new variations:


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]


🔍 Testing: distance calculation
Question: If a plane flies 500 mph for 2 hours, how far?
Reasoning: To find the distance traveled by the plane, we need to multiply the speed (in miles per hour) by the time.

Step 1: Convert the speed from miles per hour to just numbers.
Speed = 500 mph
Time = 2 hours

Step 2: Multiply the two numbers together.
Distance = Speed × Time
= 500 × 2
= 1000
----------------------------------------

🔍 Testing: algebraic equation
Question: Solve: 3x + 7 = 22
Reasoning: To isolate the variable (x), I will subtract 7 from both sides of the equation.

3x + 7 - 7 = 22 - 7
This simplifies to:
3x = 15

Next, I'll divide both sides of the equation by 3 to solve for x:

(3x) / 3 = 15 / 3
This simplifies to:
x = 5
----------------------------------------

🔍 Testing: percentage calculation
Question: Find 30% of 250
Reasoning: To find 30% of a number, multiply the number by 0.3.

Step 1: Convert the percentage to a decimal.
30% = 0.3

Step 2: Multiply 250 by 0.3.
250 * 0