# 🧠 MCP Memory Auto-Trigger Training on Google Colab A100

This notebook trains a **WORLD-CLASS** auto-trigger model using **47K+ ULTIMATE examples** with **68% real data**.

## 🎯 **ULTIMATE DATASET**
**Dataset ID**: `PiGrieco/mcp-memory-auto-trigger-ultimate`

**Requirements:**
- Google Colab Pro/Pro+ with A100 GPU  
- Hugging Face token (already available)
- ~3-4 hours training time

## 📊 **Dataset Composition (47,516 examples):**
- **BANKING77**: 13,083 examples (27.5%) - Real financial data
- **CLINC150**: 19,222 examples (40.5%) - Real intent classification
- **Synthetic Original**: 5,255 examples (11.1%) - Advanced generation
- **Synthetic Advanced**: 9,956 examples (21.0%) - English-optimized

## 🌟 **WORLD-CLASS Quality:**
- ✅ **68% Real Data** (exceptional quality!)
- ✅ **100% Unique** (zero duplicates)
- ✅ **100% English** (consistent language)
- ✅ **Balanced Classes** (optimal distribution)

## 📈 **Expected Performance:**
- **Accuracy**: >**90%** (world-class!)
- **F1-Score**: >**88%**
- **Training Time**: 3-4 hours on A100
- **Production Ready**: Immediate deployment

**Ready for WORLD-CLASS results!** 🌟


In [None]:
# 🚀 Install required packages for WORLD-CLASS training
!pip install datasets transformers torch accelerate evaluate scikit-learn huggingface_hub wandb

# Import libraries
import torch
import pandas as pd
import numpy as np
from datasets import load_dataset, Dataset, DatasetDict
from transformers import (
    AutoTokenizer, 
    AutoModelForSequenceClassification,
    TrainingArguments, 
    Trainer,
    DataCollatorWithPadding
)
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix
import evaluate
from huggingface_hub import login
import wandb
import warnings
warnings.filterwarnings('ignore')

print("🚀 Libraries imported successfully!")
print(f"⚡ PyTorch version: {torch.__version__}")
print(f"🔥 CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"🎯 GPU: {torch.cuda.get_device_name(0)}")
    print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f}GB")
    print("✅ Ready for WORLD-CLASS training!")
else:
    print("⚠️ No GPU detected - training will be slower")


In [None]:
# 📂 Load ULTIMATE Dataset from Hugging Face Hub
print("📂 Loading ULTIMATE dataset...")

# The dataset is already public, no token needed for loading
dataset = load_dataset("PiGrieco/mcp-memory-auto-trigger-ultimate")

print("✅ Dataset loaded successfully!")
print(f"📊 Dataset splits: {list(dataset.keys())}")

# Show dataset info
for split_name, split_data in dataset.items():
    print(f"  📋 {split_name}: {len(split_data):,} examples")

# Analyze the dataset
train_data = dataset['train']
print(f"\n🔍 Dataset Analysis:")
print(f"  📝 Sample text: \"{train_data[0]['text']}\"")
print(f"  🎯 Label: {train_data[0]['label']} ({train_data[0]['label_name']})")
print(f"  📚 Source: {train_data[0].get('source', 'unknown')}")

# Check label distribution
labels = [ex['label'] for ex in train_data]
from collections import Counter
label_counts = Counter(labels)
label_names = {0: "SAVE_MEMORY", 1: "SEARCH_MEMORY", 2: "NO_ACTION"}

print(f"\n📊 Label Distribution:")
for label, count in label_counts.items():
    label_name = label_names.get(label, f"UNKNOWN_{label}")
    percentage = (count / len(train_data)) * 100
    print(f"  {label_name}: {count:,} examples ({percentage:.1f}%)")

print(f"\n🌟 ULTIMATE dataset ready for WORLD-CLASS training!")


In [None]:
# 🔐 Login to Hugging Face Hub for model upload
from huggingface_hub import login
import os

# Use your token for uploading the trained model
# Set your token in Colab: !export HUGGINGFACE_TOKEN=your_token_here
HF_TOKEN = os.getenv("HUGGINGFACE_TOKEN", "your_hf_token_here")
if HF_TOKEN == "your_hf_token_here":
    print("⚠️ Please set your HuggingFace token:")
    print("!export HUGGINGFACE_TOKEN=your_actual_token")
    print("Or set it manually: HF_TOKEN = 'your_token'")
else:
    login(token=HF_TOKEN)
    print("✅ Logged in to Hugging Face Hub!")
    print("🔄 Ready to upload trained model after training...")


In [None]:
# 🤖 Load Model and Tokenizer for Training
print("🤖 Loading DistilBERT model and tokenizer...")

# Model configuration
MODEL_NAME = "distilbert-base-uncased"
NUM_LABELS = 3  # SAVE_MEMORY, SEARCH_MEMORY, NO_ACTION

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Load model for sequence classification
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    num_labels=NUM_LABELS,
    id2label={0: "SAVE_MEMORY", 1: "SEARCH_MEMORY", 2: "NO_ACTION"},
    label2id={"SAVE_MEMORY": 0, "SEARCH_MEMORY": 1, "NO_ACTION": 2}
)

print(f"✅ Model loaded: {MODEL_NAME}")
print(f"🎯 Number of labels: {NUM_LABELS}")
print(f"⚙️ Model parameters: {model.num_parameters():,}")

# Check model is on GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
print(f"🔥 Model moved to: {device}")


In [None]:
# 🔄 Tokenize Dataset for Training
print("🔄 Tokenizing dataset...")

def tokenize_function(examples):
    """Tokenize the text for BERT-style models"""
    return tokenizer(
        examples['text'],
        truncation=True,
        padding=True,
        max_length=512,
        return_tensors=None
    )

# Tokenize all splits
print("  📝 Tokenizing train set...")
tokenized_train = dataset['train'].map(tokenize_function, batched=True)

print("  📝 Tokenizing validation set...")
tokenized_val = dataset['validation'].map(tokenize_function, batched=True)

print("  📝 Tokenizing test set...")
tokenized_test = dataset['test'].map(tokenize_function, batched=True)

# Set format for PyTorch
tokenized_train.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])
tokenized_val.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])
tokenized_test.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])

print("✅ Dataset tokenization complete!")
print(f"  📊 Train: {len(tokenized_train):,} examples")
print(f"  📊 Validation: {len(tokenized_val):,} examples") 
print(f"  📊 Test: {len(tokenized_test):,} examples")

# Data collator for dynamic padding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)


In [None]:
# 📊 Setup Evaluation Metrics
print("📊 Setting up evaluation metrics...")

# Load evaluation metric
accuracy_metric = evaluate.load("accuracy")
f1_metric = evaluate.load("f1")

def compute_metrics(eval_pred):
    """Compute metrics for evaluation"""
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    
    # Calculate metrics
    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)
    f1_macro = f1_metric.compute(predictions=predictions, references=labels, average='macro')
    f1_weighted = f1_metric.compute(predictions=predictions, references=labels, average='weighted')
    
    # Additional metrics
    precision, recall, f1_per_class, support = precision_recall_fscore_support(
        labels, predictions, average=None
    )
    
    # Class names for detailed metrics
    class_names = ["SAVE_MEMORY", "SEARCH_MEMORY", "NO_ACTION"]
    
    metrics = {
        'accuracy': accuracy['accuracy'],
        'f1_macro': f1_macro['f1'],
        'f1_weighted': f1_weighted['f1'],
    }
    
    # Add per-class metrics
    for i, class_name in enumerate(class_names):
        metrics[f'f1_{class_name}'] = f1_per_class[i]
        metrics[f'precision_{class_name}'] = precision[i]
        metrics[f'recall_{class_name}'] = recall[i]
    
    return metrics

print("✅ Evaluation metrics configured!")
print("  🎯 Metrics: Accuracy, F1 (macro/weighted), Per-class Precision/Recall")


In [None]:
# 🏋️ Training Configuration
print("🏋️ Setting up training configuration...")

# Training arguments optimized for A100
training_args = TrainingArguments(
    output_dir='./mcp-memory-auto-trigger-results',
    num_train_epochs=3,                    # 3 epochs for high-quality dataset
    per_device_train_batch_size=32,        # Large batch size for A100
    per_device_eval_batch_size=64,         # Larger batch for evaluation
    warmup_steps=500,                      # Warmup steps
    weight_decay=0.01,                     # Regularization
    logging_dir='./logs',
    logging_steps=100,                     # Log every 100 steps
    evaluation_strategy="steps",           # Evaluate during training
    eval_steps=500,                        # Evaluate every 500 steps
    save_strategy="steps",                 # Save checkpoints
    save_steps=1000,                       # Save every 1000 steps
    load_best_model_at_end=True,          # Load best model
    metric_for_best_model="f1_macro",     # Use F1 macro for best model
    greater_is_better=True,               # Higher F1 is better
    report_to=None,                       # Disable wandb for now
    push_to_hub=False,                    # We'll push manually later
    dataloader_num_workers=2,             # Parallel data loading
    gradient_accumulation_steps=1,        # No gradient accumulation needed with large batches
    learning_rate=2e-5,                   # Standard learning rate for DistilBERT
    adam_epsilon=1e-8,                    # Adam optimizer epsilon
    max_grad_norm=1.0,                    # Gradient clipping
    seed=42,                              # Reproducibility
    fp16=True,                            # Mixed precision training for speed
)

print("✅ Training configuration set!")
print(f"  🎯 Epochs: {training_args.num_train_epochs}")
print(f"  📦 Batch size: {training_args.per_device_train_batch_size}")
print(f"  📈 Learning rate: {training_args.learning_rate}")
print(f"  ⚡ Mixed precision: {training_args.fp16}")
print(f"  💾 Output dir: {training_args.output_dir}")


In [None]:
# 🚀 Initialize Trainer and Start Training
print("🚀 Initializing trainer...")

# Create trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_val,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("✅ Trainer initialized!")
print("🎯 Ready to start training...")

# Print training info
total_train_steps = len(tokenized_train) // training_args.per_device_train_batch_size * training_args.num_train_epochs
print(f"📊 Total training steps: {total_train_steps:,}")
print(f"⏱️ Estimated training time: ~3-4 hours on A100")

print("\n" + "="*50)
print("🔥 STARTING TRAINING!")
print("="*50)

# Start training
trainer.train()

print("\n" + "="*50)
print("🎉 TRAINING COMPLETED!")
print("="*50)


In [None]:
# 📊 Final Evaluation on Test Set
print("📊 Evaluating on test set...")

# Evaluate on test set
test_results = trainer.evaluate(tokenized_test)

print("✅ Test evaluation completed!")
print("\n🎯 **FINAL RESULTS:**")

# Print key metrics
print(f"  📈 Accuracy: {test_results['eval_accuracy']:.4f} ({test_results['eval_accuracy']*100:.2f}%)")
print(f"  🎯 F1 Macro: {test_results['eval_f1_macro']:.4f}")
print(f"  ⚖️ F1 Weighted: {test_results['eval_f1_weighted']:.4f}")

print(f"\n📋 **PER-CLASS RESULTS:**")
class_names = ["SAVE_MEMORY", "SEARCH_MEMORY", "NO_ACTION"]
for class_name in class_names:
    f1 = test_results[f'eval_f1_{class_name}']
    precision = test_results[f'eval_precision_{class_name}']
    recall = test_results[f'eval_recall_{class_name}']
    print(f"  {class_name}:")
    print(f"    F1: {f1:.4f} | Precision: {precision:.4f} | Recall: {recall:.4f}")

# Generate predictions for confusion matrix
predictions = trainer.predict(tokenized_test)
y_pred = np.argmax(predictions.predictions, axis=1)
y_true = predictions.label_ids

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns

# Plot confusion matrix
plt.figure(figsize=(8, 6))
cm = confusion_matrix(y_true, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=class_names, yticklabels=class_names)
plt.title('Confusion Matrix - MCP Memory Auto-Trigger')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

# Detailed classification report
print(f"\n📄 **DETAILED CLASSIFICATION REPORT:**")
print(classification_report(y_true, y_pred, target_names=class_names))


In [None]:
# 📤 Upload Trained Model to Hugging Face Hub
print("📤 Uploading trained model to Hugging Face Hub...")

# Define model repository name
MODEL_REPO_NAME = "mcp-memory-auto-trigger-model"
MODEL_REPO_ID = f"PiGrieco/{MODEL_REPO_NAME}"

print(f"🎯 Uploading to: {MODEL_REPO_ID}")

# Push model and tokenizer to hub
trainer.push_to_hub(
    repo_id=MODEL_REPO_ID,
    commit_message="🚀 Trained MCP Memory Auto-Trigger model on 47K world-class examples",
    private=False  # Make it public
)

print("✅ Model uploaded successfully!")

# Create model card content
model_card_content = f"""
# MCP Memory Auto-Trigger Model

## Model Description

This model was trained to automatically decide when to save information to memory, search existing memory, or take no action based on user conversations. It's designed for intelligent memory management in AI assistants.

## Training Data

- **Dataset**: PiGrieco/mcp-memory-auto-trigger-ultimate  
- **Total Examples**: 47,516
- **Real Data**: 68% (BANKING77, CLINC150)
- **Synthetic Data**: 32% (high-quality generated)
- **Language**: English

## Performance

- **Accuracy**: {test_results['eval_accuracy']:.4f} ({test_results['eval_accuracy']*100:.2f}%)
- **F1 Macro**: {test_results['eval_f1_macro']:.4f}
- **F1 Weighted**: {test_results['eval_f1_weighted']:.4f}

## Classes

- **SAVE_MEMORY** (0): Save important information to memory
- **SEARCH_MEMORY** (1): Search for existing information in memory  
- **NO_ACTION** (2): Normal conversation requiring no memory action

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("PiGrieco/{MODEL_REPO_NAME}")
model = AutoModelForSequenceClassification.from_pretrained("PiGrieco/{MODEL_REPO_NAME}")

# Example usage
text = "I need to remember this configuration setting for later"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

class_names = ["SAVE_MEMORY", "SEARCH_MEMORY", "NO_ACTION"]
print(f"Predicted action: {{class_names[predicted_class]}}")
print(f"Confidence: {{predictions[0][predicted_class]:.4f}}")
```

## Training Details

- **Base Model**: distilbert-base-uncased
- **Training Framework**: Hugging Face Transformers
- **Hardware**: Google Colab A100 GPU
- **Training Time**: ~3-4 hours
- **Epochs**: 3
- **Batch Size**: 32
- **Learning Rate**: 2e-5

## Intended Use

This model is designed for production use in MCP Memory Server systems to intelligently trigger memory operations based on conversational context.
"""

# Save model card
from huggingface_hub import HfApi
api = HfApi()

api.upload_file(
    path_or_fileobj=model_card_content.encode(),
    path_in_repo="README.md",
    repo_id=MODEL_REPO_ID,
    repo_type="model",
    token=HF_TOKEN
)

print("✅ Model card uploaded!")

print(f"\n🎉 **DEPLOYMENT COMPLETE!**")
print(f"🔗 **Model URL**: https://huggingface.co/{MODEL_REPO_ID}")
print(f"📊 **Performance**: {test_results['eval_accuracy']*100:.2f}% accuracy")
print(f"🚀 **Ready for production use!**")
