# Sentiment Probe Full Pipeline - Google Colab
## 🆕 Regression-Based Continuous Sentiment Scoring

Complete end-to-end pipeline for **regression-based sentiment probes** on Gemma 3 4B.

### What makes this different?
**Traditional approach**: Binary classification (0 or 1) with sigmoid probabilities  
**Our approach**: **Linear regression** producing continuous sentiment scores (-∞ to +∞)

### Why Regression?
✅ **Smoother predictions** - No sigmoid compression  
✅ **Natural intensity** - Magnitude reflects sentiment strength  
✅ **Better granularity** - Detects subtle shifts in sentiment  
✅ **Unbounded scores** - Can capture extreme emotions  

### Pipeline Steps:
1. ✅ Setup and clone repository
2. 📝 Generate sentiment data (700 positive + 700 negative)
3. 🚀 Capture activations from ALL layers (1-34) in batches
4. 🎯 **Train regression-based sentiment probes** (MSE loss, continuous outputs)
5. 📊 Visualize regression performance across layers
6. 💾 Download trained models

### Features:
- **OOM Prevention**: Process layers in batches of 10 to avoid memory issues
- **Progress Tracking**: Clear ETA and progress bars
- **Auto-backup**: Save to Google Drive after each step
- **Resume Capability**: Can resume from any batch if interrupted
- **Continuous Scoring**: Get sentiment intensity scores, not just classifications

### Requirements:
- Google Colab with GPU (T4 or better)
- Runtime: ~4-6 hours total
- Hugging Face token for Gemma access

### Example Outputs:
```
Text: "I'm absolutely thrilled about this opportunity!"
Score: +2.8  (Strong positive)

Text: "Feeling a bit uncertain about the decision."
Score: -0.4  (Slightly negative)

Text: "This is the worst experience I've ever had."
Score: -3.2  (Very strong negative)
```

## 1️⃣ Check GPU and Setup

In [None]:
# Check GPU
!nvidia-smi

import torch
print("\n" + "="*60)
print("GPU INFORMATION")
print("="*60)
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("⚠️  WARNING: No GPU detected!")
print("="*60)

## 2️⃣ Clone Repository and Install Dependencies

In [None]:
import os
import sys

# Clone repository
repo_url = "https://github.com/ChuloIva/brije.git"
repo_name = "brije"

if not os.path.exists(repo_name):
    print("📥 Cloning repository...")
    !git clone {repo_url}
    print("✅ Repository cloned")
else:
    print("✅ Repository exists")
    !cd {repo_name} && git pull

os.chdir(repo_name)
print(f"\n📁 Working directory: {os.getcwd()}")

In [None]:
# Install dependencies
print("📦 Installing dependencies...\n")
!pip install -q torch transformers h5py scikit-learn tqdm matplotlib seaborn pandas

# Install nnsight
nnsight_dir = "third_party/nnsight"
nnsight_repo = "https://github.com/ndif-team/nnsight"

print("\n📦 Setting up nnsight...")
if not os.path.exists(nnsight_dir) or not os.listdir(nnsight_dir):
    os.makedirs("third_party", exist_ok=True)
    !git clone {nnsight_repo} {nnsight_dir}
    print("   ✅ nnsight cloned")
else:
    print("   ✅ nnsight exists")

!pip install -q -e {nnsight_dir}
print("\n✅ All dependencies installed!")

## 3️⃣ Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Create output directories
drive_output_dir = '/content/drive/MyDrive/brije_sentiment_outputs'
os.makedirs(drive_output_dir, exist_ok=True)
os.makedirs(f"{drive_output_dir}/data", exist_ok=True)
os.makedirs(f"{drive_output_dir}/activations", exist_ok=True)
os.makedirs(f"{drive_output_dir}/probes", exist_ok=True)

print(f"✅ Google Drive mounted")
print(f"   Outputs will be saved to: {drive_output_dir}")

## 4️⃣ Login to Hugging Face

In [None]:
from huggingface_hub import notebook_login
notebook_login()

## 5️⃣ Generate Sentiment Data (700 Positive + 700 Negative)

Uses Ollama locally or generates with Gemma on Colab.

In [None]:
import json
import random
from dataclasses import dataclass, asdict
from tqdm import tqdm

@dataclass
class SentimentExample:
    text: str
    sentiment: str  # "positive" or "negative"
    emotion: str

# Sentiment definitions
SENTIMENTS = {
    "positive": {
        "emotions": [
            "joy", "gratitude", "hope", "excitement", "love", 
            "pride", "contentment", "inspiration", "relief", "satisfaction"
        ]
    },
    "negative": {
        "emotions": [
            "sadness", "anger", "fear", "disgust", "shame",
            "anxiety", "frustration", "disappointment", "guilt", "loneliness"
        ]
    }
}

CONTEXTS = [
    "relationships", "work", "family", "friends", "health",
    "achievements", "hobbies", "learning", "challenges", "life changes"
]

print("✅ Sentiment definitions loaded")

In [None]:
# Load Gemma for data generation
from transformers import AutoTokenizer, AutoModelForCausalLM

print("Loading Gemma for data generation...")
model_name = "google/gemma-3-4b-it"
tokenizer = AutoTokenizer.from_pretrained(model_name)
gen_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype="auto"
)
print("✅ Model loaded")

def generate_sentiment_example(sentiment: str, emotion: str, context: str) -> str:
    """Generate one sentiment example using Gemma"""
    prompt = f"""Generate a brief first-person example expressing {sentiment} sentiment, specifically {emotion}, in the context of {context}.

Requirements:
- 2-3 sentences
- First person (I, my, me)
- Show genuine {emotion}, don't just state it
- Natural and authentic

Example only:"""
    
    inputs = tokenizer(prompt, return_tensors="pt").to(gen_model.device)
    outputs = gen_model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    
    text = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
    return text.strip()

print("✅ Generation function ready")

In [None]:
# Generate sentiment data
print("="*70)
print("GENERATING SENTIMENT DATA")
print("="*70)

examples_per_sentiment = 700
sentiment_data = {"positive": [], "negative": []}

for sentiment in ["positive", "negative"]:
    print(f"\nGenerating {sentiment} examples...")
    emotions = SENTIMENTS[sentiment]["emotions"]
    
    for i in tqdm(range(examples_per_sentiment), desc=sentiment.capitalize()):
        emotion = random.choice(emotions)
        context = random.choice(CONTEXTS)
        
        try:
            text = generate_sentiment_example(sentiment, emotion, context)
            if len(text) > 20:  # Valid example
                sentiment_data[sentiment].append(
                    SentimentExample(text=text, sentiment=sentiment, emotion=emotion)
                )
        except Exception as e:
            print(f"\nError: {e}")
            continue
        
        # Cleanup every 50 examples
        if i % 50 == 0:
            torch.cuda.empty_cache()

print(f"\n✅ Generated {len(sentiment_data['positive'])} positive and {len(sentiment_data['negative'])} negative examples")

# Clean up generation model
del gen_model
torch.cuda.empty_cache()
print("✅ Freed generation model memory")

In [None]:
# Save sentiment data
os.makedirs('third_party/datagen/generated_data', exist_ok=True)

# Save separate files
for sentiment in ['positive', 'negative']:
    filename = f'third_party/datagen/generated_data/{sentiment}_sentiment_{len(sentiment_data[sentiment])}.jsonl'
    with open(filename, 'w') as f:
        for ex in sentiment_data[sentiment]:
            f.write(json.dumps(asdict(ex)) + '\n')
    print(f"✅ Saved {filename}")

# Save combined
combined_file = 'third_party/datagen/generated_data/sentiment_combined_1400.jsonl'
with open(combined_file, 'w') as f:
    for sentiment in ['positive', 'negative']:
        for ex in sentiment_data[sentiment]:
            f.write(json.dumps(asdict(ex)) + '\n')

print(f"\n✅ Saved combined file: {combined_file}")

# Backup to Drive
!cp third_party/datagen/generated_data/*.jsonl {drive_output_dir}/data/
print("✅ Backed up to Google Drive")

## 6️⃣ Capture Activations from ALL Layers (1-34)

Process in batches of 10 layers to avoid OOM:
- Batch 1: Layers 1-10
- Batch 2: Layers 11-20
- Batch 3: Layers 21-30
- Batch 4: Layers 31-34

**Note**: You can skip already captured layers (21-30) by modifying the batches list.

In [None]:
import time

# Configuration
CAPTURE_CONFIG = {
    'model': 'google/gemma-3-4b-it',
    'dataset': combined_file,
    'device': 'auto',
    'batch_size': 1000,
    
    # Define layer batches (10 layers each to avoid OOM)
    'layer_batches': [
        list(range(1, 11)),   # Batch 1: Layers 1-10
        list(range(11, 21)),  # Batch 2: Layers 11-20
        list(range(21, 31)),  # Batch 3: Layers 21-30 (already done, can skip)
        list(range(31, 35))   # Batch 4: Layers 31-34
    ]
}

# Remove batch 3 if you already have layers 21-30 from cognitive actions
# CAPTURE_CONFIG['layer_batches'] = [b for i, b in enumerate(CAPTURE_CONFIG['layer_batches']) if i != 2]

print("Capture configuration:")
print(f"  Model: {CAPTURE_CONFIG['model']}")
print(f"  Dataset: {CAPTURE_CONFIG['dataset']}")
print(f"  Total batches: {len(CAPTURE_CONFIG['layer_batches'])}")
print(f"  Total layers: {sum(len(b) for b in CAPTURE_CONFIG['layer_batches'])}")
print("\nBatches:")
for i, batch in enumerate(CAPTURE_CONFIG['layer_batches'], 1):
    print(f"  Batch {i}: Layers {batch[0]}-{batch[-1]} ({len(batch)} layers)")

In [None]:
# Run activation capture for each batch
print("\n" + "="*70)
print("STARTING ACTIVATION CAPTURE")
print("="*70)

total_start = time.time()

for batch_idx, layer_batch in enumerate(CAPTURE_CONFIG['layer_batches'], 1):
    batch_start = time.time()
    
    print(f"\n{'='*70}")
    print(f"BATCH {batch_idx}/{len(CAPTURE_CONFIG['layer_batches'])}: Layers {layer_batch[0]}-{layer_batch[-1]}")
    print(f"{'='*70}")
    
    # Build command
    cmd = [
        'python', 'src/probes/capture_activations.py',
        '--dataset', CAPTURE_CONFIG['dataset'],
        '--output-dir', 'data/activations/sentiment',
        '--model', CAPTURE_CONFIG['model'],
        '--layers', *[str(l) for l in layer_batch],
        '--device', CAPTURE_CONFIG['device'],
        '--format', 'hdf5',
        '--single-pass',  # Optimized mode
        '--batch-size', str(CAPTURE_CONFIG['batch_size'])
    ]
    
    # Run capture
    !{' '.join(cmd)}
    
    batch_elapsed = time.time() - batch_start
    print(f"\n✅ Batch {batch_idx} complete in {batch_elapsed/60:.1f} minutes")
    
    # Backup to Google Drive
    print("\n📥 Backing up to Google Drive...")
    !cp -r data/activations/sentiment/* {drive_output_dir}/activations/
    print("✅ Backup complete")
    
    # Cleanup between batches
    torch.cuda.empty_cache()
    print("🧹 Cleared GPU memory\n")

total_elapsed = time.time() - total_start

print(f"\n{'='*70}")
print("✅ ALL ACTIVATION CAPTURE COMPLETE")
print(f"{'='*70}")
print(f"Total time: {total_elapsed/60:.1f} minutes ({total_elapsed/3600:.2f} hours)")
print(f"Layers captured: {sum(len(b) for b in CAPTURE_CONFIG['layer_batches'])}")

## 7️⃣ Train Regression-Based Sentiment Probes

Train **linear regression probes** for each layer to predict continuous sentiment scores.

### Why Regression Instead of Classification?

**Classification (0 or 1)**:
- Binary output: positive=1, negative=0
- Sharp boundary, no nuance
- Probabilities from sigmoid (0.0-1.0)

**Regression (continuous scores)**:
- Continuous output: -3 to +3 (unbounded)
- Smooth transitions, captures intensity
- Natural interpretation: negative scores = negative sentiment, positive scores = positive sentiment
- Better for detecting subtle sentiment shifts

### Score Interpretation:
- **Strong negative**: -2.5 to -1.5
- **Mild negative**: -1.5 to -0.5
- **Neutral**: -0.5 to +0.5
- **Mild positive**: +0.5 to +1.5
- **Strong positive**: +1.5 to +2.5

In [None]:
# Training configuration for REGRESSION probes
TRAIN_CONFIG = {
    'batch_size': 32,
    'epochs': 50,
    'learning_rate': 0.0005,
    'weight_decay': 0.001,
    'early_stopping_patience': 10,
    'use_scheduler': True,
    'device': 'auto'
}

print("Regression Training Configuration:")
print("="*60)
for key, value in TRAIN_CONFIG.items():
    print(f"  {key:25s}: {value}")
print("="*60)
print("\n💡 Using MSE loss (Mean Squared Error) for continuous prediction")
print("💡 Targets: negative=-1, positive=+1 (will extrapolate beyond)")
print("💡 Output: Unbounded continuous scores (smoother than sigmoid)")

In [None]:
# Get list of captured layers
import glob

activation_files = sorted(glob.glob('data/activations/sentiment/layer_*_activations.h5'))
captured_layers = [int(f.split('layer_')[1].split('_')[0]) for f in activation_files]

print(f"Found {len(captured_layers)} captured layers")
if captured_layers:
    print(f"  Layers: {captured_layers[:5]}...{captured_layers[-5:]}" if len(captured_layers) > 10 else f"  Layers: {captured_layers}")

print("\n" + "="*70)
print("🚀 TRAINING REGRESSION-BASED SENTIMENT PROBES")
print("="*70)
print("Using: src/probes/sentiment_regression_probe.py")
print("Output: Continuous sentiment scores (-∞ to +∞)")
print("="*70 + "\n")

train_start = time.time()
layer_results = []

for layer_idx in captured_layers:
    layer_start = time.time()
    
    print(f"\n{'='*70}")
    print(f"Training Layer {layer_idx} ({captured_layers.index(layer_idx) + 1}/{len(captured_layers)})")
    print(f"{'='*70}")
    
    activation_file = f"data/activations/sentiment/layer_{layer_idx}_activations.h5"
    output_dir = f"data/probes_regression/sentiment/layer_{layer_idx}"
    
    if not os.path.exists(activation_file):
        print(f"⚠️  Skipping - activation file not found")
        continue
    
    # Build training command for REGRESSION probe
    cmd = [
        'python', 'src/probes/sentiment_regression_probe.py',
        '--activations', activation_file,
        '--output-dir', output_dir,
        '--batch-size', str(TRAIN_CONFIG['batch_size']),
        '--epochs', str(TRAIN_CONFIG['epochs']),
        '--lr', str(TRAIN_CONFIG['learning_rate']),
        '--weight-decay', str(TRAIN_CONFIG['weight_decay']),
        '--early-stopping-patience', str(TRAIN_CONFIG['early_stopping_patience']),
        '--device', TRAIN_CONFIG['device']
    ]
    
    if not TRAIN_CONFIG['use_scheduler']:
        cmd.append('--no-scheduler')
    
    # Run regression training
    !{' '.join(cmd)}
    
    layer_elapsed = time.time() - layer_start
    
    # Load regression metrics
    metrics_file = f"{output_dir}/metrics.json"
    if os.path.exists(metrics_file):
        with open(metrics_file, 'r') as f:
            metrics = json.load(f)
        
        layer_results.append({
            'layer': layer_idx,
            'mse': metrics['mse'],
            'mae': metrics['mae'],
            'r2': metrics['r2'],
            'accuracy': metrics['accuracy'],  # Binary accuracy at threshold 0
            'score_range': (metrics['min_prediction'], metrics['max_prediction']),
            'time_minutes': layer_elapsed / 60
        })
        
        print(f"\n✅ Layer {layer_idx} complete in {layer_elapsed/60:.1f} minutes")
        print(f"   MSE: {metrics['mse']:.4f}, MAE: {metrics['mae']:.4f}, "
              f"R²: {metrics['r2']:.4f}, Accuracy: {metrics['accuracy']:.4f}")
        print(f"   Score range: [{metrics['min_prediction']:.2f}, {metrics['max_prediction']:.2f}]")
    
    # Backup to Drive
    !mkdir -p {drive_output_dir}/probes_regression/
    !cp -r {output_dir} {drive_output_dir}/probes_regression/

train_elapsed = time.time() - train_start

print(f"\n{'='*70}")
print("✅ ALL REGRESSION TRAINING COMPLETE")
print(f"{'='*70}")
print(f"Total time: {train_elapsed/60:.1f} minutes ({train_elapsed/3600:.2f} hours)")
print(f"Trained {len(layer_results)} layers with continuous sentiment scoring")
print(f"{'='*70}")

## 8️⃣ Visualize Performance Across Layers

In [None]:
import matplotlib.pyplot as plt
import numpy as np

if layer_results:
    fig, axes = plt.subplots(2, 2, figsize=(16, 10))
    
    layers = [r['layer'] for r in layer_results]
    r2_scores = [r['r2'] for r in layer_results]
    mae_scores = [r['mae'] for r in layer_results]
    accuracy_scores = [r['accuracy'] for r in layer_results]
    score_ranges = [r['score_range'][1] - r['score_range'][0] for r in layer_results]
    
    # Plot 1: R² Score (coefficient of determination)
    axes[0, 0].plot(layers, r2_scores, 'b-o', linewidth=2, markersize=6)
    axes[0, 0].axhline(y=np.mean(r2_scores), color='r', linestyle='--', alpha=0.5, label='Mean')
    axes[0, 0].set_xlabel('Layer', fontsize=12)
    axes[0, 0].set_ylabel('R² Score', fontsize=12)
    axes[0, 0].set_title('Regression Quality Across Layers (R²)', fontsize=14, fontweight='bold')
    axes[0, 0].grid(True, alpha=0.3)
    axes[0, 0].legend()
    
    # Mark best layer
    best_idx = np.argmax(r2_scores)
    axes[0, 0].annotate(f'Best: {layers[best_idx]}\nR²={r2_scores[best_idx]:.4f}',
                       xy=(layers[best_idx], r2_scores[best_idx]),
                       xytext=(10, -20), textcoords='offset points',
                       bbox=dict(boxstyle='round,pad=0.5', facecolor='yellow', alpha=0.7),
                       arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0'))
    
    # Plot 2: MAE (Mean Absolute Error)
    axes[0, 1].plot(layers, mae_scores, 'g-s', linewidth=2, markersize=6)
    axes[0, 1].axhline(y=np.mean(mae_scores), color='r', linestyle='--', alpha=0.5, label='Mean')
    axes[0, 1].set_xlabel('Layer', fontsize=12)
    axes[0, 1].set_ylabel('Mean Absolute Error', fontsize=12)
    axes[0, 1].set_title('Prediction Error Across Layers (MAE)', fontsize=14, fontweight='bold')
    axes[0, 1].grid(True, alpha=0.3)
    axes[0, 1].legend()
    axes[0, 1].invert_yaxis()  # Lower is better
    
    # Plot 3: Classification Accuracy (at threshold 0)
    axes[1, 0].plot(layers, accuracy_scores, 'purple', marker='D', linewidth=2, markersize=6)
    axes[1, 0].axhline(y=np.mean(accuracy_scores), color='r', linestyle='--', alpha=0.5, label='Mean')
    axes[1, 0].set_xlabel('Layer', fontsize=12)
    axes[1, 0].set_ylabel('Binary Accuracy', fontsize=12)
    axes[1, 0].set_title('Classification Accuracy (threshold=0)', fontsize=14, fontweight='bold')
    axes[1, 0].grid(True, alpha=0.3)
    axes[1, 0].legend()
    axes[1, 0].set_ylim([0.5, 1.0])
    
    # Plot 4: Score Range (dynamic range)
    axes[1, 1].plot(layers, score_ranges, 'orange', marker='^', linewidth=2, markersize=6)
    axes[1, 1].axhline(y=np.mean(score_ranges), color='r', linestyle='--', alpha=0.5, label='Mean')
    axes[1, 1].set_xlabel('Layer', fontsize=12)
    axes[1, 1].set_ylabel('Score Range', fontsize=12)
    axes[1, 1].set_title('Dynamic Range of Predictions', fontsize=14, fontweight='bold')
    axes[1, 1].grid(True, alpha=0.3)
    axes[1, 1].legend()
    
    plt.tight_layout()
    plt.savefig('sentiment_regression_layer_comparison.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    # Save to Drive
    !cp sentiment_regression_layer_comparison.png {drive_output_dir}/
    
    print("\n📊 Regression Performance Summary:")
    print("="*70)
    print(f"  Best layer (R²): {layers[best_idx]} (R²={r2_scores[best_idx]:.4f})")
    print(f"  Mean R²: {np.mean(r2_scores):.4f} ± {np.std(r2_scores):.4f}")
    print(f"  Mean MAE: {np.mean(mae_scores):.4f} ± {np.std(mae_scores):.4f}")
    print(f"  Mean Accuracy: {np.mean(accuracy_scores):.4f} ± {np.std(accuracy_scores):.4f}")
    print(f"  Mean Score Range: {np.mean(score_ranges):.2f} ± {np.std(score_ranges):.2f}")
    print("="*70)
    
    # Show example score ranges
    print("\n💡 Typical Sentiment Score Ranges by Layer:")
    for i, layer_idx in enumerate(layers[:5]):  # Show first 5
        min_score, max_score = layer_results[i]['score_range']
        print(f"  Layer {layer_idx:2d}: [{min_score:+.2f}, {max_score:+.2f}]")
    if len(layers) > 5:
        print(f"  ... ({len(layers)-5} more layers)")
        
else:
    print("⚠️  No results to visualize")

## 9️⃣ Download Trained Models

In [None]:
from google.colab import files

# Find best performing layer (by R² score)
if layer_results:
    best_layer = max(layer_results, key=lambda x: x['r2'])['layer']
    
    print(f"Creating download package for best layer ({best_layer})...")
    print(f"  R²: {max(layer_results, key=lambda x: x['r2'])['r2']:.4f}")
    print(f"  MAE: {max(layer_results, key=lambda x: x['r2'])['mae']:.4f}")
    
    # Create zip of best layer
    best_layer_zip = f'sentiment_regression_probes_layer_{best_layer}.zip'
    !cd data/probes_regression/sentiment && zip -r ../../../{best_layer_zip} layer_{best_layer}/ -q
    
    print(f"\n📥 Downloading {best_layer_zip}...")
    files.download(best_layer_zip)
    
    print("\n✅ Download complete!")
    print(f"\nPackage contains:")
    print(f"  • Layer {best_layer} regression sentiment probe")
    print(f"  • Outputs continuous scores (not 0-1 probabilities!)")
    print(f"  • Performance metrics (MSE, MAE, R²)")
    print(f"  • Training history")
    
    print("\n💡 Usage example:")
    print(f"```python")
    print(f"# Load the probe")
    print(f"probe, metadata = load_probe('sentiment_regression_probe.pth')")
    print(f"")
    print(f"# Get continuous sentiment score")
    print(f"score = probe.predict(activations)  # Returns: -2.5 to +2.5 (unbounded)")
    print(f"")
    print(f"# Interpret:")
    print(f"# Negative scores = negative sentiment")
    print(f"# Positive scores = positive sentiment")
    print(f"# Magnitude = intensity")
    print(f"```")
    
    print("\n💡 To download all layers, uncomment and run:")
    print("  # !cd data/probes_regression && zip -r ../../sentiment_regression_all_layers.zip sentiment/ -q")
    print("  # files.download('sentiment_regression_all_layers.zip')")
else:
    print("⚠️  No trained models to download")

## 🎉 Pipeline Complete!

### What was accomplished:
1. ✅ Generated 1,400 sentiment examples (700 positive + 700 negative)
2. ✅ Captured activations from all 34 Gemma layers
3. ✅ **Trained REGRESSION-BASED sentiment probes for each layer**
4. ✅ Visualized regression performance across layers
5. ✅ Backed up everything to Google Drive

### 🆕 Key Difference: Regression vs Classification

**Traditional Binary Classification**:
- Output: 0 or 1 (negative or positive)
- Probabilities: 0.0-1.0 via sigmoid
- Sharp boundary, no intensity information

**Our Regression Approach** ⭐:
- Output: Continuous scores (-∞ to +∞)
- Natural interpretation: sign indicates polarity, magnitude indicates intensity
- Smooth transitions, captures subtle sentiment shifts
- Examples:
  - `-2.5` = Very negative
  - `-0.3` = Slightly negative
  - `+0.2` = Slightly positive
  - `+2.8` = Very positive

### Files saved to Google Drive:
- `{drive_output_dir}/data/` - Generated sentiment data
- `{drive_output_dir}/activations/` - Layer activations
- `{drive_output_dir}/probes_regression/` - **Regression-based probes**
- `{drive_output_dir}/sentiment_regression_layer_comparison.png` - Performance visualization

### How to use the regression probes:

```python
# Load probe
from src.probes.probe_models import load_probe
probe, metadata = load_probe('sentiment_regression_probe.pth')

# Get activation from text
from src.probes.capture_activations import ActivationCapture
capture = ActivationCapture('google/gemma-3-4b-it', layers_to_capture=[best_layer])
activation = capture.capture_single_example(text, best_layer)

# Predict continuous sentiment score
score = probe.predict(activation.unsqueeze(0))
# Returns: tensor([[-1.85]]) for negative or tensor([[+2.31]]) for positive

# Interpret the score:
if score < -1.5:
    print(f"Strong negative sentiment: {score:.2f}")
elif score < -0.5:
    print(f"Mild negative sentiment: {score:.2f}")
elif score < 0.5:
    print(f"Neutral sentiment: {score:.2f}")
elif score < 1.5:
    print(f"Mild positive sentiment: {score:.2f}")
else:
    print(f"Strong positive sentiment: {score:.2f}")
```

### Next steps:
1. Download trained regression probes for local use
2. Integrate with existing cognitive action probes
3. Use continuous scores for more nuanced sentiment analysis
4. Create visualizations showing sentiment intensity over time
5. Experiment with different score thresholds for your application

### Advantages of regression-based probes:
✅ Smoother predictions (no sigmoid compression)  
✅ Natural intensity interpretation  
✅ Better for detecting subtle sentiment shifts  
✅ Can detect extreme sentiments (scores beyond ±1)  
✅ More suitable for continuous sentiment tracking