# Wine_Hybrid_Iterative: Iterative Truth-Biased Training

This notebook implements an iterative approach to improve hypothesis selection precision:

**Iteration 1: Unbiased Signal Extraction (same as Wine_Hybrid Phase 1)**
- Train on ALL hypotheses equally (no selection)
- Use Adaptive Context Selection to score hypotheses
- Top 30% has ~68% precision (vs 33% random baseline)

**Iteration 2: Biased Training**
- Train a NEW model on:
  - Top 30% highest-scoring samples from Iteration 1
  - Partial data with upweighting (~25% of effective training)
- This creates a "truth-biased" model (trained on >70% correct data)

**Iteration 3: Apply Biased Model to Remaining Data**
- Keep Iteration 2 model frozen
- Compute gradients AND losses on remaining 70% of data
- Theory: Truth-biased model should produce better separation signals

**Key Insight**: Once the model is biased toward correct hypotheses, loss becomes a useful signal for distinguishing correct vs incorrect hypotheses on unseen data.

In [1]:
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
from tqdm import tqdm
import sys
sys.path.insert(0, '../')
sys.path.insert(0, '../GGH')

from GGH.data_ops import DataOperator
from GGH.selection_algorithms import AlgoModulators, compute_individual_grads_nothread
from GGH.models import initialize_model, load_model
from GGH.train_val_loop import TrainValidationManager
from GGH.inspector import Inspector, visualize_train_val_error, selection_histograms
from GGH.custom_optimizer import CustomAdam
from sklearn.metrics import r2_score
from torch.autograd import grad
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

def set_to_deterministic(rand_state):
    import random
    random.seed(rand_state)
    np.random.seed(rand_state)
    torch.manual_seed(rand_state)
    torch.set_num_threads(1)
    torch.use_deterministic_algorithms(True)

print("Imports successful!")

Imports successful!


In [2]:
# Data configuration
data_path = '../data/wine/red_wine.csv'
results_path = "../saved_results/Red Wine Hybrid Iterative"
inpt_vars = ['volatile acidity', 'total sulfur dioxide', 'citric acid'] 
target_vars = ['quality']
miss_vars = ['alcohol']

# Hypothesis values (3-class)
hypothesis = [[9.4, 10.5, 12.0]]

# Model parameters
hidden_size = 32
output_size = len(target_vars)
hyp_per_sample = len(hypothesis[0])
batch_size = 100 * hyp_per_sample

# Training parameters
partial_perc = 0.025  # 2.5% complete data
rand_state = 0
lr = 0.001

# Iteration 1 parameters
iter1_epochs = 60
iter1_analysis_epochs = 5  # Track last 5 epochs

# Iteration 2 parameters
iter2_epochs = 30  # Same training duration
top_percentile = 30  # Use top 30% from Iteration 1
partial_target_ratio = 0.25  # Partial should be ~25% of effective training

# Iteration 3 parameters
iter3_analysis_epochs = 5  # Track last 5 epochs for remaining data

# Create directories
import os
os.makedirs(results_path, exist_ok=True)
for folder in ['iteration1', 'iteration2', 'iteration3']:
    os.makedirs(f'{results_path}/{folder}', exist_ok=True)

print(f"Results will be saved to: {results_path}")
print(f"Iteration 1: {iter1_epochs} epochs (track last {iter1_analysis_epochs})")
print(f"Iteration 2: {iter2_epochs} epochs on top {top_percentile}% + weighted partial")
print(f"Iteration 3: Score remaining {100-top_percentile}% with biased model")
print(f"Hypothesis values: {hypothesis[0]}")

Results will be saved to: ../saved_results/Red Wine Hybrid Iterative
Iteration 1: 60 epochs (track last 5)
Iteration 2: 30 epochs on top 30% + weighted partial
Iteration 3: Score remaining 70% with biased model
Hypothesis values: [9.4, 10.5, 12.0]


## Model Definitions

In [3]:
class HypothesisAmplifyingModel(nn.Module):
    """
    Neural network that amplifies the impact of hypothesis feature on gradients.
    
    Architecture:
    - Shared features (non-hypothesis): small embedding
    - Hypothesis feature: separate, larger embedding path
    - Concatenate and process through final layers
    """
    def __init__(self, n_shared_features, n_hypothesis_features=1, 
                 shared_hidden=16, hypothesis_hidden=32, final_hidden=32, output_size=1):
        super().__init__()
        
        # Shared features path (smaller)
        self.shared_path = nn.Sequential(
            nn.Linear(n_shared_features, shared_hidden),
            nn.ReLU(),
        )
        
        # Hypothesis feature path (larger - amplifies its importance)
        self.hypothesis_path = nn.Sequential(
            nn.Linear(n_hypothesis_features, hypothesis_hidden),
            nn.ReLU(),
            nn.Linear(hypothesis_hidden, hypothesis_hidden),
            nn.ReLU(),
        )
        
        # Combined path
        combined_size = shared_hidden + hypothesis_hidden
        self.final_path = nn.Sequential(
            nn.Linear(combined_size, final_hidden),
            nn.ReLU(),
            nn.Linear(final_hidden, output_size)
        )
        
        self.n_shared = n_shared_features
        
    def forward(self, x):
        # Split input: shared features vs hypothesis feature
        shared_features = x[:, :self.n_shared]
        hypothesis_feature = x[:, self.n_shared:]
        
        # Process separately
        shared_emb = self.shared_path(shared_features)
        hypothesis_emb = self.hypothesis_path(hypothesis_feature)
        
        # Combine and predict
        combined = torch.cat([shared_emb, hypothesis_emb], dim=1)
        return self.final_path(combined)


class StandardModel(nn.Module):
    """Standard MLP for comparison."""
    def __init__(self, input_size, hidden_size=32, output_size=1):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, output_size)
        )
    
    def forward(self, x):
        return self.net(x)

print("Models defined.")

Models defined.


## Training Classes

In [4]:
class UnbiasedTrainer:
    """
    Train on ALL hypotheses equally (no selection).
    Track per-hypothesis losses and gradients in the last N epochs.
    Used for Iteration 1.
    """
    def __init__(self, DO, model, lr=0.001, device='cpu'):
        self.DO = DO
        self.model = model
        self.device = device
        self.optimizer = torch.optim.Adam(model.parameters(), lr=lr)
        self.criterion = nn.MSELoss(reduction='none')
        self.hyp_per_sample = DO.num_hyp_comb
        
        # Tracking data
        self.loss_history = {}  # global_id -> list of losses per epoch
        self.gradient_history = {}  # global_id -> list of gradient vectors
        
    def train_epoch(self, dataloader, epoch, track_data=False):
        """Train one epoch on ALL hypotheses equally."""
        self.model.train()
        total_loss = 0
        num_batches = 0
        
        for batch_idx, (inputs, targets, global_ids) in enumerate(dataloader):
            inputs = inputs.to(self.device)
            targets = targets.to(self.device)
            
            # Standard forward pass on ALL hypotheses
            predictions = self.model(inputs)
            
            # Compute loss (mean over all hypotheses - no selection)
            individual_losses = self.criterion(predictions, targets).mean(dim=1)
            batch_loss = individual_losses.mean()
            
            # Track per-hypothesis data if in analysis window
            if track_data:
                self._track_hypothesis_data(inputs, targets, global_ids, individual_losses)
            
            # Standard backprop on ALL hypotheses
            self.optimizer.zero_grad()
            batch_loss.backward()
            self.optimizer.step()
            
            total_loss += batch_loss.item()
            num_batches += 1
        
        return total_loss / num_batches
    
    def _track_hypothesis_data(self, inputs, targets, global_ids, losses):
        """Track loss and gradient for each hypothesis in the batch."""
        self.model.eval()
        
        for i in range(len(inputs)):
            gid = global_ids[i].item()
            
            # Track loss
            if gid not in self.loss_history:
                self.loss_history[gid] = []
            self.loss_history[gid].append(losses[i].item())
            
            # Compute and track gradient for this hypothesis
            inp = inputs[i:i+1].clone().requires_grad_(True)
            pred = self.model(inp)
            loss = nn.MSELoss()(pred, targets[i:i+1])
            
            # Get gradient w.r.t. last layer weights
            params = list(self.model.parameters())
            grad_param = grad(loss, params[-2], retain_graph=False)[0]
            grad_vec = grad_param.flatten().detach().cpu().numpy()
            
            if gid not in self.gradient_history:
                self.gradient_history[gid] = []
            self.gradient_history[gid].append(grad_vec)
        
        self.model.train()
    
    def get_hypothesis_analysis(self):
        """Compile analysis results for each hypothesis."""
        analysis = {}
        
        for gid in self.loss_history:
            analysis[gid] = {
                'avg_loss': np.mean(self.loss_history[gid]),
                'loss_std': np.std(self.loss_history[gid]),
                'loss_trajectory': self.loss_history[gid],
                'avg_gradient': np.mean(self.gradient_history[gid], axis=0) if gid in self.gradient_history else None,
                'gradient_magnitude': np.mean([np.linalg.norm(g) for g in self.gradient_history.get(gid, [])]),
            }
        
        return analysis

print("UnbiasedTrainer defined.")

UnbiasedTrainer defined.


In [5]:
class BiasedTrainer:
    """
    Train on selected hypotheses + weighted partial data.
    Used for Iteration 2.
    """
    def __init__(self, DO, model, selected_gids, partial_gids, partial_weight, lr=0.001, device='cpu'):
        self.DO = DO
        self.model = model
        self.device = device
        self.optimizer = torch.optim.Adam(model.parameters(), lr=lr)
        self.criterion = nn.MSELoss(reduction='none')
        self.hyp_per_sample = DO.num_hyp_comb
        
        self.selected_gids = set(selected_gids)  # Top N% from Iteration 1
        self.partial_gids = set(partial_gids)    # Partial data (known correct)
        self.partial_weight = partial_weight
        
        # Tracking data for analysis
        self.loss_history = {}
        self.gradient_history = {}
        
    def train_epoch(self, dataloader, epoch, track_data=False):
        """Train one epoch on selected + partial data."""
        self.model.train()
        total_loss = 0
        total_weight = 0
        
        for batch_idx, (inputs, targets, global_ids) in enumerate(dataloader):
            inputs = inputs.to(self.device)
            targets = targets.to(self.device)
            
            # Compute individual losses
            predictions = self.model(inputs)
            individual_losses = self.criterion(predictions, targets).mean(dim=1)
            
            # Apply weights: selected gets weight 1, partial gets partial_weight
            weights = torch.zeros(len(inputs), device=self.device)
            included_indices = []
            
            for i, gid in enumerate(global_ids):
                gid = gid.item()
                if gid in self.partial_gids:
                    weights[i] = self.partial_weight
                    included_indices.append(i)
                elif gid in self.selected_gids:
                    weights[i] = 1.0
                    included_indices.append(i)
            
            if len(included_indices) == 0:
                continue
            
            # Weighted loss
            weighted_loss = (individual_losses * weights).sum() / weights.sum()
            
            # Track data if requested
            if track_data:
                self._track_hypothesis_data(inputs, targets, global_ids, individual_losses)
            
            # Backprop
            self.optimizer.zero_grad()
            weighted_loss.backward()
            self.optimizer.step()
            
            total_loss += weighted_loss.item() * weights.sum().item()
            total_weight += weights.sum().item()
        
        return total_loss / total_weight if total_weight > 0 else 0
    
    def _track_hypothesis_data(self, inputs, targets, global_ids, losses):
        """Track loss and gradient for each hypothesis in the batch."""
        self.model.eval()
        
        for i in range(len(inputs)):
            gid = global_ids[i].item()
            
            # Track loss
            if gid not in self.loss_history:
                self.loss_history[gid] = []
            self.loss_history[gid].append(losses[i].item())
            
            # Compute and track gradient
            inp = inputs[i:i+1].clone().requires_grad_(True)
            pred = self.model(inp)
            loss = nn.MSELoss()(pred, targets[i:i+1])
            
            params = list(self.model.parameters())
            grad_param = grad(loss, params[-2], retain_graph=False)[0]
            grad_vec = grad_param.flatten().detach().cpu().numpy()
            
            if gid not in self.gradient_history:
                self.gradient_history[gid] = []
            self.gradient_history[gid].append(grad_vec)
        
        self.model.train()
    
    def get_hypothesis_analysis(self):
        """Compile analysis results."""
        analysis = {}
        for gid in self.loss_history:
            analysis[gid] = {
                'avg_loss': np.mean(self.loss_history[gid]),
                'loss_std': np.std(self.loss_history[gid]),
                'loss_trajectory': self.loss_history[gid],
                'avg_gradient': np.mean(self.gradient_history[gid], axis=0) if gid in self.gradient_history else None,
                'gradient_magnitude': np.mean([np.linalg.norm(g) for g in self.gradient_history.get(gid, [])]),
            }
        return analysis

print("BiasedTrainer defined.")

BiasedTrainer defined.


In [6]:
class RemainingDataScorer:
    """
    Score remaining data (not used in Iteration 2) using a biased model.
    Computes both loss and gradient signals.
    Used for Iteration 3.
    """
    def __init__(self, DO, model, remaining_sample_indices, device='cpu'):
        self.DO = DO
        self.model = model
        self.device = device
        self.hyp_per_sample = DO.num_hyp_comb
        self.remaining_sample_indices = set(remaining_sample_indices)
        
        # Storage for scores
        self.loss_scores = {}  # gid -> avg_loss
        self.gradient_history = {}  # gid -> list of gradients
        
    def compute_scores(self, dataloader, n_passes=5):
        """
        Compute loss and gradient scores for remaining data.
        Run multiple passes to get stable gradient estimates.
        """
        self.model.eval()
        criterion = nn.MSELoss(reduction='none')
        
        for pass_idx in tqdm(range(n_passes), desc="Scoring passes"):
            for inputs, targets, global_ids in dataloader:
                inputs = inputs.to(self.device)
                targets = targets.to(self.device)
                
                for i in range(len(inputs)):
                    gid = global_ids[i].item()
                    sample_idx = gid // self.hyp_per_sample
                    
                    # Only score remaining samples
                    if sample_idx not in self.remaining_sample_indices:
                        continue
                    
                    # Compute loss
                    inp = inputs[i:i+1].clone().requires_grad_(True)
                    pred = self.model(inp)
                    loss = nn.MSELoss()(pred, targets[i:i+1])
                    
                    # Store loss
                    if gid not in self.loss_scores:
                        self.loss_scores[gid] = []
                    self.loss_scores[gid].append(loss.item())
                    
                    # Compute gradient
                    params = list(self.model.parameters())
                    grad_param = grad(loss, params[-2], retain_graph=False)[0]
                    grad_vec = grad_param.flatten().detach().cpu().numpy()
                    
                    if gid not in self.gradient_history:
                        self.gradient_history[gid] = []
                    self.gradient_history[gid].append(grad_vec)
        
        print(f"Scored {len(self.loss_scores)} hypotheses from {len(self.remaining_sample_indices)} samples")
    
    def get_analysis(self):
        """Get analysis for scored hypotheses."""
        analysis = {}
        for gid in self.loss_scores:
            analysis[gid] = {
                'avg_loss': np.mean(self.loss_scores[gid]),
                'loss_std': np.std(self.loss_scores[gid]),
                'avg_gradient': np.mean(self.gradient_history[gid], axis=0) if gid in self.gradient_history else None,
                'gradient_magnitude': np.mean([np.linalg.norm(g) for g in self.gradient_history.get(gid, [])]),
            }
        return analysis

print("RemainingDataScorer defined.")

RemainingDataScorer defined.


In [7]:
class HypothesisDataset(torch.utils.data.Dataset):
    """Dataset that includes global IDs for tracking."""
    def __init__(self, DO):
        # Input features = inpt_vars + hypothesis column
        input_cols = DO.inpt_vars + [f'{DO.miss_vars[0]}_hypothesis']
        self.inputs = torch.tensor(
            DO.df_train_hypothesis[input_cols].values,
            dtype=torch.float32
        )
        self.targets = torch.tensor(
            DO.df_train_hypothesis[DO.target_vars].values, 
            dtype=torch.float32
        )
        self.global_ids = torch.arange(len(self.inputs))
        self.input_cols = input_cols
        
    def __len__(self):
        return len(self.inputs)
    
    def __getitem__(self, idx):
        return self.inputs[idx], self.targets[idx], self.global_ids[idx]

print("HypothesisDataset defined.")

HypothesisDataset defined.


## Adaptive Context Selection Utilities

In [8]:
def compute_anchor_data(trainer, DO):
    """
    Compute gradient-only anchors AND enriched anchors for each class.
    Also computes anchor_similarity to decide which method to use per class.
    """
    analysis = trainer.get_hypothesis_analysis()
    hyp_per_sample = DO.num_hyp_comb
    input_cols = DO.inpt_vars
    
    # Get partial data
    partial_correct_gids = set(DO.df_train_hypothesis[
        (DO.df_train_hypothesis['partial_full_info'] == 1) & 
        (DO.df_train_hypothesis['correct_hypothesis'] == True)
    ].index.tolist())
    blacklisted_gids = set(DO.df_train_hypothesis[
        (DO.df_train_hypothesis['partial_full_info'] == 1) & 
        (DO.df_train_hypothesis['correct_hypothesis'] == False)
    ].index.tolist())
    partial_sample_indices = set(gid // hyp_per_sample for gid in partial_correct_gids)
    
    # Compute all anchors per class
    anchor_correct_grad = {}
    anchor_incorrect_grad = {}
    anchor_correct_enriched = {}
    anchor_incorrect_enriched = {}
    anchor_similarity_grad = {}
    anchor_similarity_enriched = {}
    use_enriched = {}
    
    # For normalization: collect all gradients to get scale
    all_grads = [analysis[gid]['avg_gradient'] for gid in analysis 
                 if analysis[gid]['avg_gradient'] is not None]
    grad_scale = float(np.mean([np.linalg.norm(g) for g in all_grads])) if all_grads else 1.0
    
    # Store normalization params per class
    feature_norm_params = {}
    
    for class_id in range(hyp_per_sample):
        class_correct_gids = [gid for gid in partial_correct_gids 
                              if DO.df_train_hypothesis.iloc[gid]['hyp_class_id'] == class_id]
        class_incorrect_gids = [gid for gid in blacklisted_gids 
                                if DO.df_train_hypothesis.iloc[gid]['hyp_class_id'] == class_id]
        
        # Collect gradients and features for correct
        correct_grads = []
        correct_features = []
        for gid in class_correct_gids:
            if gid in analysis and analysis[gid]['avg_gradient'] is not None:
                correct_grads.append(analysis[gid]['avg_gradient'])
                feat = DO.df_train_hypothesis.loc[gid, input_cols].values.astype(np.float64)
                correct_features.append(feat)
        
        # Collect gradients and features for incorrect
        incorrect_grads = []
        incorrect_features = []
        for gid in class_incorrect_gids:
            if gid in analysis and analysis[gid]['avg_gradient'] is not None:
                incorrect_grads.append(analysis[gid]['avg_gradient'])
                feat = DO.df_train_hypothesis.loc[gid, input_cols].values.astype(np.float64)
                incorrect_features.append(feat)
        
        if not correct_grads or not incorrect_grads:
            continue
            
        # Gradient-only anchors
        anchor_correct_grad[class_id] = np.mean(correct_grads, axis=0)
        anchor_incorrect_grad[class_id] = np.mean(incorrect_grads, axis=0)
        
        # Compute gradient-only anchor similarity
        sim_grad = float(np.dot(anchor_correct_grad[class_id], anchor_incorrect_grad[class_id]) / (
            np.linalg.norm(anchor_correct_grad[class_id]) * np.linalg.norm(anchor_incorrect_grad[class_id]) + 1e-8))
        anchor_similarity_grad[class_id] = sim_grad
        
        # Decide: use enriched if gradient anchor_similarity > 0
        use_enriched[class_id] = sim_grad > 0
        
        # Enriched anchors (gradient + normalized features)
        correct_grads = np.array(correct_grads, dtype=np.float64)
        incorrect_grads = np.array(incorrect_grads, dtype=np.float64)
        correct_features = np.array(correct_features, dtype=np.float64)
        incorrect_features = np.array(incorrect_features, dtype=np.float64)
        
        # Normalize features to gradient scale
        all_features = np.vstack([correct_features, incorrect_features])
        feat_mean = np.mean(all_features, axis=0)
        feat_std = np.std(all_features, axis=0) + 1e-8
        
        feature_norm_params[class_id] = {'mean': feat_mean, 'std': feat_std, 'scale': grad_scale}
        
        correct_features_norm = (correct_features - feat_mean) / feat_std * grad_scale
        incorrect_features_norm = (incorrect_features - feat_mean) / feat_std * grad_scale
        
        # Enriched = gradient + normalized features
        correct_enriched = np.hstack([correct_grads, correct_features_norm])
        incorrect_enriched = np.hstack([incorrect_grads, incorrect_features_norm])
        
        anchor_correct_enriched[class_id] = np.mean(correct_enriched, axis=0)
        anchor_incorrect_enriched[class_id] = np.mean(incorrect_enriched, axis=0)
        
        # Compute enriched anchor similarity
        sim_enriched = float(np.dot(anchor_correct_enriched[class_id], anchor_incorrect_enriched[class_id]) / (
            np.linalg.norm(anchor_correct_enriched[class_id]) * np.linalg.norm(anchor_incorrect_enriched[class_id]) + 1e-8))
        anchor_similarity_enriched[class_id] = sim_enriched
    
    return {
        'anchor_correct_grad': anchor_correct_grad,
        'anchor_incorrect_grad': anchor_incorrect_grad,
        'anchor_correct_enriched': anchor_correct_enriched,
        'anchor_incorrect_enriched': anchor_incorrect_enriched,
        'anchor_similarity_grad': anchor_similarity_grad,
        'anchor_similarity_enriched': anchor_similarity_enriched,
        'use_enriched': use_enriched,
        'grad_scale': grad_scale,
        'feature_norm_params': feature_norm_params,
        'partial_correct_gids': partial_correct_gids,
        'blacklisted_gids': blacklisted_gids,
        'partial_sample_indices': partial_sample_indices,
        'input_cols': input_cols
    }


def compute_adaptive_score(gradient, features, class_id, anchor_data):
    """
    Compute score using adaptive method:
    - Gradient-only for classes with good gradient separation (anchor_sim < 0)
    - Enriched (gradient + features) for classes with poor gradient separation (anchor_sim > 0)
    """
    use_enriched = anchor_data['use_enriched'].get(class_id, False)
    
    if use_enriched:
        # Use enriched vectors
        norm_params = anchor_data['feature_norm_params'].get(class_id)
        if norm_params:
            features_norm = (features - norm_params['mean']) / norm_params['std'] * norm_params['scale']
        else:
            features_norm = features
        enriched = np.concatenate([gradient, features_norm])
        
        anchor_c = anchor_data['anchor_correct_enriched'].get(class_id)
        anchor_i = anchor_data['anchor_incorrect_enriched'].get(class_id)
    else:
        # Use gradient-only
        enriched = gradient
        anchor_c = anchor_data['anchor_correct_grad'].get(class_id)
        anchor_i = anchor_data['anchor_incorrect_grad'].get(class_id)
    
    if anchor_c is None:
        return 0.0
    
    sim_c = float(np.dot(enriched, anchor_c) / (np.linalg.norm(enriched) * np.linalg.norm(anchor_c) + 1e-8))
    
    if anchor_i is not None:
        sim_i = float(np.dot(enriched, anchor_i) / (np.linalg.norm(enriched) * np.linalg.norm(anchor_i) + 1e-8))
    else:
        sim_i = 0.0
    
    return sim_c - sim_i


def print_adaptive_method_summary(anchor_data, hyp_per_sample):
    """Print summary of adaptive method selection per class."""
    print("Per-class method selection:")
    for class_id in range(hyp_per_sample):
        use_enr = anchor_data['use_enriched'].get(class_id, False)
        sim_grad = anchor_data['anchor_similarity_grad'].get(class_id, None)
        sim_enr = anchor_data['anchor_similarity_enriched'].get(class_id, None)
        
        if use_enr:
            print(f"  Class {class_id}: grad_sim={sim_grad:+.3f} (poor) -> ENRICHED (enriched_sim={sim_enr:+.3f})")
        else:
            print(f"  Class {class_id}: grad_sim={sim_grad:+.3f} (good) -> GRADIENT-ONLY")

print("Adaptive context utilities loaded.")

Adaptive context utilities loaded.


## Combined Loss + Gradient Scoring (for Iteration 3)

In [9]:
def compute_combined_score(loss, gradient, features, class_id, anchor_data, loss_weight=0.5):
    """
    Combine loss and gradient signals for scoring.
    
    For a truth-biased model:
    - Lower loss = more likely correct (aligned with truth)
    - Gradient similarity to correct anchor = more likely correct
    
    Final score = (1 - loss_weight) * gradient_score + loss_weight * (-normalized_loss)
    Higher score = more likely correct
    """
    # Gradient score (same as adaptive)
    grad_score = compute_adaptive_score(gradient, features, class_id, anchor_data)
    
    # Loss score: lower loss = higher score
    # We'll normalize this later when we have all losses
    loss_score = -loss  # Negative because lower loss is better
    
    return {
        'grad_score': grad_score,
        'loss_score': loss_score,
        'raw_loss': loss
    }


def normalize_and_combine_scores(all_scores, loss_weight=0.5):
    """
    Normalize loss scores per class and combine with gradient scores.
    
    Returns combined scores where higher = more likely correct.
    """
    # Group by class
    class_losses = {}
    for sample_idx, (gid, scores) in all_scores.items():
        class_id = scores['class_id']
        if class_id not in class_losses:
            class_losses[class_id] = []
        class_losses[class_id].append(scores['raw_loss'])
    
    # Compute per-class mean and std for loss normalization
    class_stats = {}
    for class_id, losses in class_losses.items():
        class_stats[class_id] = {
            'mean': np.mean(losses),
            'std': np.std(losses) + 1e-8
        }
    
    # Normalize and combine
    combined_scores = {}
    for sample_idx, (gid, scores) in all_scores.items():
        class_id = scores['class_id']
        stats = class_stats[class_id]
        
        # Z-score normalize loss (then negate: lower loss = higher score)
        normalized_loss_score = -(scores['raw_loss'] - stats['mean']) / stats['std']
        
        # Combine: weighted average of gradient and loss scores
        combined = (1 - loss_weight) * scores['grad_score'] + loss_weight * normalized_loss_score
        
        combined_scores[sample_idx] = {
            'gid': gid,
            'combined_score': combined,
            'grad_score': scores['grad_score'],
            'loss_score': normalized_loss_score,
            'raw_loss': scores['raw_loss'],
            'class_id': class_id,
            'is_correct': scores['is_correct']
        }
    
    return combined_scores

print("Combined scoring utilities loaded.")

Combined scoring utilities loaded.


## Analysis Utilities

In [10]:
def analyze_threshold_precision(all_selections, title="Precision Analysis", verbose=True):
    """
    Analyze precision at different thresholds.
    
    all_selections: list of (score, is_correct, sample_idx) tuples, sorted by score descending
    """
    if not all_selections:
        print("No selections to analyze")
        return None, None
    
    # Compute precision at different percentiles
    results = []
    percentiles = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
    
    for pct in percentiles:
        n_include = max(1, int(len(all_selections) * pct / 100))
        top_selections = all_selections[:n_include]
        n_correct = sum(1 for _, is_correct, _ in top_selections if is_correct)
        precision = n_correct / n_include
        results.append({
            'percentile': pct,
            'n_samples': n_include,
            'n_correct': n_correct,
            'precision': precision
        })
    
    # Compute precision in score bins
    scores = [s[0] for s in all_selections]
    min_score, max_score = min(scores), max(scores)
    n_bins = 10
    bin_results = []
    
    for i in range(n_bins):
        bin_low = min_score + (max_score - min_score) * i / n_bins
        bin_high = min_score + (max_score - min_score) * (i + 1) / n_bins
        bin_selections = [(s, c) for s, c, _ in all_selections if bin_low <= s < bin_high]
        if bin_selections:
            bin_correct = sum(1 for _, c in bin_selections if c)
            bin_results.append({
                'bin': f'{bin_low:.2f}-{bin_high:.2f}',
                'n_samples': len(bin_selections),
                'precision': bin_correct / len(bin_selections)
            })
    
    if verbose:
        print("=" * 70)
        print(title)
        print("=" * 70)
        
        print("\nPrecision by Top Percentile (highest scores first):")
        print("-" * 50)
        for r in results:
            print(f"Top {r['percentile']:>3}%: {r['n_samples']:>4} samples, precision={r['precision']*100:.1f}%")
        
        if bin_results:
            print("\nPrecision by Score Bin:")
            print("-" * 50)
            for r in bin_results:
                print(f"Score {r['bin']}: {r['n_samples']:>4} samples, precision={r['precision']*100:.1f}%")
    
    return results, bin_results


def select_hypotheses_adaptive(trainer, DO, anchor_data=None):
    """
    Select best hypothesis per sample using adaptive context.
    Returns list of (score, is_correct, sample_idx) sorted by score descending.
    """
    if anchor_data is None:
        anchor_data = compute_anchor_data(trainer, DO)
    
    analysis = trainer.get_hypothesis_analysis()
    hyp_per_sample = DO.num_hyp_comb
    n_samples = len(DO.df_train_hypothesis) // hyp_per_sample
    input_cols = anchor_data['input_cols']
    
    partial_sample_indices = anchor_data['partial_sample_indices']
    blacklisted_gids = anchor_data['blacklisted_gids']
    
    all_selections = []
    
    for sample_idx in range(n_samples):
        if sample_idx in partial_sample_indices:
            continue
        
        start = sample_idx * hyp_per_sample
        best_score = -np.inf
        best_is_correct = False
        best_gid = None
        
        for hyp_idx in range(hyp_per_sample):
            gid = start + hyp_idx
            if gid in blacklisted_gids:
                continue
            if gid not in analysis or analysis[gid]['avg_gradient'] is None:
                continue
            
            gradient = analysis[gid]['avg_gradient']
            class_id = DO.df_train_hypothesis.iloc[gid]['hyp_class_id']
            features = DO.df_train_hypothesis.loc[gid, input_cols].values.astype(np.float64)
            
            score = compute_adaptive_score(gradient, features, class_id, anchor_data)
            
            if score > best_score:
                best_score = score
                best_is_correct = DO.df_train_hypothesis.iloc[gid]['correct_hypothesis']
                best_gid = gid
        
        if best_score > -np.inf:
            all_selections.append((best_score, best_is_correct, sample_idx, best_gid))
    
    # Sort by score descending
    all_selections.sort(key=lambda x: x[0], reverse=True)
    
    return all_selections, anchor_data

print("Analysis utilities loaded.")

Analysis utilities loaded.


---
# ITERATION 1: Unbiased Training

Train on ALL hypotheses equally. No selection = no feedback loop bias.
Use Adaptive Context Selection to score hypotheses.

In [11]:
# Initialize data
set_to_deterministic(rand_state)

DO = DataOperator(data_path, inpt_vars, target_vars, miss_vars, hypothesis,
                  partial_perc, rand_state, device='cpu')
DO.problem_type = 'regression'

print(f"Lack partial coverage: {DO.lack_partial_coverage}")
print(f"Number of training hypotheses: {len(DO.df_train_hypothesis)}")
print(f"Hypotheses per sample: {DO.num_hyp_comb}")
print(f"Number of samples: {len(DO.df_train_hypothesis) // DO.num_hyp_comb}")

# Count partial data
partial_correct_gids = DO.df_train_hypothesis[
    (DO.df_train_hypothesis['partial_full_info'] == 1) & 
    (DO.df_train_hypothesis['correct_hypothesis'] == True)
].index.tolist()
print(f"Partial data samples: {len(partial_correct_gids)}")

Lack partial coverage: False
Number of training hypotheses: 3453
Hypotheses per sample: 3
Number of samples: 1151
Partial data samples: 28


In [12]:
if not DO.lack_partial_coverage:
    # Create dataloader
    dataset = HypothesisDataset(DO)
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
    
    # Check input structure
    input_size = dataset.inputs.shape[1]
    n_shared_features = len(inpt_vars)
    n_hypothesis_features = 1
    
    print(f"\nInput structure:")
    print(f"  Total input size: {input_size}")
    print(f"  Shared features: {n_shared_features}")
    print(f"  Hypothesis feature: {n_hypothesis_features}")
    
    # Create model
    model_iter1 = HypothesisAmplifyingModel(
        n_shared_features=n_shared_features,
        n_hypothesis_features=n_hypothesis_features,
        shared_hidden=16,
        hypothesis_hidden=32,
        final_hidden=32,
        output_size=output_size
    )
    
    print(f"\nModel created: HypothesisAmplifyingModel")


Input structure:
  Total input size: 4
  Shared features: 3
  Hypothesis feature: 1

Model created: HypothesisAmplifyingModel


In [13]:
# Train Iteration 1
if not DO.lack_partial_coverage:
    trainer_iter1 = UnbiasedTrainer(DO, model_iter1, lr=lr)
    
    print("=" * 70)
    print("ITERATION 1: Unbiased Training")
    print("=" * 70)
    print(f"Training on ALL hypotheses equally for {iter1_epochs} epochs")
    print(f"Tracking gradients in last {iter1_analysis_epochs} epochs")
    
    iter1_losses = []
    
    for epoch in tqdm(range(iter1_epochs)):
        # Track data in last N epochs
        track = epoch >= (iter1_epochs - iter1_analysis_epochs)
        
        loss = trainer_iter1.train_epoch(dataloader, epoch, track_data=track)
        iter1_losses.append(loss)
        
        if (epoch + 1) % 20 == 0:
            status = "(tracking)" if track else ""
            print(f"Epoch {epoch+1}/{iter1_epochs}: Loss = {loss:.4f} {status}")
    
    print(f"\nIteration 1 complete. Final loss: {iter1_losses[-1]:.4f}")
    print(f"Tracked {len(trainer_iter1.loss_history)} hypotheses")

ITERATION 1: Unbiased Training
Training on ALL hypotheses equally for 60 epochs
Tracking gradients in last 5 epochs


 40%|████      | 24/60 [00:00<00:01, 29.90it/s]

Epoch 20/60: Loss = 0.0226 


 72%|███████▏  | 43/60 [00:01<00:00, 25.69it/s]

Epoch 40/60: Loss = 0.0214 


100%|██████████| 60/60 [00:07<00:00,  8.34it/s]

Epoch 60/60: Loss = 0.0214 (tracking)

Iteration 1 complete. Final loss: 0.0214
Tracked 3453 hypotheses





In [14]:
# Analyze Iteration 1 results
if not DO.lack_partial_coverage:
    # Get selections with adaptive context
    all_selections_iter1, anchor_data_iter1 = select_hypotheses_adaptive(trainer_iter1, DO)
    
    print("\n" + "=" * 70)
    print("ITERATION 1: Selection Analysis (Adaptive Context)")
    print("=" * 70)
    
    print_adaptive_method_summary(anchor_data_iter1, hyp_per_sample)
    
    # Convert to format for analysis
    selections_for_analysis = [(s[0], s[1], s[2]) for s in all_selections_iter1]
    results_iter1, _ = analyze_threshold_precision(
        selections_for_analysis, 
        title="ITERATION 1: Precision by Threshold"
    )
    
    # Store precision for comparison
    iter1_precision = {r['percentile']: r['precision'] for r in results_iter1}


ITERATION 1: Selection Analysis (Adaptive Context)
Per-class method selection:
  Class 0: grad_sim=-0.985 (good) -> GRADIENT-ONLY
  Class 1: grad_sim=+0.956 (poor) -> ENRICHED (enriched_sim=-0.396)
  Class 2: grad_sim=+0.825 (poor) -> ENRICHED (enriched_sim=-0.304)
ITERATION 1: Precision by Threshold

Precision by Top Percentile (highest scores first):
--------------------------------------------------
Top  10%:  112 samples, precision=58.9%
Top  20%:  224 samples, precision=67.4%
Top  30%:  336 samples, precision=67.6%
Top  40%:  449 samples, precision=64.8%
Top  50%:  561 samples, precision=59.9%
Top  60%:  673 samples, precision=54.2%
Top  70%:  786 samples, precision=52.0%
Top  80%:  898 samples, precision=51.8%
Top  90%: 1010 samples, precision=50.5%
Top 100%: 1123 samples, precision=48.4%

Precision by Score Bin:
--------------------------------------------------
Score -0.85--0.56:    8 samples, precision=37.5%
Score -0.56--0.28:   24 samples, precision=20.8%
Score -0.28-0.00:  

In [15]:
# Select top N% for Iteration 2
if not DO.lack_partial_coverage:
    n_total = len(all_selections_iter1)
    n_top = max(1, int(n_total * top_percentile / 100))
    
    top_selections = all_selections_iter1[:n_top]
    top_sample_indices = set(s[2] for s in top_selections)
    top_gids = set(s[3] for s in top_selections)
    
    # Remaining samples (not in top N%)
    remaining_sample_indices = set(s[2] for s in all_selections_iter1[n_top:])
    
    # Count correct in top selection
    n_correct_top = sum(1 for s in top_selections if s[1])
    precision_top = n_correct_top / n_top
    
    print(f"\n" + "=" * 70)
    print(f"ITERATION 1: Top {top_percentile}% Selection")
    print("=" * 70)
    print(f"Selected {n_top} samples (top {top_percentile}%)")
    print(f"Correct: {n_correct_top} ({precision_top*100:.1f}% precision)")
    print(f"Remaining samples: {len(remaining_sample_indices)}")


ITERATION 1: Top 30% Selection
Selected 336 samples (top 30%)
Correct: 227 (67.6% precision)
Remaining samples: 787


---
# ITERATION 2: Biased Training

Train a NEW model on:
- Top 30% from Iteration 1 (high precision selections)
- Partial data with upweighting (~25% of effective training)

This creates a "truth-biased" model.

In [16]:
if not DO.lack_partial_coverage:
    # Get partial data GIDs (correct hypotheses from partial data)
    partial_gids = set(anchor_data_iter1['partial_correct_gids'])
    n_partial = len(partial_gids)
    n_selected = len(top_gids)
    
    # Calculate weight for partial data
    # Target: partial should be ~25% of effective training
    # partial_weight * n_partial / (partial_weight * n_partial + n_selected) = 0.25
    # Solving: partial_weight = 0.25 * n_selected / (0.75 * n_partial)
    partial_weight = (partial_target_ratio * n_selected) / ((1 - partial_target_ratio) * n_partial)
    partial_weight = max(1.0, partial_weight)  # At least weight 1
    
    effective_partial = n_partial * partial_weight
    effective_total = effective_partial + n_selected
    actual_partial_ratio = effective_partial / effective_total
    
    print("=" * 70)
    print("ITERATION 2: Biased Training Setup")
    print("=" * 70)
    print(f"Training data:")
    print(f"  Selected (top {top_percentile}%): {n_selected} samples")
    print(f"  Partial data: {n_partial} samples")
    print(f"  Partial weight: {partial_weight:.2f}x")
    print(f"  Effective partial: {effective_partial:.1f} ({actual_partial_ratio*100:.1f}% of training)")
    print(f"  Effective total: {effective_total:.1f}")

ITERATION 2: Biased Training Setup
Training data:
  Selected (top 30%): 336 samples
  Partial data: 28 samples
  Partial weight: 4.00x
  Effective partial: 112.0 (25.0% of training)
  Effective total: 448.0


In [17]:
if not DO.lack_partial_coverage:
    # Create new model for Iteration 2
    set_to_deterministic(rand_state + 1)  # Different seed for variety
    
    model_iter2 = HypothesisAmplifyingModel(
        n_shared_features=n_shared_features,
        n_hypothesis_features=n_hypothesis_features,
        shared_hidden=16,
        hypothesis_hidden=32,
        final_hidden=32,
        output_size=output_size
    )
    
    # Create biased trainer
    trainer_iter2 = BiasedTrainer(
        DO, model_iter2, 
        selected_gids=top_gids,
        partial_gids=partial_gids,
        partial_weight=partial_weight,
        lr=lr
    )
    
    print("\n" + "=" * 70)
    print("ITERATION 2: Training Biased Model")
    print("=" * 70)
    
    iter2_losses = []
    
    for epoch in tqdm(range(iter2_epochs)):
        loss = trainer_iter2.train_epoch(dataloader, epoch, track_data=False)
        iter2_losses.append(loss)
        
        if (epoch + 1) % 20 == 0:
            print(f"Epoch {epoch+1}/{iter2_epochs}: Loss = {loss:.4f}")
    
    print(f"\nIteration 2 complete. Final loss: {iter2_losses[-1]:.4f}")


ITERATION 2: Training Biased Model


 80%|████████  | 24/30 [00:00<00:00, 24.93it/s]

Epoch 20/30: Loss = 0.0077


100%|██████████| 30/30 [00:01<00:00, 24.81it/s]


Iteration 2 complete. Final loss: 0.0071





---
# ITERATION 3: Score Remaining Data with Biased Model

Use the truth-biased model from Iteration 2 to score the remaining 70% of data.
Compute BOTH loss and gradient signals, then combine them.

**Key insight**: Since the model is biased toward truth:
- Correct hypotheses should have LOWER loss
- Incorrect hypotheses should have HIGHER loss
- This makes loss a useful discriminative signal

In [18]:
if not DO.lack_partial_coverage:
    print("=" * 70)
    print("ITERATION 3: Scoring Remaining Data")
    print("=" * 70)
    print(f"Scoring {len(remaining_sample_indices)} remaining samples with biased model")
    
    # Create scorer
    scorer = RemainingDataScorer(DO, model_iter2, remaining_sample_indices)
    
    # Score remaining data
    scorer.compute_scores(dataloader, n_passes=iter3_analysis_epochs)
    
    # Get analysis
    analysis_iter3 = scorer.get_analysis()

ITERATION 3: Scoring Remaining Data
Scoring 787 remaining samples with biased model


Scoring passes: 100%|██████████| 5/5 [00:03<00:00,  1.41it/s]


Scored 2361 hypotheses from 787 samples


In [19]:
if not DO.lack_partial_coverage:
    # Compute new anchors using the biased model's view of partial data
    # We need to score partial data with the biased model too
    
    print("\nComputing anchors from biased model on partial data...")
    
    # Score partial data with biased model
    partial_sample_indices = anchor_data_iter1['partial_sample_indices']
    partial_scorer = RemainingDataScorer(DO, model_iter2, partial_sample_indices)
    partial_scorer.compute_scores(dataloader, n_passes=iter3_analysis_epochs)
    
    # Build anchor data from partial scores
    partial_analysis = partial_scorer.get_analysis()
    
    # Create anchor data structure similar to compute_anchor_data
    anchor_data_iter3 = {
        'anchor_correct_grad': {},
        'anchor_incorrect_grad': {},
        'anchor_similarity_grad': {},
        'use_enriched': {},  # For now, use gradient-only for simplicity
        'input_cols': inpt_vars,
        'partial_correct_gids': anchor_data_iter1['partial_correct_gids'],
        'blacklisted_gids': anchor_data_iter1['blacklisted_gids'],
        'partial_sample_indices': partial_sample_indices,
    }
    
    # Compute anchors per class
    for class_id in range(hyp_per_sample):
        correct_grads = []
        incorrect_grads = []
        
        for gid in anchor_data_iter1['partial_correct_gids']:
            if gid in partial_analysis and DO.df_train_hypothesis.iloc[gid]['hyp_class_id'] == class_id:
                if partial_analysis[gid]['avg_gradient'] is not None:
                    correct_grads.append(partial_analysis[gid]['avg_gradient'])
        
        for gid in anchor_data_iter1['blacklisted_gids']:
            if gid in partial_analysis and DO.df_train_hypothesis.iloc[gid]['hyp_class_id'] == class_id:
                if partial_analysis[gid]['avg_gradient'] is not None:
                    incorrect_grads.append(partial_analysis[gid]['avg_gradient'])
        
        if correct_grads and incorrect_grads:
            anchor_data_iter3['anchor_correct_grad'][class_id] = np.mean(correct_grads, axis=0)
            anchor_data_iter3['anchor_incorrect_grad'][class_id] = np.mean(incorrect_grads, axis=0)
            
            # Compute similarity
            sim = float(np.dot(
                anchor_data_iter3['anchor_correct_grad'][class_id],
                anchor_data_iter3['anchor_incorrect_grad'][class_id]
            ) / (
                np.linalg.norm(anchor_data_iter3['anchor_correct_grad'][class_id]) * 
                np.linalg.norm(anchor_data_iter3['anchor_incorrect_grad'][class_id]) + 1e-8
            ))
            anchor_data_iter3['anchor_similarity_grad'][class_id] = sim
            anchor_data_iter3['use_enriched'][class_id] = False  # Gradient-only for now
    
    print("\nBiased model anchor similarities:")
    for class_id in range(hyp_per_sample):
        sim = anchor_data_iter3['anchor_similarity_grad'].get(class_id, None)
        if sim is not None:
            print(f"  Class {class_id}: grad_sim = {sim:+.3f}")


Computing anchors from biased model on partial data...


Scoring passes: 100%|██████████| 5/5 [00:00<00:00, 18.61it/s]

Scored 84 hypotheses from 28 samples

Biased model anchor similarities:
  Class 0: grad_sim = +1.000
  Class 1: grad_sim = +0.996
  Class 2: grad_sim = -0.968





In [20]:
if not DO.lack_partial_coverage:
    print("\n" + "=" * 70)
    print("ITERATION 3: Combined Loss + Gradient Scoring")
    print("=" * 70)
    
    # Collect scores for each sample in remaining data
    all_scores_iter3 = {}  # sample_idx -> (best_gid, scores_dict)
    
    for sample_idx in remaining_sample_indices:
        start = sample_idx * hyp_per_sample
        best_combined = -np.inf
        best_gid = None
        best_scores = None
        
        for hyp_idx in range(hyp_per_sample):
            gid = start + hyp_idx
            if gid not in analysis_iter3:
                continue
            if analysis_iter3[gid]['avg_gradient'] is None:
                continue
            
            gradient = analysis_iter3[gid]['avg_gradient']
            loss = analysis_iter3[gid]['avg_loss']
            class_id = DO.df_train_hypothesis.iloc[gid]['hyp_class_id']
            features = DO.df_train_hypothesis.loc[gid, inpt_vars].values.astype(np.float64)
            is_correct = DO.df_train_hypothesis.iloc[gid]['correct_hypothesis']
            
            # Compute gradient score
            grad_score = compute_adaptive_score(gradient, features, class_id, anchor_data_iter3)
            
            scores = {
                'grad_score': grad_score,
                'raw_loss': loss,
                'class_id': class_id,
                'is_correct': is_correct
            }
            
            # Store for later normalization
            if best_gid is None or grad_score > best_combined:
                best_combined = grad_score
                best_gid = gid
                best_scores = scores
        
        if best_gid is not None:
            all_scores_iter3[sample_idx] = (best_gid, best_scores)
    
    print(f"Collected scores for {len(all_scores_iter3)} samples")


ITERATION 3: Combined Loss + Gradient Scoring
Collected scores for 787 samples


In [21]:
if not DO.lack_partial_coverage:
    # Normalize and combine scores
    loss_weight = 0.5  # Equal weight to loss and gradient
    
    combined_scores = normalize_and_combine_scores(all_scores_iter3, loss_weight=loss_weight)
    
    # Create selection list sorted by combined score
    all_selections_iter3 = [
        (scores['combined_score'], scores['is_correct'], sample_idx)
        for sample_idx, scores in combined_scores.items()
    ]
    all_selections_iter3.sort(key=lambda x: x[0], reverse=True)
    
    # Analyze precision
    results_iter3_combined, _ = analyze_threshold_precision(
        all_selections_iter3,
        title=f"ITERATION 3: Combined Score Precision (loss_weight={loss_weight})"
    )
    
    iter3_precision_combined = {r['percentile']: r['precision'] for r in results_iter3_combined}

ITERATION 3: Combined Score Precision (loss_weight=0.5)

Precision by Top Percentile (highest scores first):
--------------------------------------------------
Top  10%:   78 samples, precision=57.7%
Top  20%:  157 samples, precision=56.7%
Top  30%:  236 samples, precision=56.8%
Top  40%:  314 samples, precision=53.5%
Top  50%:  393 samples, precision=51.1%
Top  60%:  472 samples, precision=47.9%
Top  70%:  550 samples, precision=46.2%
Top  80%:  629 samples, precision=44.5%
Top  90%:  708 samples, precision=42.5%
Top 100%:  787 samples, precision=41.7%

Precision by Score Bin:
--------------------------------------------------
Score -3.95--3.42:    3 samples, precision=0.0%
Score -3.42--2.89:    2 samples, precision=0.0%
Score -2.89--2.35:    1 samples, precision=100.0%
Score -2.35--1.82:    1 samples, precision=0.0%
Score -1.82--1.29:    1 samples, precision=0.0%
Score -1.29--0.76:    2 samples, precision=100.0%
Score -0.76--0.22:  199 samples, precision=31.7%
Score -0.22-0.31:  359 

In [22]:
if not DO.lack_partial_coverage:
    # Also analyze gradient-only and loss-only for comparison
    
    # Gradient-only
    all_selections_iter3_grad = [
        (scores['grad_score'], scores['is_correct'], sample_idx)
        for sample_idx, scores in combined_scores.items()
    ]
    all_selections_iter3_grad.sort(key=lambda x: x[0], reverse=True)
    
    results_iter3_grad, _ = analyze_threshold_precision(
        all_selections_iter3_grad,
        title="ITERATION 3: Gradient-Only Precision (biased model)"
    )
    iter3_precision_grad = {r['percentile']: r['precision'] for r in results_iter3_grad}
    
    # Loss-only
    all_selections_iter3_loss = [
        (scores['loss_score'], scores['is_correct'], sample_idx)
        for sample_idx, scores in combined_scores.items()
    ]
    all_selections_iter3_loss.sort(key=lambda x: x[0], reverse=True)
    
    results_iter3_loss, _ = analyze_threshold_precision(
        all_selections_iter3_loss,
        title="ITERATION 3: Loss-Only Precision (biased model)"
    )
    iter3_precision_loss = {r['percentile']: r['precision'] for r in results_iter3_loss}

ITERATION 3: Gradient-Only Precision (biased model)

Precision by Top Percentile (highest scores first):
--------------------------------------------------
Top  10%:   78 samples, precision=59.0%
Top  20%:  157 samples, precision=60.5%
Top  30%:  236 samples, precision=52.1%
Top  40%:  314 samples, precision=49.4%
Top  50%:  393 samples, precision=46.6%
Top  60%:  472 samples, precision=45.6%
Top  70%:  550 samples, precision=43.8%
Top  80%:  629 samples, precision=43.2%
Top  90%:  708 samples, precision=42.5%
Top 100%:  787 samples, precision=41.7%

Precision by Score Bin:
--------------------------------------------------
Score -0.00-0.20:  644 samples, precision=37.3%
Score 1.78-1.98:  142 samples, precision=61.3%
ITERATION 3: Loss-Only Precision (biased model)

Precision by Top Percentile (highest scores first):
--------------------------------------------------
Top  10%:   78 samples, precision=52.6%
Top  20%:  157 samples, precision=51.6%
Top  30%:  236 samples, precision=50.0%
T

---
# Comparison: Iteration 1 vs Iteration 3

In [23]:
if not DO.lack_partial_coverage:
    print("=" * 70)
    print("COMPARISON: Precision Improvement")
    print("=" * 70)
    print("\nNote: Iteration 3 scores the REMAINING 70% (not in top 30% of Iter 1)")
    print("These are the 'harder' samples that Iteration 1 was less confident about.")
    print()
    
    # For fair comparison, we need Iteration 1's precision on the SAME remaining samples
    # Get Iteration 1 scores for remaining samples only
    iter1_remaining = [s for s in all_selections_iter1[n_top:]]  # Already sorted, these are the remaining
    iter1_remaining_for_analysis = [(s[0], s[1], s[2]) for s in iter1_remaining]
    
    results_iter1_remaining, _ = analyze_threshold_precision(
        iter1_remaining_for_analysis,
        title="ITERATION 1: Precision on Remaining 70% (for comparison)",
        verbose=True
    )
    iter1_remaining_precision = {r['percentile']: r['precision'] for r in results_iter1_remaining}
    
    # Print comparison table
    print("\n" + "=" * 70)
    print("PRECISION COMPARISON TABLE (on remaining 70% samples)")
    print("=" * 70)
    print(f"{'Percentile':<12} {'Iter1 (unbiased)':<18} {'Iter3 Grad':<15} {'Iter3 Loss':<15} {'Iter3 Combined':<15}")
    print("-" * 75)
    
    for pct in [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]:
        p1 = iter1_remaining_precision.get(pct, 0) * 100
        p3g = iter3_precision_grad.get(pct, 0) * 100
        p3l = iter3_precision_loss.get(pct, 0) * 100
        p3c = iter3_precision_combined.get(pct, 0) * 100
        
        # Highlight improvement
        best = max(p3g, p3l, p3c)
        improvement = best - p1
        marker = " *" if improvement > 2 else ""
        
        print(f"Top {pct:>3}%      {p1:>6.1f}%           {p3g:>6.1f}%        {p3l:>6.1f}%        {p3c:>6.1f}%{marker}")
    
    print("\n* = >2% improvement over Iteration 1")

COMPARISON: Precision Improvement

Note: Iteration 3 scores the REMAINING 70% (not in top 30% of Iter 1)
These are the 'harder' samples that Iteration 1 was less confident about.

ITERATION 1: Precision on Remaining 70% (for comparison)

Precision by Top Percentile (highest scores first):
--------------------------------------------------
Top  10%:   78 samples, precision=64.1%
Top  20%:  157 samples, precision=55.4%
Top  30%:  236 samples, precision=47.5%
Top  40%:  314 samples, precision=41.4%
Top  50%:  393 samples, precision=39.9%
Top  60%:  472 samples, precision=41.1%
Top  70%:  550 samples, precision=41.8%
Top  80%:  629 samples, precision=42.9%
Top  90%:  708 samples, precision=41.7%
Top 100%:  787 samples, precision=40.2%

Precision by Score Bin:
--------------------------------------------------
Score -0.85--0.56:    8 samples, precision=37.5%
Score -0.56--0.28:   24 samples, precision=20.8%
Score -0.28-0.00:   35 samples, precision=25.7%
Score 0.00-0.28:   40 samples, precis

In [24]:
# Agreement analysis: both methods select same hypothesis
if not DO.lack_partial_coverage:
    # Build lookup for Iter1: sample_idx -> selected gid
    iter1_gid_lookup = {s[2]: s[3] for s in iter1_remaining}  # sample_idx -> gid
    
    # Build lookup for Iter3: sample_idx -> selected gid
    iter3_gid_lookup = {sample_idx: scores['gid'] for sample_idx, scores in combined_scores.items()}
    
    # Find samples where both methods agree
    agreed_samples = []
    disagreed_samples = []
    
    for sample_idx in remaining_sample_indices:
        iter1_gid = iter1_gid_lookup.get(sample_idx)
        iter3_gid = iter3_gid_lookup.get(sample_idx)
        
        if iter1_gid is not None and iter3_gid is not None:
            if iter1_gid == iter3_gid:
                # Both methods selected same hypothesis
                is_correct = DO.df_train_hypothesis.iloc[iter1_gid]['correct_hypothesis']
                agreed_samples.append((sample_idx, iter1_gid, is_correct))
            else:
                # Methods disagree
                iter1_correct = DO.df_train_hypothesis.iloc[iter1_gid]['correct_hypothesis']
                iter3_correct = DO.df_train_hypothesis.iloc[iter3_gid]['correct_hypothesis']
                disagreed_samples.append((sample_idx, iter1_gid, iter3_gid, iter1_correct, iter3_correct))
    
    # Calculate precision
    n_agreed = len(agreed_samples)
    n_agreed_correct = sum(1 for s in agreed_samples if s[2])
    agreed_precision = n_agreed_correct / n_agreed * 100 if n_agreed > 0 else 0
    
    n_disagreed = len(disagreed_samples)
    n_iter1_correct_disagree = sum(1 for s in disagreed_samples if s[3])
    n_iter3_correct_disagree = sum(1 for s in disagreed_samples if s[4])
    
    print("=" * 70)
    print("AGREEMENT ANALYSIS: When Iter1 AND Iter3 select same hypothesis")
    print("=" * 70)
    print(f"\nTotal remaining samples: {len(remaining_sample_indices)}")
    print(f"\nAGREED (same hypothesis):")
    print(f"  Count: {n_agreed} ({n_agreed/len(remaining_sample_indices)*100:.1f}%)")
    print(f"  Correct: {n_agreed_correct}")
    print(f"  PRECISION: {agreed_precision:.1f}%")
    print(f"\nDISAGREED (different hypothesis):")
    print(f"  Count: {n_disagreed} ({n_disagreed/len(remaining_sample_indices)*100:.1f}%)")
    print(f"  Iter1 correct: {n_iter1_correct_disagree} ({n_iter1_correct_disagree/n_disagreed*100:.1f}%)")
    print(f"  Iter3 correct: {n_iter3_correct_disagree} ({n_iter3_correct_disagree/n_disagreed*100:.1f}%)")

AGREEMENT ANALYSIS: When Iter1 AND Iter3 select same hypothesis

Total remaining samples: 787

AGREED (same hypothesis):
  Count: 294 (37.4%)
  Correct: 144
  PRECISION: 49.0%

DISAGREED (different hypothesis):
  Count: 493 (62.6%)
  Iter1 correct: 172 (34.9%)
  Iter3 correct: 184 (37.3%)


In [25]:
# Confidence-ranked agreement analysis
if not DO.lack_partial_coverage and n_agreed > 0:
    # For agreed samples, compute combined confidence score
    # Need to get Iter1 scores for the agreed samples
    iter1_score_lookup = {s[2]: s[0] for s in iter1_remaining}  # sample_idx -> score

    # Compute combined score for each agreed sample
    agreed_with_scores = []
    for sample_idx, gid, is_correct in agreed_samples:
        iter1_score = iter1_score_lookup.get(sample_idx, 0)
        iter3_score = combined_scores[sample_idx]['combined_score']
        combined_conf = iter1_score + iter3_score
        agreed_with_scores.append((combined_conf, is_correct, sample_idx))

    # Sort by combined confidence (descending)
    agreed_with_scores.sort(key=lambda x: x[0], reverse=True)

    # Analyze precision at different percentiles
    print("=" * 70)
    print("CONFIDENCE-RANKED AGREEMENT: High confidence agreement = better precision?")
    print("=" * 70)
    print(f"\nTotal agreed samples: {n_agreed}")
    print(f"Overall agreement precision: {agreed_precision:.1f}%")
    print(f"\nPrecision by confidence percentile (among agreed samples):")
    print("-" * 50)

    for pct in [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]:
        n_include = max(1, int(len(agreed_with_scores) * pct / 100))
        top = agreed_with_scores[:n_include]
        n_correct = sum(1 for s in top if s[1])
        prec = n_correct / n_include * 100
        print(f"Top {pct:>3}% confidence: {n_include:>4} samples, precision={prec:.1f}%")

CONFIDENCE-RANKED AGREEMENT: High confidence agreement = better precision?

Total agreed samples: 294
Overall agreement precision: 49.0%

Precision by confidence percentile (among agreed samples):
--------------------------------------------------
Top  10% confidence:   29 samples, precision=51.7%
Top  20% confidence:   58 samples, precision=53.4%
Top  30% confidence:   88 samples, precision=53.4%
Top  40% confidence:  117 samples, precision=59.0%
Top  50% confidence:  147 samples, precision=55.8%
Top  60% confidence:  176 samples, precision=53.4%
Top  70% confidence:  205 samples, precision=49.8%
Top  80% confidence:  235 samples, precision=46.8%
Top  90% confidence:  264 samples, precision=47.7%
Top 100% confidence:  294 samples, precision=49.0%


In [26]:
# =============================================================================
# PRUNING APPROACH: Use biased model to remove likely-wrong samples from Iter1's top 30%
# =============================================================================
if not DO.lack_partial_coverage:
    print("=" * 70)
    print("PRUNING: Score Iter1's top 30% with biased model, remove lowest confidence")
    print("=" * 70)
    
    # Score Iter1's top 30% samples with the biased model
    print(f"\nScoring Iter1's top {top_percentile}% ({n_top} samples) with biased model...")
    
    # Create scorer for top samples
    top_scorer = RemainingDataScorer(DO, model_iter2, top_sample_indices)
    top_scorer.compute_scores(dataloader, n_passes=iter3_analysis_epochs)
    top_analysis = top_scorer.get_analysis()
    
    # Also need to score partial data with biased model to get anchors (reuse from before)
    # anchor_data_iter3 was already computed
    
    # Compute combined scores for each top sample
    top_scores = []
    for sample_idx in top_sample_indices:
        start = sample_idx * hyp_per_sample
        
        # Find the gid that Iter1 selected for this sample
        iter1_selected_gid = None
        for s in top_selections:
            if s[2] == sample_idx:
                iter1_selected_gid = s[3]
                break
        
        if iter1_selected_gid is None or iter1_selected_gid not in top_analysis:
            continue
        
        gid = iter1_selected_gid
        if top_analysis[gid]['avg_gradient'] is None:
            continue
            
        gradient = top_analysis[gid]['avg_gradient']
        loss = top_analysis[gid]['avg_loss']
        class_id = DO.df_train_hypothesis.iloc[gid]['hyp_class_id']
        features = DO.df_train_hypothesis.loc[gid, inpt_vars].values.astype(np.float64)
        is_correct = DO.df_train_hypothesis.iloc[gid]['correct_hypothesis']
        
        # Compute gradient score using biased model anchors
        grad_score = compute_adaptive_score(gradient, features, class_id, anchor_data_iter3)
        
        # Use negative loss as score (lower loss = higher score)
        loss_score = -loss
        
        # Combined score (can tune weights)
        combined = 0.5 * grad_score + 0.5 * loss_score
        
        top_scores.append({
            'sample_idx': sample_idx,
            'gid': gid,
            'is_correct': is_correct,
            'grad_score': grad_score,
            'loss': loss,
            'combined_score': combined
        })
    
    print(f"Scored {len(top_scores)} samples")
    
    # Sort by combined score (ascending - lowest scores are most likely wrong)
    top_scores_sorted = sorted(top_scores, key=lambda x: x['combined_score'])
    
    # Analyze precision after removing bottom N%
    print(f"\nOriginal top {top_percentile}%: {n_top} samples, {precision_top*100:.1f}% precision")
    print(f"\nPrecision after REMOVING lowest-confidence samples:")
    print("-" * 60)
    print(f"{'Remove':<10} {'Remaining':<12} {'Correct':<10} {'Precision':<12} {'Change':<10}")
    print("-" * 60)
    
    for remove_pct in [0, 10, 20, 30, 40, 50]:
        n_remove = int(len(top_scores_sorted) * remove_pct / 100)
        remaining = top_scores_sorted[n_remove:]  # Remove lowest scores
        
        n_remaining = len(remaining)
        n_correct = sum(1 for s in remaining if s['is_correct'])
        prec = n_correct / n_remaining * 100 if n_remaining > 0 else 0
        change = prec - precision_top * 100
        
        marker = " **" if change > 5 else ""
        print(f"{remove_pct:>3}%       {n_remaining:<12} {n_correct:<10} {prec:>6.1f}%      {change:>+6.1f}pp{marker}")
    
    print("\n** = >5pp improvement over original")

PRUNING: Score Iter1's top 30% with biased model, remove lowest confidence

Scoring Iter1's top 30% (336 samples) with biased model...


Scoring passes: 100%|██████████| 5/5 [00:01<00:00,  3.07it/s]

Scored 1008 hypotheses from 336 samples
Scored 336 samples

Original top 30%: 336 samples, 67.6% precision

Precision after REMOVING lowest-confidence samples:
------------------------------------------------------------
Remove     Remaining    Correct    Precision    Change    
------------------------------------------------------------
  0%       336          227          67.6%        +0.0pp
 10%       303          210          69.3%        +1.7pp
 20%       269          194          72.1%        +4.6pp
 30%       236          173          73.3%        +5.7pp **
 40%       202          148          73.3%        +5.7pp **
 50%       168          125          74.4%        +6.8pp **

** = >5pp improvement over original





In [None]:
if not DO.lack_partial_coverage:
    # Plot comparison
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    percentiles = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
    
    # Plot 1: Precision comparison
    ax1 = axes[0]
    ax1.plot(percentiles, [iter1_remaining_precision.get(p, 0)*100 for p in percentiles], 
             'b-o', label='Iter1 (unbiased)', linewidth=2)
    ax1.plot(percentiles, [iter3_precision_grad.get(p, 0)*100 for p in percentiles], 
             'g--s', label='Iter3 Gradient', linewidth=2)
    ax1.plot(percentiles, [iter3_precision_loss.get(p, 0)*100 for p in percentiles], 
             'r--^', label='Iter3 Loss', linewidth=2)
    ax1.plot(percentiles, [iter3_precision_combined.get(p, 0)*100 for p in percentiles], 
             'm-*', label='Iter3 Combined', linewidth=2, markersize=10)
    ax1.axhline(y=100/hyp_per_sample, color='gray', linestyle=':', label='Random baseline')
    ax1.set_xlabel('Top Percentile')
    ax1.set_ylabel('Precision (%)')
    ax1.set_title('Precision on Remaining 70% Samples')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Plot 2: Improvement over Iteration 1
    ax2 = axes[1]
    improvement_grad = [iter3_precision_grad.get(p, 0)*100 - iter1_remaining_precision.get(p, 0)*100 for p in percentiles]
    improvement_loss = [iter3_precision_loss.get(p, 0)*100 - iter1_remaining_precision.get(p, 0)*100 for p in percentiles]
    improvement_combined = [iter3_precision_combined.get(p, 0)*100 - iter1_remaining_precision.get(p, 0)*100 for p in percentiles]
    
    ax2.bar([p-2 for p in percentiles], improvement_grad, width=2, label='Gradient', alpha=0.7)
    ax2.bar([p for p in percentiles], improvement_loss, width=2, label='Loss', alpha=0.7)
    ax2.bar([p+2 for p in percentiles], improvement_combined, width=2, label='Combined', alpha=0.7)
    ax2.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
    ax2.set_xlabel('Top Percentile')
    ax2.set_ylabel('Precision Improvement (pp)')
    ax2.set_title('Improvement over Iteration 1')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(f'{results_path}/precision_comparison.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print(f"\nPlot saved to {results_path}/precision_comparison.png")

---
# Summary

In [None]:
if not DO.lack_partial_coverage:
    print("=" * 70)
    print("ITERATIVE APPROACH SUMMARY")
    print("=" * 70)
    
    print(f"\nIteration 1: Unbiased Training")
    print(f"  - Trained on ALL hypotheses for {iter1_epochs} epochs")
    print(f"  - Top {top_percentile}% precision: {iter1_precision.get(top_percentile, 0)*100:.1f}%")
    print(f"  - Overall precision: {iter1_precision.get(100, 0)*100:.1f}%")
    
    print(f"\nIteration 2: Biased Training")
    print(f"  - Trained on top {top_percentile}% ({n_top} samples) + partial ({n_partial} samples)")
    print(f"  - Partial weight: {partial_weight:.2f}x")
    print(f"  - Training set was ~{precision_top*100:.1f}% correct (top selections) + 100% correct (partial)")
    
    print(f"\nIteration 3: Score Remaining Data")
    print(f"  - Scored {len(remaining_sample_indices)} remaining samples with biased model")
    print(f"  - Best method at top 30%:")
    
    p1_30 = iter1_remaining_precision.get(30, 0) * 100
    p3g_30 = iter3_precision_grad.get(30, 0) * 100
    p3l_30 = iter3_precision_loss.get(30, 0) * 100
    p3c_30 = iter3_precision_combined.get(30, 0) * 100
    
    print(f"    Iter1 unbiased: {p1_30:.1f}%")
    print(f"    Iter3 gradient: {p3g_30:.1f}% ({p3g_30 - p1_30:+.1f}pp)")
    print(f"    Iter3 loss:     {p3l_30:.1f}% ({p3l_30 - p1_30:+.1f}pp)")
    print(f"    Iter3 combined: {p3c_30:.1f}% ({p3c_30 - p1_30:+.1f}pp)")
    
    best_improvement = max(p3g_30, p3l_30, p3c_30) - p1_30
    if best_improvement > 0:
        print(f"\n  --> IMPROVEMENT: +{best_improvement:.1f} percentage points at top 30%")
    else:
        print(f"\n  --> No improvement at top 30% (need to investigate)")

## Next Steps

If Iteration 3 shows improvement:
1. **Continue iterating**: Add high-confidence Iter3 selections to training, retrain, repeat
2. **Tune loss_weight**: Try different weightings of loss vs gradient
3. **Final GGH**: Use the improved selections to feed into full GGH training and measure R2

If Iteration 3 shows no improvement:
1. **Investigate**: Check if biased model is truly biased (loss distribution on partial data)
2. **Try different percentiles**: Maybe top 20% or top 40% works better
3. **Alternative scoring**: Try other combinations of signals