# IOI Example Submission - DAS Training for Attention Heads

This notebook demonstrates how to train DAS (Distributed Alignment Search) on attention heads for the IOI (Indirect Object Identification) task using a Gemma model.

## Overview
1. Load IOI datasets and setup the model
2. Learn linear parameters from training data
3. Train DAS featurizers on selected attention heads
4. Save trained models in submission format

In [None]:
import sys
from pathlib import Path
sys.path.append(str(Path().resolve()))

from tasks.IOI_task.ioi_task import get_causal_model, get_counterfactual_datasets, get_token_positions
from CausalAbstraction.experiments.attention_head_experiment import PatchAttentionHeads
from CausalAbstraction.experiments.filter_experiment import FilterExperiment
from CausalAbstraction.experiments.aggregate_experiments import attention_head_baselines
from baselines.ioi_baselines.ioi_utils import (
    log_diff, clear_memory, checker, filter_checker, custom_loss, 
    ioi_loss_and_metric_fn, setup_pipeline
)
import torch
import gc
import json
import os
import numpy as np
from sklearn.linear_model import LinearRegression

# Clear memory before starting
gc.collect()
torch.cuda.empty_cache()

# Set device
device = "cuda:1" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Model configuration
model_name = "llama"  # Will use google/gemma-2-2b
heads_list = [(7, 6), (8,1)]  

print(f"Model: {model_name}")
print(f"Heads to train: {heads_list}")

nnsight is not detected. Please install via 'pip install nnsight' for nnsight backend.
Using device: cuda:1
Model: gemma
Heads to train: [(7, 6), (8, 1)]


## Step 1: Load Data and Setup Model

In [2]:
# Get counterfactual datasets with placeholder causal model
# We'll update the causal model with learned parameters later
causal_model = get_causal_model({"bias": 0.0, "token_coeff": 0.0, "position_coeff": 0.0})
counterfactual_datasets = get_counterfactual_datasets(hf=True, size=1000, load_private_data=True)#None)

print("Available datasets:", counterfactual_datasets.keys())

# Get a sample to display
sample_dataset = next(iter(counterfactual_datasets.values()))
if len(sample_dataset) > 0:
    sample = sample_dataset[0]
    print("\nSample input:")
    print(f"  Raw input: {sample['input']['raw_input']}")
    print(f"  Name A: {sample['input']['name_A']}")
    print(f"  Name B: {sample['input']['name_B']}")
    print(f"  Name C: {sample['input']['name_C']}")

Using the latest cached version of the dataset since mib-bench/ioi_private_test couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /home/atticus/.cache/huggingface/datasets/mib-bench___ioi_private_test/default/0.0.0/3513057605230348398b751d59dfeb581c115922 (last modified on Fri May 30 21:55:12 2025).


Available datasets: dict_keys(['s1_io_flip_train', 's2_io_flip_train', 's1_ioi_flip_s2_ioi_flip_train', 's1_io_flip_test', 's2_io_flip_test', 's1_ioi_flip_s2_ioi_flip_test', 's1_io_flip_testprivate', 's2_io_flip_testprivate', 's1_ioi_flip_s2_ioi_flip_testprivate', 'same_train', 'same_test', 'same_testprivate'])

Sample input:
  Raw input: As Carl and Maria left the consulate, Carl gave a fridge to
  Name A: Carl
  Name B: Maria
  Name C: Carl


In [3]:
# Set up pipeline
pipeline, default_batch_size = setup_pipeline(model_name, device, eval_batch_size=None)
batch_size = 128  # You can adjust this based on your GPU memory
eval_batch_size = 1024

print(f"Pipeline device: {pipeline.model.device}")
print(f"Model: {pipeline.model.__class__.__name__}")
print(f"Hidden size: {pipeline.model.config.hidden_size}")
print(f"Number of layers: {pipeline.get_num_layers()}")

# Test model on a sample
if len(sample_dataset) > 0:
    sample = sample_dataset[0]
    print("\nTesting model on sample:")
    print(f"INPUT: {sample['input']['raw_input']}")
    expected = causal_model.run_forward(sample['input'])['raw_output']
    print(f"EXPECTED OUTPUT: {expected}")
    print(f"MODEL PREDICTION: {pipeline.dump(pipeline.generate(sample['input']['raw_input']))}")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

You have set `use_cache` to `False`, but cache_implementation is set to hybrid. cache_implementation will have no effect.


Pipeline device: cuda:1
Model: Gemma2ForCausalLM
Hidden size: 2304
Number of layers: 26

Testing model on sample:
INPUT: As Carl and Maria left the consulate, Carl gave a fridge to
EXPECTED OUTPUT: Maria
MODEL PREDICTION:  Maria


## Step 2: Filter Datasets Based on Model Performance

In [4]:
# Filter the datasets
print("Filtering datasets based on model performance...")
exp = FilterExperiment(pipeline, causal_model, filter_checker)
filtered_datasets = exp.filter(counterfactual_datasets, verbose=True, batch_size=eval_batch_size)

# Get token positions
token_positions = get_token_positions(pipeline, causal_model)
print(f"\nToken positions: {[pos.id for pos in token_positions]}")

Filtering datasets based on model performance...


Filtering s1_io_flip_train: 100%|██████████| 1/1 [00:05<00:00,  5.25s/it]


Dataset 's1_io_flip_train': kept 866/1000 examples (86.6%)


Filtering s2_io_flip_train: 100%|██████████| 1/1 [00:05<00:00,  5.13s/it]


Dataset 's2_io_flip_train': kept 855/1000 examples (85.5%)


Filtering s1_ioi_flip_s2_ioi_flip_train: 100%|██████████| 1/1 [00:05<00:00,  5.14s/it]


Dataset 's1_ioi_flip_s2_ioi_flip_train': kept 864/1000 examples (86.4%)


Filtering s1_io_flip_test: 100%|██████████| 1/1 [00:05<00:00,  5.09s/it]


Dataset 's1_io_flip_test': kept 775/1000 examples (77.5%)


Filtering s2_io_flip_test: 100%|██████████| 1/1 [00:05<00:00,  5.11s/it]


Dataset 's2_io_flip_test': kept 765/1000 examples (76.5%)


Filtering s1_ioi_flip_s2_ioi_flip_test: 100%|██████████| 1/1 [00:05<00:00,  5.11s/it]


Dataset 's1_ioi_flip_s2_ioi_flip_test': kept 772/1000 examples (77.2%)


Filtering s1_io_flip_testprivate: 100%|██████████| 1/1 [00:05<00:00,  5.14s/it]


Dataset 's1_io_flip_testprivate': kept 880/1000 examples (88.0%)


Filtering s2_io_flip_testprivate: 100%|██████████| 1/1 [00:05<00:00,  5.14s/it]


Dataset 's2_io_flip_testprivate': kept 876/1000 examples (87.6%)


Filtering s1_ioi_flip_s2_ioi_flip_testprivate: 100%|██████████| 1/1 [00:05<00:00,  5.23s/it]


Dataset 's1_ioi_flip_s2_ioi_flip_testprivate': kept 880/1000 examples (88.0%)


Filtering same_train: 100%|██████████| 1/1 [00:05<00:00,  5.13s/it]


Dataset 'same_train': kept 906/1000 examples (90.6%)


Filtering same_test: 100%|██████████| 1/1 [00:05<00:00,  5.19s/it]


Dataset 'same_test': kept 824/1000 examples (82.4%)


Filtering same_testprivate: 100%|██████████| 1/1 [00:05<00:00,  5.14s/it]

Dataset 'same_testprivate': kept 914/1000 examples (91.4%)

Total filtering results:
Original examples: 12000
Kept examples: 10177
Overall keep rate: 84.8%

Token positions: ['all']





## Step 3: Learn Linear Parameters from Training Data

IOI requires linear parameters (bias, token_coeff, position_coeff) for the causal mechanism. We'll learn these from the training data.

In [5]:
# Load linear parameters from external file
linear_params_file = "baselines/ioi_linear_params.json"

print(f"Loading linear parameters from: {linear_params_file}")
try:
    if os.path.isfile(linear_params_file):
        with open(linear_params_file, 'r') as f:
            all_coeffs = json.load(f)
    else:
        raise FileNotFoundError(f"Linear parameters file not found: {linear_params_file}")
except Exception as e:
    raise ValueError(f"Failed to load linear_params: {e}")

# Find the coefficients for this model
if model_name in all_coeffs:
    coeffs = all_coeffs[model_name]
elif "default" in all_coeffs:
    coeffs = all_coeffs["default"]
else:
    # Use the first available coefficients
    coeffs = next(iter(all_coeffs.values()))

# Validate required keys
required_keys = ['bias', 'token_coeff', 'position_coeff']
for key in required_keys:
    if key not in coeffs:
        raise ValueError(f"Missing required key '{key}' in linear_coeffs for model {model_name}")

intercept = coeffs['bias']
token_coef = coeffs['token_coeff']
position_coef = coeffs['position_coeff']

print(f"Using coefficients for {model_name}:")
print(f"  bias: {intercept}")
print(f"  token_coeff: {token_coef}")
print(f"  position_coeff: {position_coef}")

# Store parameters
linear_params = {
    "bias": float(intercept),
    "token_coeff": float(token_coef),
    "position_coeff": float(position_coef)
}

Loading linear parameters from: baselines/ioi_linear_params.json
Using coefficients for gemma:
  bias: 0.04835902899503708
  token_coeff: 0.767971899360421
  position_coeff: 2.004627879709005


## Step 4: Update Causal Model with Learned Parameters

In [6]:
# Update the causal model with learned parameters
causal_model = get_causal_model(linear_params)
print("Causal model updated with learned parameters")

# Clear memory before training
clear_memory()

Causal model updated with learned parameters


## Step 5: Prepare Training and Test Data

In [10]:
# Setup counterfactual names for IOI
counterfactuals = ["s1_io_flip", "s2_io_flip", "s1_ioi_flip_s2_ioi_flip"]
train_data = {}
test_data = {}

for counterfactual in counterfactuals:
    if "same" in counterfactual:
        print("hi")
        continue
    if counterfactual + "_train" in filtered_datasets:
        train_data[counterfactual + "_train"] = filtered_datasets[counterfactual + "_train"]  # Limit to 1000 samples for training
    if counterfactual + "_test" in filtered_datasets:
        test_data[counterfactual + "_test"] = filtered_datasets[counterfactual + "_test"]
    if counterfactual + "_testprivate" in filtered_datasets:
        test_data[counterfactual + "_testprivate"] = filtered_datasets[counterfactual + "_testprivate"]

print("Train datasets:", list(train_data.keys()))
print("Test datasets:", list(test_data.keys()))

Train datasets: ['s1_io_flip_train', 's2_io_flip_train', 's1_ioi_flip_s2_ioi_flip_train']
Test datasets: ['s1_io_flip_test', 's1_io_flip_testprivate', 's2_io_flip_test', 's2_io_flip_testprivate', 's1_ioi_flip_s2_ioi_flip_test', 's1_ioi_flip_s2_ioi_flip_testprivate']


## Step 6: Configure Experiment Settings

In [11]:
# Setup experiment configuration for DAS
config = {
    "evaluation_batch_size": eval_batch_size,
    "batch_size": batch_size, 
    "training_epoch": 2,  # Number of training epochs
    "check_raw": True,
    "n_features": 32,  # Feature dimension for DAS
    "regularization_coefficient": 0.0, 
    "output_scores": True, 
    "shuffle": True, 
    "temperature_schedule": (1.0, 0.01),  # Temperature annealing for training
    "init_lr": 1.0,
    "loss_and_metric_fn": lambda pipeline, intervenable_model, batch, model_units_list: 
        ioi_loss_and_metric_fn(pipeline, intervenable_model, batch, model_units_list),
}

# Setup directories for saving results
results_dir = "ioi_submission_results"
model_dir = "ioi_submission"

if not os.path.exists(results_dir):
    os.makedirs(results_dir)
if not os.path.exists(model_dir):
    os.makedirs(model_dir)

print(f"Results will be saved to: {results_dir}")
print(f"Models will be saved to: {model_dir}")

Results will be saved to: ioi_submission_results
Models will be saved to: ioi_submission


## Step 7: Train DAS on Output Position Variable

In [12]:
# Train DAS for output_position variable
print("\nTraining DAS for output_position variable...")
print("="*60)

# Fix for in-place operation error during training
# Set model to training mode and disable in-place operations
pipeline.model.train()
if hasattr(pipeline.model.config, 'use_cache'):
    pipeline.model.config.use_cache = False

target_variable = "output_position"
position_model_dir = os.path.join(model_dir, f"ioi_task_{pipeline.model.__class__.__name__}_{target_variable}")

attention_head_baselines(
    pipeline=pipeline, 
    task=causal_model, 
    token_positions=token_positions, 
    train_data=train_data, 
    test_data=test_data, 
    config=config, 
    target_variables=[target_variable], 
    checker=lambda logits, params: checker(logits, params, pipeline), 
    verbose=True, 
    results_dir=results_dir,
    model_dir=position_model_dir,
    heads_list=heads_list,
    skip=["full_vector", "DBM+SVD", "DBM+PCA", "DBM", "DBM+SAE"]  # Only run DAS
)

print(f"\nDAS training completed for {target_variable}")
print(f"Models saved to: {position_model_dir}")

# Clear memory
clear_memory()


Training DAS for output_position variable...
Running DAS method...


Epoch: 0: 100%|██████████| 21/21 [00:22<00:00,  1.09s/it, loss=155, mse=157, rmse=12.5]
Epoch: 1: 100%|██████████| 21/21 [00:23<00:00,  1.10s/it, loss=151, mse=151, rmse=12.3]
Epoch: 100%|██████████| 2/2 [00:46<00:00, 23.03s/it]


Running interventions for s1_io_flip_test with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:05<00:00,  5.54s/it]


Running interventions for s1_io_flip_testprivate with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:06<00:00,  6.45s/it]


Running interventions for s2_io_flip_test with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:05<00:00,  5.47s/it]


Running interventions for s2_io_flip_testprivate with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:06<00:00,  6.33s/it]


Running interventions for s1_ioi_flip_s2_ioi_flip_test with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:05<00:00,  5.50s/it]


Running interventions for s1_ioi_flip_s2_ioi_flip_testprivate with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:06<00:00,  6.43s/it]



DAS training completed for output_position
Models saved to: ioi_submission/ioi_task_Gemma2ForCausalLM_output_position


## Step 8: Train DAS on Output Token Variable

In [13]:
# Train DAS for output_token variable
print("\nTraining DAS for output_token variable...")
print("="*60)

target_variable = "output_token"
token_model_dir = os.path.join(model_dir, f"ioi_task_{pipeline.model.__class__.__name__}_{target_variable}")

config["n_features"] = 32

attention_head_baselines(
    pipeline=pipeline, 
    task=causal_model, 
    token_positions=token_positions, 
    train_data=train_data, 
    test_data=test_data, 
    config=config, 
    target_variables=[target_variable], 
    checker=lambda logits, params: checker(logits, params, pipeline), 
    verbose=True, 
    results_dir=results_dir,
    model_dir=token_model_dir,
    heads_list=heads_list,
    skip=["full_vector", "DBM+SVD", "DBM+PCA", "DBM", "DBM+SAE"]  # Only run DAS
)

print(f"\nDAS training completed for {target_variable}")
print(f"Models saved to: {token_model_dir}")

# Clear memory
clear_memory()


Training DAS for output_token variable...
Running DAS method...


Epoch: 0: 100%|██████████| 21/21 [00:23<00:00,  1.11s/it, loss=116, mse=119, rmse=10.9]
Epoch: 1: 100%|██████████| 21/21 [00:23<00:00,  1.11s/it, loss=113, mse=113, rmse=10.6]
Epoch: 100%|██████████| 2/2 [00:46<00:00, 23.35s/it]


Running interventions for s1_io_flip_test with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:05<00:00,  5.56s/it]


Running interventions for s1_io_flip_testprivate with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:06<00:00,  6.29s/it]


Running interventions for s2_io_flip_test with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:05<00:00,  5.56s/it]


Running interventions for s2_io_flip_testprivate with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:06<00:00,  6.35s/it]


Running interventions for s1_ioi_flip_s2_ioi_flip_test with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:05<00:00,  5.51s/it]


Running interventions for s1_ioi_flip_s2_ioi_flip_testprivate with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:06<00:00,  6.28s/it]



DAS training completed for output_token
Models saved to: ioi_submission/ioi_task_Gemma2ForCausalLM_output_token


## Step 9: Save Linear Parameters for Submission

In [14]:
# Save the linear parameters we learned
linear_params_file = os.path.join(model_dir, "ioi_linear_params.json")

params_to_save = {
    model_name: linear_params,
    "model_class": pipeline.model.__class__.__name__
}

with open(linear_params_file, 'w') as f:
    json.dump(params_to_save, f, indent=2)

print(f"Linear parameters saved to: {linear_params_file}")
print(json.dumps(params_to_save, indent=2))

Linear parameters saved to: ioi_submission/ioi_linear_params.json
{
  "gemma": {
    "bias": 0.04835902899503708,
    "token_coeff": 0.767971899360421,
    "position_coeff": 2.004627879709005
  },
  "model_class": "Gemma2ForCausalLM"
}


## Step 10: Verify Saved Models

Let's verify that the models were saved correctly by listing the saved files.

In [15]:
# List saved model files
print("\nSaved model structure:")
print(f"\n{model_dir}/")

for root, dirs, files in os.walk(model_dir):
    level = root.replace(model_dir, '').count(os.sep)
    indent = ' ' * 2 * level
    print(f"{indent}{os.path.basename(root)}/")
    subindent = ' ' * 2 * (level + 1)
    for file in sorted(files)[:10]:  # Show first 10 files
        print(f"{subindent}{file}")
    if len(files) > 10:
        print(f"{subindent}... and {len(files) - 10} more files")

print("\n✓ Submission ready!")
print(f"\nYour submission folder '{model_dir}' contains:")
print("- Trained DAS featurizers for both output_position and output_token")
print("- Linear parameters used for the causal model")
print("- All necessary files for evaluation")


Saved model structure:

ioi_submission/
ioi_submission/
  ioi_linear_params.json
  ioi_task_Gemma2ForCausalLM_output_position/
    DAS_Gemma2ForCausalLM_output_position/
      AttentionHead(Layer:7,Head:6,Token:all)_featurizer
      AttentionHead(Layer:7,Head:6,Token:all)_indices
      AttentionHead(Layer:7,Head:6,Token:all)_inverse_featurizer
      AttentionHead(Layer:8,Head:1,Token:all)_featurizer
      AttentionHead(Layer:8,Head:1,Token:all)_indices
      AttentionHead(Layer:8,Head:1,Token:all)_inverse_featurizer
  ioi_task_Gemma2ForCausalLM_output_token/
    DAS_Gemma2ForCausalLM_output_token/
      AttentionHead(Layer:7,Head:6,Token:all)_featurizer
      AttentionHead(Layer:7,Head:6,Token:all)_indices
      AttentionHead(Layer:7,Head:6,Token:all)_inverse_featurizer
      AttentionHead(Layer:8,Head:1,Token:all)_featurizer
      AttentionHead(Layer:8,Head:1,Token:all)_indices
      AttentionHead(Layer:8,Head:1,Token:all)_inverse_featurizer

✓ Submission ready!

Your submission fold

## Step 11: Load Trained Models and Run Inference

This section demonstrates how to load previously trained featurizers and use them for inference on test data. This is useful for:

1. **Testing trained models**: Verify that saved models work correctly
2. **Running interventions**: Use the trained featurizers to perform causal interventions on attention heads
3. **Evaluation**: Test model performance on held-out test data

The process involves:
- Loading the trained DAS featurizers from disk
- Running interventions on test datasets for both output_position and output_token variables
- Collecting results for analysis

This is exactly what the evaluation system will do with your submitted models.

In [16]:
# Load saved models and run inference
# This demonstrates how to load previously trained featurizers and run interventions

from CausalAbstraction.experiments.attention_head_experiment import PatchAttentionHeads

print("Loading trained models and running inference...")
print("="*60)

# Directory for saving inference results
inference_results_dir = results_dir + "_loaded"
if not os.path.exists(inference_results_dir):
    os.makedirs(inference_results_dir)

# Test both target variables
target_variables_to_test = ["output_position", "output_token"]

for target_variable in target_variables_to_test:
    print(f"\nTesting DAS method for {target_variable}...")
    
    config["n_features"] = 32
    
    # Create experiment with same configuration
    config["method_name"] = "DAS"
    experiment = PatchAttentionHeads(
        pipeline=pipeline,
        causal_model=causal_model,
        layer_head_list=heads_list,
        token_positions=token_positions,
        checker=lambda logits, params: checker(logits, params, pipeline),
        config=config
    )
    
    # Load the trained featurizers
    method_model_dir = os.path.join(
        model_dir, 
        f"ioi_task_{pipeline.model.__class__.__name__}_{target_variable}",
        f"DAS_{pipeline.model.__class__.__name__}_{target_variable}"
    )
    
    print(f"Loading featurizers from: {method_model_dir}")
    experiment.load_featurizers(method_model_dir)
    
    # Run interventions on test data
    raw_results = experiment.perform_interventions(
        test_data, 
        verbose=True, 
        target_variables_list=[[target_variable]], 
        save_dir=inference_results_dir
    )
    
    # Clean up
    del experiment, raw_results
    clear_memory()

print("\n" + "="*60)
print("Inference completed!")
print(f"Results saved to: {inference_results_dir}")

Loading trained models and running inference...

Testing DAS method for output_position...
Loading featurizers from: ioi_submission/ioi_task_Gemma2ForCausalLM_output_position/DAS_Gemma2ForCausalLM_output_position
Running interventions for s1_io_flip_test with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches:   0%|          | 0/1 [00:00<?, ?it/s]

Processing batches: 100%|██████████| 1/1 [00:05<00:00,  5.56s/it]


Running interventions for s1_io_flip_testprivate with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:06<00:00,  6.32s/it]


Running interventions for s2_io_flip_test with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:05<00:00,  5.53s/it]


Running interventions for s2_io_flip_testprivate with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:06<00:00,  6.23s/it]


Running interventions for s1_ioi_flip_s2_ioi_flip_test with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:05<00:00,  5.54s/it]


Running interventions for s1_ioi_flip_s2_ioi_flip_testprivate with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:06<00:00,  6.31s/it]



Testing DAS method for output_token...
Loading featurizers from: ioi_submission/ioi_task_Gemma2ForCausalLM_output_token/DAS_Gemma2ForCausalLM_output_token
Running interventions for s1_io_flip_test with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:05<00:00,  5.67s/it]


Running interventions for s1_io_flip_testprivate with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:06<00:00,  6.27s/it]


Running interventions for s2_io_flip_test with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:05<00:00,  5.52s/it]


Running interventions for s2_io_flip_testprivate with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:06<00:00,  6.24s/it]


Running interventions for s1_ioi_flip_s2_ioi_flip_test with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:05<00:00,  5.53s/it]


Running interventions for s1_ioi_flip_s2_ioi_flip_testprivate with model units [[AtomicModelUnit(id='AttentionHead(Layer:7,Head:6,Token:all)'), AtomicModelUnit(id='AttentionHead(Layer:8,Head:1,Token:all)')]]


Processing batches: 100%|██████████| 1/1 [00:06<00:00,  6.32s/it]



Inference completed!
Results saved to: ioi_submission_results_loaded
