In [1]:
import os
os.chdir('/home/smallyan/eval_agent')
print(f"Working directory: {os.getcwd()}")

Working directory: /home/smallyan/eval_agent


# Circuit Analysis Code Evaluation

This notebook evaluates the code implementation under `/net/scratch2/smallyan/erasing-llm_eval` for circuit analysis.

## Setup and Initial Exploration

In [2]:
# First, let's explore the repository structure
REPO_PATH = "/net/scratch2/smallyan/erasing-llm_eval"

# List all files in the repository
for root, dirs, files in os.walk(REPO_PATH):
    # Skip hidden directories and common non-essential directories
    dirs[:] = [d for d in dirs if not d.startswith('.') and d not in ['__pycache__', 'venv', '.git']]
    level = root.replace(REPO_PATH, '').count(os.sep)
    indent = ' ' * 2 * level
    print(f'{indent}{os.path.basename(root)}/')
    subindent = ' ' * 2 * (level + 1)
    for file in files:
        print(f'{subindent}{file}')

erasing-llm_eval/
  documentation.pdf
  .gitignore
  __init__.py
  CodeWalkthrough.md
  requirements.txt
  plan.md
  trainscripts/
    erase.py
    prepare_consistency_data.py
    __init__.py
  utils/
    metrics.py
    __init__.py
    lora.py
  data/
    wmdp-keywords.json
    harrypotter/
      hp-questions-dual.json
      hp-questions.json
    wmdp/
      bio-questions.json
      chem-questions.json
      cyber-questions.json
  notebooks/
    inference.ipynb
  images/
    method.png
  evaluation/
    generalization_eval_summary.json
    self_matching.ipynb
    generalization_eval.ipynb
    consistency_evaluation.json
    replications/
      evaluation_replication.md
      self_replication_evaluation.json
      documentation_replication.md
      training_losses.png
      replication.ipynb
      elm_model/
        adapter_config.json
        adapter_model.safetensors
        README.md
    replication_eval/
      documentation_eval_summary.json
      documentation_evaluation_summary.md

## Project Overview

Based on the Plan and CodeWalkthrough files:

**Project Goal**: Erasure of Language Memory (ELM) - a method for erasing conceptual knowledge from language models by:
1. Using introspective classification with expert/novice context prompts
2. Three loss terms: Lerase, Lretain, and Lfluency
3. Low-rank adapters (LoRA) applied to early model layers

**Core Analysis Files to Evaluate**:
1. `trainscripts/erase.py` - Main training script
2. `trainscripts/prepare_consistency_data.py` - Pre-generating consistency data
3. `utils/metrics.py` - Metrics utilities
4. `utils/lora.py` - LoRA utilities
5. `notebooks/inference.ipynb` - Testing notebook
6. `evaluation/` - Evaluation notebooks

## Files to Evaluate

Based on the CodeWalkthrough.md and plan.md, the core analysis files are:

### Python Modules (trainscripts/)
1. `trainscripts/erase.py` - Main ELM training script
2. `trainscripts/prepare_consistency_data.py` - Data preparation for consistency training

### Utility Modules (utils/)
3. `utils/metrics.py` - Evaluation metrics (WMDP, MMLU, HP accuracy)
4. `utils/lora.py` - LoRA network implementation

### Notebooks
5. `notebooks/inference.ipynb` - Model inference and testing

### Key Functions to Evaluate in erase.py:
- `get_edit_vector()` - Computes ELM edit vector
- `ELMLogits` class - Custom logits processor
- `generate()` - Text generation with ELM guidance
- `prepare_prompts()` - Dataset preparation
- `train_elm()` - Main training function

In [3]:
# Check GPU availability
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"Device: {torch.cuda.get_device_name(0)}")
    print(f"Device count: {torch.cuda.device_count()}")
    
# Set device
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

CUDA available: True
Device: NVIDIA A100 80GB PCIe
Device count: 1
Using device: cuda:0
