# Circuit Analysis Code Evaluation

This notebook evaluates the code implementing circuit analysis (Vector Arithmetic in Concept and Token Subspaces) in the repository at `/net/scratch2/smallyan/arithmetic_eval`.

## Evaluation Criteria
1. **Runnable (Y/N)** - Block executes without error
2. **Correct-Implementation (Y/N)** - Logic implements described computation correctly
3. **Redundant (Y/N)** - Block duplicates another block's computation
4. **Irrelevant (Y/N)** - Block does not contribute to project goal

## Project Goal
Show that concept and token induction heads can identify subspaces of Llama-2-7b activations with coherent semantic and surface-level structure, enabling more accurate parallelogram arithmetic (e.g., Athens – Greece + China = Beijing) than using raw hidden states.


## Block-Level Evaluation Table

| File | Block/Function | Runnable | Correct-Impl | Redundant | Irrelevant | Error Note |
|------|----------------|----------|--------------|-----------|------------|------------|
| parallelograms.py | logit_lens | N | N | N | N | Function uses nnsight model.lm_head and model.mode... |
| parallelograms.py | print_logit_lens | Y | Y | N | N |  |
| parallelograms.py | proj_onto_ov | Y | Y | N | N |  |
| parallelograms.py | get_ov_sum | Y | Y | N | N |  |
| parallelograms.py | get_neighbors | Y | Y | N | N |  |
| parallelograms.py | get_parallelogram_scores | N | N | N | N | Calls logit_lens which fails outside trace context... |
| parallelograms.py | all_dot_products | Y | Y | N | N |  |
| parallelograms.py | calculate_save_scores | N | N | N | N | Calls get_parallelogram_scores->logit_lens which f... |
| parallelograms.py | main | N | N | N | N | Calls calculate_save_scores which fails. |
| all_parallelograms.py | loop_for_task | N | N | N | N | Calls calculate_save_scores which fails. |
| all_parallelograms.py | main | N | N | N | N | Calls loop_for_task which fails. |
| parallelogram_ranks.py | get_optimal_layers | Y | Y | N | N |  |
| parallelogram_ranks.py | run_rank_scan | N | N | N | N | Calls calculate_save_scores which fails. |
| parallelogram_ranks.py | main | N | N | N | N | Calls run_rank_scan which fails. |
| parallelogram_analysis.ipynb | cell_1_imports | Y | Y | N | N |  |
| parallelogram_analysis.ipynb | cell_2_get_number_neighbors | Y | Y | N | N |  |
| parallelogram_analysis.ipynb | cell_3_nn_acc_word2vec | Y | Y | N | N |  |
| parallelogram_analysis.ipynb | cell_4_get_number_neighbors_fv | Y | Y | N | N |  |
| parallelogram_analysis.ipynb | cell_5_nn_acc_fv | Y | Y | N | N |  |
| parallelogram_analysis.ipynb | cell_6_single_plot | Y | Y | N | N |  |
| parallelogram_analysis.ipynb | cell_7_rank_results | Y | Y | N | N |  |
| parallelogram_analysis.ipynb | cell_8_plot_task_ranks | Y | Y | N | N |  |


## Detailed Error Notes

### Critical Issue: logit_lens Function Bug

The `logit_lens` function in `parallelograms.py` has a fundamental implementation error:

```python
def logit_lens(concept_vec, model):
    with torch.no_grad():
        return model.lm_head(model.model.norm(concept_vec.cuda())).softmax(dim=-1).detach().cpu()
```

**Problem**: The function uses `model.lm_head` and `model.model.norm` which are nnsight proxy objects. These proxies only work inside a `model.trace()` context. Outside the trace context, they fail with:
```
AttributeError: 'NoneType' object has no attribute 'module_proxy'
```

**Fix**: The function should use `model._model.lm_head` and `model._model.model.norm` to access the underlying PyTorch model:
```python
def logit_lens(concept_vec, model):
    with torch.no_grad():
        return model._model.lm_head(model._model.model.norm(concept_vec.cuda())).softmax(dim=-1).detach().cpu()
```

### Propagation of the Bug

This bug propagates through the call chain:
1. `logit_lens` (fails)
2. `get_parallelogram_scores` (calls logit_lens → fails)
3. `calculate_save_scores` (calls get_parallelogram_scores → fails)
4. `parallelograms.py:main` (calls calculate_save_scores → fails)
5. `all_parallelograms.py:loop_for_task` (calls calculate_save_scores → fails)
6. `all_parallelograms.py:main` (calls loop_for_task → fails)
7. `parallelogram_ranks.py:run_rank_scan` (calls calculate_save_scores → fails)
8. `parallelogram_ranks.py:main` (calls run_rank_scan → fails)

### Working Functions

The following functions work correctly:
- `print_logit_lens` - Utility function for display
- `proj_onto_ov` - Projects words onto OV matrices
- `get_ov_sum` - Builds summed OV matrix from top-k heads
- `get_neighbors` - Gets representations for neighbor words
- `all_dot_products` - Calculates dot products for all pairs
- `get_optimal_layers` - Finds optimal layer for each task
- All notebook visualization cells - They read from cached results


## Quantitative Metrics

| Metric | Value | Description |
|--------|-------|-------------|
| **Runnable%** | 63.6% | Percentage of blocks that execute without error |
| **Output-Matches-Expectation%** | 63.6% | Percentage of blocks that run correctly |
| **Incorrect%** | 36.4% | Percentage of blocks with implementation errors |
| **Redundant%** | 0.0% | Percentage of redundant blocks |
| **Irrelevant%** | 0.0% | Percentage of irrelevant blocks |
| **Correction-Rate%** | 0.0% | Percentage of failing blocks that were corrected |

### Summary Statistics
- **Total Blocks Evaluated**: 22
- **Runnable Blocks**: 14 / 22
- **Failing Blocks**: 8 / 22


## Binary Checklist Summary

| Checklist Item | Condition | PASS/FAIL |
|----------------|-----------|-----------|
| **C1: All core analysis code is runnable** | No block has Runnable = N | **FAIL** |
| **C2: All implementations are correct** | No block has Correct-Implementation = N | **FAIL** |
| **C3: No redundant code** | No block has Redundant = Y | **PASS** |
| **C4: No irrelevant code** | No block has Irrelevant = Y | **PASS** |

### Rationale

**C1 (All Runnable): FAIL**
FAIL: 8 blocks have Runnable=N. The logit_lens function uses nnsight model proxy incorrectly outside trace context, and this bug propagates to get_parallelogram_scores, calculate_save_scores, and main functions in all scripts.

**C2 (All Correct): FAIL**
FAIL: 8 blocks have Correct-Implementation=N due to the same logit_lens bug that causes runtime failures.

**C3 (No Redundant): PASS**
PASS: No blocks were identified as duplicating another block's computation.

**C4 (No Irrelevant): PASS**
PASS: All blocks contribute to the project goal of parallelogram arithmetic analysis.


## Final Summary

### Overall Assessment

The code repository for "Vector Arithmetic in Concept and Token Subspaces" has a **critical bug** in the `logit_lens` function that prevents 36.4% of the codebase from running. However:

1. **The core methodology is sound**: The `get_ov_sum`, `proj_onto_ov`, `get_neighbors`, and visualization functions work correctly.

2. **Cached results exist**: The repository contains pre-computed results in the `cache/` directory, indicating the code worked at some point (possibly with a different version of nnsight or a different model loading approach).

3. **The visualization notebook works**: The `parallelogram_analysis.ipynb` notebook can successfully read cached results and generate all figures.

### Key Findings

| Category | Count | Percentage |
|----------|-------|------------|
| Working Blocks | 14 | 63.6% |
| Failing Blocks | 8 | 36.4% |
| Redundant Blocks | 0 | 0.0% |
| Irrelevant Blocks | 0 | 0.0% |

### Recommended Fix

To fix the `logit_lens` function, change:
```python
return model.lm_head(model.model.norm(concept_vec.cuda())).softmax(dim=-1).detach().cpu()
```
to:
```python
return model._model.lm_head(model._model.model.norm(concept_vec.cuda())).softmax(dim=-1).detach().cpu()
```

This single-line fix would resolve all 8 failing blocks.

### Files Generated

1. **Evaluation Notebook**: `/net/scratch2/smallyan/arithmetic_eval/evaluation/code_critic_evaluation.ipynb`
2. **JSON Summary**: `/net/scratch2/smallyan/arithmetic_eval/evaluation/code_critic_summary.json`
