# Find Clean Single-Token Words

**Goal:** Find interesting words that tokenize to exactly 1 token (no BPE splitting) for neighborhood analysis.

**Why:** We want to examine the local neighborhood around semantically meaningful tokens to test:
1. **Radial hypothesis:** Do related words lie along the same ray at different radii?
2. **Causal vs cosine:** What's in the causal hypersphere vs angular cone?

**Method:**
- Test various candidate words across semantic categories
- Find which ones map to single tokens
- Report token IDs and check if they're in our 32k sample

**Next step:** Pick the most interesting token(s) and analyze their neighborhoods in 07.57.

## Configuration

In [1]:
# Model
MODEL_NAME = "Qwen/Qwen3-4B-Instruct-2507"

# Our token sample (to check if candidates are available)
TOKEN_INDICES_PATH = '../data/vectors/distances_causal_32000.pt'

print(f"Configuration:")
print(f"  Model: {MODEL_NAME}")
print(f"  Token sample: {TOKEN_INDICES_PATH}")

Configuration:
  Model: Qwen/Qwen3-4B-Instruct-2507
  Token sample: ../data/vectors/distances_causal_32000.pt


## Setup

In [2]:
from transformers import AutoTokenizer
import torch

print("✓ Imports complete")

✓ Imports complete


## Load Tokenizer

In [3]:
print(f"Loading tokenizer from {MODEL_NAME}...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

print(f"\n✓ Tokenizer loaded")
print(f"  Vocab size: {tokenizer.vocab_size:,}")

Loading tokenizer from Qwen/Qwen3-4B-Instruct-2507...

✓ Tokenizer loaded
  Vocab size: 151,643


## Load Token Sample Indices

In [4]:
print(f"Loading token indices from {TOKEN_INDICES_PATH}...")
data = torch.load(TOKEN_INDICES_PATH, weights_only=False)
token_indices = data['token_indices'].numpy()

print(f"\n✓ Loaded token sample")
print(f"  N tokens in sample: {len(token_indices):,}")
print(f"  Range: [{token_indices.min()}, {token_indices.max()}]")

# Convert to set for fast membership checking
token_set = set(token_indices)
print(f"  Converted to set for fast lookup")

Loading token indices from ../data/vectors/distances_causal_32000.pt...

✓ Loaded token sample
  N tokens in sample: 32,000
  Range: [5, 151930]
  Converted to set for fast lookup


## Test Candidate Words

We'll test words across various semantic categories to find clean single-token words.

In [5]:
# Candidate words organized by category
test_words = {
    'Animals': ['cat', 'dog', 'bird', 'fish', 'lion', 'tiger', 'elephant', 'whale'],
    'Royalty': ['king', 'queen', 'monarch', 'emperor', 'prince', 'princess', 'duke', 'lord'],
    'Intelligence': ['intelligence', 'wisdom', 'knowledge', 'smart', 'clever', 'genius', 'brilliant'],
    'Emotions': ['love', 'hate', 'fear', 'joy', 'anger', 'sadness', 'happiness', 'surprise'],
    'Colors': ['red', 'blue', 'green', 'yellow', 'purple', 'orange', 'black', 'white'],
    'Numbers': ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten'],
    'Size': ['big', 'small', 'large', 'tiny', 'huge', 'enormous', 'microscopic'],
    'Science': ['science', 'physics', 'chemistry', 'biology', 'mathematics', 'astronomy'],
    'Arts': ['art', 'music', 'literature', 'poetry', 'painting', 'sculpture'],
    'Abstract': ['beauty', 'truth', 'freedom', 'justice', 'peace', 'war', 'time', 'space']
}

print("Testing tokenization for candidate words...\n")
print("=" * 80)

clean_tokens = []  # Will store (category, word, token_id, in_sample)

for category, words in test_words.items():
    print(f"\n{category}:")
    print("-" * 40)
    
    for word in words:
        tokens = tokenizer.encode(word, add_special_tokens=False)
        
        if len(tokens) == 1:
            token_id = tokens[0]
            token_text = tokenizer.decode([token_id])
            in_sample = token_id in token_set
            
            status = "✓ IN SAMPLE" if in_sample else "✗ not in sample"
            print(f"  ✓ '{word}' → token {token_id}: '{token_text}' [{status}]")
            
            clean_tokens.append((category, word, token_id, in_sample))
        else:
            print(f"  ✗ '{word}' → {len(tokens)} tokens: {tokens}")

print("\n" + "=" * 80)

Testing tokenization for candidate words...


Animals:
----------------------------------------
  ✓ 'cat' → token 4616: 'cat' [✗ not in sample]
  ✓ 'dog' → token 18457: 'dog' [✗ not in sample]
  ✓ 'bird' → token 22592: 'bird' [✓ IN SAMPLE]
  ✓ 'fish' → token 18170: 'fish' [✗ not in sample]
  ✓ 'lion' → token 78151: 'lion' [✗ not in sample]
  ✗ 'tiger' → 2 tokens: [83, 7272]
  ✗ 'elephant' → 2 tokens: [10068, 26924]
  ✗ 'whale' → 2 tokens: [1312, 1574]

Royalty:
----------------------------------------
  ✓ 'king' → token 10566: 'king' [✗ not in sample]
  ✓ 'queen' → token 93114: 'queen' [✗ not in sample]
  ✗ 'monarch' → 2 tokens: [1645, 1113]
  ✗ 'emperor' → 2 tokens: [336, 25819]
  ✗ 'prince' → 2 tokens: [649, 1701]
  ✗ 'princess' → 2 tokens: [649, 19570]
  ✗ 'duke' → 2 tokens: [1054, 440]
  ✓ 'lord' → token 25598: 'lord' [✗ not in sample]

Intelligence:
----------------------------------------
  ✓ 'intelligence' → token 92275: 'intelligence' [✗ not in sample]
  ✗ 'wisdom' → 2 tokens: 

## Summary of Clean Tokens in Sample

These are the single-token words that are also in our 32k sample (available for neighborhood analysis).

In [6]:
# Filter to only tokens in our sample
available_tokens = [(cat, word, tid) for cat, word, tid, in_sample in clean_tokens if in_sample]

print(f"\nClean single-token words IN our 32k sample:")
print("=" * 80)

if available_tokens:
    # Group by category
    by_category = {}
    for cat, word, tid in available_tokens:
        if cat not in by_category:
            by_category[cat] = []
        by_category[cat].append((word, tid))
    
    for category, tokens in by_category.items():
        print(f"\n{category}:")
        for word, tid in tokens:
            print(f"  • '{word}' (token {tid})")
    
    print(f"\n" + "=" * 80)
    print(f"Total: {len(available_tokens)} clean tokens available for analysis")
    
else:
    print("\n⚠️  No clean single-token words found in our sample!")
    print("    We may need to expand our search or use a different approach.")


Clean single-token words IN our 32k sample:

Animals:
  • 'bird' (token 22592)

Colors:
  • 'black' (token 11453)
  • 'white' (token 5782)

Numbers:
  • 'four' (token 34024)

Size:
  • 'big' (token 16154)

Science:
  • 'physics' (token 66765)
  • 'chemistry' (token 51655)

Abstract:
  • 'truth' (token 58577)
  • 'war' (token 11455)
  • 'time' (token 1678)
  • 'space' (token 8746)

Total: 11 clean tokens available for analysis


## Recommendations for Neighborhood Analysis

Based on the results above, which tokens would be most interesting to examine?

**Criteria:**
1. **Semantic richness:** Words with clear semantic relationships (e.g., 'king' → 'queen', 'monarch')
2. **Conceptual clarity:** Abstract concepts vs concrete objects
3. **Category diversity:** Test different semantic domains

**Linear representation hypothesis test:**
- If 'king', 'queen', 'monarch' are along the same ray → supports hypothesis
- If 'cat', 'dog', 'lion' cluster in causal space → interesting structure
- If abstract concepts ('love', 'wisdom') show radial stratification → deep finding

In [7]:
# Let's also check some interesting multi-word concepts that might be single tokens
additional_candidates = [
    'the', 'and', 'is', 'to', 'of',  # Common words (likely single tokens)
    'AI', 'DNA', 'USA', 'CEO',  # Acronyms
    'hello', 'goodbye', 'yes', 'no',  # Simple words
    'water', 'fire', 'earth', 'air',  # Elements
    'sun', 'moon', 'star', 'planet',  # Astronomy
    'good', 'bad', 'evil', 'pure',  # Morality
]

print("\nAdditional candidates:")
print("=" * 80)

for word in additional_candidates:
    tokens = tokenizer.encode(word, add_special_tokens=False)
    
    if len(tokens) == 1:
        token_id = tokens[0]
        token_text = tokenizer.decode([token_id])
        in_sample = token_id in token_set
        
        status = "✓ IN SAMPLE" if in_sample else "✗ not in sample"
        print(f"  ✓ '{word}' → token {token_id}: '{token_text}' [{status}]")
        
        if in_sample:
            available_tokens.append(("Additional", word, token_id))
    else:
        print(f"  ✗ '{word}' → {len(tokens)} tokens")

print(f"\n{'=' * 80}")
print(f"\nTotal available tokens: {len(available_tokens)}")


Additional candidates:
  ✓ 'the' → token 1782: 'the' [✓ IN SAMPLE]
  ✓ 'and' → token 437: 'and' [✗ not in sample]
  ✓ 'is' → token 285: 'is' [✗ not in sample]
  ✓ 'to' → token 983: 'to' [✓ IN SAMPLE]
  ✓ 'of' → token 1055: 'of' [✗ not in sample]
  ✓ 'AI' → token 15469: 'AI' [✗ not in sample]
  ✓ 'DNA' → token 55320: 'DNA' [✗ not in sample]
  ✓ 'USA' → token 24347: 'USA' [✗ not in sample]
  ✓ 'CEO' → token 78496: 'CEO' [✗ not in sample]
  ✓ 'hello' → token 14990: 'hello' [✗ not in sample]
  ✗ 'goodbye' → 2 tokens
  ✓ 'yes' → token 9693: 'yes' [✗ not in sample]
  ✓ 'no' → token 2152: 'no' [✗ not in sample]
  ✓ 'water' → token 12987: 'water' [✗ not in sample]
  ✓ 'fire' → token 10796: 'fire' [✓ IN SAMPLE]
  ✓ 'earth' → token 27541: 'earth' [✗ not in sample]
  ✓ 'air' → token 1310: 'air' [✗ not in sample]
  ✓ 'sun' → token 39519: 'sun' [✗ not in sample]
  ✓ 'moon' → token 67269: 'moon' [✗ not in sample]
  ✓ 'star' → token 11870: 'star' [✗ not in sample]
  ✓ 'planet' → token 50074: 'planet

## Next Steps

**For 07.57:** Pick 2-3 interesting tokens from the available list and:

1. **Causal neighborhood:** Find all tokens within radius R (e.g., 5 logometers)
2. **Cosine neighborhood:** Find tokens with cosine similarity > 0.999
3. **Compare:** Do they overlap? Are cosine-similar tokens at different radii?
4. **Decode:** What do the neighbors actually say?

**Test the linear representation hypothesis:**
- Are semantically related words along the same ray?
- Does radial position correlate with frequency, specificity, or abstraction?
- Do different semantic categories show different geometric patterns?