# Spin‑Prime Encoding Demo 🌌🔢

This Colab‑ready notebook shows how to map **word vectors** into an **indivisible prime number** representation using a toy *spinor* twist encoding.

**Pipeline**
1. Load small GloVe vectors (50‑D)
2. Select a handful of nouns & verbs from **WordNet**
3. *Twist‑encode* each vector (simulate SU(2) double cover)
4. Map the vector norm → nearest **prime** (indivisible magnitude key)
5. Visualize the original vectors (PCA‑2D) with prime labels


In [None]:
# Install required libraries (lightweight)
!pip -q install gensim nltk sympy scikit-learn

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import gensim.downloader as api
import nltk, math, hashlib
from sympy import nextprime
from sklearn.decomposition import PCA
nltk.download('wordnet')
from nltk.corpus import wordnet as wn

## Helper functions

In [None]:
def twist_encode(vec: np.ndarray):
    """Return simulated spinor double‑cover (v, −v)."""
    return vec, -vec

def encode_magnitude_to_prime(mag: float, scale: int = 10_000) -> int:
    """Quantize magnitude and map to nearest prime."""
    scaled = max(2, int(round(mag * scale)))
    return int(nextprime(scaled))

def spin_prime_encode(vec: np.ndarray):
    spin_pos, spin_neg = twist_encode(vec)
    prime_code = encode_magnitude_to_prime(np.linalg.norm(vec))
    return spin_pos, spin_neg, prime_code

## Load word vectors & sample WordNet terms

In [None]:
# Load a small 50‑dimensional GloVe model (~70 MB, quick)
model = api.load('glove-wiki-gigaword-50')

# Choose 10 illustrative synset lemmas from WordNet
sample_lemmas = ['cat', 'dog', 'car', 'vehicle', 'run', 'walk', 'music', 'art', 'computer', 'science']

vecs = []
words = []
for w in sample_lemmas:
    if w in model:
        words.append(w)
        vecs.append(model[w])

vecs = np.stack(vecs)
print(f"Loaded {len(vecs)} word vectors.")

## Spin‑Prime encode each vector

In [None]:
records = []
for word, vec in zip(words, vecs):
    spin_pos, spin_neg, prime_code = spin_prime_encode(vec)
    records.append({
        'word': word,
        'prime': prime_code,
        'norm': np.linalg.norm(vec),
        'spin_pos_head': spin_pos[:5],  # preview first 5 dims
    })

df = pd.DataFrame(records)
df

## Visualize in 2‑D PCA with Prime Labels

In [None]:
pca = PCA(n_components=2)
coords = pca.fit_transform(vecs)

fig, ax = plt.subplots(figsize=(6,6))
for (x, y), word, prime in zip(coords, words, df['prime']):
    ax.scatter(x, y, s=60)
    ax.text(x+0.02, y+0.02, f"{word}\n{prime}", fontsize=9)

ax.set_xlabel('PCA‑1')
ax.set_ylabel('PCA‑2')
ax.set_title('Word Vectors → Spin‑Prime Encoding')
ax.grid(True)
plt.show()

## Next Steps
* Try bigger models (e.g. word2vec‑300D)
* Explore **Gaussian primes** or multi‑prime tuples for richer encodings
* Investigate whether spin consistency helps analogy tasks
