# Tutorial 7: The Probing Theorem
## The Yoneda Lemma

---

### The Crown Jewel

*To the research assistant:*

*In Year 892, Tessery Vold made her most profound claim. She called it the Probing Theorem:*

> *"Tell me every creature that passes through the stakdur's territory, and I will tell you what a stakdur is—without ever seeing one."*

*This statement scandalized the Capital's natural philosophers. How can you know what something IS from how other things interact with it? Surely the stakdur has intrinsic properties—teeth, claws, behaviors—that define it independently of its relationships?*

*Marden Krell objected: "This is circular reasoning. To know what passes through the stakdur's territory, you must first know what a stakdur is and where its territory lies."*

*Vold's response was devastating: "I observe passages. I do not first define objects. The passages exist. From them, the objects emerge. The circularity you see is only in your ontology—not in mine."*

*The Probing Theorem is what modern category theorists call the Yoneda Lemma. It is often called the most important result in category theory. Your task: understand why.*

—*Archive Review Committee, Year 934*

---

## What You Will Learn

In this tutorial, you will learn to:

1. State the **Yoneda Lemma** and understand its components
2. See why objects are "determined" by their probing patterns
3. Understand the **Yoneda embedding** and why it's fully faithful
4. Connect the Yoneda Lemma to embeddings and representations in ML
5. Implement Yoneda-style reasoning with passage data

By the end, you will understand:
- Why Nat(Hom(-, X), F) ≅ F(X)
- The philosophical implications of "objects are relationships"
- How word embeddings secretly use Yoneda-style reasoning

---

## The Yoneda Lemma Statement

For any category C, object X in C, and functor F: C^op → Set:

**Nat(Hom(-, X), F) ≅ F(X)**

In words:
- The set of natural transformations from the probing functor Hom(-, X) to any functor F
- Is isomorphic to the set F(X)

### What This Means

1. **Objects can be recovered from functors**: X ↦ Hom(-, X) is injective (up to isomorphism)
2. **Natural transformations are "probing"**: Giving a natural transformation η: Hom(-, X) ⇒ F is the same as picking an element of F(X)
3. **Representable functors are special**: Hom(-, X) "represents" the object X

### Vold's Interpretation

> *"The Probing Theorem says: if you know all the coherent ways to shift from the stakdur's probing pattern to any other pattern, you know everything about the stakdur that can be known categorically. The object is not the thing—it is the web of coherent relationships."*

---

## Part 1: Setting Up

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
from collections import defaultdict
from itertools import product

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('deep')

print("Libraries loaded. Ready to explore the Probing Theorem.")

In [None]:
# Load the passage diagram data
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/densworld-datasets/main/data/"

passages = pd.read_csv(BASE_URL + "passage_diagrams.csv")
classifications = pd.read_csv(BASE_URL + "archive_classifications.csv")

print(f"Passages loaded: {len(passages)} morphisms")
print(f"Classifications loaded: {len(classifications)} records")

# Get all objects
all_objects = sorted(set(passages['source_object']) | set(passages['target_object']))
print(f"Objects in passage category: {len(all_objects)}")

---

## Part 2: The Yoneda Embedding

The **Yoneda embedding** is the functor:

y: C → [C^op, Set]

That sends each object X to the functor Hom(-, X).

The Yoneda Lemma implies that this embedding is **fully faithful**:
- Different objects map to different functors (faithful)
- Every morphism between Hom(-, X) and Hom(-, Y) comes from a morphism X → Y (full)

In [None]:
def hom_to(target, passages_df):
    """
    Compute Hom(-, target): morphisms into target.
    Returns a dict: source → list of (source, target, morphism_type) tuples.
    """
    incoming = passages_df[passages_df['target_object'] == target]
    result = defaultdict(list)
    for _, row in incoming.iterrows():
        result[row['source_object']].append({
            'passage_id': row['passage_id'],
            'morphism_type': row['morphism_type']
        })
    return dict(result)

# Compute Hom(-, X) for each object X
yoneda_embedding = {obj: hom_to(obj, passages) for obj in all_objects}

print("Yoneda Embedding: Each object X ↦ Hom(-, X)")
print("=" * 50)

In [None]:
# Display some examples
example_objects = ['stakdur_territory', 'reed_marsh', 'boundary_zone']

for obj in example_objects:
    hom = yoneda_embedding[obj]
    print(f"\nHom(-, {obj}):")
    if hom:
        for source, morphisms in hom.items():
            for m in morphisms:
                print(f"    {source} → {obj} ({m['morphism_type']})")
    else:
        print("    (no morphisms target this object)")

The Yoneda embedding captures each object as a "pattern of incoming morphisms." This is the essence of Vold's Probing Theorem.

---

## Part 3: Faithfulness — Different Objects, Different Functors

The Yoneda embedding is faithful: if X ≠ Y (as objects), then Hom(-, X) ≠ Hom(-, Y) (as functors).

Let's verify this by comparing probing patterns.

In [None]:
def probing_signature(obj, passages_df):
    """
    Create a signature for Hom(-, obj) that can be compared for equality.
    Returns a frozenset of (source, morphism_type) pairs.
    """
    incoming = passages_df[passages_df['target_object'] == obj]
    return frozenset((row['source_object'], row['morphism_type']) 
                      for _, row in incoming.iterrows())

# Compute signatures for all objects
signatures = {obj: probing_signature(obj, passages) for obj in all_objects}

# Check for collisions (different objects with same probing pattern)
sig_to_objects = defaultdict(list)
for obj, sig in signatures.items():
    sig_to_objects[sig].append(obj)

print("Checking faithfulness of Yoneda embedding:")
print("=" * 50)

collisions = {sig: objs for sig, objs in sig_to_objects.items() if len(objs) > 1}

if collisions:
    print(f"\nFound {len(collisions)} signature collisions:")
    for sig, objs in collisions.items():
        print(f"  Objects with same probing pattern: {objs}")
else:
    print("\nNo collisions found: every object has a unique probing pattern.")
    print("The Yoneda embedding is faithful on this category.")

In [None]:
# Some objects may have empty probing patterns (nothing targets them)
empty_pattern_objects = [obj for obj, sig in signatures.items() if len(sig) == 0]

print(f"\nObjects with empty probing pattern: {len(empty_pattern_objects)}")
if empty_pattern_objects:
    print("These are 'initial-like' objects—nothing points to them.")
    print(f"Examples: {empty_pattern_objects[:5]}")

Even objects with empty probing patterns are distinguished—they all map to the empty functor, but their *outgoing* morphisms still differ.

In a fuller category with identity morphisms for every object, no probing pattern would be truly empty.

---

## Part 4: The Yoneda Lemma in Action

The Yoneda Lemma says:

**Nat(Hom(-, X), F) ≅ F(X)**

Let's unpack this with a concrete example. We'll take:
- X = stakdur_territory
- F = the "morphism count" functor that counts incoming morphisms

F(Y) = |Hom(Y, -)| for some fixed - (we'll use the sum over all targets)

In [None]:
# Define a simple functor F: C^op → Set
# F(Y) = number of morphisms FROM Y (out-degree)

def out_degree_functor(obj, passages_df):
    """F(obj) = count of outgoing morphisms from obj."""
    return len(passages_df[passages_df['source_object'] == obj])

# Compute F for all objects
F = {obj: out_degree_functor(obj, passages) for obj in all_objects}

print("Functor F: object ↦ out-degree")
print("=" * 40)
for obj, count in sorted(F.items(), key=lambda x: -x[1])[:10]:
    print(f"  F({obj}) = {count}")

In [None]:
# The Yoneda Lemma says:
# Nat(Hom(-, X), F) ≅ F(X)

# For X = stakdur_territory:
X = 'stakdur_territory'

print(f"For X = {X}:")
print(f"  F(X) = {F[X]}")
print("\n  Yoneda says: the natural transformations Hom(-, X) ⇒ F")
print(f"  are in bijection with F(X) = {F[X]}")

In [None]:
# What does a natural transformation η: Hom(-, X) ⇒ F look like?
# For each object Y, η_Y: Hom(Y, X) → F(Y)
# This maps morphisms Y → X to elements of F(Y) = out-degree of Y

print("Components of a natural transformation η: Hom(-, X) ⇒ F")
print("=" * 60)

hom_X = yoneda_embedding[X]

for Y in hom_X.keys():
    hom_YX = hom_X[Y]  # morphisms from Y to X
    F_Y = F[Y]  # out-degree of Y
    print(f"\n  Y = {Y}")
    print(f"    Hom({Y}, {X}) has {len(hom_YX)} morphism(s)")
    print(f"    F({Y}) = {F_Y}")
    print(f"    η_Y maps {len(hom_YX)} morphism(s) into the set of size {F_Y}")

The Yoneda Lemma is saying: instead of specifying all these η_Y components (which must satisfy naturality), you can just pick one element of F(X).

This is a remarkable compression: an infinite amount of coherent data reduces to a single choice.

---

## Part 5: The Bijection Explicitly

Let's construct the bijection from the Yoneda Lemma:

**Given** a natural transformation η: Hom(-, X) ⇒ F
**Get** an element of F(X) by: η_X(id_X)

**Given** an element x ∈ F(X)
**Get** a natural transformation by: η_Y(f) = F(f)(x) for each morphism f: Y → X

In [None]:
# Direction 1: Natural transformation → Element of F(X)

# The key is: evaluate at the identity morphism id_X: X → X
# η_X(id_X) gives an element of F(X)

X = 'stakdur_territory'

# Check if there's an identity morphism for X in the data
identity_X = passages[
    (passages['source_object'] == X) & 
    (passages['target_object'] == X) &
    (passages['morphism_type'] == 'identity')
]

if len(identity_X) > 0:
    print(f"Identity morphism for {X} exists: {identity_X['passage_id'].values[0]}")
    print(f"\nTo get an element of F(X) = {F[X]}, evaluate η at this identity.")
    print("Since F(X) = out-degree, we're picking one of the outgoing morphisms.")
else:
    print(f"No explicit identity morphism for {X} in data.")
    print("(In a proper category, we assume it exists.)")

In [None]:
# Direction 2: Element of F(X) → Natural transformation

# Pick an element of F(X) = out-degree of X
# For simplicity, let's say we pick "the first outgoing morphism"

outgoing_from_X = passages[passages['source_object'] == X]
print(f"Outgoing morphisms from {X}:")
print(outgoing_from_X[['passage_id', 'target_object', 'morphism_type']])

print(f"\nF({X}) = {len(outgoing_from_X)} (the out-degree)")
print("\nChoosing element x = 'first outgoing morphism' ∈ F(X)")
print("This determines a natural transformation η: Hom(-, X) ⇒ F")

The Yoneda bijection is surprisingly simple:
- **Forward**: Evaluate η at the identity
- **Backward**: Extend by functoriality

---

## Part 6: Objects as Representable Functors

The Yoneda Lemma has a corollary: the functor Hom(-, X) **represents** the object X.

If Hom(-, X) ≅ Hom(-, Y) as functors, then X ≅ Y as objects.

This is Vold's claim: **the probing pattern determines the object**.

In [None]:
# Compare probing patterns of different objects

def compare_probing_patterns(obj1, obj2, passages_df):
    """Compare Hom(-, obj1) and Hom(-, obj2)."""
    sig1 = probing_signature(obj1, passages_df)
    sig2 = probing_signature(obj2, passages_df)
    
    common = sig1 & sig2
    only_1 = sig1 - sig2
    only_2 = sig2 - sig1
    
    return {
        'common': common,
        'only_obj1': only_1,
        'only_obj2': only_2,
        'similarity': len(common) / max(len(sig1 | sig2), 1)
    }

# Compare some objects
pairs = [
    ('stakdur_territory', 'reed_marsh'),
    ('grimslew_pool', 'open_water'),
    ('boundary_zone', 'capital_outskirts')
]

print("Comparing probing patterns:")
print("=" * 50)

for obj1, obj2 in pairs:
    result = compare_probing_patterns(obj1, obj2, passages)
    print(f"\n{obj1} vs {obj2}:")
    print(f"  Similarity: {result['similarity']:.2%}")
    print(f"  Common sources: {len(result['common'])}")
    print(f"  Unique to {obj1}: {len(result['only_obj1'])}")
    print(f"  Unique to {obj2}: {len(result['only_obj2'])}")

In [None]:
# Visualize similarity between all objects based on probing patterns

# Compute pairwise similarities
n = len(all_objects)
similarity_matrix = np.zeros((n, n))

for i, obj1 in enumerate(all_objects):
    for j, obj2 in enumerate(all_objects):
        if i == j:
            similarity_matrix[i, j] = 1.0
        elif i < j:
            result = compare_probing_patterns(obj1, obj2, passages)
            similarity_matrix[i, j] = result['similarity']
            similarity_matrix[j, i] = result['similarity']

print(f"Computed {n}x{n} similarity matrix based on probing patterns")

In [None]:
# Show most similar pairs (excluding identity)
similar_pairs = []
for i in range(n):
    for j in range(i+1, n):
        if similarity_matrix[i, j] > 0:
            similar_pairs.append((all_objects[i], all_objects[j], similarity_matrix[i, j]))

similar_pairs.sort(key=lambda x: -x[2])

print("Most similar object pairs (by probing pattern):")
print("=" * 50)
for obj1, obj2, sim in similar_pairs[:10]:
    print(f"  {obj1} ↔ {obj2}: {sim:.2%}")

Objects with similar probing patterns are "similar" in a categorical sense. The Yoneda Lemma tells us this is the **correct** notion of similarity for categorical objects.

---

## Part 7: The ML Connection — Word Embeddings

The Yoneda Lemma is secretly behind word embeddings:

- **Words** are objects
- **Co-occurrence** is morphism-like ("this word appears near that word")
- **Embeddings** are like Hom(-, X): they capture how other words relate to X

John Firth's famous quote:
> "You shall know a word by the company it keeps."

This IS the Yoneda Lemma for language!

In [None]:
# Demonstrate Yoneda-style embeddings
# We'll embed each object as a vector based on what points to it

# Create a feature for each possible (source, morphism_type) pair
all_morphism_pairs = set()
for _, row in passages.iterrows():
    all_morphism_pairs.add((row['source_object'], row['morphism_type']))

feature_list = sorted(all_morphism_pairs)
feature_to_idx = {f: i for i, f in enumerate(feature_list)}

print(f"Feature space dimension: {len(feature_list)}")
print(f"Each object will be embedded as a {len(feature_list)}-dimensional vector")

In [None]:
# Build Yoneda embeddings
yoneda_vectors = np.zeros((len(all_objects), len(feature_list)))

for i, obj in enumerate(all_objects):
    sig = probing_signature(obj, passages)
    for pair in sig:
        if pair in feature_to_idx:
            yoneda_vectors[i, feature_to_idx[pair]] = 1

print(f"Yoneda embedding matrix: {yoneda_vectors.shape}")
print(f"Sparsity: {1 - np.count_nonzero(yoneda_vectors) / yoneda_vectors.size:.2%}")

In [None]:
# Compute cosine similarity between Yoneda embeddings
from sklearn.metrics.pairwise import cosine_similarity

# Add small epsilon to avoid division by zero
yoneda_sim = cosine_similarity(yoneda_vectors + 1e-10)

print("Cosine similarity based on Yoneda embeddings:")
print("=" * 50)

In [None]:
# Find most similar pairs
yoneda_pairs = []
for i in range(len(all_objects)):
    for j in range(i+1, len(all_objects)):
        if yoneda_sim[i, j] > 0.01:  # Skip near-zero
            yoneda_pairs.append((all_objects[i], all_objects[j], yoneda_sim[i, j]))

yoneda_pairs.sort(key=lambda x: -x[2])

print("Most similar objects (Yoneda embedding):")
for obj1, obj2, sim in yoneda_pairs[:10]:
    print(f"  {obj1} ↔ {obj2}: {sim:.3f}")

This is exactly how word embeddings work:
1. Represent each word by its context (what words appear near it)
2. Similar contexts → similar embeddings
3. The embedding "knows" the word by the company it keeps

The Yoneda Lemma provides the theoretical foundation.

---

## Part 8: Dimensionality Reduction of Yoneda Embeddings

In [None]:
# Use PCA to reduce Yoneda embeddings to 2D for visualization
from sklearn.decomposition import PCA

# Only include objects with non-zero embeddings
non_zero_mask = yoneda_vectors.sum(axis=1) > 0
active_objects = [obj for obj, mask in zip(all_objects, non_zero_mask) if mask]
active_vectors = yoneda_vectors[non_zero_mask]

if len(active_objects) >= 2:
    pca = PCA(n_components=2)
    reduced = pca.fit_transform(active_vectors)
    
    print(f"Reduced {len(active_objects)} objects to 2D")
    print(f"Variance explained: {pca.explained_variance_ratio_.sum():.2%}")
else:
    print("Not enough non-zero embeddings for PCA")
    reduced = None

In [None]:
# Visualize Yoneda embeddings in 2D
if reduced is not None and len(reduced) > 0:
    fig, ax = plt.subplots(figsize=(14, 10))
    
    ax.scatter(reduced[:, 0], reduced[:, 1], s=100, alpha=0.6, c='steelblue')
    
    for i, obj in enumerate(active_objects):
        ax.annotate(obj, (reduced[i, 0], reduced[i, 1]), fontsize=8, alpha=0.8)
    
    ax.set_xlabel('PC1')
    ax.set_ylabel('PC2')
    ax.set_title('Yoneda Embeddings (PCA)\nObjects as "How They\'re Probed"', fontsize=12)
    plt.tight_layout()
    plt.show()
else:
    print("Visualization not available")

Objects that cluster together have similar probing patterns. This is the Yoneda perspective: similarity IS similarity of probing patterns.

---

## Part 9: Philosophical Implications

The Yoneda Lemma has profound philosophical implications:

### 1. Anti-Essentialism
Objects don't have "intrinsic essences." They are what they do—how they relate to everything else.

### 2. Holism
You can't understand an object in isolation. Its identity depends on the entire network of relationships.

### 3. Structuralism
Structure is more fundamental than substance. The "what" is determined by the "how."

### Vold's Claim
> *"Marden Krell asked: what is the stakdur before we observe its passages? I say: that question has no answer. Before passages, there is no stakdur. The stakdur emerges from the pattern of passages. It does not precede them."*

In [None]:
# Final demonstration: reconstructing object identity from probing

def identify_from_probing(signature, signatures_dict):
    """Given a probing signature, find the object."""
    matches = [obj for obj, sig in signatures_dict.items() if sig == signature]
    return matches

# Test: can we identify stakdur_territory from its probing pattern alone?
test_obj = 'stakdur_territory'
test_sig = signatures[test_obj]

identified = identify_from_probing(test_sig, signatures)

print("The Probing Theorem in action:")
print("=" * 50)
print(f"\nGiven probing signature: {dict(list(test_sig)[:3])}...")
print(f"Identified object(s): {identified}")
print(f"\nVold: 'I told you what the stakdur is—without ever seeing one.'")

---

## Exercises

### Exercise 1: Dual Yoneda

The Yoneda embedding uses Hom(-, X) (incoming morphisms). Build the dual using Hom(X, -) (outgoing morphisms). Are objects still distinguishable?

In [None]:
# Your code here
# Hint: Filter passages by source_object instead of target_object

### Exercise 2: Morphism Types as Features

Modify the Yoneda embedding to use morphism types as additional features. Does this improve object distinguishability?

In [None]:
# Your code here
# Hint: Include (source, type) pairs instead of just sources

### Exercise 3: Lifecycle Objects

Focus on lifecycle-related objects (egg, juvenile, adult stages). What do their Yoneda embeddings reveal about lifecycle structure?

In [None]:
# Your code here
# Hint: Filter to lifecycle_passage morphisms

### Exercise 4: The Identity Element

The Yoneda bijection uses the identity morphism. For objects with explicit identities in the data, verify that the bijection works.

In [None]:
# Your code here
# Hint: Find identity morphisms in the data

---

## Discussion Questions

1. Krell objected that Vold's approach is circular: to know passages, you must first know objects. Vold countered that passages can be observed directly. Who is right? Is observation itself theory-laden?

2. The Yoneda Lemma suggests that objects "are" their relationships. Does this mean objects are social constructs? What does it mean for scientific realism?

3. Word embeddings based on co-occurrence have been remarkably successful. Is this because language is fundamentally categorical, or is it a useful approximation?

---

## Summary

In this tutorial, you learned:

| Concept | What You Learned |
|---------|------------------|
| Yoneda Lemma | Nat(Hom(-, X), F) ≅ F(X) |
| Yoneda embedding | X ↦ Hom(-, X) is fully faithful |
| Objects as patterns | X is determined by how it's probed |
| The bijection | Evaluate at identity ↔ Extend by functoriality |
| ML connection | Word embeddings use Yoneda-style reasoning |

| Skill | Code Pattern |
|-------|--------------|
| Compute Hom(-, X) | Filter passages by target_object |
| Build Yoneda embedding | Binary feature vector of incoming morphisms |
| Compare objects | Cosine similarity of embeddings |
| Identify from signature | Match probing patterns |

---

## Next Tutorial

In **Tutorial 8: The Language of Dens**, you will learn how language itself forms a category:

- Words and phrases as objects
- Substring containment as morphisms
- Language categories and their structure
- Connection to distributional semantics

> *"The linguists say: 'You know a word by the company it keeps.' But I say: you know a word by the passages it admits. What phrases contain it? What sentences extend it? The word IS the pattern of its containments."*
> — Tessery Vold, "The Language of Dens," Year 899

---

## Credits

**Source Material:** Tai-Danae Bradley, "Category Theory and Language Models" (Cartesian Cafe)

**Densworld Integration:** The Relational Foundations course applies categorical concepts through the framework of Tessery Vold.

**Learn more:** [buildLittleWorlds](https://github.com/buildLittleWorlds)

---

> *"Tell me every creature that passes through the stakdur's territory, and I will tell you what a stakdur is—without ever seeing one. This is not mysticism. This is mathematics. The object IS the pattern of passages to it. The Probing Theorem is the foundation of all relational knowledge."*
> — Tessery Vold, "The Probing Theorem," Year 892