# Tutorial 5: The Probing Pattern
## The Hom Functor and Representable Functors

---

### The Probing Theorem

*To the research assistant:*

*In Year 892, Tessery Vold proposed what she called the "Probing Theorem"—a claim so radical that it sparked immediate controversy.*

*Vold asserted: "Tell me every creature that passes through the stakdur's territory, and I will tell you what a stakdur is—without ever seeing one."*

*This claim inverts the traditional approach to classification. Instead of defining an object by its intrinsic properties (size, color, behavior), Vold proposed defining it by its relationships to all other objects. An object IS the pattern of how other objects interact with it.*

*Marden Krell objected: "This is circular! To know what passes through the stakdur's territory, we must first know what a stakdur is." But Vold countered: "The passages exist independently. We observe them. From these observations, the stakdur emerges."*

*Your task: Test Vold's Probing Pattern. Using the passage data, determine whether objects can be characterized by the morphisms pointing to them. This is the foundation of what modern category theorists call the "Yoneda perspective."*

—*Archive Review Committee, Year 934*

---

## What You Will Learn

In this tutorial, you will learn to:

1. Understand the **Hom functor** Hom(-, X): what morphisms point to X?
2. Recognize **contravariance**: how the source varies "backwards"
3. Construct "probing profiles" for objects
4. Understand **representable functors** as a categorical perspective
5. See the foundation of the Yoneda Lemma

By the end, you will understand:
- Why "an object is determined by how it's probed"
- The deep connection between morphisms and identity
- How this relates to distributional semantics in NLP

---

## The Hom Functor

For any category C and object X in C, we can define a functor:

**Hom(-, X): C^op → Set**

This functor:
- Takes an object A and returns the set Hom(A, X) = all morphisms from A to X
- Takes a morphism f: A → B and returns a function Hom(B, X) → Hom(A, X)

Wait—the morphism goes A → B, but the function goes B → A? This is **contravariance**.

### Why Contravariance?

If we have:
- f: A → B (a morphism in C)
- g: B → X (a morphism in Hom(B, X))

Then we can form g ∘ f: A → X, which is in Hom(A, X).

So f "pulls back" morphisms from B to morphisms from A. The direction reverses.

---

### Vold's Interpretation

> *"To probe an object X, we ask: what passes TO it? From the reed marsh, what goes to the stakdur territory? From the open water, what goes to the stakdur territory? The collection of all such passages—the complete answer to 'what targets X?'—tells us what X is."*

This is the Hom functor Hom(-, X) in action.

---

## Part 1: Loading the Passage Data

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
from collections import defaultdict

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('deep')

print("Libraries loaded. Ready to probe objects.")

In [None]:
# Load the Passage Diagram data
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/densworld-datasets/main/data/"

passages = pd.read_csv(BASE_URL + "passage_diagrams.csv")

print(f"Passages loaded: {len(passages)} morphisms")

# Also load classifications for the pre-order example
classifications = pd.read_csv(BASE_URL + "archive_classifications.csv")
print(f"Classifications loaded: {len(classifications)} records")

---

## Part 2: Building the Hom Functor

For a fixed object X, Hom(-, X) collects all morphisms targeting X.

In [None]:
def hom_to(target, passages_df):
    """
    Compute Hom(-, target): all morphisms pointing TO target.
    Returns a dictionary: source_object → list of passage_ids
    """
    morphisms_to_target = passages_df[passages_df['target_object'] == target]
    
    result = defaultdict(list)
    for _, row in morphisms_to_target.iterrows():
        result[row['source_object']].append(row['passage_id'])
    
    return dict(result)

# Example: Hom(-, stakdur_territory)
stakdur_probes = hom_to('stakdur_territory', passages)

print("Hom(-, stakdur_territory):")
print("What morphisms point TO stakdur_territory?")
print("=" * 50)
for source, pids in stakdur_probes.items():
    print(f"  From {source}: {pids}")

In [None]:
# Hom(-, reed_marsh)
marsh_probes = hom_to('reed_marsh', passages)

print("\nHom(-, reed_marsh):")
print("What morphisms point TO reed_marsh?")
print("=" * 50)
for source, pids in marsh_probes.items():
    print(f"  From {source}: {pids}")

The stakdur_territory receives morphisms from itself (identity) and from nothing else in this sample.

The reed_marsh receives morphisms from stakdur_territory (the stakdur hunts there) and from itself (identity).

These "incoming morphism profiles" characterize the objects.

---

## Part 3: The Probing Profile

Let's build a complete "probing profile" for each object: what comes IN and what goes OUT.

In [None]:
def probing_profile(obj, passages_df):
    """
    Build a complete probing profile for an object.
    Returns incoming morphisms (Hom(-, obj)) and outgoing morphisms (Hom(obj, -)).
    """
    incoming = passages_df[passages_df['target_object'] == obj]
    outgoing = passages_df[passages_df['source_object'] == obj]
    
    return {
        'incoming': list(incoming['passage_id']),
        'outgoing': list(outgoing['passage_id']),
        'incoming_sources': list(incoming['source_object'].unique()),
        'outgoing_targets': list(outgoing['target_object'].unique()),
        'in_degree': len(incoming),
        'out_degree': len(outgoing)
    }

# Build profiles for all objects
all_objects = set(passages['source_object']) | set(passages['target_object'])

profiles = {obj: probing_profile(obj, passages) for obj in all_objects}

print(f"Built probing profiles for {len(profiles)} objects")

In [None]:
# Display some profiles
print("Probing Profile: stakdur_territory")
print("=" * 50)
p = profiles['stakdur_territory']
print(f"  Incoming morphisms: {p['incoming']}")
print(f"  From sources: {p['incoming_sources']}")
print(f"  Outgoing morphisms: {p['outgoing']}")
print(f"  To targets: {p['outgoing_targets']}")
print(f"  In-degree: {p['in_degree']}, Out-degree: {p['out_degree']}")

In [None]:
# Compare profiles of different objects
print("\nComparing Probing Profiles:")
print("=" * 60)

for obj in ['stakdur_territory', 'reed_marsh', 'grimslew_pool', 'boundary_zone']:
    if obj in profiles:
        p = profiles[obj]
        print(f"\n{obj}:")
        print(f"  In: {p['in_degree']} morphisms from {len(p['incoming_sources'])} sources")
        print(f"  Out: {p['out_degree']} morphisms to {len(p['outgoing_targets'])} targets")

Each object has a distinct probing profile. The boundary_zone, for instance, has high connectivity—many things pass through it. This is its "categorical identity."

---

## Part 4: Visualizing the Probing Pattern

In [None]:
# Create a DataFrame of profiles for analysis
profile_df = pd.DataFrame([
    {'object': obj, **prof} for obj, prof in profiles.items()
])

# Sort by total degree
profile_df['total_degree'] = profile_df['in_degree'] + profile_df['out_degree']
profile_df = profile_df.sort_values('total_degree', ascending=False)

print("Objects by connectivity:")
print(profile_df[['object', 'in_degree', 'out_degree', 'total_degree']].head(15))

In [None]:
# Visualize in-degree vs out-degree
fig, ax = plt.subplots(figsize=(12, 10))

# Scatter plot
sizes = profile_df['total_degree'] * 100 + 50
scatter = ax.scatter(profile_df['out_degree'], profile_df['in_degree'], 
                      s=sizes, alpha=0.6, c='steelblue')

# Label high-connectivity objects
for _, row in profile_df[profile_df['total_degree'] >= 3].iterrows():
    ax.annotate(row['object'], (row['out_degree'], row['in_degree']),
                fontsize=8, alpha=0.8, xytext=(5, 5), textcoords='offset points')

# Diagonal line
max_val = max(profile_df['in_degree'].max(), profile_df['out_degree'].max())
ax.plot([0, max_val], [0, max_val], 'k--', alpha=0.3, label='Equal in/out')

ax.set_xlabel('Out-degree: Hom(X, -) size')
ax.set_ylabel('In-degree: Hom(-, X) size')
ax.set_title('Probing Profiles: Objects Characterized by Their Morphisms\n(Size = total connectivity)')
ax.legend()
plt.tight_layout()
plt.show()

Objects cluster into patterns:
- **High in-degree, low out-degree**: "sinks" that things flow into
- **Low in-degree, high out-degree**: "sources" that things flow from
- **Balanced**: transition zones, hubs

This is categorical structure revealed through probing.

---

## Part 5: Contravariance in Action

The Hom functor is **contravariant**. Let's see what this means concretely.

In [None]:
# Consider a morphism f: deep_dens → boundary_zone
# And fix target X = capital_outskirts

# Hom(boundary_zone, capital_outskirts) contains PD-009
# (boundary_zone → capital_outskirts)

# The contravariant action of f on this morphism:
# Given g: boundary_zone → capital_outskirts
# We get g ∘ f: deep_dens → capital_outskirts

print("Contravariance Example:")
print("=" * 60)
print("\nMorphism f: deep_dens → boundary_zone (PD-008)")
print("Fix target X = capital_outskirts")
print("\nHom(boundary_zone, capital_outskirts):")
print("  Contains: g = boundary_zone → capital_outskirts (PD-009)")
print("\nContravariant action of f:")
print("  Hom(boundary_zone, X) → Hom(deep_dens, X)")
print("  g ↦ g ∘ f")
print("  (boundary_zone → capital_outskirts) ↦ (deep_dens → capital_outskirts)")
print("\nThe direction REVERSES: f goes A→B, but induced map goes Hom(B,X)→Hom(A,X)")

In [None]:
# Build a visualization of contravariance
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: The original morphism f: A → B
ax1 = axes[0]
G1 = nx.DiGraph()
G1.add_edge('deep_dens', 'boundary_zone', label='f')
G1.add_edge('boundary_zone', 'capital_outskirts', label='g')

pos1 = {'deep_dens': (0, 0), 'boundary_zone': (1, 0), 'capital_outskirts': (2, 0)}

nx.draw_networkx_nodes(G1, pos1, node_size=2000, node_color='lightblue', ax=ax1)
nx.draw_networkx_labels(G1, pos1, font_size=9, ax=ax1)
nx.draw_networkx_edges(G1, pos1, edge_color='gray', arrows=True, arrowsize=20, ax=ax1)
nx.draw_networkx_edge_labels(G1, pos1, edge_labels={('deep_dens', 'boundary_zone'): 'f',
                                                     ('boundary_zone', 'capital_outskirts'): 'g'},
                              ax=ax1)
ax1.set_title('Original Category: Morphisms f and g')
ax1.axis('off')

# Right: The induced map on Hom sets (reversed)
ax2 = axes[1]
G2 = nx.DiGraph()
G2.add_edge('Hom(boundary_zone, X)', 'Hom(deep_dens, X)', label='f*')

pos2 = {'Hom(boundary_zone, X)': (0, 0), 'Hom(deep_dens, X)': (1.5, 0)}

nx.draw_networkx_nodes(G2, pos2, node_size=3500, node_color='lightgreen', ax=ax2)
nx.draw_networkx_labels(G2, pos2, font_size=8, ax=ax2)
nx.draw_networkx_edges(G2, pos2, edge_color='darkgreen', arrows=True, arrowsize=20, ax=ax2)
nx.draw_networkx_edge_labels(G2, pos2, edge_labels={('Hom(boundary_zone, X)', 'Hom(deep_dens, X)'): 'f*'},
                              ax=ax2)
ax2.set_title('Hom Functor: Direction REVERSES')
ax2.axis('off')

plt.suptitle('Contravariance: f goes right, f* goes left', fontsize=12)
plt.tight_layout()
plt.show()

This reversal is why Hom(-, X) is called a **contravariant** functor. It maps from C^op (the opposite category) to Set.

---

## Part 6: Objects Determined by Probing

Vold's key insight: if two objects have the same probing profile (same Hom(-, X) for all probing objects), they are "the same" categorically.

In [None]:
def hom_set_signature(target, passages_df):
    """
    Create a signature of Hom(-, target) as a frozenset of (source, count) pairs.
    This captures the "probing pattern" of the target.
    """
    incoming = passages_df[passages_df['target_object'] == target]
    counts = incoming['source_object'].value_counts().to_dict()
    return frozenset(counts.items())

# Compute signatures for all objects
signatures = {obj: hom_set_signature(obj, passages) for obj in all_objects}

# Find objects with identical signatures (probing-equivalent)
from collections import Counter
sig_groups = defaultdict(list)
for obj, sig in signatures.items():
    sig_groups[sig].append(obj)

print("Objects grouped by incoming morphism pattern:")
print("=" * 50)
for sig, objs in sig_groups.items():
    if len(objs) > 1:
        print(f"\nGroup with signature {dict(sig)}:")
        for obj in objs:
            print(f"  - {obj}")

In [None]:
# How many unique probing patterns exist?
print(f"\nTotal objects: {len(all_objects)}")
print(f"Unique incoming patterns: {len(sig_groups)}")

# Distribution of group sizes
group_sizes = [len(objs) for objs in sig_groups.values()]
print(f"\nGroup size distribution:")
print(f"  Singletons (unique pattern): {group_sizes.count(1)}")
print(f"  Pairs: {group_sizes.count(2)}")
print(f"  Larger groups: {sum(1 for s in group_sizes if s > 2)}")

Most objects have unique probing patterns—they're distinguishable by how other objects interact with them.

Objects in the same group have the same "incoming connectivity structure." From a categorical perspective, they're equivalent in terms of how they receive morphisms.

---

## Part 7: The Yoneda Perspective

The Yoneda Lemma formalizes Vold's intuition:

> An object X is completely determined (up to isomorphism) by the functor Hom(-, X).

In other words: if you know all morphisms into X, you know X.

This is profound. It says that objects don't have intrinsic identity—they ARE their relationships.

In [None]:
# Demonstrate the Yoneda perspective with a pre-order example
# In a pre-order, Hom(A, X) is either empty or a singleton

# Build the creature taxonomy pre-order
creature_taxonomy = classifications[classifications['region_system'] == 'capital_taxonomy']

# Build the order relation: child ≤ parent
order_pairs = []
for _, row in creature_taxonomy.iterrows():
    if row['child_category'] != row['parent_category']:
        order_pairs.append((row['child_category'], row['parent_category']))

# Compute transitive closure
G_order = nx.DiGraph(order_pairs)
closure = nx.transitive_closure(G_order)

print("Pre-order Probing (Taxonomy):")
print("=" * 50)
print("\nFor each object X, Hom(-, X) = {A : A ≤ X}")
print("(All categories that are 'below' X in the hierarchy)")

# Compute Hom(-, X) for key objects
for target in ['creatures', 'predators', 'apex_predators']:
    below = [node for node in closure.nodes() if closure.has_edge(node, target) or node == target]
    print(f"\nHom(-, {target}): {below}")

In the pre-order:
- `Hom(-, creatures)` contains everything (creatures is the top)
- `Hom(-, apex_predators)` contains only apex_predators itself

The "size" of Hom(-, X) tells us where X sits in the hierarchy. This is probing in action.

---

## Part 8: The ML Connection

The probing pattern appears throughout machine learning:

### 1. Distributional Semantics

"You shall know a word by the company it keeps." (J.R. Firth)

A word's meaning is determined by what words appear near it—its "probing profile."

### 2. Attention Mechanisms

In transformers, each token is updated based on what other tokens "attend" to it. This is categorical probing.

### 3. Graph Neural Networks

A node's representation is updated based on messages from its neighbors. The "neighborhood" is the probing set.

In [None]:
# Demonstrate: Building an embedding from probing profiles
# Each object gets a vector based on what connects to it

# Create a feature matrix: rows = objects, columns = possible sources
sources = list(all_objects)
objects = list(all_objects)

# Build adjacency-based features
n = len(objects)
feature_matrix = np.zeros((n, n))

for i, target in enumerate(objects):
    incoming = passages[passages['target_object'] == target]
    for _, row in incoming.iterrows():
        src_idx = objects.index(row['source_object'])
        feature_matrix[i, src_idx] += 1

print(f"Probing-based feature matrix: {feature_matrix.shape}")
print(f"Each row is an object's 'incoming morphism profile'")

In [None]:
# Compute similarity based on probing profiles
from sklearn.metrics.pairwise import cosine_similarity

# Normalize features
row_sums = feature_matrix.sum(axis=1, keepdims=True)
row_sums[row_sums == 0] = 1  # Avoid division by zero
normalized = feature_matrix / row_sums

# Compute cosine similarity
similarity = cosine_similarity(normalized)

# Find most similar pairs (excluding self-similarity)
np.fill_diagonal(similarity, 0)

# Get top pairs
pairs = []
for i in range(n):
    for j in range(i+1, n):
        if similarity[i, j] > 0:
            pairs.append((objects[i], objects[j], similarity[i, j]))

pairs.sort(key=lambda x: x[2], reverse=True)

print("Most similar objects (by probing profile):")
print("=" * 50)
for obj1, obj2, sim in pairs[:10]:
    print(f"  {obj1} ↔ {obj2}: {sim:.3f}")

Objects with similar probing profiles cluster together. This is how word embeddings work: words that appear in similar contexts get similar vectors.

---

## Part 9: Representable Functors

A functor F: C^op → Set is **representable** if it equals Hom(-, X) for some object X.

This X is called the **representing object**.

The Yoneda Lemma says: the representing object is unique (up to isomorphism).

In [None]:
# Demonstrate: Can we recover the representing object from the functor?

def is_representable(functor_values, candidates, passages_df):
    """
    Given a functor F (as a dict source → set of morphism ids),
    check if F = Hom(-, X) for some candidate X.
    """
    for X in candidates:
        hom_to_X = hom_to(X, passages_df)
        # Convert to comparable format
        hom_sets = {src: set(pids) for src, pids in hom_to_X.items()}
        func_sets = {src: set(pids) for src, pids in functor_values.items()}
        
        if hom_sets == func_sets:
            return X
    return None

# Test: Given the functor Hom(-, reed_marsh), can we recover reed_marsh?
target_functor = hom_to('reed_marsh', passages)

recovered = is_representable(target_functor, all_objects, passages)
print(f"Given functor F = Hom(-, reed_marsh), recovered representing object: {recovered}")

The representing object can be recovered from the functor. This is the essence of the Yoneda Lemma: the functor Hom(-, X) completely characterizes X.

---

## Exercises

### Exercise 1: Out-Probing

We've focused on Hom(-, X) (incoming morphisms). Build the dual analysis for Hom(X, -) (outgoing morphisms). What objects have the largest outgoing profiles?

In [None]:
# Your code here
# Hint: Filter passages by source_object instead of target_object

### Exercise 2: Morphism-Type Probing

Build probing profiles that consider morphism TYPE, not just source. Does the type of incoming morphisms help distinguish objects?

In [None]:
# Your code here
# Hint: Include morphism_type in the signature

### Exercise 3: Lifecycle Objects

Focus on lifecycle_passage morphisms. Build probing profiles for lifecycle stages. What do the profiles reveal about the lifecycle structure?

In [None]:
# Your code here
# Hint: Filter to lifecycle_passage, then build profiles

### Exercise 4: Dimensionality Reduction

Use PCA or t-SNE to visualize the probing-based feature matrix in 2D. Do objects cluster by type or region?

In [None]:
# Your code here
# Hint: Use sklearn.decomposition.PCA or sklearn.manifold.TSNE

---

## Discussion Questions

1. Vold claims that an object IS its relationships. But Marden Krell objected that to have relationships, the object must first exist. How do you interpret this philosophical tension?

2. In NLP, "you know a word by its context" has been remarkably successful (Word2Vec, BERT, etc.). Is this because language is fundamentally categorical, or is it an engineering trick?

3. If objects are determined by how they're probed, what happens in a category with very few morphisms? Can objects still be distinguished?

---

## Summary

In this tutorial, you learned:

| Concept | What You Learned |
|---------|------------------|
| Hom functor | Hom(-, X) collects all morphisms targeting X |
| Contravariance | Morphisms induce maps in the opposite direction |
| Probing profile | An object characterized by what points to it |
| Representable functor | F = Hom(-, X) for some X |
| Yoneda perspective | Objects are determined by how they're probed |

| Skill | Code Pattern |
|-------|--------------|
| Compute Hom(-, X) | `df[df['target'] == X]` |
| Build probing signatures | Count incoming morphisms by source |
| Find similar objects | Cosine similarity on probing features |
| Visualize probing patterns | Scatter plot of in/out degree |

---

## Next Tutorial

In **Tutorial 6: Coherent Shifts**, you will learn about **natural transformations**—morphisms between functors:

- How to transform one functor into another coherently
- The "prism diagram" and commutativity requirements
- Natural transformations as higher-level morphisms
- The currency exchange analogy

> *"A coherent shift does not merely translate each passage independently. It translates in a way that respects composition. If two passages compose to a third, their translations must compose to the translation of the third."*
> — Tessery Vold, "On the Coherence of Shifts," Year 897

---

## Credits

**Source Material:** Tai-Danae Bradley, "Category Theory and Language Models" (Cartesian Cafe)

**Densworld Integration:** The Relational Foundations course applies categorical concepts through the framework of Tessery Vold.

**Learn more:** [buildLittleWorlds](https://github.com/buildLittleWorlds)

---

> *"Tell me every creature that passes through the stakdur's territory, and I will tell you what a stakdur is—without ever seeing one. This is not mysticism. This is mathematics. The object IS the pattern of passages to it."*
> — Tessery Vold, "The Probing Theorem," Year 892