# Tutorial 3: Composition and Identity
## The Fundamental Laws of Categories

---

### From the Debates of Year 894

*To the research assistant:*

*In Year 894, Marden Krell challenged Vold's framework with what he believed was a fatal objection: "Your passages compose, you say. But composition requires an ordering—which comes first? And if a creature remains in its territory, you call this an 'identity passage.' But how is remaining the same as passing?"*

*Vold's response was characteristically precise: "Composition is associative. It matters not which pair you compose first—the result is the same. As for identity, it is the passage that changes nothing. Compose any passage with identity, and you have the original passage unchanged. This is not philosophy. This is structure."*

*Your task: Verify Vold's claims using the passage data. Does composition behave associatively? Do identity morphisms satisfy the identity laws? The Archive has documented extensive composition chains—test whether Vold's mathematical claims hold.*

*If Vold is correct, her passage diagrams are not mere illustrations. They are categories.*

—*Archive Review Committee, Year 934*

---

## What You Will Learn

In this tutorial, you will learn to:

1. Understand the **composition operation** in categories
2. Verify the **associativity axiom**: (f ∘ g) ∘ h = f ∘ (g ∘ h)
3. Understand the **identity axiom**: f ∘ id = f = id ∘ f
4. Trace composition chains in the passage data
5. Recognize when composition laws fail (and what that means)

By the end, you will understand:
- Why these two axioms are sufficient to define a category
- How to verify categorical structure in data
- The conceptual meaning of composition and identity

---

## The Axioms of a Category

A **category** C consists of:
1. A collection of **objects**
2. For each pair of objects A, B, a collection of **morphisms** Hom(A, B)
3. For each object A, an **identity morphism** id_A: A → A
4. For each pair of morphisms f: A → B and g: B → C, a **composite** g ∘ f: A → C

Subject to two axioms:

### Axiom 1: Identity

For any morphism f: A → B:
- **Left identity**: id_B ∘ f = f
- **Right identity**: f ∘ id_A = f

Composing with identity leaves the morphism unchanged.

### Axiom 2: Associativity

For any morphisms f: A → B, g: B → C, h: C → D:
- **(h ∘ g) ∘ f = h ∘ (g ∘ f)**

It doesn't matter how you parenthesize—the result is the same.

---

### Vold's Interpretation

> *"When a creature passes from den to boundary, then from boundary to outskirts, the composite passage—den to outskirts—is well-defined. The boundary is forgotten; only the passage remains. And if you add another leg, outskirts to capital, it matters not whether you first compose den-to-boundary-to-outskirts, then add the capital leg, or first compose boundary-to-outskirts-to-capital, then prepend the den leg. The creature's full journey is the same."*

This is associativity in nature.

---

## Part 1: Loading the Passage Data

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('deep')

print("Libraries loaded. Ready to verify categorical axioms.")

In [None]:
# Load the Passage Diagram data
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/densworld-datasets/main/data/"

passages = pd.read_csv(BASE_URL + "passage_diagrams.csv")

print(f"Passages loaded: {len(passages)} morphisms")
print(f"\nColumns: {list(passages.columns)}")

---

## Part 2: Identity Morphisms

Let's first examine the identity morphisms in the data.

In [None]:
# Find all identity morphisms
identities = passages[passages['morphism_type'] == 'identity']

print(f"Identity morphisms found: {len(identities)}")
print("\nIdentity passages:")
print(identities[['passage_id', 'source_object', 'target_object', 'notes']])

In [None]:
# Verify: all identity morphisms have source == target
identity_valid = (identities['source_object'] == identities['target_object']).all()
print(f"All identities have source == target: {identity_valid}")

### Understanding Identity

Vold recorded three identity morphisms:
- **stakdur_territory → stakdur_territory**: The stakdur remains in its territory
- **reed_marsh → reed_marsh**: The reed-spinner habitat persists
- **grimslew_pool → grimslew_pool**: The grimslew's resting pool endures

These represent *persistence*—the passage of remaining unchanged.

In category theory, identity is essential because it allows composition to have a neutral element. Just as 0 + n = n for addition, id ∘ f = f for morphisms.

---

## Part 3: Composition Chains

The dataset explicitly records composition using the `composition_with` column. A morphism that "composes with" another morphism is the second step in a chain.

In [None]:
# Find all morphisms that are part of composition chains
has_composition = passages[passages['composition_with'].notna()]

print(f"Morphisms with explicit composition: {len(has_composition)}")
print("\nComposition references:")
print(has_composition[['passage_id', 'source_object', 'target_object', 'composition_with', 'notes']].head(15))

In [None]:
# Build a helper to look up passages by ID
passage_lookup = passages.set_index('passage_id').to_dict('index')

def get_passage(pid):
    """Get passage details by ID."""
    if pid in passage_lookup:
        p = passage_lookup[pid]
        return f"{p['source_object']} → {p['target_object']}"
    return None

# Test
print(f"PD-008: {get_passage('PD-008')}")
print(f"PD-009: {get_passage('PD-009')}")

### Tracing a Two-Step Composition

Let's trace the composition: deep_dens → boundary_zone → capital_outskirts

In [None]:
# First morphism: PD-008
f = passage_lookup['PD-008']
print("First morphism (f):")
print(f"  PD-008: {f['source_object']} → {f['target_object']}")
print(f"  Notes: {f['notes']}")

print()

# Second morphism: PD-009 (composes with PD-008)
g = passage_lookup['PD-009']
print("Second morphism (g):")
print(f"  PD-009: {g['source_object']} → {g['target_object']}")
print(f"  Composes with: {g['composition_with']}")
print(f"  Notes: {g['notes']}")

print()
print("Composite (g ∘ f):")
print(f"  {f['source_object']} → {g['target_object']}")
print("  (The intermediate boundary_zone is 'forgotten')")

### Verifying Composition Validity

For composition to be valid, the target of the first morphism must equal the source of the second:
- f: A → B
- g: B → C
- g ∘ f: A → C

The "B" must match.

In [None]:
# Verify all compositions are valid (targets match sources)
def verify_composition(row):
    """Check if a composition is valid."""
    if pd.isna(row['composition_with']):
        return None  # No composition
    
    first_id = row['composition_with']
    if first_id not in passage_lookup:
        return 'missing_reference'
    
    first = passage_lookup[first_id]
    # For g ∘ f: target of f must equal source of g
    if first['target_object'] == row['source_object']:
        return 'valid'
    else:
        return 'invalid'

composition_validity = has_composition.apply(verify_composition, axis=1)
print("Composition validity check:")
print(composition_validity.value_counts())

In [None]:
# Show examples of valid compositions
print("Examples of valid compositions:")
print("=" * 70)

count = 0
for _, row in has_composition.iterrows():
    first_id = row['composition_with']
    if first_id in passage_lookup:
        first = passage_lookup[first_id]
        if first['target_object'] == row['source_object']:
            print(f"\n{first_id}: {first['source_object']} → {first['target_object']}")
            print(f"{row['passage_id']}: {row['source_object']} → {row['target_object']}")
            print(f"Composite: {first['source_object']} → {row['target_object']}")
            count += 1
            if count >= 5:
                break

---

## Part 4: Testing the Identity Axiom

The identity axiom states:
- **Left identity**: id_B ∘ f = f
- **Right identity**: f ∘ id_A = f

Let's test this with our data.

In [None]:
# Find a morphism and the relevant identity morphisms
# Example: PD-001 is stakdur_territory → reed_marsh

f = passage_lookup['PD-001']
print("Testing morphism f:")
print(f"  PD-001: {f['source_object']} → {f['target_object']}")

# Identity at source: id_{stakdur_territory}
id_source = passage_lookup.get('PD-005')
if id_source:
    print(f"\nIdentity at source (id_A):")
    print(f"  PD-005: {id_source['source_object']} → {id_source['target_object']}")

In [None]:
# Test right identity: f ∘ id_A = f
# Composing: first id_{stakdur_territory}, then f (stakdur_territory → reed_marsh)

print("Right Identity Test: f ∘ id_A = f")
print("=" * 50)
print(f"\nid_A: {id_source['source_object']} → {id_source['target_object']}")
print(f"f:    {f['source_object']} → {f['target_object']}")

# Check: target of id_A matches source of f?
if id_source['target_object'] == f['source_object']:
    print(f"\nComposition valid: targets/sources align")
    composite_source = id_source['source_object']
    composite_target = f['target_object']
    print(f"f ∘ id_A: {composite_source} → {composite_target}")
    print(f"f:        {f['source_object']} → {f['target_object']}")
    
    if composite_source == f['source_object'] and composite_target == f['target_object']:
        print("\n✓ Right identity holds: f ∘ id_A = f")

### Conceptual Interpretation

Why does composing with identity give back the original morphism?

Consider: A stakdur first "remains" in its territory (identity), then passes to the reed marsh. The total journey is simply: territory → reed marsh. The "remaining" added nothing to the passage.

This is why identity is called a "neutral element"—it doesn't change anything when composed.

---

## Part 5: Testing Associativity

The associativity axiom requires three composable morphisms:
- f: A → B
- g: B → C
- h: C → D

Then: (h ∘ g) ∘ f = h ∘ (g ∘ f)

Let's find a three-step composition chain in the data.

In [None]:
# Find morphisms that form a 3-step chain
# Look for PD-X that composes with PD-Y, which composes with PD-Z

def find_chains():
    """Find all 3-step composition chains."""
    chains = []
    for _, row3 in has_composition.iterrows():
        # row3 composes with some earlier morphism
        row2_id = row3['composition_with']
        if row2_id not in passage_lookup:
            continue
        row2 = passage_lookup[row2_id]
        
        # Does row2 also compose with something?
        if pd.notna(row2.get('composition_with')):
            row1_id = row2['composition_with']
            if row1_id in passage_lookup:
                row1 = passage_lookup[row1_id]
                chains.append({
                    'f_id': row1_id,
                    'f': f"{row1['source_object']} → {row1['target_object']}",
                    'g_id': row2_id,
                    'g': f"{row2['source_object']} → {row2['target_object']}",
                    'h_id': row3['passage_id'],
                    'h': f"{row3['source_object']} → {row3['target_object']}",
                    'f_source': row1['source_object'],
                    'h_target': row3['target_object']
                })
    return chains

chains = find_chains()
print(f"Found {len(chains)} three-step composition chains")

In [None]:
# Display the chains
for i, chain in enumerate(chains[:8]):
    print(f"\nChain {i+1}:")
    print(f"  f ({chain['f_id']}): {chain['f']}")
    print(f"  g ({chain['g_id']}): {chain['g']}")
    print(f"  h ({chain['h_id']}): {chain['h']}")
    print(f"  Full composite: {chain['f_source']} → {chain['h_target']}")

### Testing Associativity on a Specific Chain

Let's take the stakdur lifecycle chain and verify associativity.

In [None]:
# Stakdur lifecycle: egg_chamber → nursery_zone → hunting_grounds → breeding_territory
# This is a 3-step chain (4 objects, 3 morphisms)

f = passage_lookup['PD-023']  # egg_chamber → nursery_zone
g = passage_lookup['PD-024']  # nursery_zone → hunting_grounds
h = passage_lookup['PD-025']  # hunting_grounds → breeding_territory

print("Stakdur Lifecycle Chain:")
print("=" * 60)
print(f"f (PD-023): {f['source_object']} → {f['target_object']}")
print(f"g (PD-024): {g['source_object']} → {g['target_object']}")
print(f"h (PD-025): {h['source_object']} → {h['target_object']}")

In [None]:
# Method 1: (h ∘ g) ∘ f
# First compose h and g, then compose with f

print("Method 1: (h ∘ g) ∘ f")
print("-" * 40)

# Step 1: h ∘ g
# g: nursery_zone → hunting_grounds
# h: hunting_grounds → breeding_territory
hg_source = g['source_object']  # nursery_zone
hg_target = h['target_object']  # breeding_territory
print(f"Step 1: h ∘ g = {hg_source} → {hg_target}")

# Step 2: (h ∘ g) ∘ f
# f: egg_chamber → nursery_zone
# h ∘ g: nursery_zone → breeding_territory
final_source = f['source_object']  # egg_chamber
final_target = hg_target  # breeding_territory
print(f"Step 2: (h ∘ g) ∘ f = {final_source} → {final_target}")

In [None]:
# Method 2: h ∘ (g ∘ f)
# First compose g and f, then compose with h

print("Method 2: h ∘ (g ∘ f)")
print("-" * 40)

# Step 1: g ∘ f
# f: egg_chamber → nursery_zone
# g: nursery_zone → hunting_grounds
gf_source = f['source_object']  # egg_chamber
gf_target = g['target_object']  # hunting_grounds
print(f"Step 1: g ∘ f = {gf_source} → {gf_target}")

# Step 2: h ∘ (g ∘ f)
# g ∘ f: egg_chamber → hunting_grounds
# h: hunting_grounds → breeding_territory
final_source_2 = gf_source  # egg_chamber
final_target_2 = h['target_object']  # breeding_territory
print(f"Step 2: h ∘ (g ∘ f) = {final_source_2} → {final_target_2}")

In [None]:
# Verify associativity
print("\nAssociativity Verification:")
print("=" * 40)
print(f"(h ∘ g) ∘ f = {final_source} → {final_target}")
print(f"h ∘ (g ∘ f) = {final_source_2} → {final_target_2}")

if final_source == final_source_2 and final_target == final_target_2:
    print("\n✓ ASSOCIATIVITY HOLDS")
    print("  The order of composition doesn't matter.")
else:
    print("\n✗ ASSOCIATIVITY FAILS")

### Why Associativity Matters

Associativity means we can write f ∘ g ∘ h without ambiguity. We don't need to specify whether we mean (f ∘ g) ∘ h or f ∘ (g ∘ h)—they're the same.

For Vold, this captures a deep truth: *the intermediate stages can be forgotten*. A stakdur that grows from egg to adult takes a specific path (egg → nursery → hunting → breeding), but the composite journey—egg to breeding—is the same regardless of how we mentally group the stages.

---

## Part 6: Visualizing Composition

Let's build a visual representation of composition chains.

In [None]:
# Build a graph of all passages with composition arrows highlighted
G = nx.DiGraph()

# Add all passages as edges
for _, row in passages.iterrows():
    G.add_edge(
        row['source_object'], 
        row['target_object'],
        id=row['passage_id'],
        type=row['morphism_type'],
        composes_with=row['composition_with'] if pd.notna(row['composition_with']) else None
    )

print(f"Graph built: {G.number_of_nodes()} objects, {G.number_of_edges()} morphisms")

In [None]:
# Focus on the lifecycle passages to visualize composition
lifecycle = passages[passages['morphism_type'] == 'lifecycle_passage']

G_life = nx.DiGraph()
for _, row in lifecycle.iterrows():
    G_life.add_edge(row['source_object'], row['target_object'], id=row['passage_id'])

fig, ax = plt.subplots(figsize=(14, 8))

# Use a left-to-right layout for lifecycle
pos = nx.spring_layout(G_life, k=2, iterations=100, seed=42)

# Draw
nx.draw_networkx_nodes(G_life, pos, node_size=2000, node_color='lightgreen', ax=ax)
nx.draw_networkx_labels(G_life, pos, font_size=8, ax=ax)
nx.draw_networkx_edges(G_life, pos, edge_color='darkgreen', arrows=True,
                        arrowsize=20, connectionstyle='arc3,rad=0.1', ax=ax)

# Add edge labels
edge_labels = nx.get_edge_attributes(G_life, 'id')
nx.draw_networkx_edge_labels(G_life, pos, edge_labels, font_size=7, ax=ax)

ax.set_title('Lifecycle Passages: Composition Chains\n(Each arrow composes with the previous)', fontsize=12)
ax.axis('off')
plt.tight_layout()
plt.show()

In [None]:
# Visualize the scientific method chain as a composition diagram
scientific = passages[passages['morphism_type'] == 'scientific_passage']

print("Scientific Method as Composition:")
for _, row in scientific.iterrows():
    comp = f"(composes with {row['composition_with']})" if pd.notna(row['composition_with']) else ""
    print(f"  {row['passage_id']}: {row['source_object']} → {row['target_object']} {comp}")

The scientific method forms a composition chain:

```
observation_made → hypothesis_formed → prediction_tested → theory_established
```

The composite passage: observation_made → theory_established

This is science viewed categorically—a chain of morphisms that composes into a single "passage" from observation to theory.

---

## Part 7: Composition in Different Domains

Let's examine how composition manifests across different morphism types.

In [None]:
# Group compositions by morphism type
composition_by_type = has_composition.groupby('morphism_type').size().sort_values(ascending=False)

print("Composition chains by domain:")
print(composition_by_type)

In [None]:
# Visualize
fig, ax = plt.subplots(figsize=(10, 6))

composition_by_type.plot(kind='barh', ax=ax, color='steelblue', alpha=0.7)
ax.set_xlabel('Number of Composition References')
ax.set_ylabel('Morphism Type')
ax.set_title('Which Domains Have the Most Composition?')
plt.tight_layout()
plt.show()

Interesting pattern: **creature_passage** and **lifecycle_passage** have the most compositions. This reflects the natural world's sequential structure—creatures move through territories in chains, organisms develop through stages.

**Temporal passages** (circadian rhythms) and **scientific passages** also compose heavily—daily cycles and the scientific method are inherently sequential.

---

## Part 8: The ML Connection

Composition and identity are fundamental to neural network architecture:

### Neural Networks as Categories

| Categorical Concept | Neural Network Analog |
|---------------------|----------------------|
| Objects | Vector spaces (layer dimensions) |
| Morphisms | Linear maps (weight matrices) |
| Composition | Matrix multiplication |
| Identity | Identity matrix |

### Associativity in Backpropagation

The chain rule in calculus is associative:

d/dx (f ∘ g ∘ h) = (f' ∘ g ∘ h) · (g' ∘ h) · h'

We can compute gradients layer-by-layer because composition is associative.

In [None]:
# Demonstrate: Matrix composition is associative
np.random.seed(42)

# Three "layer" matrices
A = np.random.randn(3, 4)  # Layer 1: R^4 → R^3
B = np.random.randn(2, 3)  # Layer 2: R^3 → R^2
C = np.random.randn(5, 2)  # Layer 3: R^2 → R^5

# Method 1: (C @ B) @ A
method1 = (C @ B) @ A

# Method 2: C @ (B @ A)
method2 = C @ (B @ A)

print("Matrix Composition Associativity:")
print(f"(C @ B) @ A shape: {method1.shape}")
print(f"C @ (B @ A) shape: {method2.shape}")
print(f"\nAre they equal? {np.allclose(method1, method2)}")

In [None]:
# Demonstrate: Identity matrices leave morphisms unchanged
I_left = np.eye(3)   # Identity at codomain of A
I_right = np.eye(4)  # Identity at domain of A

print("Identity Axiom for Matrices:")
print(f"A shape: {A.shape}")
print(f"\nLeft identity: I @ A = A?  {np.allclose(I_left @ A, A)}")
print(f"Right identity: A @ I = A? {np.allclose(A @ I_right, A)}")

This is why neural networks work: layers compose associatively, and identity (skip connections) can be inserted without changing the computation.

---

## Exercises

### Exercise 1: Find All Three-Step Chains

Write a function to find ALL three-step composition chains in the data. How many exist? What types of morphisms appear most often in chains?

In [None]:
# Your code here
# Hint: Use the find_chains() function and analyze the results

### Exercise 2: Identity Coverage

Which objects have explicit identity morphisms? Which don't? Create a list of objects that "should" have identity morphisms based on appearing as sources/targets in other passages.

In [None]:
# Your code here
# Hint: Compare set of identity sources with all objects in passages

### Exercise 3: Cyclic Compositions

Some compositions form cycles (like the tidal movements). Find all passages where composing a morphism with its "inverse" returns to the starting point. Are these true categorical inverses?

In [None]:
# Your code here
# Hint: Look for reversible passages and check if f ∘ g = identity

### Exercise 4: Longest Composition Chain

Find the longest composition chain in the dataset. What domain does it come from?

In [None]:
# Your code here
# Hint: Recursively follow composition_with references

---

## Discussion Questions

1. Vold claimed that composition allows intermediate stages to be "forgotten." But in some domains (like lifecycle passages), the intermediate stages seem essential. How do you reconcile "forgetting" with the fact that the stakdur must pass through each life stage?

2. Identity morphisms represent "remaining in place." But for time-dependent systems, can something truly "remain" the same? Is the stakdur in its territory at time t the same as at time t+1?

3. In machine learning, skip connections (ResNets) add identity-like paths. Categorically, this is adding identity morphisms. Why does this help training?

---

## Summary

In this tutorial, you learned:

| Concept | What You Learned |
|---------|------------------|
| Identity axiom | f ∘ id = f = id ∘ f — identity is the neutral element |
| Associativity axiom | (h ∘ g) ∘ f = h ∘ (g ∘ f) — parentheses don't matter |
| Composition chains | Multi-step morphisms can be composed into single passages |
| Verification | How to test categorical axioms on real data |
| ML connection | Neural network layers are morphisms; composition is matrix multiplication |

| Skill | Code Pattern |
|-------|--------------|
| Build lookup | `df.set_index('id').to_dict('index')` |
| Trace compositions | Follow `composition_with` references |
| Verify axioms | Check source/target matching |
| Visualize chains | `nx.DiGraph()` with edge labels |

---

## Next Tutorial

In **Tutorial 4: The Preservation Principle**, you will learn about **functors**—structure-preserving maps between categories:

- How the Archive translates classifications between regional systems
- What it means to "preserve structure"
- Functors as the morphisms of the category of categories
- The ML connection to equivariant neural networks

> *"A preservation map carries passages to passages, compositions to compositions, identities to identities. It is not merely a translation of names—it is a translation of structure."*
> — Tessery Vold, "On the Coherence of Shifts," Year 897

---

## Credits

**Source Material:** Tai-Danae Bradley, "Category Theory and Language Models" (Cartesian Cafe)

**Densworld Integration:** The Relational Foundations course applies categorical concepts through the framework of Tessery Vold.

**Learn more:** [buildLittleWorlds](https://github.com/buildLittleWorlds)

---

> *"Composition is associative. It matters not which pair you compose first—the result is the same. As for identity, it is the passage that changes nothing. Compose any passage with identity, and you have the original passage unchanged. This is not philosophy. This is structure."*
> — Tessery Vold, responding to Marden Krell's objections, Year 894