# Tutorial 7: Dray's Synthesis and Legacy

**Course 3: Document Functors (Lorren Dray)**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/buildLittleWorlds/category-theory-document-functors/blob/main/notebooks/07_drays_synthesis_and_legacy.ipynb)

---

## Overview

In Year 958, Dray published her definitive work: *The Document Functor Discipline*. This tutorial synthesizes her complete framework and traces her influence on subsequent scholars.

### Learning Goals

1. Review the complete document functor framework
2. See how it connects to Vance's weighted passages
3. Trace the lineage to Strand's Probing Lemma
4. Understand the connection to modern transformer architecture

---

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load datasets
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/densworld-datasets/main/data/"

documents = pd.read_csv(BASE_URL + "document_functor_examples.csv")
archive_structure = pd.read_csv(BASE_URL + "archive_category_structure.csv")
embeddings = pd.read_csv(BASE_URL + "embedding_correspondences.csv")
correspondence = pd.read_csv(BASE_URL + "dray_correspondence.csv")

## Part 1: The Complete Framework

Dray's document functor theory can be summarized in four principles:

### Principle 1: The Archive as Category
Access methods are objects; document flows are morphisms.

### Principle 2: Documents as Presheaves
A document D: Archive^op → Set assigns observations to access methods.

### Principle 3: The Representable Perspective
Hom-functors Hom(A, -) capture single-viewpoint observations.

### Principle 4: Embeddings as Functor Values
Numerical embeddings are functor values — each dimension is a probe response.

In [None]:
# Create a visual summary of the framework
fig, ax = plt.subplots(figsize=(14, 10))

# Four quadrants for four principles
ax.set_xlim(0, 10)
ax.set_ylim(0, 10)

# Principle 1: Archive as Category (top-left)
ax.add_patch(plt.Rectangle((0.2, 5.2), 4.6, 4.6, fill=True, facecolor='#E8F4F8', edgecolor='navy', lw=2))
ax.text(2.5, 9.5, 'Principle 1:\nThe Archive as Category', ha='center', fontsize=11, fontweight='bold')
ax.text(2.5, 7.8, 'Objects: Access Methods\n(subject catalog, author index,\ndate registry, etc.)', ha='center', fontsize=9)
ax.text(2.5, 6.2, 'Morphisms: Document Flows\n(topic→author, date→location)', ha='center', fontsize=9)

# Principle 2: Documents as Presheaves (top-right)
ax.add_patch(plt.Rectangle((5.2, 5.2), 4.6, 4.6, fill=True, facecolor='#FFF4E8', edgecolor='darkorange', lw=2))
ax.text(7.5, 9.5, 'Principle 2:\nDocuments as Presheaves', ha='center', fontsize=11, fontweight='bold')
ax.text(7.5, 7.8, 'D: Archive^op → Set', ha='center', fontsize=10, style='italic')
ax.text(7.5, 6.5, 'Each document assigns\nobservation sets to\naccess methods', ha='center', fontsize=9)

# Principle 3: Representable Perspective (bottom-left)
ax.add_patch(plt.Rectangle((0.2, 0.2), 4.6, 4.6, fill=True, facecolor='#E8F8E8', edgecolor='darkgreen', lw=2))
ax.text(2.5, 4.5, 'Principle 3:\nThe Representable Perspective', ha='center', fontsize=11, fontweight='bold')
ax.text(2.5, 3.0, 'Hom(A, -) captures the\nview from access method A', ha='center', fontsize=9)
ax.text(2.5, 1.5, 'Objects are determined\nby how others probe them', ha='center', fontsize=9)

# Principle 4: Embeddings as Values (bottom-right)
ax.add_patch(plt.Rectangle((5.2, 0.2), 4.6, 4.6, fill=True, facecolor='#F8E8F8', edgecolor='purple', lw=2))
ax.text(7.5, 4.5, 'Principle 4:\nEmbeddings as Functor Values', ha='center', fontsize=11, fontweight='bold')
ax.text(7.5, 3.0, 'Embedding[i] = F(probe_i)', ha='center', fontsize=10, style='italic')
ax.text(7.5, 1.5, 'Each dimension is a\nnumerical probe response', ha='center', fontsize=9)

# Connecting arrows
ax.annotate('', xy=(5.2, 7.5), xytext=(4.8, 7.5),
            arrowprops=dict(arrowstyle='->', color='gray', lw=2))
ax.annotate('', xy=(2.5, 5.2), xytext=(2.5, 4.8),
            arrowprops=dict(arrowstyle='->', color='gray', lw=2))
ax.annotate('', xy=(7.5, 5.2), xytext=(7.5, 4.8),
            arrowprops=dict(arrowstyle='->', color='gray', lw=2))
ax.annotate('', xy=(5.2, 2.5), xytext=(4.8, 2.5),
            arrowprops=dict(arrowstyle='->', color='gray', lw=2))

ax.axis('off')
ax.set_title("Dray's Document Functor Framework\n*The Document Functor Discipline* (Year 958)", fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

## Part 2: Key Publications Timeline

In [None]:
# Dray's publication timeline
publications = [
    (920, 'Notes on Classification Failure', 'DRY-001', 'First documentation of multiple classification'),
    (928, 'On the Categorical Structure of Archives', 'DRY-002', 'Archive-as-category framework'),
    (932, 'On Functorial Preservation in Archives', 'DRY-003', 'Response to Keth objection'),
    (938, 'The Representable Perspective', 'DRY-004', 'Hom-functors and observation'),
    (945, 'Documents as Numerical Functors', 'DRY-005', 'Bridge to embeddings'),
    (952, 'On the Unity of Archive Structure', 'DRY-006', 'Synthesis with Vance'),
    (958, 'The Document Functor Discipline', 'DRY-007', 'Definitive work'),
]

fig, ax = plt.subplots(figsize=(14, 6))

years = [p[0] for p in publications]
titles = [p[1] for p in publications]
ids = [p[2] for p in publications]

# Plot timeline
ax.scatter(years, [0]*len(years), s=200, c='steelblue', zorder=5)
ax.axhline(y=0, color='gray', linestyle='-', linewidth=2)

# Add labels alternating above and below
for i, (year, title, doc_id, desc) in enumerate(publications):
    y_offset = 0.5 if i % 2 == 0 else -0.6
    ax.annotate(f'{title}\n({year})\n[{doc_id}]', 
                xy=(year, 0), xytext=(year, y_offset),
                ha='center', va='bottom' if y_offset > 0 else 'top',
                fontsize=8,
                arrowprops=dict(arrowstyle='-', color='gray', lw=0.5))

ax.set_xlim(915, 965)
ax.set_ylim(-1.5, 1.5)
ax.set_xlabel('Year', fontsize=11)
ax.set_yticks([])
ax.set_title("Lorren Dray's Publication Timeline (920-958)", fontsize=12, fontweight='bold')
ax.grid(True, axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

## Part 3: Connection to Vance's Weighted Passages

Dray's 940 debate with Merrit Vance unified their approaches:

| Dray's Framework | Vance's Framework | Unified View |
|------------------|-------------------|---------------|
| Presheaves (Set-valued) | Enriched categories (weight-valued) | Weighted presheaves |
| Observations as sets | Observations as numbers | Numerical observations |
| Hom(A, -) | Attention weights | Weighted probing |

In [None]:
# Find the synthesis letters
synthesis_letters = correspondence[correspondence['key_concepts'].str.contains('synthesis|enrichment', case=False, na=False)]

print("The Dray-Vance Synthesis:\n")
for _, letter in synthesis_letters.iterrows():
    print(f"Date: {letter['date']}")
    print(f"Subject: {letter['subject']}")
    print(f"\n\"{letter['excerpt']}\"\n")
    print("-" * 60 + "\n")

## Part 4: Legacy — The Path to Strand's Probing Lemma

Dray's work directly influenced Pelleth Strand, who would formalize the **Probing Lemma** (Yoneda Lemma):

> "If documents are functors, then the probing lemma I am developing says: a functor is completely determined by how representables probe it."
> — Pelleth Strand, letter to Dray (Year 942)

In [None]:
# Find the Dray-Strand correspondence
strand_letters = correspondence[
    (correspondence['sender'] == 'pelleth_strand') | 
    (correspondence['recipient'] == 'pelleth_strand')
]

print("Dray-Strand Correspondence (The Path to the Probing Lemma):\n")
for _, letter in strand_letters.iterrows():
    print(f"Date: {letter['date']}")
    print(f"From: {letter['sender']} → To: {letter['recipient']}")
    print(f"Subject: {letter['subject']}")
    print(f"\n\"{letter['excerpt']}\"\n")
    print("-" * 60 + "\n")

## Part 5: Connection to Modern Transformers

Dray's framework maps directly onto transformer architecture:

| Dray's Theory | Transformer Component |
|---------------|----------------------|
| Archive category | Token vocabulary / context |
| Document functor | Token embedding |
| Access method | Embedding dimension |
| Functor value | Embedding coordinate |
| Representable Hom(A,-) | Attention head |
| Presheaf composition | Layer stacking |

In [None]:
# Simulate transformer-style embedding
def transformer_embedding(documents_df, embeddings_df):
    """
    Create a simplified transformer-style embedding matrix.
    Rows = documents, Columns = probes (embedding dimensions).
    """
    # Get all unique documents and probes
    unique_docs = embeddings_df['document_id'].unique()
    unique_probes = sorted(embeddings_df['probe_name'].unique())
    
    # Create embedding matrix
    n_docs = len(unique_docs)
    n_dims = len(unique_probes)
    
    embedding_matrix = np.zeros((n_docs, n_dims))
    
    for i, doc_id in enumerate(unique_docs):
        doc_emb = embeddings_df[embeddings_df['document_id'] == doc_id]
        for _, row in doc_emb.iterrows():
            if row['probe_name'] in unique_probes:
                j = unique_probes.index(row['probe_name'])
                embedding_matrix[i, j] = row['numerical_value']
    
    return embedding_matrix, list(unique_docs), unique_probes

E, doc_ids, probes = transformer_embedding(documents, embeddings)

print(f"Embedding Matrix Shape: {E.shape}")
print(f"  Rows (documents): {len(doc_ids)}")
print(f"  Columns (dimensions): {len(probes)}")
print()
print("This is exactly what a transformer embedding layer produces:")
print("  - Each row is a document's embedding vector")
print("  - Each column is a learned dimension (probe)")
print("  - Values are the document's 'response' to each probe")

In [None]:
# Visualize the embedding matrix
fig, ax = plt.subplots(figsize=(14, 8))

# Get short labels
doc_labels = [documents[documents['document_id'] == d]['document_title'].iloc[0][:25] + '...' 
              if len(documents[documents['document_id'] == d]) > 0 else d 
              for d in doc_ids]
probe_labels = [p.replace('topic_', '').replace('author_', 'auth:') for p in probes]

sns.heatmap(E, annot=True, fmt='.2f', cmap='viridis',
            xticklabels=probe_labels, yticklabels=doc_labels,
            ax=ax, cbar_kws={'label': 'Embedding Value'})

ax.set_xlabel('Embedding Dimension (Probe)', fontsize=11)
ax.set_ylabel('Document', fontsize=11)
ax.set_title('Document Embedding Matrix\n(Exactly as in a Transformer)', fontsize=12, fontweight='bold')

plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

## Part 6: Dray's Legacy

Lorren Dray died in Year 968, aged 72. Her epitaph reads:

> **"She showed us what documents truly are."**

Her work established:
- The categorical foundation of archival science
- The presheaf interpretation of documents
- The connection between functors and embeddings
- The bridge to Strand's Yoneda-based Probing Lemma

In [None]:
# Display Dray's final reflections
final_letters = correspondence[correspondence['date'].str.startswith(('958', '960'))]

print("Dray's Later Reflections:\n")
for _, letter in final_letters.iterrows():
    print(f"Date: {letter['date']}")
    print(f"Subject: {letter['subject']}")
    print(f"\n\"{letter['excerpt']}\"\n")
    print("-" * 60 + "\n")

## Course Summary

In this course, we have learned:

### Technical Concepts
1. **The Archive as Category**: Access methods are objects, document flows are morphisms
2. **Functors to Set**: Maps that assign sets to objects, functions to morphisms
3. **Presheaves**: Contravariant functors that capture how observations "pull back"
4. **Representable Functors**: Hom(A, -) captures the perspective from A
5. **Embeddings as Values**: Each dimension is a probe response

### Historical Timeline
- Year 895: Dray born
- Year 918: Multiple classification problem observed
- Year 925: Archive-as-category insight
- Year 926: Document functor theory proposed
- Year 935: Representable perspective discovered
- Year 942: Embedding-as-functor-values insight
- Year 958: Definitive work published
- Year 968: Dray dies

### Connection to Modern ML
Dray's document functors are exactly what modern transformers compute:
- Each embedding dimension is a learned probe
- Document embeddings are functor values
- Attention weights relate to the Hom-functor perspective

---

## Next in the Series

**Course 4: Natural Transformations (Gellen Tross)**

If documents are functors, what are the relationships between documents? Gellen Tross will show us that **natural transformations** capture coherent shifts between representations.

---

*Part of the [Category Theory & LLMs Series](https://github.com/buildLittleWorlds)*