# Tutorial 4: Documents as Presheaves

**Course 3: Document Functors (Lorren Dray)**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/buildLittleWorlds/category-theory-document-functors/blob/main/notebooks/04_documents_as_presheaves.ipynb)

---

## Overview

In this tutorial, we encounter a crucial subtlety: documents are not just functors, they are **presheaves** — functors from the *opposite category*. This reversal captures how observations "pull back" along access shifts.

### Learning Goals

1. Understand the opposite category C^op
2. See why documents are presheaves (contravariant functors)
3. Learn how observations pull back along morphisms
4. Build intuition for the presheaf structure

---

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load datasets
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/densworld-datasets/main/data/"

documents = pd.read_csv(BASE_URL + "document_functor_examples.csv")
archive_structure = pd.read_csv(BASE_URL + "archive_category_structure.csv")
correspondence = pd.read_csv(BASE_URL + "dray_correspondence.csv")

## Part 1: The Opposite Category

For any category C, the **opposite category** C^op has:
- The same objects as C
- Morphisms reversed: if f: A → B in C, then f: B → A in C^op

### Why Does This Matter?

Consider the morphism `topic_to_author` in the Archive:
- In Archive: subject_catalog → author_index (given topics, find authors)
- In Archive^op: author_index → subject_catalog (reversed direction)

A presheaf is a functor from C^op to Set. This means morphisms in C induce functions in the *opposite* direction on sets.

In [None]:
# Visualize the opposite category
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Original category
ax1 = axes[0]
ax1.scatter([0.2, 0.8], [0.5, 0.5], s=600, c='lightblue', edgecolor='navy', zorder=5)
ax1.annotate('A', (0.2, 0.5), ha='center', va='center', fontsize=14, fontweight='bold')
ax1.annotate('B', (0.8, 0.5), ha='center', va='center', fontsize=14, fontweight='bold')
ax1.annotate('', xy=(0.7, 0.5), xytext=(0.3, 0.5),
            arrowprops=dict(arrowstyle='->', color='blue', lw=3))
ax1.annotate('f: A → B', (0.5, 0.6), ha='center', fontsize=12, color='blue')
ax1.set_xlim(0, 1)
ax1.set_ylim(0.2, 0.8)
ax1.axis('off')
ax1.set_title('Category C', fontsize=14, fontweight='bold')

# Opposite category
ax2 = axes[1]
ax2.scatter([0.2, 0.8], [0.5, 0.5], s=600, c='lightblue', edgecolor='navy', zorder=5)
ax2.annotate('A', (0.2, 0.5), ha='center', va='center', fontsize=14, fontweight='bold')
ax2.annotate('B', (0.8, 0.5), ha='center', va='center', fontsize=14, fontweight='bold')
ax2.annotate('', xy=(0.3, 0.5), xytext=(0.7, 0.5),
            arrowprops=dict(arrowstyle='->', color='red', lw=3))
ax2.annotate('f: B → A', (0.5, 0.6), ha='center', fontsize=12, color='red')
ax2.set_xlim(0, 1)
ax2.set_ylim(0.2, 0.8)
ax2.axis('off')
ax2.set_title('Opposite Category C^op', fontsize=14, fontweight='bold')

plt.suptitle('Same objects, reversed morphisms', fontsize=12)
plt.tight_layout()
plt.show()

## Part 2: Presheaves

A **presheaf** on C is a functor F: C^op → Set.

For a presheaf:
- F assigns a set F(A) to each object A
- For each morphism f: A → B in C, F assigns a function F(f): F(B) → F(A)
  - Note the reversal! f goes A → B, but F(f) goes F(B) → F(A)

This is called a **contravariant** functor.

### Why "Presheaf"?

The name comes from sheaf theory in topology. A presheaf assigns data to open sets, and when open sets include one another, data "restricts" (pulls back) from larger to smaller.

In [None]:
# Find Dray's explanation of contravariance
preservation_letter = correspondence[correspondence['subject'].str.contains('Preservation', case=False, na=False)]
if len(preservation_letter) > 0:
    letter = preservation_letter.iloc[0]
    print(f"From Dray's correspondence on functorial preservation:\n")
    print(f"Date: {letter['date']}")
    print(f"Subject: {letter['subject']}")
    print(f"\n\"{letter['excerpt']}\"")

## Part 3: Documents as Presheaves

Why are documents presheaves rather than regular functors?

Consider the morphism `topic_to_author`: subject_catalog → author_index

This morphism says: "Given a topic, find the authors who wrote about it."

For a document D:
- D(subject_catalog) = {set of topics in this document}
- D(author_index) = {set of authors of this document}

The function D(topic_to_author) should go from D(author_index) to D(subject_catalog):
- Given an author, what topics did they write about *in this document*?

This is a "pullback" — we're restricting from the broader category (all authors) to the specific observations in this document.

In [None]:
class DocumentPresheaf:
    """
    A document represented as a presheaf F: Archive^op → Set.
    
    For a morphism f: A → B in Archive,
    F(f): F(B) → F(A) pulls observations back.
    """
    def __init__(self, doc_id, doc_df):
        self.doc_id = doc_id
        self.observations = {}
        
        for _, row in doc_df.iterrows():
            access_method = row['access_method']
            obs_value = row['observation_value']
            if ',' in obs_value:
                obs_set = set(s.strip() for s in obs_value.split(','))
            else:
                obs_set = {obs_value}
            self.observations[access_method] = obs_set
    
    def on_object(self, access_method):
        """F(A): the set of observations for access method A."""
        return self.observations.get(access_method, set())
    
    def on_morphism(self, source, target):
        """
        F(f): F(target) → F(source) for morphism f: source → target.
        
        This is the pullback/restriction function.
        For a document, this "restricts" observations.
        """
        # In a full implementation, this would compute the actual pullback.
        # Here we demonstrate the concept.
        source_obs = self.on_object(source)
        target_obs = self.on_object(target)
        
        return {
            'domain': target_obs,
            'codomain': source_obs,
            'direction': f'{target} → {source} (contravariant)'
        }

# Create a presheaf
doc_001_data = documents[documents['document_id'] == 'DOC-001']
P = DocumentPresheaf('DOC-001', doc_001_data)

print("Document as Presheaf:")
print(f"  P(subject_catalog) = {P.on_object('subject_catalog')}")
print(f"  P(author_index) = {P.on_object('author_index')}")
print()
print("On morphism topic_to_author: subject_catalog → author_index:")
result = P.on_morphism('subject_catalog', 'author_index')
print(f"  P(topic_to_author): {result['direction']}")
print(f"  Maps {result['domain']} to {result['codomain']}")

## Part 4: The Pullback Intuition

The contravariance of presheaves captures an important intuition:

**When we refine our query, observations become more specific.**

Consider:
- General query: "What topics are in this document?"
- Refined query: "What topics did Torvun Kell write about in this document?"

The morphism `topic_to_author` refines the query. The presheaf "pulls back" observations along this refinement.

In [None]:
# Visualize the pullback
fig, ax = plt.subplots(figsize=(12, 6))

# Archive category (top)
ax.text(0.1, 0.85, 'Archive Category', fontsize=12, fontweight='bold')
ax.scatter([0.15, 0.4], [0.7, 0.7], s=400, c='lightblue', edgecolor='navy', zorder=5)
ax.annotate('subject\ncatalog', (0.15, 0.7), ha='center', va='center', fontsize=8)
ax.annotate('author\nindex', (0.4, 0.7), ha='center', va='center', fontsize=8)
ax.annotate('', xy=(0.35, 0.7), xytext=(0.2, 0.7),
            arrowprops=dict(arrowstyle='->', color='blue', lw=2))
ax.annotate('f', (0.275, 0.75), ha='center', fontsize=10, color='blue')

# Presheaf arrow (contravariant)
ax.annotate('', xy=(0.5, 0.4), xytext=(0.5, 0.6),
            arrowprops=dict(arrowstyle='->', color='green', lw=3))
ax.annotate('P\n(presheaf)', (0.55, 0.5), ha='left', fontsize=10, color='green', fontweight='bold')

# Set category (bottom)
ax.text(0.1, 0.35, 'Set Category', fontsize=12, fontweight='bold')

# P(subject_catalog)
ax.add_patch(plt.Circle((0.15, 0.2), 0.1, fill=True, facecolor='lightyellow', edgecolor='orange', lw=2))
ax.annotate('P(subject)', (0.15, 0.08), ha='center', fontsize=8)
ax.text(0.15, 0.2, '{boundaries,\nsurveys,\nSW-sector}', ha='center', va='center', fontsize=6)

# P(author_index)
ax.add_patch(plt.Circle((0.4, 0.2), 0.06, fill=True, facecolor='lightyellow', edgecolor='orange', lw=2))
ax.annotate('P(author)', (0.4, 0.08), ha='center', fontsize=8)
ax.text(0.4, 0.2, '{kell}', ha='center', va='center', fontsize=7)

# P(f) goes BACKWARDS
ax.annotate('', xy=(0.2, 0.2), xytext=(0.32, 0.2),
            arrowprops=dict(arrowstyle='->', color='red', lw=2))
ax.annotate('P(f)', (0.26, 0.12), ha='center', fontsize=10, color='red')
ax.annotate('(contravariant!)', (0.26, 0.05), ha='center', fontsize=8, color='red', style='italic')

# Explanation
ax.text(0.6, 0.5, 'f: subject → author\n\nP(f): P(author) → P(subject)\n\nThe morphism direction reverses!\nThis is why P is a presheaf.',
        fontsize=10, verticalalignment='center', bbox=dict(boxstyle='round', facecolor='white', edgecolor='gray'))

ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.axis('off')
ax.set_title('Documents as Presheaves: Contravariant Functors', fontsize=14)

plt.tight_layout()
plt.show()

## Part 5: Why Contravariance?

Dray's insight was that archive operations naturally have this contravariant structure:

> "When we shift from a general access method to a more specific one, our observations become restricted. A document seen through the author lens reveals only what that author contributed. The restriction function goes backward — from the specific to the general."
> — Lorren Dray

In [None]:
# Compare covariant vs contravariant intuition
print("Covariant (Regular) Functor:")
print("  Morphism f: A → B induces F(f): F(A) → F(B)")
print("  Direction preserved")
print()
print("Contravariant (Presheaf) Functor:")
print("  Morphism f: A → B induces P(f): P(B) → P(A)")
print("  Direction reversed")
print()
print("For Archives:")
print("  topic_to_author: subject_catalog → author_index")
print("  This morphism 'refines' or 'specializes' the query.")
print("  The presheaf P 'restricts' observations back along this refinement.")
print()
print("  P(topic_to_author): P(author_index) → P(subject_catalog)")
print("  'Given what we know about the author, restrict to topics.'")

## Summary

In this tutorial, we've learned:

1. **Opposite category** C^op has the same objects but reversed morphisms
2. **Presheaves** are functors from C^op to Set (contravariant functors)
3. **Documents are presheaves**: Morphisms in Archive induce pullback functions on observation sets
4. **Contravariance captures restriction**: Refining a query restricts observations

### Key Notation

A document D is a presheaf D: Archive^op → Set

For morphism f: A → B in Archive:
- D(f): D(B) → D(A) (note the reversal)

### Next Tutorial

In Tutorial 5, we'll connect functor values to numerical embeddings — each embedding dimension corresponds to a "probe" that extracts a numerical value from the document.

---

*Part of the [Category Theory & LLMs Series](https://github.com/buildLittleWorlds)*