# Tutorial 2: The Archive as Category

**Course 3: Document Functors (Lorren Dray)**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/buildLittleWorlds/category-theory-document-functors/blob/main/notebooks/02_archive_as_category.ipynb)

---

## Overview

In Year 925, Dray made a crucial discovery: **the Archive itself has categorical structure**. Access methods are objects, and the flows between them are morphisms.

This tutorial develops the Archive as a category, preparing us to understand documents as functors.

### Learning Goals

1. Understand access methods as objects in a category
2. See document flows as morphisms between access methods
3. Verify composition and identity laws
4. Build intuition for the categorical structure underlying archives

---

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx

# Load datasets
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/densworld-datasets/main/data/"

archive_structure = pd.read_csv(BASE_URL + "archive_category_structure.csv")
print(f"Loaded {len(archive_structure)} archive structure elements")
archive_structure.head(10)

## Part 1: Objects — The Access Methods

In Dray's categorical view of the Archive, the **objects** are access methods — different ways of querying the archive.

> "The Archive is not a warehouse but a living category. Its objects are the methods by which we access knowledge."
> — Lorren Dray

In [None]:
# Extract objects (access methods)
objects = archive_structure[archive_structure['object_type'] == 'access_method']

print("Objects in the Archive Category (Access Methods):\n")
for _, obj in objects.iterrows():
    print(f"  {obj['object_name']}: {obj['notes']}")

In [None]:
# Visualize objects
fig, ax = plt.subplots(figsize=(10, 6))

# Create a simple layout for access methods
n_objects = len(objects)
angles = np.linspace(0, 2*np.pi, n_objects, endpoint=False)
radius = 2

x = radius * np.cos(angles)
y = radius * np.sin(angles)

# Plot objects as nodes
ax.scatter(x, y, s=500, c='lightblue', edgecolor='navy', linewidth=2, zorder=5)

# Add labels
for i, (_, obj) in enumerate(objects.iterrows()):
    name = obj['object_name'].replace('_', '\n')
    ax.annotate(name, (x[i], y[i]), ha='center', va='center', fontsize=8, fontweight='bold')

ax.set_xlim(-4, 4)
ax.set_ylim(-3, 3)
ax.set_aspect('equal')
ax.axis('off')
ax.set_title('Objects in the Archive Category\n(Access Methods)', fontsize=14)

plt.tight_layout()
plt.show()

## Part 2: Morphisms — Document Flows Between Methods

The **morphisms** in the Archive category are the flows between access methods. For example:

- `topic_to_author`: Given a topic, find who wrote about it
- `author_to_topic`: Given an author, find their topics
- `topic_to_date`: Given a topic, find when it was studied

These are natural operations archivists perform daily.

In [None]:
# Extract morphisms
morphisms = archive_structure[archive_structure['object_type'] == 'morphism']

print("Morphisms in the Archive Category:\n")
for _, mor in morphisms.iterrows():
    print(f"  {mor['morphism_name']}: {mor['morphism_source']} → {mor['morphism_target']}")
    print(f"    Description: {mor['notes']}")
    print()

In [None]:
# Create a directed graph of the archive category
G = nx.DiGraph()

# Add nodes (objects)
for _, obj in objects.iterrows():
    G.add_node(obj['object_name'])

# Add edges (morphisms)
for _, mor in morphisms.iterrows():
    if pd.notna(mor['morphism_source']) and pd.notna(mor['morphism_target']):
        G.add_edge(
            mor['morphism_source'], 
            mor['morphism_target'],
            name=mor['morphism_name']
        )

# Draw the category
fig, ax = plt.subplots(figsize=(12, 8))

pos = nx.spring_layout(G, k=2, iterations=50, seed=42)

# Draw nodes
nx.draw_networkx_nodes(G, pos, node_size=2000, node_color='lightblue', 
                       edgecolors='navy', linewidths=2, ax=ax)

# Draw edges with arrows
nx.draw_networkx_edges(G, pos, edge_color='gray', arrows=True, 
                       arrowsize=20, arrowstyle='->', ax=ax,
                       connectionstyle='arc3,rad=0.1')

# Draw labels
labels = {node: node.replace('_', '\n') for node in G.nodes()}
nx.draw_networkx_labels(G, pos, labels, font_size=8, font_weight='bold', ax=ax)

# Draw edge labels
edge_labels = nx.get_edge_attributes(G, 'name')
nx.draw_networkx_edge_labels(G, pos, edge_labels, font_size=6, ax=ax)

ax.set_title('The Archive as a Category\nAccess Methods and Document Flows', fontsize=14)
ax.axis('off')
plt.tight_layout()
plt.show()

## Part 3: Identity Morphisms

Every object in a category has an **identity morphism** — a morphism from the object to itself that "does nothing."

For the Archive category: querying the subject catalog and asking for subjects returns... subjects.

In [None]:
# Extract identity morphisms
identities = archive_structure[archive_structure['morphism_type'] == 'identity']

print("Identity Morphisms:\n")
for _, id_mor in identities.iterrows():
    print(f"  id_{id_mor['morphism_source']}: {id_mor['morphism_source']} → {id_mor['morphism_target']}")
    print(f"    '{id_mor['notes']}'")
    print()

## Part 4: Composition of Morphisms

In a category, morphisms can be **composed**: if we have f: A → B and g: B → C, then we can form g ∘ f: A → C.

For the Archive:
- `topic_to_author`: subject_catalog → author_index
- `author_to_method`: author_index → methodology_index
- Composition: subject_catalog → methodology_index (given a topic, find methods used)

In [None]:
# Extract compositions
compositions = archive_structure[archive_structure['morphism_type'] == 'composite']

print("Composite Morphisms (Compositions):\n")
for _, comp in compositions.iterrows():
    print(f"  {comp['object_name']}: {comp['morphism_source']} → {comp['morphism_target']}")
    print(f"    Composition rule: {comp['composition_rule']}")
    print(f"    Description: {comp['notes']}")
    print()

In [None]:
# Visualize a specific composition
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# First morphism: topic_to_author
ax1 = axes[0]
ax1.annotate('', xy=(0.8, 0.5), xytext=(0.2, 0.5),
            arrowprops=dict(arrowstyle='->', color='blue', lw=2))
ax1.scatter([0.2, 0.8], [0.5, 0.5], s=500, c='lightblue', edgecolor='navy', zorder=5)
ax1.annotate('subject\ncatalog', (0.2, 0.5), ha='center', va='center', fontsize=9)
ax1.annotate('author\nindex', (0.8, 0.5), ha='center', va='center', fontsize=9)
ax1.annotate('topic_to_author', (0.5, 0.65), ha='center', fontsize=8, color='blue')
ax1.set_xlim(0, 1)
ax1.set_ylim(0, 1)
ax1.axis('off')
ax1.set_title('Morphism f', fontsize=12)

# Second morphism: author_to_method
ax2 = axes[1]
ax2.annotate('', xy=(0.8, 0.5), xytext=(0.2, 0.5),
            arrowprops=dict(arrowstyle='->', color='green', lw=2))
ax2.scatter([0.2, 0.8], [0.5, 0.5], s=500, c='lightblue', edgecolor='navy', zorder=5)
ax2.annotate('author\nindex', (0.2, 0.5), ha='center', va='center', fontsize=9)
ax2.annotate('methodology\nindex', (0.8, 0.5), ha='center', va='center', fontsize=9)
ax2.annotate('author_to_method', (0.5, 0.65), ha='center', fontsize=8, color='green')
ax2.set_xlim(0, 1)
ax2.set_ylim(0, 1)
ax2.axis('off')
ax2.set_title('Morphism g', fontsize=12)

# Composition: g ∘ f
ax3 = axes[2]
ax3.annotate('', xy=(0.8, 0.5), xytext=(0.2, 0.5),
            arrowprops=dict(arrowstyle='->', color='purple', lw=2))
ax3.scatter([0.2, 0.8], [0.5, 0.5], s=500, c='lightblue', edgecolor='navy', zorder=5)
ax3.annotate('subject\ncatalog', (0.2, 0.5), ha='center', va='center', fontsize=9)
ax3.annotate('methodology\nindex', (0.8, 0.5), ha='center', va='center', fontsize=9)
ax3.annotate('g ∘ f = topic_author_method', (0.5, 0.65), ha='center', fontsize=8, color='purple')
ax3.set_xlim(0, 1)
ax3.set_ylim(0, 1)
ax3.axis('off')
ax3.set_title('Composition g ∘ f', fontsize=12)

plt.suptitle('Composition of Morphisms in the Archive Category', fontsize=14)
plt.tight_layout()
plt.show()

## Part 5: Associativity

Composition in a category must be **associative**: (h ∘ g) ∘ f = h ∘ (g ∘ f).

For the Archive, this means that chaining three access shifts gives the same result regardless of how we group them.

In [None]:
# Show associativity verification
assoc = archive_structure[archive_structure['morphism_type'] == 'associative']

print("Associativity Verification:\n")
for _, a in assoc.iterrows():
    print(f"  {a['object_name']}: {a['morphism_source']} → {a['morphism_target']}")
    print(f"    Associativity rule: {a['composition_rule']}")
    print(f"    Description: {a['notes']}")

## Summary

In this tutorial, we've seen the Archive as a category:

1. **Objects**: Access methods (subject catalog, author index, date registry, etc.)
2. **Morphisms**: Document flows between access methods
3. **Identity**: Each access method has a trivial self-morphism
4. **Composition**: Access shifts can be chained
5. **Associativity**: Chaining is well-behaved

### Key Quote

> "The Archive itself has categorical structure — access methods are objects, document flows are morphisms."
> — Lorren Dray, *On the Categorical Structure of Archives* (Year 928)

### Next Tutorial

In Tutorial 3, we'll formally define functors from categories to Sets, preparing us to understand documents as presheaves.

---

*Part of the [Category Theory & LLMs Series](https://github.com/buildLittleWorlds)*