# Densworld Event Explorer

**A demonstration of Hugging Face pipelines applied to rich narrative data**

This notebook shows how to:
1. Load structured event data (JSONL format)
2. Apply **zero-shot classification** to categorize events
3. Use **question answering** to explore event relationships

---

## About Densworld

Densworld is a fictional universe with 1,189 logged events spanning centuries. Events include:
- Expeditions and explorations
- Boundary anomalies and spatial phenomena
- Scholarly theories and manuscripts
- Political changes and institutional responses

The Living Ledger tracks causal chains: events trigger consequences that become new events.

---

## Why This Demo?

This notebook demonstrates Level 2 concepts with **real data**:
- Zero-shot classification (no training required)
- Question-answering pipelines
- Working with structured narrative data

You can adapt this approach for your own datasets!

## Setup

First, let's install the required libraries and upload the event data.

In [None]:
# Install transformers if not already installed (Colab may have it)
!pip install -q transformers torch

In [None]:
import json
import pandas as pd
from collections import Counter

# For Colab: upload files
try:
    from google.colab import files
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

print(f"Running in Colab: {IN_COLAB}")

In [None]:
# Upload EVENT_LOG.jsonl (in Colab)
# If running locally, place the file in the same directory

if IN_COLAB:
    print("Please upload EVENT_LOG.jsonl when prompted...")
    uploaded = files.upload()
    filename = list(uploaded.keys())[0]
else:
    filename = "EVENT_LOG.jsonl"

print(f"Using file: {filename}")

## Loading the Event Data

The Living Ledger is stored in JSONL format - one JSON object per line. Each event has:
- `event_id`: Unique identifier (e.g., "EV-847-001")
- `type`: Event category (e.g., "boundary_anomaly", "theory_proposed")
- `date`: When it happened in Densworld time
- `location`: Where it happened
- `actors`: Characters involved
- `notes`: Narrative description

In [None]:
# Load all events from JSONL
events = []
with open(filename, 'r') as f:
    for line in f:
        if line.strip():
            events.append(json.loads(line))

print(f"Loaded {len(events)} events")
print(f"\nFirst event:")
print(json.dumps(events[0], indent=2))

In [None]:
# Convert to DataFrame for easier analysis
df = pd.DataFrame(events)

# Show basic statistics
print("Event Types:")
print(df['type'].value_counts().head(15))
print(f"\nTotal unique event types: {df['type'].nunique()}")

In [None]:
# Look at some interesting events with notes
events_with_notes = df[df['notes'].notna()].copy()
print(f"Events with narrative notes: {len(events_with_notes)}")
print("\nSample event notes:")
for _, row in events_with_notes.head(5).iterrows():
    print(f"\n[{row['event_id']}] {row['type']}")
    print(f"  {row['notes'][:200]}..." if len(str(row['notes'])) > 200 else f"  {row['notes']}")

---

## Zero-Shot Classification

**Zero-shot classification** lets us categorize text without training a custom model. We provide:
- Text to classify
- A list of possible labels

The model determines which label best fits the text.

### Use Case: Urgency Classification

Let's classify events by **urgency level** - something not in the original data!

In [None]:
from transformers import pipeline

# Load zero-shot classification pipeline
# This uses facebook/bart-large-mnli by default
classifier = pipeline("zero-shot-classification")
print("Zero-shot classifier loaded!")

In [None]:
# Define urgency labels
urgency_labels = [
    "routine observation",
    "notable discovery",
    "urgent situation",
    "crisis or emergency"
]

# Test on a few events
sample_events = events_with_notes.head(5)

print("Classifying events by urgency...\n")
for _, row in sample_events.iterrows():
    text = row['notes']
    result = classifier(text, urgency_labels)
    
    print(f"[{row['event_id']}] {row['type']}")
    print(f"  Text: {text[:100]}...")
    print(f"  Urgency: {result['labels'][0]} ({result['scores'][0]:.2%})")
    print()

In [None]:
# Let's also classify by THEME
theme_labels = [
    "scientific investigation",
    "political or institutional",
    "personal or character-driven",
    "mysterious or unexplained",
    "conflict or danger"
]

print("Classifying events by theme...\n")
for _, row in sample_events.iterrows():
    text = row['notes']
    result = classifier(text, theme_labels)
    
    print(f"[{row['event_id']}] {row['type']}")
    print(f"  Theme: {result['labels'][0]} ({result['scores'][0]:.2%})")
    print(f"  Runner-up: {result['labels'][1]} ({result['scores'][1]:.2%})")
    print()

### Exercise: Create Your Own Categories

Try classifying events with your own labels! Ideas:
- `["success", "failure", "ambiguous outcome"]`
- `["individual action", "group action", "natural phenomenon"]`
- `["reversible", "permanent change"]`

In [None]:
# Your turn! Define custom labels and classify events
my_labels = ["your", "labels", "here"]

# Pick an event
test_event = events_with_notes.iloc[10]
print(f"Event: {test_event['notes']}")

# Classify it
# result = classifier(test_event['notes'], my_labels)
# print(f"Classification: {result['labels'][0]}")

---

## Question Answering

**Extractive QA** finds answers within a given context. We provide:
- A **question**
- A **context** (text that contains the answer)

The model extracts the answer from the context.

### Use Case: Ask Questions About Events

In [None]:
# Load QA pipeline
qa_pipeline = pipeline("question-answering")
print("QA pipeline loaded!")

In [None]:
# Create a context from multiple events about the SW collapse
sw_events = df[df['notes'].str.contains('SW|southwest|collapse', case=False, na=False)]
context = " ".join(sw_events['notes'].dropna().tolist()[:10])

print(f"Context length: {len(context)} characters")
print(f"\nContext preview: {context[:500]}...")

In [None]:
# Ask questions about the events
questions = [
    "What was the breathing phenomenon?",
    "Who predicted the collapse?",
    "What happened to the moat?",
    "Who was the chief surveyor?"
]

print("Asking questions about SW collapse events...\n")
for q in questions:
    result = qa_pipeline(question=q, context=context)
    print(f"Q: {q}")
    print(f"A: {result['answer']} (confidence: {result['score']:.2%})")
    print()

In [None]:
# Let's try with a different context - scholarly theories
theory_events = df[df['type'] == 'theory_proposed']
theory_context = " ".join(theory_events['notes'].dropna().tolist()[:10])

theory_questions = [
    "What is the Library hypothesis?",
    "What did Keth propose?",
    "What causes landmarks to appear in different positions?"
]

print("Asking questions about theories...\n")
for q in theory_questions:
    result = qa_pipeline(question=q, context=theory_context)
    print(f"Q: {q}")
    print(f"A: {result['answer']} (confidence: {result['score']:.2%})")
    print()

### Exercise: Ask Your Own Questions

Create a context from events and ask questions about it!

In [None]:
# Your turn! Create a context and ask questions

# Filter events (example: events about disappearances)
# my_events = df[df['type'] == 'disappearance']
# my_context = " ".join(my_events['notes'].dropna().tolist())

# Ask a question
# result = qa_pipeline(question="Who disappeared?", context=my_context)
# print(result['answer'])

---

## Combining Both: Event Analysis Tool

Let's build a simple tool that:
1. Takes an event ID
2. Shows the event details
3. Classifies its urgency and theme
4. Finds related events and answers questions

In [None]:
def analyze_event(event_id):
    """Analyze a Densworld event using ML pipelines."""
    
    # Find the event
    event = df[df['event_id'] == event_id]
    if len(event) == 0:
        print(f"Event {event_id} not found!")
        return
    
    event = event.iloc[0]
    
    print("=" * 60)
    print(f"EVENT ANALYSIS: {event_id}")
    print("=" * 60)
    
    # Basic info
    print(f"\nType: {event['type']}")
    print(f"Date: {event['date']}")
    print(f"Location: {event['location']}")
    if event.get('actors'):
        print(f"Actors: {', '.join(event['actors'])}")
    
    # Notes
    if pd.notna(event.get('notes')):
        print(f"\nNotes: {event['notes']}")
        
        # Classify urgency
        urgency = classifier(event['notes'], urgency_labels)
        print(f"\nUrgency: {urgency['labels'][0]} ({urgency['scores'][0]:.2%})")
        
        # Classify theme
        theme = classifier(event['notes'], theme_labels)
        print(f"Theme: {theme['labels'][0]} ({theme['scores'][0]:.2%})")
    
    # Find related events (same type or location)
    related = df[
        ((df['type'] == event['type']) | (df['location'] == event['location'])) &
        (df['event_id'] != event_id)
    ].head(3)
    
    if len(related) > 0:
        print(f"\nRelated Events:")
        for _, r in related.iterrows():
            print(f"  - {r['event_id']}: {r['type']} at {r['location']}")
    
    print("=" * 60)

In [None]:
# Try the analysis tool!
analyze_event("EV-847-002")  # The SW-6 breathing phenomenon

In [None]:
# Try another event
analyze_event("EV-895-001")  # The Library hypothesis

---

## Summary

In this notebook, you learned how to:

1. **Load JSONL data** - A common format for structured text data
2. **Zero-shot classification** - Categorize text without training
3. **Question answering** - Extract answers from context
4. **Combine pipelines** - Build analysis tools

### Key Takeaways

- **Zero-shot classification** is powerful for adding new metadata to existing data
- **QA pipelines** work best when you provide focused, relevant context
- **Hugging Face pipelines** make it easy to experiment quickly

### Next Steps

1. Try classifying ALL events and visualize the distribution
2. Build a character tracker that follows one actor across events
3. Create a timeline visualization of classified events
4. Experiment with different classification labels

In [None]:
# Bonus: Show distribution of event types
import matplotlib.pyplot as plt

type_counts = df['type'].value_counts().head(10)
plt.figure(figsize=(10, 6))
type_counts.plot(kind='barh')
plt.xlabel('Count')
plt.ylabel('Event Type')
plt.title('Top 10 Event Types in Densworld')
plt.tight_layout()
plt.show()