# IWTC Raw SourceIndexing


## Purpose and scope

This notebook develops tooling for indexing **raw source materials for a campaign world** using an evidence-first model. It generates **draft indexes** that segment and label unstructured source material, including raw session notes, auto-generated session transcripts, play-by-post transcripts, planning notes, and recollections, to support later human refinement and editorial use.

The notebook is parameterized by campaign world via explicit configuration supplied as an input. World-specific vocabulary, locations, arc cues, and related indexing semantics are defined in configuration files owned by the world repository (e.g., under `_meta/`), rather than being hard-coded into the tooling.

The outputs produced by this notebook are work-in-progress draft indexes. They are reproducible and are written into ignored locations within the world file structure. These draft indexes are intended to be reviewed and curated by a human, producing curated indexes that are committed to the world repository and used by downstream tooling and agents.

This notebook does not modify any curated indexes or canonical materials tracked in version control. It has no authority to publish, interpret, or author in-world material. Decisions about inclusion, interpretation, and narrative significance remain the responsibility of human editors.


## Inputs and assumptions

### Inputs

This notebook operates on raw source materials for a campaign world. Expected inputs include, but are not limited to:

- Raw session notes (e.g., longform narrative notes)
- Auto-generated session transcripts
- Play-by-post (PbP) transcripts
- Planning notes
- Recollections and retrospective summaries

Inputs may originate in different formats and levels of structure. Initial development assumes common text-based formats (e.g., Markdown, DOCX, or extracted plain text), with format-specific handling treated as an implementation detail rather than a conceptual constraint.

World-specific configuration is provided as an explicit input to the indexing process. Configuration files are owned by the world repository (e.g., under `_meta/`) and define vocabulary, locations, arc cues, and other indexing semantics required to interpret the raw source materials in a world-aware manner.

### Assumptions

- Raw source materials are treated as authoritative records of play or planning, but may be incomplete, inconsistent, or internally contradictory.
- Chronology, session boundaries, and narrative continuity may be implicit rather than explicitly marked in the source material.
- Indexing decisions are evidence-based and conservative; the notebook does not attempt to resolve ambiguities or impose narrative interpretation.
- The generated draft indexes are expected to be reviewed, corrected, and refined by a human before any downstream use.
- The notebook does not assume that all raw sources are suitable for indexing; exclusion or partial indexing of inputs is an acceptable outcome.


## Indexing outputs

This notebook produces **draft indexes** that describe the structure and contents of raw source materials in a machine-readable, evidence-linked form. Draft indexes are intended to support human refinement and downstream tooling; they are not themselves canonical references.

At a minimum, draft indexes are expected to capture:

- Identification of the indexed source material
- Segmentation of the source into indexable units
- Labels or classifications applied to each unit
- Explicit evidence references (e.g., line ranges, offsets, or excerpts) supporting each segmentation and label
- Confidence indicators and/or flags where ambiguity or uncertainty exists

Draft indexes may evolve in structure over time as heuristics and requirements mature. However, all draft indexes must remain traceable to the underlying raw sources and reproducible from those sources given the same configuration inputs.

The specific on-disk representation of draft indexes (e.g., JSON, YAML, or other structured formats) is an implementation detail and may change as the tooling evolves.
