# IWTC Raw Source Indexing

This notebook executes the raw source indexing workflow defined in:

- `docs/raw_source_indexing_design.md`

It is intended for hands-on execution and experimentation. Conceptual scope, responsibilities, and workflow design are defined in the linked design document.


## Parameters

This notebook operates on a single world repository.

You must provide the full path to a `world_repository.yml` descriptor file.
All paths declared in that file are resolved relative to the `world_root`
defined within it.

A minimal example of `world_repository.yml` is provided in this repository
under:

- `data/config_examples/world_repository.yml`

You may copy and adapt that example for your own world repository.

Source selection (which files to index) is provided separately and is not
encoded in the descriptor.


In [1]:
# Path to the world_repository.yml file.
# This must be an absolute path.
WORLD_REPOSITORY_DESCRIPTOR = (
    "/Users/charissophia/obsidian/Iron Wolf Trading Company/_meta/descriptors/world_repository.yml"
)

# Optional: explicit source file specification.
# If None, the notebook will list candidate files under sources.read_paths
# and require human selection before proceeding.
SOURCE_FILE_SPEC = None


## Load world repository descriptor

This section loads the world repository descriptor (`world_repository.yml`) and
normalizes the paths it contains so downstream steps can rely on a consistent,
absolute-path view of the world.

In [6]:
# Import libraries required for descriptor loading and path handling
from pathlib import Path
import yaml

In [15]:
# Load world_repository.yml from disk into a Python mapping (dict)
descriptor_path = Path(WORLD_REPOSITORY_DESCRIPTOR)

if not descriptor_path.exists():
    raise FileNotFoundError(f"Descriptor file not found: {descriptor_path}")

if not descriptor_path.is_file():
    raise FileNotFoundError(f"Descriptor path is not a file: {descriptor_path}")

with descriptor_path.open("r", encoding="utf-8") as f:
    world_repo = yaml.safe_load(f)

if world_repo is None:
    raise ValueError(f"Descriptor file is empty or contains only comments: {descriptor_path}")

if not isinstance(world_repo, dict):
    raise ValueError(
        "world_repository.yml must be a YAML mapping at the top level (key/value pairs). "
        f"Got: {type(world_repo).__name__}"
    )

print(f"Loaded: {descriptor_path}")
print(f"Top-level keys: {list(world_repo.keys())}")


Loaded: /Users/charissophia/obsidian/Iron Wolf Trading Company/_meta/descriptors/world_repository.yml
Top-level keys: ['world_root', 'sources', 'working_drafts']


In [16]:
# Validate key values are useable

## Discover sources

## Select inputs

## Normalize inputs

## Vocabulary proposal (optional)

## Generate draft index

## Emit outputs