diff --git a/mcp_plex/loader/AGENTS.md b/mcp_plex/loader/AGENTS.md index 4729447..116646d 100644 --- a/mcp_plex/loader/AGENTS.md +++ b/mcp_plex/loader/AGENTS.md @@ -11,4 +11,19 @@ - Qdrant upserts are batched and network errors are logged so large loads can continue even when individual batches fail. - Qdrant model metadata is tracked locally to avoid relying on private client helpers. - Qdrant collection setup happens before media ingestion, and the loader streams asynchronous upsert tasks once the configurable buffer fills so fetching can continue while points are written. +- The staged loader rewrite lives under `mcp_plex/loader/pipeline/`. The concrete classes that must be wired together are: + - `IngestionStage` (`ingestion.py`) + - `EnrichmentStage` (`enrichment.py`) + - `PersistenceStage` (`persistence.py`) + - `LoaderOrchestrator` (`orchestrator.py`) +- `mcp_plex/loader/pipeline/channels.py` defines the queue type aliases and sentinel tokens (`INGEST_DONE`, `PERSIST_DONE`) shared by the stages. +## Loader CLI expectations +- `mcp_plex/loader/__init__.py` still contains the legacy `LoaderPipeline` implementation for reference. New work should instantiate the staged classes directly and coordinate them with `LoaderOrchestrator`. +- When constructing stages from the CLI: + - `IngestionStage` must receive the Plex server (or `None` for sample mode), the list of sample items, the Plex chunk size for both movies and episodes, the enrichment batch size for sample batches, the ingest queue instance, and the `INGEST_DONE` sentinel. + - `EnrichmentStage` requires a factory that returns an `httpx.AsyncClient` (or context manager), the TMDb API key (empty string when unused in sample mode), the ingest queue, the persistence queue, the shared `IMDbRetryQueue`, the enrichment batch size for movies and episodes, and the IMDb configuration derived from CLI flags. + - `PersistenceStage` expects the `AsyncQdrantClient`, collection name, dense/sparse vector names, the persistence queue, the Qdrant retry queue, the semaphore limiting concurrent upserts, the upsert buffer size, and callables for performing the upsert as well as recording progress. + - `LoaderOrchestrator` must be initialised with the three stage instances, the ingest queue, the persistence queue, and the number of persistence workers (the CLI's `max_concurrent_upserts`). +- Convert `AggregatedItem` batches into Qdrant `PointStruct` objects with `build_point` before handing them to the persistence stage's `enqueue_points` helper. +- Prefer explicit keyword arguments when threading CLI options into stage constructors so the mapping is obvious to future readers.