Skip to content

Conversation

@Teagan42
Copy link
Contributor

@Teagan42 Teagan42 commented Oct 6, 2025

What

  • add a _build_loader_orchestrator helper that wires the staged ingestion, enrichment, and persistence components
  • ensure Qdrant upserts convert aggregated items to points, reuse shared IMDb retry queues, and record progress logging

Why

  • provide a reusable constructor for the staged loader pipeline to simplify CLI integration work

Affects

  • loader module orchestrator wiring and related staging queues

Testing

  • uv run pytest tests/test_loader_logging.py::test_run_logs_upsert

Documentation

  • not needed

https://chatgpt.com/codex/tasks/task_e_68e32f8f3fd88328a4e030e88d7a9c95

@github-actions
Copy link

github-actions bot commented Oct 6, 2025

Coverage

Coverage Report
FileStmtsMissCoverMissing
mcp_plex/loader
   __init__.py78515880%182–189, 238, 324–329, 931–932, 976, 978, 1043–1044, 1055–1057, 1060, 1063–1064, 1066, 1088, 1098, 1109–1141, 1156, 1158, 1165–1179, 1182–1196, 1202, 1217–1248, 1253–1286, 1289–1293, 1298–1306, 1316, 1371–1467, 1523–1525, 1586–1588, 1946
mcp_plex/loader/pipeline
   __init__.py15753%57–62, 68
   channels.py73297%19–20
   enrichment.py3344487%78, 80, 87, 91, 134–137, 170, 191, 211, 219–221, 228–231, 234–236, 244, 301, 357, 380, 384, 386, 412, 430, 465, 471, 474–482, 508, 532, 603, 606–608
   ingestion.py81890%67, 98–108, 129, 157, 179
   orchestrator.py85594%48, 112, 141, 164–165
   persistence.py117992%109, 151–152, 162, 167, 171–173, 201
mcp_plex/server
   __init__.py6142995%43–44, 119–120, 148, 252, 256, 277–280, 297, 362, 365, 402, 420–421, 458, 1109, 1131–1137, 1173, 1191, 1196, 1214, 1338, 1375
   __main__.py440%3–8
   config.py48785%50, 52–55, 65, 76
TOTAL235827388% 

Tests Skipped Failures Errors Time
128 0 💤 0 ❌ 0 🔥 47.302s ⏱️

@Teagan42 Teagan42 merged commit 629f29e into main Oct 6, 2025
4 checks passed
@Teagan42 Teagan42 deleted the codex/add-_build_loader_orchestrator-function branch October 6, 2025 03:02
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

Comment on lines +1387 to +1455
async def _upsert_aggregated(
batch: Sequence[AggregatedItem],
) -> None:
if not batch:
return
items.extend(batch)
points = [
build_point(item, dense_model_name, sparse_model_name)
for item in batch
]
await _upsert_in_batches(
client,
collection_name,
points,
retry_queue=qdrant_retry_queue,
)

def _record_upsert(worker_id: int, batch_size: int, queue_size: int) -> None:
nonlocal upserted, upsert_start
if upserted == 0:
upsert_start = time.perf_counter()
upserted += batch_size
elapsed = time.perf_counter() - upsert_start
rate = upserted / elapsed if elapsed > 0 else 0.0
logger.info(
"Upsert worker %d processed %d items (%.2f items/sec, queue size=%d)",
worker_id,
upserted,
rate,
queue_size,
)

ingestion_stage = IngestionStage(
plex_server=plex_server,
sample_items=sample_items,
movie_batch_size=plex_chunk_size,
episode_batch_size=plex_chunk_size,
sample_batch_size=enrichment_batch_size,
output_queue=ingest_queue,
completion_sentinel=INGEST_DONE,
)

enrichment_stage = EnrichmentStage(
http_client_factory=lambda: httpx.AsyncClient(timeout=30),
tmdb_api_key=tmdb_api_key or "",
ingest_queue=ingest_queue,
persistence_queue=persistence_queue,
imdb_retry_queue=_imdb_retry_queue,
movie_batch_size=enrichment_batch_size,
episode_batch_size=enrichment_batch_size,
imdb_cache=_imdb_cache,
imdb_max_retries=_imdb_max_retries,
imdb_backoff=_imdb_backoff,
imdb_batch_limit=_imdb_batch_limit,
imdb_requests_per_window=_imdb_requests_per_window,
imdb_window_seconds=_imdb_window_seconds,
)

persistence_stage = _PersistenceStage(
client=client,
collection_name=collection_name,
dense_vector_name=dense_model_name,
sparse_vector_name=sparse_model_name,
persistence_queue=persistence_queue,
retry_queue=qdrant_retry_queue,
upsert_semaphore=upsert_capacity,
upsert_buffer_size=upsert_buffer_size,
upsert_fn=_upsert_aggregated,
on_batch_complete=_record_upsert,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat retry payloads as AggregatedItem batches

The new orchestrator wiring feeds EnrichmentStage’s aggregated items directly into PersistenceStage and converts them to Qdrant points inside _upsert_aggregated. That works for the first attempt, but when _upsert_in_batches fails it enqueues the raw PointStruct batch onto the shared retry queue. On shutdown PersistenceStage._flush_retry_queue() re-enqueues those point batches via enqueue_points, after which the persistence workers call _upsert_aggregated again. _upsert_aggregated always assumes batch contains AggregatedItem instances and passes each element to build_point, so any retried payload will raise attribute errors and the failed points can never be persisted. Any upsert failure therefore permanently breaks the retry path. The upsert helper needs to detect already-built PointStructs or reuse PersistenceStage’s native point-based enqueueing instead.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants