Skip to content

Conversation

@Teagan42
Copy link
Contributor

@Teagan42 Teagan42 commented Oct 4, 2025

Summary

  • ensure the Qdrant collection and indexes are created before ingesting media
  • stream media items through the loader and schedule asynchronous upserts when the buffer fills
  • add a CLI option and tests covering the new upsert buffer behaviour and ordering

Testing

  • uv run ruff check .
  • uv run pytest

https://chatgpt.com/codex/tasks/task_e_68e0ec5f4b34832899897c764721ef16

Copilot AI review requested due to automatic review settings October 4, 2025 10:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the Qdrant loader to use streaming architecture for improved efficiency and reliability. The changes implement asynchronous buffering to handle large media collections without memory spikes and ensure proper ordering of collection setup before ingestion.

  • Converts the loader from batch-processing to streaming with configurable upsert buffers
  • Moves Qdrant collection and index creation to occur before media processing
  • Adds CLI configuration for buffer size with validation and comprehensive test coverage

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
mcp_plex/loader.py Core refactor implementing streaming iterator functions, collection setup, buffered upserts, and CLI option
tests/test_loader_logging.py Adds validation tests for buffer size parameter and collection setup ordering
tests/test_load_from_plex.py Updates test to use new streaming iterator function name
pyproject.toml Version bump to 0.26.39
docker/pyproject.deps.toml Version bump to 0.26.39
AGENTS.md Documentation update describing the streaming improvements

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 96 to 102
async def _iter_gather_in_batches(
tasks: Sequence[Awaitable[T]], batch_size: int
) -> List[T]:
"""Gather awaitable tasks in fixed-size batches."""
) -> AsyncIterator[T]:
"""Yield results from awaitable tasks in fixed-size batches."""

if batch_size <= 0:
raise ValueError("batch_size must be positive")
Copy link

Copilot AI Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation logic for batch_size is duplicated. Consider extracting this validation into a shared helper function since it's used in both _iter_gather_in_batches and the upsert_buffer_size validation in the run function.

Copilot uses AI. Check for mistakes.
Comment on lines 879 to 895

def _schedule_upsert(batch: List[models.PointStruct]) -> None:
logger.info(
"Upserting %d points into Qdrant collection %s in batches of %d",
len(batch),
collection_name,
_qdrant_batch_size,
)
upsert_tasks.append(
asyncio.create_task(
_upsert_in_batches(
client,
collection_name,
batch,
)
)
)
Copy link

Copilot AI Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _schedule_upsert function creates unbounded async tasks without any concurrency control. For large datasets, this could lead to memory issues and resource exhaustion. Consider implementing a semaphore or task pool to limit concurrent upsert operations.

Suggested change
def _schedule_upsert(batch: List[models.PointStruct]) -> None:
logger.info(
"Upserting %d points into Qdrant collection %s in batches of %d",
len(batch),
collection_name,
_qdrant_batch_size,
)
upsert_tasks.append(
asyncio.create_task(
_upsert_in_batches(
client,
collection_name,
batch,
)
)
)
upsert_semaphore = asyncio.Semaphore(10) # Limit to 10 concurrent upserts
def _schedule_upsert(batch: List[models.PointStruct]) -> None:
async def upsert_with_semaphore():
async with upsert_semaphore:
logger.info(
"Upserting %d points into Qdrant collection %s in batches of %d",
len(batch),
collection_name,
_qdrant_batch_size,
)
await _upsert_in_batches(
client,
collection_name,
batch,
)
upsert_tasks.append(asyncio.create_task(upsert_with_semaphore()))

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

github-actions bot commented Oct 4, 2025

Coverage

Coverage Report
FileStmtsMissCoverMissing
mcp_plex
   loader.py5443294%80, 83–84, 169, 198–202, 204–213, 224, 226–228, 244, 270, 315, 588, 591, 880–882, 978
   server.py4786686%107, 124, 142–143, 278, 299–302, 304–306, 467, 479, 497–504, 524, 526–531, 533–538, 541, 547, 551, 553, 579, 581, 583, 585, 587, 590, 597, 603, 832, 836, 858–864, 896–897, 900, 918, 923, 941, 972–973, 990–991, 1014, 1017–1018, 1065, 1102
TOTAL12099892% 

Tests Skipped Failures Errors Time
69 0 💤 0 ❌ 0 🔥 35.374s ⏱️

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

Comment on lines 877 to 895
points_buffer: List[models.PointStruct] = []
upsert_tasks: list[asyncio.Task[None]] = []

def _schedule_upsert(batch: List[models.PointStruct]) -> None:
logger.info(
"Upserting %d points into Qdrant collection %s in batches of %d",
len(batch),
collection_name,
_qdrant_batch_size,
)
upsert_tasks.append(
asyncio.create_task(
_upsert_in_batches(
client,
collection_name,
batch,
)
)
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bound the number of parallel upsert tasks

The new streaming path fires an asyncio.create_task for every buffered batch and only awaits all of them after every item has been processed. On a large Plex library this schedules one _upsert_in_batches task per upsert_buffer_size items, so a 50 000 item load with the default buffer of 200 will launch 250 concurrent tasks all hitting Qdrant at once. The previous implementation performed these upserts sequentially. This unbounded concurrency can exhaust connections or overwhelm Qdrant, causing timeouts and partial loads. Consider serialising the upserts or using a semaphore/queue to cap the number of in‑flight tasks.

Useful? React with 👍 / 👎.

Copilot AI review requested due to automatic review settings October 4, 2025 10:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated no new comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@Teagan42 Teagan42 merged commit c7d990e into main Oct 4, 2025
4 checks passed
@Teagan42 Teagan42 deleted the codex/refactor-media-item-processing-in-qdrant branch October 4, 2025 10:18
@Teagan42 Teagan42 restored the codex/refactor-media-item-processing-in-qdrant branch October 4, 2025 10:23
@Teagan42 Teagan42 deleted the codex/refactor-media-item-processing-in-qdrant branch October 4, 2025 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants