Refactor loader to stream Qdrant upserts #70

Teagan42 · 2025-10-04T10:03:11Z

Summary

ensure the Qdrant collection and indexes are created before ingesting media
stream media items through the loader and schedule asynchronous upserts when the buffer fills
add a CLI option and tests covering the new upsert buffer behaviour and ordering

Testing

uv run ruff check .
uv run pytest

https://chatgpt.com/codex/tasks/task_e_68e0ec5f4b34832899897c764721ef16

Copilot

Pull Request Overview

This PR refactors the Qdrant loader to use streaming architecture for improved efficiency and reliability. The changes implement asynchronous buffering to handle large media collections without memory spikes and ensure proper ordering of collection setup before ingestion.

Converts the loader from batch-processing to streaming with configurable upsert buffers
Moves Qdrant collection and index creation to occur before media processing
Adds CLI configuration for buffer size with validation and comprehensive test coverage

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
mcp_plex/loader.py	Core refactor implementing streaming iterator functions, collection setup, buffered upserts, and CLI option
tests/test_loader_logging.py	Adds validation tests for buffer size parameter and collection setup ordering
tests/test_load_from_plex.py	Updates test to use new streaming iterator function name
pyproject.toml	Version bump to 0.26.39
docker/pyproject.deps.toml	Version bump to 0.26.39
AGENTS.md	Documentation update describing the streaming improvements

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-04T10:03:50Z

mcp_plex/loader.py

+async def _iter_gather_in_batches(
    tasks: Sequence[Awaitable[T]], batch_size: int
-) -> List[T]:
-    """Gather awaitable tasks in fixed-size batches."""
+) -> AsyncIterator[T]:
+    """Yield results from awaitable tasks in fixed-size batches."""
+
+    if batch_size <= 0:
+        raise ValueError("batch_size must be positive")


The validation logic for batch_size is duplicated. Consider extracting this validation into a shared helper function since it's used in both _iter_gather_in_batches and the upsert_buffer_size validation in the run function.

Copilot · 2025-10-04T10:03:50Z

mcp_plex/loader.py

+
+    def _schedule_upsert(batch: List[models.PointStruct]) -> None:
+        logger.info(
+            "Upserting %d points into Qdrant collection %s in batches of %d",
+            len(batch),
+            collection_name,
+            _qdrant_batch_size,
+        )
+        upsert_tasks.append(
+            asyncio.create_task(
+                _upsert_in_batches(
+                    client,
+                    collection_name,
+                    batch,
+                )
+            )
+        )


The _schedule_upsert function creates unbounded async tasks without any concurrency control. For large datasets, this could lead to memory issues and resource exhaustion. Consider implementing a semaphore or task pool to limit concurrent upsert operations.

Suggested change

def _schedule_upsert(batch: List[models.PointStruct]) -> None:

logger.info(

"Upserting %d points into Qdrant collection %s in batches of %d",

len(batch),

collection_name,

_qdrant_batch_size,

)

upsert_tasks.append(

asyncio.create_task(

_upsert_in_batches(

client,

collection_name,

batch,

)

)

)

upsert_semaphore = asyncio.Semaphore(10) # Limit to 10 concurrent upserts

def _schedule_upsert(batch: List[models.PointStruct]) -> None:

async def upsert_with_semaphore():

async with upsert_semaphore:

logger.info(

"Upserting %d points into Qdrant collection %s in batches of %d",

len(batch),

collection_name,

_qdrant_batch_size,

)

await _upsert_in_batches(

client,

collection_name,

batch,

)

upsert_tasks.append(asyncio.create_task(upsert_with_semaphore()))

github-actions · 2025-10-04T10:04:35Z

Coverage Report

File	Stmts	Miss	Cover	Missing
mcp_plex
loader.py	544	32	94%	80, 83–84, 169, 198–202, 204–213, 224, 226–228, 244, 270, 315, 588, 591, 880–882, 978
server.py	478	66	86%	107, 124, 142–143, 278, 299–302, 304–306, 467, 479, 497–504, 524, 526–531, 533–538, 541, 547, 551, 553, 579, 581, 583, 585, 587, 590, 597, 603, 832, 836, 858–864, 896–897, 900, 918, 923, 941, 972–973, 990–991, 1014, 1017–1018, 1065, 1102
TOTAL	1209	98	92%

Tests	Skipped	Failures	Errors	Time
69	0 💤	0 ❌	0 🔥	35.374s ⏱️

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

@codex fix this CI failure
@codex address that feedback

chatgpt-codex-connector · 2025-10-04T10:07:32Z

mcp_plex/loader.py

+    points_buffer: List[models.PointStruct] = []
+    upsert_tasks: list[asyncio.Task[None]] = []
+
+    def _schedule_upsert(batch: List[models.PointStruct]) -> None:
+        logger.info(
+            "Upserting %d points into Qdrant collection %s in batches of %d",
+            len(batch),
+            collection_name,
+            _qdrant_batch_size,
+        )
+        upsert_tasks.append(
+            asyncio.create_task(
+                _upsert_in_batches(
+                    client,
+                    collection_name,
+                    batch,
+                )
+            )
+        )


Bound the number of parallel upsert tasks

The new streaming path fires an asyncio.create_task for every buffered batch and only awaits all of them after every item has been processed. On a large Plex library this schedules one _upsert_in_batches task per upsert_buffer_size items, so a 50 000 item load with the default buffer of 200 will launch 250 concurrent tasks all hitting Qdrant at once. The previous implementation performed these upserts sequentially. This unbounded concurrency can exhaust connections or overwhelm Qdrant, causing timeouts and partial loads. Consider serialising the upserts or using a semaphore/queue to cap the number of in‑flight tasks.

Useful? React with 👍 / 👎.

Copilot

Pull Request Overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated no new comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

feat: pipeline qdrant upserts

1e29e98

Copilot AI review requested due to automatic review settings October 4, 2025 10:03

Teagan42 added the codex label Oct 4, 2025 — with ChatGPT Codex Connector

Copilot AI reviewed Oct 4, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 4, 2025

View reviewed changes

Teagan42 added 2 commits October 4, 2025 04:11

fix(loader): bound qdrant upsert concurrency

9b09ace

fix: bound loader upsert scheduling

100856f

Copilot AI review requested due to automatic review settings October 4, 2025 10:17

Copilot AI reviewed Oct 4, 2025

View reviewed changes

Teagan42 merged commit c7d990e into main Oct 4, 2025
4 checks passed

Teagan42 deleted the codex/refactor-media-item-processing-in-qdrant branch October 4, 2025 10:18

Teagan42 restored the codex/refactor-media-item-processing-in-qdrant branch October 4, 2025 10:23

Teagan42 deleted the codex/refactor-media-item-processing-in-qdrant branch October 4, 2025 10:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor loader to stream Qdrant upserts #70

Refactor loader to stream Qdrant upserts #70

Uh oh!

Teagan42 commented Oct 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 4, 2025

Uh oh!

Copilot AI Oct 4, 2025

Uh oh!

github-actions bot commented Oct 4, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor loader to stream Qdrant upserts #70

Refactor loader to stream Qdrant upserts #70

Uh oh!

Conversation

Teagan42 commented Oct 4, 2025

Summary

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Oct 4, 2025 •

edited

Loading