feat(api): add sequential batch ingestion endpoint (Phase 1)#143
feat(api): add sequential batch ingestion endpoint (Phase 1)#143ishaanxgupta merged 1 commit intoXortexAI:mainfrom
Conversation
|
@cursor review |
|
@anirudhaacharyap Thanks for the PR, most of it looks good to me, just a small question how do you think we could handle the UPDATE step as memories get ingested parallelly in batch mode. |
|
Great question! The UPDATE step is exactly why I chose sequential processing for Phase 1 — it completely sidesteps the race condition since each item passes through the Judge agent one at a time, always seeing the latest state of memory. For Phase 2 (parallel mode), I think we can handle this with a per-user locking strategy: use a keyed from collections import defaultdict
_user_locks: dict[str, asyncio.Lock] = defaultdict(asyncio.Lock)
async def _process_item(item, user_id, pipeline):
async with _user_locks[user_id]:
return await pipeline.run(...)
# Phase 2 batch endpoint would then do:
results = await asyncio.gather(*[_process_item(item, user_id, pipeline) for item in req.items]) |
|
I also had another idea Queue SystemThe core challenge with parallel batch ingestion is ensuring the Judge agent always sees consistent memory state when deciding Benefits
Implementation Plan
Trade-offsI want to be transparent about the downsides:
That said, XMem already uses MongoDB and Neo4j in its stack, so adding Redis isn't a huge leap — and the Docker Compose setup could bundle it cleanly. Happy to hear your thoughts on whether this aligns with the project's direction, or if you'd prefer a lighter-weight approach for now! |
|
Hi @anirudhaacharyap great research! I am happy with the user locking strategy that solves the issue for the case where memory is getting ingested for multiple users which is great. Please let me know your thoughts! |
|
Hi @ishaanxgupta , that’s a really great suggestion , I like the idea of splitting the pipeline into stages to balance performance and consistency. The “fan-out → fan-in → fan-out” approach makes a lot of sense:
For a single user, this staged approach seems like a clean way to introduce parallelism while still keeping updates deterministic. Regarding the global queue, I agree it would still be important to coordinate ingestion requests coming from different sources (MCP, SDK, extension), especially to avoid overlapping updates for the same user. My thought would be:
Happy to take this up in a follow-up PR once this is merged. |
|
Sure thanks for validating the approach lets merge this one first. |
Description
Closes #132
This PR implements Phase 1 of the dedicated batch ingestion endpoint, allowing clients to send multiple conversation turns in a single HTTP request to significantly reduce network overhead.
As per the Phase 1 scope, this implementation focuses strictly on the API contract and sequential processing. Concurrency (
asyncio.gather), semaphores, and partial success aggregation are deferred to a subsequent PR to keep this implementation minimal and easy to review.Changes Made
BatchIngestRequestandBatchIngestResponseschemas to strictly validate batch payloads.POST /v1/memory/batch-ingestendpoint insrc/api/routes/memory.py.get_ingest_pipeline()to ensure domain extraction and storage logic remains completely uniform.tests/test_batch_ingest.pyto verify standard success workflows and schema validation.Resolves #132
Verification
batch-ingestpass.forloop with notry/exceptaggregation (strict Phase 1 alignment).