fix(conversations): bounded ThreadPoolExecutor for background work by beastoin · Pull Request #4827 · BasedHardware/omi

beastoin · 2026-02-15T02:59:31Z

Summary

Replace 7+ raw threading.Thread().start() per conversation completion with ThreadPoolExecutor(max_workers=32)
Affected functions: save_structured_vector, _extract_memories, _extract_trends, _save_action_items, _update_goal_progress, conversation_created_webhook, update_personas_async, _run_auto_sync
Under sustained load, the old pattern spawned hundreds of threads per minute with no pooling or rate limiting
The bounded pool queues work when all workers are busy instead of spawning unlimited threads

Part of #4825 (Fix 2/3). Follow-up to PR #4784.

Test plan

Verify conversation processing still completes (memories extracted, trends saved, action items created, goals updated)
Verify webhook notifications still fire on conversation creation
Verify persona updates still happen after conversations
Load test: confirm thread count stays bounded under sustained conversation volume

🤖 Generated with Claude Code

…olExecutor Each conversation completion was spawning 7+ raw threading.Thread() calls (save_structured_vector, _extract_memories, _extract_trends, _save_action_items, _update_goal_progress, conversation_created_webhook, update_personas_async). No pooling, no rate limiting. Under sustained load this creates hundreds of concurrent threads, each holding full Conversation objects in memory. Replaced with a bounded ThreadPoolExecutor(max_workers=32) that queues work when all workers are busy instead of spawning unlimited threads. Found during deep memory leak audit (follow-up to PR #4784).

gemini-code-assist

Code Review

This pull request is a solid improvement for the application's stability and resource management. Replacing unbounded threading.Thread creation with a bounded ThreadPoolExecutor is the correct approach to handle background tasks under sustained load. My review focuses on ensuring this new pattern is implemented robustly, with particular attention to exception handling and resource lifecycle management to prevent silent failures and resource leaks. All original comments are valid and have been kept.

gemini-code-assist · 2026-02-15T03:01:36Z

+        if not is_reprocess:
+            _conversation_bg_executor.submit(save_structured_vector, uid, conversation)
+        _conversation_bg_executor.submit(_extract_memories, uid, conversation)
+        _conversation_bg_executor.submit(_extract_trends, uid, conversation)
+        _conversation_bg_executor.submit(_save_action_items, uid, conversation)
+        _conversation_bg_executor.submit(_update_goal_progress, uid, conversation)


By switching to ThreadPoolExecutor.submit, any exceptions that occur within the submitted tasks will be swallowed silently if the returned Future object is not checked. This is a critical issue as it can lead to silent failures of background processing, causing data inconsistencies or incomplete conversation processing without any logs.

Several of the submitted functions do not have top-level exception handling, including:

save_structured_vector

_extract_memories

_save_action_items

To fix this, you should add a try...except block within each of these functions to catch all exceptions and log them properly. For example, in save_structured_vector:

def save_structured_vector(uid: str, conversation: Conversation, update_only: bool = False): try: # ... function body ... except Exception as e: logging.error(f"Error in background task save_structured_vector for conv {conversation.id}: {e}", exc_info=True)

Please apply this pattern to all background tasks submitted to the executor that currently lack robust exception handling.

Fixed in 9f21832. Added try/except Exception with logging.exception() to all executor-submitted functions:

save_structured_vector

_extract_memories

_extract_trends

_save_action_items (refactored to _save_action_items → _save_action_items_inner)

All exceptions are now logged with full traceback and conversation ID for debugging.

gemini-code-assist · 2026-02-15T03:01:36Z


+# Bounded thread pool for post-conversation background work.
+# Prevents thread explosion under sustained load (was spawning 7+ raw threads per conversation).
+_conversation_bg_executor = ThreadPoolExecutor(max_workers=32, thread_name_prefix="conv-bg")


The ThreadPoolExecutor is created as a global instance, but there's no corresponding call to shutdown() when the application exits. This can lead to tasks in the queue being lost during application restarts or deployments, causing incomplete background work. It's important to ensure a graceful shutdown.

You can register a shutdown function using atexit to ensure the executor waits for running tasks to complete before the process exits. You will also need to add import atexit at the top of the file.

Suggested change

_conversation_bg_executor = ThreadPoolExecutor(max_workers=32, thread_name_prefix="conv-bg")

_conversation_bg_executor = ThreadPoolExecutor(max_workers=32, thread_name_prefix="conv-bg")

atexit.register(_conversation_bg_executor.shutdown, wait=True)

Fixed in 9f21832. Added import atexit and atexit.register(_conversation_bg_executor.shutdown, wait=True) immediately after executor creation.

gemini-code-assist · 2026-02-15T03:01:36Z

-        # Update persona prompts with new conversation
-        threading.Thread(target=update_personas_async, args=(uid,)).start()
+        _conversation_bg_executor.submit(conversation_created_webhook, uid, conversation)
+        _conversation_bg_executor.submit(update_personas_async, uid)


The function update_personas_async is being submitted to the thread pool, but its implementation in utils/apps.py creates and manages its own threads using threading.Thread and join(). This is an anti-pattern when using a thread pool.

A worker from _conversation_bg_executor will be blocked waiting for the new threads inside update_personas_async to complete. This negates the benefits of the thread pool for this task and can still lead to an uncontrolled number of threads if many conversations are processed concurrently.

It's recommended to refactor update_personas_async to submit its individual persona update tasks (sync_update_persona_prompt) directly to the shared _conversation_bg_executor instead of creating new threads internally.

Fixed in 9f21832. Replaced the call to update_personas_async (which spawns inner threads) with a new _update_personas_via_pool function that:

Checks rate limiting via can_update_persona(uid)

Sets update timestamp via set_persona_update_timestamp(uid)

Submits each individual persona update to the shared _conversation_bg_executor pool

This eliminates the nested thread anti-pattern — no more raw threading.Thread creation inside a pool worker.

beastoin · 2026-02-15T03:22:56Z

Chaos Engineering Test Results — Thread Explosion

Test: 50 rapid conversation completions, each spawning 7 background tasks with 2-5s sleep (simulating slow LLM/DB work).

Metric	Vulnerable (main)	Fixed (this PR)
Peak total threads	352	34
Peak background threads	350	32 (pool cap)
Thread creation pattern	50 × 7 = 350 unbounded	Queued at 32 workers

Verdict: PASS — Vulnerable explodes to 350 threads, fixed caps at 32.

Reproducer:

cd backend/testing/chaos-threadpool/
./run_chaos_test.sh

Test harness at backend/testing/chaos-threadpool/ — standalone Python, no Docker.

…read fix - Add try/except to all executor-submitted functions to prevent silent failures: save_structured_vector, _extract_memories, _extract_trends, _save_action_items - Register atexit.shutdown(wait=True) for graceful executor cleanup - Replace update_personas_async (spawns raw threads inside pool worker) with _update_personas_via_pool that submits individual persona updates to the shared executor, eliminating the nested thread anti-pattern Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Wraps asyncio.run(auto_sync_action_items_batch) with try/except to prevent silent swallowing of exceptions when submitted via _conversation_bg_executor.submit(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

12 tests covering: - Executor setup: max_workers cap, atexit registration, submit returns Future - Exception handling: logged vs swallowed, wrapped functions don't propagate - Persona pool: rate limiting, empty list, per-persona submission, fault isolation - Boundary: concurrent tasks capped at max_workers, contrast with raw threads Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin · 2026-02-21T11:14:05Z

Closing for now — will revisit and review later.

github-actions · 2026-02-21T11:14:13Z

Hey @beastoin 👋

Thank you so much for taking the time to contribute to Omi! We truly appreciate you putting in the effort to submit this pull request.

After careful review, we've decided not to merge this particular PR. Please don't take this personally — we genuinely try to merge as many contributions as possible, but sometimes we have to make tough calls based on:

Project standards — Ensuring consistency across the codebase
User needs — Making sure changes align with what our users need
Code best practices — Maintaining code quality and maintainability
Project direction — Keeping aligned with our roadmap and vision

Your contribution is still valuable to us, and we'd love to see you contribute again in the future! If you'd like feedback on how to improve this PR or want to discuss alternative approaches, please don't hesitate to reach out.

Thank you for being part of the Omi community! 💜

gemini-code-assist Bot reviewed Feb 15, 2026

View reviewed changes

Kelvin (AI Agent) and others added 3 commits February 15, 2026 04:36

Add exception handling to _run_auto_sync submitted to executor

dfdb1ce

Wraps asyncio.run(auto_sync_action_items_batch) with try/except to prevent silent swallowing of exceptions when submitted via _conversation_bg_executor.submit(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beastoin closed this Feb 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(conversations): bounded ThreadPoolExecutor for background work#4827

fix(conversations): bounded ThreadPoolExecutor for background work#4827
beastoin wants to merge 4 commits intomainfrom
fix/process-conversation-thread-pool

beastoin commented Feb 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Feb 15, 2026

Uh oh!

beastoin Feb 15, 2026

Uh oh!

gemini-code-assist Bot Feb 15, 2026

Uh oh!

beastoin Feb 15, 2026

Uh oh!

gemini-code-assist Bot Feb 15, 2026

Uh oh!

beastoin Feb 15, 2026

Uh oh!

beastoin commented Feb 15, 2026

Uh oh!

beastoin commented Feb 21, 2026

Uh oh!

github-actions Bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	_conversation_bg_executor = ThreadPoolExecutor(max_workers=32, thread_name_prefix="conv-bg")
	_conversation_bg_executor = ThreadPoolExecutor(max_workers=32, thread_name_prefix="conv-bg")
	atexit.register(_conversation_bg_executor.shutdown, wait=True)

Conversation

beastoin commented Feb 15, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

beastoin commented Feb 15, 2026

Chaos Engineering Test Results — Thread Explosion

Uh oh!

beastoin commented Feb 21, 2026

Uh oh!

github-actions Bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant