fix: gevent DB thread-safety 이슈 해결 및 STAR 분석 문서 추가#82
Merged
Conversation
Document the DatabaseWrapper thread-sharing error that occurs when Celery alert worker uses gevent pool with concurrency=100. Includes root cause analysis, solution comparison, and reference materials.
Add Layer 1 root cause: opentelemetry-instrument imports ssl before Celery calls gevent.monkey.patch_all(), causing incomplete patching. Include actual container startup logs showing MonkeyPatchWarning and _after_fork_in_child AssertionError. Add cause hierarchy diagram.
Restructure from verbose reference format to storytelling flow. Remove deployment-specific instructions, reduce redundancy, curate references to official docs + 3 GitHub issues + 3 enterprise blogs.
…nalysis - Add 5 mermaid diagrams: prefork vs gevent comparison, sequence diagram for greenlet retry failure, late patching flow, cause layer diagram, quadrant chart for solution trade-offs - Make capture placeholders visible (blockquote format, not HTML comments) - Add personal Spring developer perspective on monkey-patching - Expand solution section with detailed trade-off analysis per option
Replace structured method-by-method format with storytelling approach where trade-offs emerge naturally through the elimination process.
…rs in prod Local runs celery directly (proper monkey-patch order), deployment uses opentelemetry-instrument wrapper which imports ssl before gevent patches.
OTel late patching prevents threading.local from being greenlet-local, causing stale connections to be shared across greenlets. Transfer connection ownership to current greenlet before close_old_connections() so both task code and Celery's post-task cleanup pass validation.
close_old_connections() alone fails because close() also validates thread sharing. Must reset _thread_ident to current greenlet first.
…ead-safety # Conflicts: # tasks/notification_tasks.py
…cause fix) 기존 Django private API(_thread_ident) 의존 워크어라운드를 제거하고, OTel 환경변수(OTEL_PYTHON_AUTO_INSTRUMENTATION_EXPERIMENTAL_GEVENT_PATCH)로 monkey-patch 순서를 교정하여 근본 원인을 해결한다.
- Kombu consumer receives detections.completed events (single thread) - send_notification dispatched to Celery gevent pool via .delay() - notification_tasks.py converted to @shared_task with autoretry - fcm_queue added to celery.py for FCM task routing - start_alert_worker.sh runs 2 processes (POSIX sh compatible) - GEVENT_DB_THREAD_SAFETY.md cleaned up to focus on concurrency issue
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DatabaseWrapper objects created in a thread can only be used in that same thread에러 수정notification_tasks.py에서 task 시작 시_thread_ident를 현재 greenlet ID로 리셋 후 stale 커넥션 정리docs/GEVENT_DB_THREAD_SAFETY.md)Root Cause
opentelemetry-instrument래퍼가 Celery보다 먼저ssl/urllib3를 import하여gevent.monkey.patch_all()이threading.local()을 greenlet-local로 패치하지 못함. 결과적으로 모든 greenlet이 동일한django.db.connectionsdict를 공유하게 되어 thread ID 불일치 발생.Fix
Test plan