Skip to content

[OT-375][FEAT]: 메시지 중복 실행 최소화 #231

Merged
phonil merged 10 commits intodevelopfrom
OT-375-feature/message
Apr 25, 2026
Merged

[OT-375][FEAT]: 메시지 중복 실행 최소화 #231
phonil merged 10 commits intodevelopfrom
OT-375-feature/message

Conversation

@phonil
Copy link
Copy Markdown
Contributor

@phonil phonil commented Apr 22, 2026

📝 작업 내용

이번 PR에서 작업한 내용을 적어주세요

  • Publisher Confirm 추가
  • 비관적 락 -> CAS 체크
  • 작업 중 Heartbeat 로직 추가
  • 진행 중 작업의 지연 처리를 위한 Delay Queue 도입

📷 스크린샷

☑️ 체크 리스트

체크 리스트를 확인해주세요

  • 테스트는 잘 통과했나요?
  • 충돌을 해결했나요?
  • 이슈는 등록했나요?
  • 라벨은 등록했나요?

#️⃣ 연관된 이슈

ex) # 이슈번호

closes #227

💬 리뷰 요구사항

리뷰어가 특별히 봐주었으면 하는 부분이 있다면 작성해주세요

ex) 예외 처리를 이렇게 해도 괜찮을까요? / ~~부분 주의 깊게 봐주세요

4/22 노션에서 확인 가능합니다~

Summary by CodeRabbit

  • 새로운 기능

    • 주기적 하트비트 모니터링으로 수집 작업 상태 자동 갱신 지원
    • 재시도용 지연(Delay) 큐 기반 메시지 재발행으로 소유권 충돌 시 자동 재시도 도입
  • 개선 사항

    • 작업 오케스트레이션 제어 흐름 개선 및 지연 메시지 처리 플로우 추가
    • 메시지 발행 신뢰성 향상(퍼블리셔 확인) 및 FFmpeg 실행 대기 시간 연장
  • Chores

    • 하트비트 타임스탬프용 DB 스키마 추가

@phonil phonil self-assigned this Apr 22, 2026
@phonil phonil added the feat 새로운 기능 구현 label Apr 22, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Walkthrough

하트비트 기반 작업 점유(CAS) 및 RabbitMQ 지연 큐(retry) 메커니즘을 추가합니다. 스케줄러가 데이터베이스의 heartbeat_at을 주기적으로 갱신하고, CAS 실패 시 메시지를 지연 큐로 재발행하며 DB 스키마에 heartbeat_at 컬럼이 추가됩니다.

Changes

Cohort / File(s) Summary
RabbitMQ 지연 큐 설정
apps/transcoder/src/main/java/com/ott/transcoder/config/RabbitConsumerConfig.java, apps/transcoder/src/main/java/com/ott/transcoder/queue/rabbit/DelayQueuePublisher.java
지연 교환/큐/라우팅키 상수 및 durable 지연 DirectExchange, TTL 기반 delay 큐(DLQ로 기존 교환/라우팅 지정)와 바인딩 추가. DelayQueuePublisher는 x-delayed=true 헤더로 메시지 발행.
하트비트 관리
apps/transcoder/src/main/java/com/ott/transcoder/heartbeat/Heartbeat.java, .../HeartbeatScheduler.java, .../HeartbeatWriter.java, apps/transcoder/src/main/java/com/ott/transcoder/constant/IngestJobConstant.java
HEARTBEAT 상수 추가(인터벌/타임아웃/DELAY_TTL). ScheduledExecutorService 래퍼(Heartbeat)와 HeartbeatScheduler 컴포넌트, DB 갱신용 HeartbeatWriter(@Transactional) 추가.
오케스트레이션 및 리스너 변경
apps/transcoder/src/main/java/com/ott/transcoder/job/JobOrchestrator.java, apps/transcoder/src/main/java/com/ott/transcoder/job/IngestJobStatusManager.java, apps/transcoder/src/main/java/com/ott/transcoder/queue/MessageListener.java, apps/transcoder/src/main/java/com/ott/transcoder/queue/rabbit/RabbitTranscodeListener.java
MessageListener/RabbitTranscodeListener에 boolean delayed 인자 추가. JobOrchestrator.handle(...)에 delayed 플래그 도입, executeTranscoding() 분리, CAS 선점 실패 시 delayed=false이면 지연 큐 발행, delayed=true이면 드롭. IngestJobStatusManager에 isTerminal() 추가 및 startProcessing()를 tryPreempt CAS 호출로 변경.
영속성 및 마이그레이션
modules/domain/src/main/java/com/ott/domain/ingest_job/domain/IngestJob.java, modules/domain/src/main/java/com/ott/domain/ingest_job/repository/IngestJobRepository.java, modules/infra-db/.../V13__add_heartbeat_to_ingest_job.sql
엔티티에 heartbeatAt 필드 추가. 기존 PESSIMISTIC_READ/forUpdate 방식 제거, 네이티브 조건부 업데이트 기반 tryPreempt(jobId, heartbeatTimeoutSec) 추가 및 updateHeartbeat(jobId) 추가. DB 마이그레이션으로 heartbeat_at DATETIME NULL 추가.
퍼블리셔 및 설정 변경
apps/api-admin/src/main/java/com/ott/api_admin/publish/RabbitTranscodePublisher.java, apps/api-admin/src/main/resources/application.yml
RabbitTemplate invoke 사용 후 publisher confirm 대기(waitForConfirmsOrDie(5_000))로 발행 신뢰성 강화. spring.rabbitmq.publisher-confirm-type=simple 추가.
FFmpeg 타임아웃 조정
apps/transcoder/src/main/java/.../ProcessBuilderFfmpegExecutor.java
FFmpeg 프로세스 대기 타임아웃을 30분에서 1시간으로 연장.

Sequence Diagram(s)

sequenceDiagram
    participant RL as RabbitTranscodeListener
    participant JO as JobOrchestrator
    participant SM as IngestJobStatusManager
    participant DB as Database
    participant HS as HeartbeatScheduler
    participant HW as HeartbeatWriter
    participant DQP as DelayQueuePublisher

    RL->>JO: handle(message, delayed=false)
    alt 이미 터미널 상태
        JO->>SM: isTerminal(jobId)
        SM->>DB: select status
        DB-->>SM: SUCCESS/FAILED
        SM-->>JO: true
        JO-->>RL: ACK (skip)
    else 비터미널
        JO->>SM: startProcessing(jobId)
        SM->>DB: tryPreempt(jobId, timeout)
        alt 선점 성공 (affected=1)
            DB-->>SM: 1
            SM-->>JO: true
            JO->>HS: start(jobId)
            HS->>HS: ScheduledExecutorService 생성
            HS-->>JO: Heartbeat
            loop every HEARTBEAT_INTERVAL_SEC
                HS->>HW: updateHeartbeat(jobId)
                HW->>DB: update heartbeat_at = NOW()
            end
            JO->>JO: executeTranscoding(...)
            JO->>HS: close() (stop scheduler)
        else 선점 실패
            DB-->>SM: 0
            SM-->>JO: false
            alt delayed == false
                JO->>DQP: publishToDelay(message)
                DQP->>DB: publish to delay exchange (x-delayed=true)
                JO-->>RL: ACK (re-queued)
            else delayed == true
                JO-->>RL: ACK (drop)
            end
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • marulog
  • arlen02-01
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive FFmpeg 타임아웃 변경(30분→1시간)은 '해상도 결과' 항목으로 언급되지만, 링크된 이슈의 공식 범위와의 명확한 관계가 문서화되지 않았습니다. FFmpeg 타임아웃 변경이 OT-375의 범위에 속하는지 확인하고, 필요시 별도 이슈로 분리하거나 PR 설명에 정당성을 추가해주세요.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed PR 제목 '[OT-375][FEAT]: 메시지 중복 실행 최소화'는 주요 변경사항(중복 메시지 실행 최소화)을 명확하고 간결하게 요약하고 있으며, 전체 코드베이스 변경과 관련성이 있습니다.
Linked Issues check ✅ Passed PR은 #227의 모든 주요 목표를 충족합니다: 메시지 중복 처리 개선(CAS, 지연 큐), 메시지 유실 개선(DLQ, Publisher-Confirm), DLQ 활용(설정됨), RabbitMQ ack 방식 수정(Publisher-Confirm 추가).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch OT-375-feature/message

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
apps/transcoder/src/main/java/com/ott/transcoder/heartbeat/HeartbeatScheduler.java (1)

30-41: [P2] 스케줄러 스레드 네이밍 및 데몬 설정 권장

Executors.newSingleThreadScheduledExecutor()는 기본적으로 non-daemon 스레드를 생성하고 이름도 pool-N-thread-1 형태라 디버깅/쉬다운 시 불편합니다. 예외 상황으로 Heartbeat#close()가 호출되지 않으면 JVM 종료가 지연될 수 있고, 로그에서 어떤 job의 heartbeat 스레드인지 식별하기도 어렵습니다.

♻️ 제안
-    public Heartbeat start(Long jobId) {
-        ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor();
+    public Heartbeat start(Long jobId) {
+        ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor(r -> {
+            Thread t = new Thread(r, "heartbeat-job-" + jobId);
+            t.setDaemon(true);
+            return t;
+        });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/transcoder/src/main/java/com/ott/transcoder/heartbeat/HeartbeatScheduler.java`
around lines 30 - 41, The scheduler created in Heartbeat.start currently uses
Executors.newSingleThreadScheduledExecutor() which creates a non-daemon thread
with an opaque name; change it to create the ScheduledExecutorService with a
custom ThreadFactory that sets threads as daemon and names them with the jobId
(e.g., "heartbeat-%d-job-%d" or similar) so threads are identifiable and won't
block JVM shutdown, e.g., build the ThreadFactory inside Heartbeat.start (or a
private helper) and call
Executors.newSingleThreadScheduledExecutor(yourThreadFactory); also verify
Heartbeat.close properly shuts down the executor (shutdownNow/awaitTermination)
to avoid leaks.
modules/infra-db/src/main/resources/db/migration/V13__add_heartbeat_to_ingest_job.sql (1)

3-4: [P2] heartbeat_at 단독 인덱스 고려 (선택)

tryPreempt의 CAS 조건에서 heartbeat_at이 WHERE 절에 포함되지만, 일반적으로 PK(id)로 조회되므로 대부분의 케이스는 문제없습니다. 다만 향후 "만료된 job 스캔" 유스케이스(예: 죽은 워커 정리 배치)가 추가될 가능성이 있다면 heartbeat_at 인덱스를 추후 마이그레이션에서 추가하는 방향을 검토해 보세요. 현재 변경 범위에서는 필수 아님.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modules/infra-db/src/main/resources/db/migration/V13__add_heartbeat_to_ingest_job.sql`
around lines 3 - 4, Add a non-unique index on ingest_job.heartbeat_at to support
future "expired job scan" queries referenced by tryPreempt: create a new
migration (or extend the V13 migration) that issues an ALTER TABLE ingest_job
ADD INDEX ingest_job_heartbeat_at_idx (heartbeat_at); ensure the index name is
unique in the schema, that it handles NULL values, and run migrations to
validate there are no conflicts with existing indices.
apps/transcoder/src/main/java/com/ott/transcoder/constant/IngestJobConstant.java (1)

51-63: [P2] 타이밍 불변식(invariant)을 테스트로 보장 권장

주석에 "Delay Queue TTL > HEARTBEAT_TIMEOUT" 및 "TIMEOUT = INTERVAL * 3" 가정이 명시되어 있습니다. 누군가 상수를 개별적으로 바꾸면 전체 캐스케이드(10s → 30s → 40s)가 깨져 선점/복구 로직에 미묘한 버그가 생길 수 있습니다. 다음과 같이 단위 테스트 또는 static 초기화 블록의 assert로 불변식을 명시적으로 고정해 두면 회귀를 방지할 수 있습니다.

♻️ 제안: 불변식 assertion
     public static final class HeartbeatConstant {
         private HeartbeatConstant() {
         }
 
         public static final int HEARTBEAT_INTERVAL_SEC = 10;
         public static final int HEARTBEAT_TIMEOUT_SEC = 30;
         public static final int DELAY_QUEUE_TTL_MS = 40_000;
+
+        static {
+            if (HEARTBEAT_TIMEOUT_SEC <= HEARTBEAT_INTERVAL_SEC
+                    || DELAY_QUEUE_TTL_MS <= HEARTBEAT_TIMEOUT_SEC * 1000L) {
+                throw new IllegalStateException(
+                        "Heartbeat timing invariant broken: INTERVAL < TIMEOUT < DELAY_TTL 이어야 함");
+            }
+        }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/transcoder/src/main/java/com/ott/transcoder/constant/IngestJobConstant.java`
around lines 51 - 63, Add explicit runtime assertions in the HeartbeatConstant
class to lock the documented invariants: assert that HEARTBEAT_TIMEOUT_SEC ==
HEARTBEAT_INTERVAL_SEC * 3 and assert that DELAY_QUEUE_TTL_MS >
HEARTBEAT_TIMEOUT_SEC * 1000; put these checks in a static initialization block
inside the HeartbeatConstant nested class so they run at class load time and
will fail fast if someone changes HEARTBEAT_INTERVAL_SEC, HEARTBEAT_TIMEOUT_SEC,
or DELAY_QUEUE_TTL_MS.
apps/transcoder/src/main/java/com/ott/transcoder/queue/rabbit/DelayQueuePublisher.java (1)

25-37: [P2] "x-delayed" 헤더 키는 상수화 필수

리스너(RabbitTranscodeListener.java:30)와 퍼블리셔(DelayQueuePublisher.java:31) 양쪽에서 동일한 문자열 "x-delayed"를 사용하고 있습니다. 현재 RabbitConsumerConfig에는 이 상수가 정의되지 않아 오타로 인한 무음 실패 위험이 있습니다. 공용 상수로 추출하여 양쪽에서 참조하세요.

제안 변경

RabbitConsumerConfig.java에 추가:

public static final String HEADER_X_DELAYED = "x-delayed";

DelayQueuePublisher.java 수정:

-                    msg.getMessageProperties().setHeader("x-delayed", true);
+                    msg.getMessageProperties().setHeader(RabbitConsumerConfig.HEADER_X_DELAYED, true);

RabbitTranscodeListener.java 수정:

-            `@Header`(name = "x-delayed", required = false, defaultValue = "false") boolean delayed
+            `@Header`(name = RabbitConsumerConfig.HEADER_X_DELAYED, required = false, defaultValue = "false") boolean delayed
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/transcoder/src/main/java/com/ott/transcoder/queue/rabbit/DelayQueuePublisher.java`
around lines 25 - 37, Add a shared constant for the header key and use it in
both publisher and listener: define public static final String HEADER_X_DELAYED
= "x-delayed" in RabbitConsumerConfig, then replace the literal "x-delayed" in
DelayQueuePublisher.publishToDelay (the msg.getMessageProperties().setHeader
call) and the corresponding usage in RabbitTranscodeListener to reference
RabbitConsumerConfig.HEADER_X_DELAYED so both sides use the same constant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/transcoder/src/main/java/com/ott/transcoder/heartbeat/Heartbeat.java`:
- Around line 17-20: The current close() only calls scheduler.shutdown(), which
allows already-scheduled or in-flight updateHeartbeat tasks to run and possibly
update heartbeat_at after JobOrchestrator transitions; change close() to call
scheduler.shutdownNow() to cancel queued tasks, then call
scheduler.awaitTermination(...) with a short timeout to wait for in-flight
updateHeartbeat to finish, and handle InterruptedException by restoring the
thread interrupt; ensure references to scheduler, close(), updateHeartbeat, and
JobOrchestrator/heartbeat_at are used when updating the method.

In
`@apps/transcoder/src/main/java/com/ott/transcoder/job/IngestJobStatusManager.java`:
- Around line 48-52: The isTerminal method in IngestJobStatusManager incorrectly
treats a missing ingest job as terminal by using orElse(true); change this to
surface missing jobs instead of swallowing them: update isTerminal (or its
caller) so that when ingestJobRepository.findById(ingestJobId) returns empty it
does not return true but either throws a JobNotFoundException (or another
runtime exception) or returns an Optional/boolean that clearly indicates
"missing" so JobOrchestrator can route the message to DLQ/retry handling instead
of acknowledging; reference the isTerminal method in IngestJobStatusManager and
the consumer logic in JobOrchestrator to implement and wire this behavior.

In `@apps/transcoder/src/main/java/com/ott/transcoder/job/JobOrchestrator.java`:
- Line 18: JobOrchestrator currently depends directly on the Rabbit-specific
DelayQueuePublisher which breaks bean creation when the property
transcoder.messaging.provider != rabbit; introduce a domain port interface
(e.g., DelayMessagePublisher) and change JobOrchestrator to depend on that
interface instead of DelayQueuePublisher, then provide two conditional
implementations: the existing RabbitDelayMessagePublisher (wraps/renames
DelayQueuePublisher logic and is
`@ConditionalOnProperty`(name="transcoder.messaging.provider",
havingValue="rabbit")) and a NoOpDelayMessagePublisher (registered when the
rabbit bean is absent, e.g., `@ConditionalOnMissingBean` or `@ConditionalOnProperty`
with inverse condition) so the application context can always create
JobOrchestrator regardless of the messaging provider.

In `@apps/transcoder/src/main/java/com/ott/transcoder/queue/MessageListener.java`:
- Around line 10-15: Move the misplaced `@param` delayed from the interface-level
Javadoc into a proper method-level Javadoc for
MessageListener.listen(TranscodeMessage message, boolean delayed), add a `@param`
entry documenting the message parameter (type TranscodeMessage) and clarify what
delayed means, and then add an identical method-level Javadoc to the
implementing class RabbitTranscodeListener.listen(...) so the interface contract
is documented consistently across MessageListener.listen and
RabbitTranscodeListener.listen.

---

Nitpick comments:
In
`@apps/transcoder/src/main/java/com/ott/transcoder/constant/IngestJobConstant.java`:
- Around line 51-63: Add explicit runtime assertions in the HeartbeatConstant
class to lock the documented invariants: assert that HEARTBEAT_TIMEOUT_SEC ==
HEARTBEAT_INTERVAL_SEC * 3 and assert that DELAY_QUEUE_TTL_MS >
HEARTBEAT_TIMEOUT_SEC * 1000; put these checks in a static initialization block
inside the HeartbeatConstant nested class so they run at class load time and
will fail fast if someone changes HEARTBEAT_INTERVAL_SEC, HEARTBEAT_TIMEOUT_SEC,
or DELAY_QUEUE_TTL_MS.

In
`@apps/transcoder/src/main/java/com/ott/transcoder/heartbeat/HeartbeatScheduler.java`:
- Around line 30-41: The scheduler created in Heartbeat.start currently uses
Executors.newSingleThreadScheduledExecutor() which creates a non-daemon thread
with an opaque name; change it to create the ScheduledExecutorService with a
custom ThreadFactory that sets threads as daemon and names them with the jobId
(e.g., "heartbeat-%d-job-%d" or similar) so threads are identifiable and won't
block JVM shutdown, e.g., build the ThreadFactory inside Heartbeat.start (or a
private helper) and call
Executors.newSingleThreadScheduledExecutor(yourThreadFactory); also verify
Heartbeat.close properly shuts down the executor (shutdownNow/awaitTermination)
to avoid leaks.

In
`@apps/transcoder/src/main/java/com/ott/transcoder/queue/rabbit/DelayQueuePublisher.java`:
- Around line 25-37: Add a shared constant for the header key and use it in both
publisher and listener: define public static final String HEADER_X_DELAYED =
"x-delayed" in RabbitConsumerConfig, then replace the literal "x-delayed" in
DelayQueuePublisher.publishToDelay (the msg.getMessageProperties().setHeader
call) and the corresponding usage in RabbitTranscodeListener to reference
RabbitConsumerConfig.HEADER_X_DELAYED so both sides use the same constant.

In
`@modules/infra-db/src/main/resources/db/migration/V13__add_heartbeat_to_ingest_job.sql`:
- Around line 3-4: Add a non-unique index on ingest_job.heartbeat_at to support
future "expired job scan" queries referenced by tryPreempt: create a new
migration (or extend the V13 migration) that issues an ALTER TABLE ingest_job
ADD INDEX ingest_job_heartbeat_at_idx (heartbeat_at); ensure the index name is
unique in the schema, that it handles NULL values, and run migrations to
validate there are no conflicts with existing indices.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 90642bb5-f745-408b-8788-5c7f9c7fed98

📥 Commits

Reviewing files that changed from the base of the PR and between 623fee8 and 13ea4be.

📒 Files selected for processing (13)
  • apps/transcoder/src/main/java/com/ott/transcoder/config/RabbitConsumerConfig.java
  • apps/transcoder/src/main/java/com/ott/transcoder/constant/IngestJobConstant.java
  • apps/transcoder/src/main/java/com/ott/transcoder/heartbeat/Heartbeat.java
  • apps/transcoder/src/main/java/com/ott/transcoder/heartbeat/HeartbeatScheduler.java
  • apps/transcoder/src/main/java/com/ott/transcoder/heartbeat/HeartbeatWriter.java
  • apps/transcoder/src/main/java/com/ott/transcoder/job/IngestJobStatusManager.java
  • apps/transcoder/src/main/java/com/ott/transcoder/job/JobOrchestrator.java
  • apps/transcoder/src/main/java/com/ott/transcoder/queue/MessageListener.java
  • apps/transcoder/src/main/java/com/ott/transcoder/queue/rabbit/DelayQueuePublisher.java
  • apps/transcoder/src/main/java/com/ott/transcoder/queue/rabbit/RabbitTranscodeListener.java
  • modules/domain/src/main/java/com/ott/domain/ingest_job/domain/IngestJob.java
  • modules/domain/src/main/java/com/ott/domain/ingest_job/repository/IngestJobRepository.java
  • modules/infra-db/src/main/resources/db/migration/V13__add_heartbeat_to_ingest_job.sql

Comment thread apps/transcoder/src/main/java/com/ott/transcoder/job/IngestJobStatusManager.java Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@apps/api-admin/src/main/java/com/ott/api_admin/publish/RabbitTranscodePublisher.java`:
- Around line 18-26: The publish() currently calls rabbitTemplate.invoke and
waitForConfirmsOrDie(5_000) per message which can block
OutboxPoller.pollAndPublish() (scheduled fixedDelay=10000) for up to 50×5s;
change to a non-blocking/batched confirm strategy: either collect messages in
the poll cycle and call rabbitTemplate.invoke once to send all and then a single
operations.waitForConfirmsOrDie(...) (define and document the batch rollback
semantics in publish()/OutboxPoller), or reduce the per-message timeout to 1–2s
and switch to asynchronous confirms with correlation and Outbox state updates;
also audit DelayQueuePublisher.publishToDelay() for symmetry (it uses
convertAndSend without confirms) and align its reliability strategy with
publish() to avoid asymmetric loss. Ensure references: publish(),
OutboxPoller.pollAndPublish(), rabbitTemplate.invoke(...),
operations.waitForConfirmsOrDie(...), and DelayQueuePublisher.publishToDelay()
when implementing the chosen approach.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 588b11b6-ce8c-4f2b-8201-76e02bad79d8

📥 Commits

Reviewing files that changed from the base of the PR and between 13ea4be and e61ccd8.

📒 Files selected for processing (4)
  • apps/api-admin/src/main/java/com/ott/api_admin/publish/RabbitTranscodePublisher.java
  • apps/api-admin/src/main/resources/application.yml
  • apps/transcoder/src/main/java/com/ott/transcoder/ffmpeg/execution/processbuilder/ProcessBuilderFfmpegExecutor.java
  • apps/transcoder/src/main/java/com/ott/transcoder/job/IngestJobStatusManager.java
✅ Files skipped from review due to trivial changes (2)
  • apps/api-admin/src/main/resources/application.yml
  • apps/transcoder/src/main/java/com/ott/transcoder/ffmpeg/execution/processbuilder/ProcessBuilderFfmpegExecutor.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/transcoder/src/main/java/com/ott/transcoder/job/IngestJobStatusManager.java

Comment on lines +18 to +26
rabbitTemplate.invoke(operations -> {
operations.convertAndSend(
TranscodeConstants.EXCHANGE_NAME,
TranscodeConstants.ROUTING_KEY,
message
);
operations.waitForConfirmsOrDie(5_000);
return null;
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

P1 — 스케줄러 스레드 블로킹 위험: per-message 5초 동기 대기

publish()OutboxPoller.pollAndPublish()(@Scheduled(fixedDelay=10000), 한 사이클당 최대 50건 순차 처리)에서 호출됩니다. 메시지마다 waitForConfirmsOrDie(5_000)을 동기 대기하므로, 브로커가 일시적으로 느려지면 한 사이클이 최악의 경우 50 × 5s = 250s까지 블로킹되어 fixedDelay=10s가 사실상 무력화되고 lockAtMostFor=5m에 근접합니다. 정상 상황에서는 확인이 ms 단위라 큰 문제가 없지만, 브로커 저하 시 영향이 증폭됩니다.

다음 중 하나를 권장합니다.

  • 폴러 측에서 50건을 모두 publish 한 뒤 invoke 한 번으로 마지막에 단일 waitForConfirmsOrDie를 호출(배치 confirm). 이 경우 한 건 실패 시 일괄 처리/롤백 정책을 정의해야 합니다.
  • 또는 타임아웃을 더 짧게(예: 1~2초) 두고 correlated 비동기 confirm + Outbox 상태 업데이트로 전환.
  • 최소한 대량 적체 시 부분 폴링/short-circuit 처리를 검토.

또한 apps/transcoder/.../DelayQueuePublisher.publishToDelay()는 confirm 없이 convertAndSend만 사용 중이라(컨텍스트 스니펫 2 참고) 발행 신뢰성 정책이 publisher 간 비대칭입니다. 재시도/지연 경로의 메시지 유실이 본 PR의 “메시지 중복 실행 최소화” 의도에 영향을 준다면 동일한 publisher-confirm 패턴 적용을 함께 검토하세요.

코딩 가이드라인의 “P0/P1 checks: ... validate RabbitMQ/DB state transitions and DLQ/ack behavior correctness (avoid duplicate execution regressions)” 항목에 따라 발행 신뢰성/스케줄러 처리량 관점에서 표시했습니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/api-admin/src/main/java/com/ott/api_admin/publish/RabbitTranscodePublisher.java`
around lines 18 - 26, The publish() currently calls rabbitTemplate.invoke and
waitForConfirmsOrDie(5_000) per message which can block
OutboxPoller.pollAndPublish() (scheduled fixedDelay=10000) for up to 50×5s;
change to a non-blocking/batched confirm strategy: either collect messages in
the poll cycle and call rabbitTemplate.invoke once to send all and then a single
operations.waitForConfirmsOrDie(...) (define and document the batch rollback
semantics in publish()/OutboxPoller), or reduce the per-message timeout to 1–2s
and switch to asynchronous confirms with correlation and Outbox state updates;
also audit DelayQueuePublisher.publishToDelay() for symmetry (it uses
convertAndSend without confirms) and align its reliability strategy with
publish() to avoid asymmetric loss. Ensure references: publish(),
OutboxPoller.pollAndPublish(), rabbitTemplate.invoke(...),
operations.waitForConfirmsOrDie(...), and DelayQueuePublisher.publishToDelay()
when implementing the chosen approach.

@phonil phonil merged commit 3ae9ef1 into develop Apr 25, 2026
1 check passed
@phonil phonil deleted the OT-375-feature/message branch May 2, 2026 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat 새로운 기능 구현

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[OT-375] [FEAT]: 메시지 유실/중복 개선

1 participant