[OT-311] [CHORE]: 그라파나 탬플릿 적용 및 필요 설정 추가 by arlen02-01 · Pull Request #171 · OpenTheTaste/backend

arlen02-01 · 2026-03-14T15:22:32Z

📝 작업 내용

이번 PR에서 작업한 내용을 적어주세요

그라파나 탬플릿 적용 및 필요 설정 추가

☑️ 체크 리스트

체크 리스트를 확인해주세요

테스트는 잘 통과했나요?
충돌을 해결했나요?
이슈는 등록했나요?
라벨은 등록했나요?

#️⃣ 연관된 이슈

ex) # 이슈번호
close #167

Summary by CodeRabbit

릴리스 노트

새 기능
- 노드 익스포터 자동 배포 및 스크레이프 대상 등록(인스턴스 IP를 SSM에 저장)
- JVM(Micrometer) 모니터링 대시보드 추가
- Loki 로그 시각화 대시보드 추가
인프라 업데이트
- Prometheus 포트 변경(호스트 9090 → 9091)
- Prometheus에 node-exporter 및 machine-node-exporter 스크랩 잡 및 application 라벨 추가
- 모니터링 워크플로우 SSM 타임아웃·폴링 연장 및 네트워크/대시보드 기본 경로 설정 추가

coderabbitai · 2026-03-14T15:22:41Z

Walkthrough

CI/CD 워크플로우에 node-exporter SSM 파라미터 저장 및 원격 배포 단계를 추가하고, Prometheus 스크랩 설정에 node-exporter 대상과 application 라벨을 추가하며 Grafana 대시보드와 Docker Compose 네트워크/포트를 업데이트했습니다. 변경 대상은 워크플로우, Prometheus, Grafana 및 compose 파일들입니다.

Changes

Cohort / File(s)	Summary
배포 워크플로우 `.github/workflows/deploy-ai.yml`, `.github/workflows/deploy-ec2-docker.yml`	대상 인스턴스 PrivateIp 조회 후 `<ip>:9100` 값을 SSM 파라미터로 저장(`NODE_EXPORTER_TARGET_SSM_PARAM` / `SSM_MACHINE_NODE_EXPORTER_TARGET_PARAM`)하고, 배포 시 기존 node-exporter 컨테이너 제거·이미지 풀·실행 단계 추가 및 보안그룹 인그레스 처리 추가. SSM 타임아웃·폴링 주기 연장.
모니터링 워크플로우 `.github/workflows/deploy-monitoring.yml`	워크플로우 입력(노드 익스포터 SSM 파라미터) 추가, SSM에서 NODE_EXPORTER_TARGET / MACHINE_NODE_EXPORTER_TARGET 조회·검증 및 GitHub 환경에 내보내기, Prometheus 템플릿에 새 타겟 치환 추가.
Prometheus 구성 `apps/monitoring/prometheus/prometheus.prod.yml`, `apps/monitoring/prometheus/prometheus.prod.yml.tpl`, `apps/monitoring/prometheus/prometheus.yml`	기존 scrape_configs에 `application` 라벨 추가. `node-exporter` 및 `machine-node-exporter` 신규 scrape job 추가(정적_targets: SSM에서 치환되는 값).
Grafana 대시보드 추가 `apps/monitoring/grafana/provisioning/dashboards/json/jvm-micrometer-4701.json`, `apps/monitoring/grafana/provisioning/dashboards/json/loki-logs-15324.json`	JVM 및 Loki 로그용 Grafana 프로비저닝 대시보드 JSON 파일 추가(템플릿 변수·다수 패널 포함).
Compose / 네트워크·포트 조정 `apps/monitoring/docker-compose.yml`, `docker-compose.yml`	`prometheus` 서비스에 `backend_shared` 네트워크 추가 및 `grafana`에 `GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH` 환경변수 추가. 로컬 Prometheus 호스트 포트 변경(9090→9091).

Sequence Diagram(s)

sequenceDiagram
    participant CI as "CI/CD 워크플로우"
    participant EC2 as "EC2 인스턴스"
    participant SSM as "AWS SSM"
    participant Prom as "Prometheus"

    CI->>EC2: 대상 인스턴스 선택 및 PrivateIpAddress 조회
    EC2-->>CI: PrivateIpAddress
    CI->>SSM: SSM에 "<private_ip>:9100" 저장 (NODE_EXPORTER_*_SSM_PARAM)
    CI->>EC2: 기존 node-exporter 컨테이너 제거, 이미지 풀, node-exporter 실행
    EC2-->>CI: node-exporter 프로세스 실행 확인
    Prom->>SSM: NODE_EXPORTER_TARGET / MACHINE_NODE_EXPORTER_TARGET 조회
    SSM-->>Prom: 타겟 주소(IP:9100) 반환
    Prom->>Prom: scrape job으로 node-exporter 메트릭 수집

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

[OT-181] [CHORE]: 모니터링 서버 설정 및 git 액션 구현 #102: 기존 모니터링 워크플로우 및 Prometheus 구성 확장과 직접적인 코드 수준 연관이 있음.

Suggested labels

deploy

Suggested reviewers

phonil
marulog

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	PR 제목이 주요 변경사항인 그라파나 템플릿 적용 및 설정 추가를 명확하게 요약하고 있습니다.
Linked Issues check	✅ Passed	PR의 변경사항들이 연결된 이슈 `#167의` 목표인 '그라파나 템플릿 적용 및 필요 설정 추가'를 충족합니다.
Out of Scope Changes check	✅ Passed	노드 익스포터 배포 단계, 프로메테우스 설정, 그라파나 대시보드 및 docker-compose 포트 변경이 포함되어 있으나, 이들은 모두 그라파나 모니터링 시스템 설정의 필수 구성 요소로 보입니다.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch OT-311-chore/grafana-template

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

arlen02-01 · 2026-03-14T15:32:11Z

@CodeRabbit 코드 리뷰 해주고 컨맨션 문제도 확인해서 각각을 코맨트 분리해서 달아줘

coderabbitai · 2026-03-14T15:32:29Z

@arlen02-01 PR #171에 대해 코드 리뷰 및 컨벤션 검사를 진행하겠습니다. 각 항목별로 분리된 댓글로 피드백을 남겨드리겠습니다.

각 리뷰 항목은 별도의 댓글로 분리하여 작성해 주세요. 코드 품질 및 컨벤션 이슈 모두 포함해 주세요.

🧠 Learnings used

Learnt from: phonil
Repo: OpenTheTaste/backend PR: 0
File: :0-0
Timestamp: 2026-02-27T04:09:55.496Z
Learning: OpenTheTaste/backend 저장소의 PR 리뷰 시, repo 내 coderabbit/coderabbit-guidelines.md를 준수하고 “하나의 리뷰 항목당 하나의 댓글” 방식으로 피드백을 남긴다.

✅ Actions performed

Full review triggered.

arlen02-01 · 2026-03-14T15:40:20Z

@CodeRabbit 리뷰 하고있니..?

coderabbitai · 2026-03-14T15:40:35Z

@arlen02-01 네, 바로 리뷰 시작하겠습니다! 잠시만 기다려 주세요. 각 항목별로 댓글을 분리하여 피드백 드리겠습니다. 🐰

🧠 Learnings used

Learnt from: phonil
Repo: OpenTheTaste/backend PR: 0
File: :0-0
Timestamp: 2026-02-27T04:09:55.496Z
Learning: OpenTheTaste/backend 저장소의 PR 리뷰 시, repo 내 coderabbit/coderabbit-guidelines.md를 준수하고 “하나의 리뷰 항목당 하나의 댓글” 방식으로 피드백을 남긴다.

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

.github/workflows/deploy-monitoring.yml (1)

110-116: 타깃 검증 실패 메시지에 누락된 키를 포함해 주세요.

Line 110-116은 유효성 검증 자체는 맞지만, 현재 에러 메시지가 포괄적이라 어떤 타깃이 비었는지 즉시 파악하기 어렵습니다. 운영 장애 대응 속도를 위해 누락 키를 함께 출력하는 쪽이 좋습니다.

🔧 제안 수정안

-          if [ -z "$USER_API_TARGET" ] || [ "$USER_API_TARGET" = "None" ] || \
-             [ -z "$ADMIN_API_TARGET" ] || [ "$ADMIN_API_TARGET" = "None" ] || \
-             [ -z "$TRANSCODER_TARGET" ] || [ "$TRANSCODER_TARGET" = "None" ] || \
-             [ -z "$NODE_EXPORTER_TARGET" ] || [ "$NODE_EXPORTER_TARGET" = "None" ] || \
-             [ -z "$MACHINE_NODE_EXPORTER_TARGET" ] || [ "$MACHINE_NODE_EXPORTER_TARGET" = "None" ]; then
-            echo "One or more scrape targets are empty. Check SSM parameter values." >&2
+          MISSING_KEYS=()
+          [ -z "$USER_API_TARGET" ] || [ "$USER_API_TARGET" = "None" ] && MISSING_KEYS+=("USER_API_TARGET")
+          [ -z "$ADMIN_API_TARGET" ] || [ "$ADMIN_API_TARGET" = "None" ] && MISSING_KEYS+=("ADMIN_API_TARGET")
+          [ -z "$TRANSCODER_TARGET" ] || [ "$TRANSCODER_TARGET" = "None" ] && MISSING_KEYS+=("TRANSCODER_TARGET")
+          [ -z "$NODE_EXPORTER_TARGET" ] || [ "$NODE_EXPORTER_TARGET" = "None" ] && MISSING_KEYS+=("NODE_EXPORTER_TARGET")
+          [ -z "$MACHINE_NODE_EXPORTER_TARGET" ] || [ "$MACHINE_NODE_EXPORTER_TARGET" = "None" ] && MISSING_KEYS+=("MACHINE_NODE_EXPORTER_TARGET")
+          if [ "${`#MISSING_KEYS`[@]}" -gt 0 ]; then
+            echo "Empty scrape targets: ${MISSING_KEYS[*]}. Check SSM parameter values." >&2
             exit 1
           fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy-monitoring.yml around lines 110 - 116, The current
validation block that checks USER_API_TARGET, ADMIN_API_TARGET,
TRANSCODER_TARGET, NODE_EXPORTER_TARGET, and MACHINE_NODE_EXPORTER_TARGET should
be changed to build a list of which variables are empty or "None" and include
those keys in the error message; locate the if-check referencing those
variables, compute a missing_targets string/array by testing each variable
(e.g., append "USER_API_TARGET" when [ -z "$USER_API_TARGET" ] || [
"$USER_API_TARGET" = "None" ]), and replace the generic echo with an echo that
prints "Missing or empty scrape targets: $missing_targets" (then exit 1) so the
failing keys are clearly reported.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/deploy-ai.yml:
- Around line 146-153: The deploy step currently swallows all errors with "||
true" when calling aws ec2 authorize-security-group-ingress (using
MONITORING_SG_ID and MACHINE_SG_ID), which hides real failures; remove the
blanket "|| true", run the aws command and capture its stderr/exit code, then
only ignore the call if the error indicates a duplicate rule (e.g. error string
or code like "InvalidPermission.Duplicate" / "InvalidPermission.Duplicate" in
stderr), otherwise print the aws error and exit non-zero so permission/SG/quota
problems fail the workflow; implement this logic around the aws ec2
authorize-security-group-ingress invocation.

In @.github/workflows/deploy-ec2-docker.yml:
- Around line 136-143: When promtail_role equals "user-api" ensure the script
fails fast if private_ip is missing or invalid instead of silently skipping SSM
update: inside the conditional around promtail_role/private_ip (the block that
currently sets NODE_EXPORTER_TARGET_SSM_PARAM) add a guard that checks
private_ip for emptiness or the literal "None" and, if invalid, emit an error
message and exit non‑zero so the workflow stops immediately rather than
proceeding without updating NODE_EXPORTER_TARGET_SSM_PARAM.

In `@apps/monitoring/docker-compose.yml`:
- Around line 53-56: The external network name is hardcoded as backend_default
(networks: backend_shared -> name: backend_default), which breaks when the
Compose project name differs; update the monitoring compose to derive the
network name from an env var instead of hardcoding (e.g., set
networks.backend_shared.name to something like
${COMPOSE_PROJECT_NAME:-backend}_default) or alternatively remove external:true
and document/expect the root docker-compose to declare an explicit shared
network; change the networks definition in apps/monitoring/docker-compose.yml to
reference the env-var-based name (or make it non-external) so Prometheus will
attach to the correct project network.

In `@apps/monitoring/grafana/provisioning/dashboards/json/loki-logs-15324.json`:
- Around line 132-205: The Loki panel queries (see the target expressions
"sum(count_over_time({namespace=\"$namespace\", pod=~\"$pod\"} |~
\"$search\"[$__interval]))" and "{namespace=\"$namespace\", pod=~\"$pod\"} |~
\"$search\"") assume stream labels "namespace" and "pod" that your
Promtail/Prometheus stack does not emit; replace those label matchers with the
actual labels your stack provides (or remove them) or alternatively enable/emit
the missing Kubernetes labels in Promtail/Prometheus so the queries match real
streams; update the two target expressions and any templating variables to use
the correct label keys so the dashboard returns non-empty results.

In `@apps/monitoring/prometheus/prometheus.prod.yml.tpl`:
- Around line 43-55: The prod Prometheus scrape jobs "node-exporter" and
"machine-node-exporter" only set the label app and lack the application label
expected by the Grafana dashboard (jvm-micrometer-4701.json) which uses
label_values(application) and selectors like {application="$application"};
update the template for those jobs (job_name "node-exporter" and
"machine-node-exporter") to include an application label alongside app (e.g.
application: "<appropriate value or template variable>") for both targets
(__NODE_EXPORTER_TARGET__ and __MACHINE_NODE_EXPORTER_TARGET__), then
regenerate/sync the resulting prometheus.prod.yml so prod matches the local
prometheus.yml label schema.

---

Nitpick comments:
In @.github/workflows/deploy-monitoring.yml:
- Around line 110-116: The current validation block that checks USER_API_TARGET,
ADMIN_API_TARGET, TRANSCODER_TARGET, NODE_EXPORTER_TARGET, and
MACHINE_NODE_EXPORTER_TARGET should be changed to build a list of which
variables are empty or "None" and include those keys in the error message;
locate the if-check referencing those variables, compute a missing_targets
string/array by testing each variable (e.g., append "USER_API_TARGET" when [ -z
"$USER_API_TARGET" ] || [ "$USER_API_TARGET" = "None" ]), and replace the
generic echo with an echo that prints "Missing or empty scrape targets:
$missing_targets" (then exit 1) so the failing keys are clearly reported.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dbea28c5-1bd9-4e63-a031-c54d5041155e

📥 Commits

Reviewing files that changed from the base of the PR and between 38399d5 and 873fbc3.

📒 Files selected for processing (14)

.github/workflows/deploy-ai.yml
.github/workflows/deploy-ec2-docker.yml
.github/workflows/deploy-monitoring.yml
apps/monitoring/docker-compose.yml
apps/monitoring/grafana/provisioning/dashboards/json/New dashboard-1772584885701.json.disabled
apps/monitoring/grafana/provisioning/dashboards/json/jvm-micrometer-4701.json
apps/monitoring/grafana/provisioning/dashboards/json/loki-logs-15324.json
apps/monitoring/grafana/provisioning/dashboards/json/node-exporter-full-1860.json
apps/monitoring/grafana/provisioning/dashboards/json/rabbitmq-overview-10991.json
apps/monitoring/grafana/provisioning/dashboards/json/spring-boot-3x-20727.json
apps/monitoring/prometheus/prometheus.prod.yml
apps/monitoring/prometheus/prometheus.prod.yml.tpl
apps/monitoring/prometheus/prometheus.yml
docker-compose.yml

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

.github/workflows/deploy-ec2-docker.yml (1)

136-147: ⚠️ Potential issue | 🟡 Minor

user-api private IP 누락 시 경고만 출력하고 있습니다.

현재 구현은 private IP가 없을 때 ::warning::만 출력하고 워크플로우를 계속 진행합니다. 그러나 deploy-monitoring.yml의 "Resolve scrape targets from SSM" 단계에서 NODE_EXPORTER_TARGET 값이 비어있거나 "None"이면 배포가 실패합니다.

이 접근 방식이 의도된 것이라면 괜찮지만, user-api 배포 시 private IP 해결 실패가 다운스트림 모니터링 배포 실패로 이어질 수 있습니다.

🔧 fail-fast 방식으로 변경하려면

             if [ "$promtail_role" = "user-api" ]; then
-              if [ -n "$private_ip" ] && [ "$private_ip" != "None" ]; then
+              if [ -z "$private_ip" ] || [ "$private_ip" = "None" ]; then
+                echo "Failed to resolve private IP for ${target_tag}; cannot update ${NODE_EXPORTER_TARGET_SSM_PARAM}" >&2
+                exit 1
+              fi
                 aws ssm put-parameter \
                   --region "$AWS_REGION" \
                   --name "$NODE_EXPORTER_TARGET_SSM_PARAM" \
                   --type String \
                   --overwrite \
                   --value "${private_ip}:9100" >/dev/null
-              else
-                echo "::warning::user-api private IP is missing. NODE_EXPORTER_TARGET SSM parameter was not updated."
-              fi
             fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy-ec2-docker.yml around lines 136 - 147, 현재
promtail_role이 "user-api"인 경우 private_ip가 비어있거나 "None"이면 단순히 경고만 출력하고 계속 진행하는데,
이로 인해 deploy-monitoring.yml의 "Resolve scrape targets from SSM" 단계에서
NODE_EXPORTER_TARGET이 비어 있어 배포가 실패할 수 있으므로 fail-fast로 변경해야 합니다; 수정 방법은 해당 조건의
else 블록에서 echo "::warning::user-api private IP is missing. NODE_EXPORTER_TARGET
SSM parameter was not updated."를 오류로 바꾸고 즉시 종료하도록 처리(예: echo "::error::..." 후
exit 1)해 NODE_EXPORTER_TARGET_SSM_PARAM이 설정되지 않은 상태로 진행되지 않게 하세요, 점검 대상 변수/심볼:
promtail_role, private_ip, NODE_EXPORTER_TARGET_SSM_PARAM 및 관련
deploy-monitoring.yml의 "Resolve scrape targets from SSM" 단계를 참고하세요.

🧹 Nitpick comments (1)

apps/monitoring/prometheus/prometheus.prod.yml.tpl (1)

46-60: node-exporter job들에 relabel_configs가 없습니다.

기존 user-api, admin-api, transcoder job들은 relabel_configs로 instance 라벨을 명시적으로 설정하고 있습니다. node-exporter와 machine-node-exporter에도 동일한 패턴을 적용하면 대시보드에서 instance 라벨 기반 필터링이 일관성 있게 동작합니다.

♻️ 일관성을 위한 relabel_configs 추가 제안

   - job_name: "node-exporter"
     static_configs:
       - targets: ["__NODE_EXPORTER_TARGET__"]
         labels:
           app: "node-exporter"
           application: "node-exporter"
           env: "prod"
+    relabel_configs:
+      - target_label: instance
+        replacement: "node-exporter"

   - job_name: "machine-node-exporter"
     static_configs:
       - targets: ["__MACHINE_NODE_EXPORTER_TARGET__"]
         labels:
           app: "machine"
           application: "machine"
           env: "prod"
+    relabel_configs:
+      - target_label: instance
+        replacement: "machine"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@apps/monitoring/prometheus/prometheus.prod.yml.tpl` around lines 46 - 60, Add
a relabel_configs block to the node-exporter and machine-node-exporter scrape
jobs so they explicitly set the instance label like the other jobs do; locate
the job entries identified by job_name: "node-exporter" and job_name:
"machine-node-exporter" in the Prometheus template and insert a relabel_configs
section that maps __address__ (or the target) to the instance label (matching
the pattern used by user-api/admin-api/transcoder jobs) so dashboards can
consistently filter by instance.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/deploy-ai.yml:
- Around line 137-144: The script currently skips updating the SSM parameter
when MACHINE_PRIVATE_IP is empty or "None" (the if block checking
MACHINE_PRIVATE_IP) without warning, which later causes deploy-monitoring.yml to
fail; modify that branch to explicitly log a clear error or warning and exit
non-zero (or fail the workflow) when MACHINE_PRIVATE_IP is missing/equals "None"
so the pipeline fails fast, or alternatively set a deliberate fallback value and
emit a warning before continuing; update the section that references
MACHINE_PRIVATE_IP and SSM_MACHINE_NODE_EXPORTER_TARGET_PARAM to implement this
explicit error handling.

---

Duplicate comments:
In @.github/workflows/deploy-ec2-docker.yml:
- Around line 136-147: 현재 promtail_role이 "user-api"인 경우 private_ip가 비어있거나
"None"이면 단순히 경고만 출력하고 계속 진행하는데, 이로 인해 deploy-monitoring.yml의 "Resolve scrape
targets from SSM" 단계에서 NODE_EXPORTER_TARGET이 비어 있어 배포가 실패할 수 있으므로 fail-fast로
변경해야 합니다; 수정 방법은 해당 조건의 else 블록에서 echo "::warning::user-api private IP is
missing. NODE_EXPORTER_TARGET SSM parameter was not updated."를 오류로 바꾸고 즉시 종료하도록
처리(예: echo "::error::..." 후 exit 1)해 NODE_EXPORTER_TARGET_SSM_PARAM이 설정되지 않은 상태로
진행되지 않게 하세요, 점검 대상 변수/심볼: promtail_role, private_ip,
NODE_EXPORTER_TARGET_SSM_PARAM 및 관련 deploy-monitoring.yml의 "Resolve scrape
targets from SSM" 단계를 참고하세요.

---

Nitpick comments:
In `@apps/monitoring/prometheus/prometheus.prod.yml.tpl`:
- Around line 46-60: Add a relabel_configs block to the node-exporter and
machine-node-exporter scrape jobs so they explicitly set the instance label like
the other jobs do; locate the job entries identified by job_name:
"node-exporter" and job_name: "machine-node-exporter" in the Prometheus template
and insert a relabel_configs section that maps __address__ (or the target) to
the instance label (matching the pattern used by user-api/admin-api/transcoder
jobs) so dashboards can consistently filter by instance.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c27546c9-6ffa-420a-b1e8-0d6aaf61202d

📥 Commits

Reviewing files that changed from the base of the PR and between 873fbc3 and cceb191.

📒 Files selected for processing (6)

.github/workflows/deploy-ai.yml
.github/workflows/deploy-ec2-docker.yml
apps/monitoring/docker-compose.yml
apps/monitoring/grafana/provisioning/dashboards/json/loki-logs-15324.json
apps/monitoring/prometheus/prometheus.prod.yml
apps/monitoring/prometheus/prometheus.prod.yml.tpl

🚧 Files skipped from review as they are similar to previous changes (3)

apps/monitoring/prometheus/prometheus.prod.yml
apps/monitoring/grafana/provisioning/dashboards/json/loki-logs-15324.json
apps/monitoring/docker-compose.yml

…ckend into OT-311-chore/grafana-template

coderabbitai

♻️ Duplicate comments (2)

.github/workflows/deploy-ai.yml (2)

137-144: ⚠️ Potential issue | 🟡 Minor

MACHINE_PRIVATE_IP 누락 시 명시적 에러 처리가 필요합니다.

MACHINE_PRIVATE_IP가 비어있거나 "None"인 경우 SSM 파라미터 업데이트가 조용히 생략됩니다. 이후 deploy-monitoring.yml에서 해당 타겟을 사용할 때 배포가 실패할 수 있습니다.

,

🔧 명시적 에러 처리 추가 제안

-          if [ -n "$MACHINE_PRIVATE_IP" ] && [ "$MACHINE_PRIVATE_IP" != "None" ]; then
+          if [ -z "$MACHINE_PRIVATE_IP" ] || [ "$MACHINE_PRIVATE_IP" = "None" ]; then
+            echo "::error::Failed to resolve private IP for ${TARGET_TAG}; cannot update ${SSM_MACHINE_NODE_EXPORTER_TARGET_PARAM}" >&2
+            exit 1
+          fi
+
             aws ssm put-parameter \
               --region "$AWS_REGION" \
               --name "$SSM_MACHINE_NODE_EXPORTER_TARGET_PARAM" \
               --type String \
               --overwrite \
               --value "${MACHINE_PRIVATE_IP}:9100" >/dev/null
-          fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy-ai.yml around lines 137 - 144, The workflow
silently skips setting SSM_MACHINE_NODE_EXPORTER_TARGET_PARAM when
MACHINE_PRIVATE_IP is empty or "None", which later breaks deploy-monitoring.yml;
add an explicit failure path: check MACHINE_PRIVATE_IP (the same variable used
in the if condition) and if it's empty or "None" print a clear error referencing
SSM_MACHINE_NODE_EXPORTER_TARGET_PARAM and exit non‑zero so the job fails early;
update the conditional block around MACHINE_PRIVATE_IP to include an else branch
that echoes the error and exits 1 (so downstream deploy-monitoring.yml won't run
with a missing target).

146-161: ⚠️ Potential issue | 🟡 Minor

보안 그룹 인그레스 에러 처리가 개선되었으나, 실패 시 워크플로우 중단 고려가 필요합니다.

InvalidPermission.Duplicate 처리가 추가되어 이전 || true 방식보다 개선되었습니다. 다만, 권한 부족이나 잘못된 SG ID 같은 실제 오류 발생 시 경고만 출력하고 계속 진행되므로, node-exporter가 Prometheus에서 접근 불가능한 상태로 배포될 수 있습니다.

,

🔧 실패 시 워크플로우 중단 제안

           if ! INGRESS_OUTPUT=$(aws ec2 authorize-security-group-ingress \
             --region "$AWS_REGION" \
             --group-id "$MACHINE_SG_ID" \
             --protocol tcp \
             --port 9100 \
             --source-group "$MONITORING_SG_ID" 2>&1); then
             if echo "$INGRESS_OUTPUT" | grep -q "InvalidPermission.Duplicate"; then
               echo "Node exporter ingress rule already exists."
             else
-              echo "::warning::Failed to open node-exporter ingress (monitoring -> ai)."
+              echo "::error::Failed to open node-exporter ingress (monitoring -> ai)."
               echo "$INGRESS_OUTPUT"
+              exit 1
             fi
           fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy-ai.yml around lines 146 - 161, The current
authorize-security-group-ingress block captures errors into INGRESS_OUTPUT and
only warns on failures other than InvalidPermission.Duplicate, but then
continues; change the error handling so that if the aws ec2
authorize-security-group-ingress command fails and the output does NOT contain
"InvalidPermission.Duplicate" (i.e., INGRESS_OUTPUT check), you print the error
and then exit non‑zero (e.g., exit 1) to fail the workflow and prevent
continuing with a potentially inaccessible node-exporter; reference the aws ec2
authorize-security-group-ingress invocation and variables MONITORING_SG_ID,
MACHINE_SG_ID, and INGRESS_OUTPUT when making this change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/workflows/deploy-ai.yml:
- Around line 137-144: The workflow silently skips setting
SSM_MACHINE_NODE_EXPORTER_TARGET_PARAM when MACHINE_PRIVATE_IP is empty or
"None", which later breaks deploy-monitoring.yml; add an explicit failure path:
check MACHINE_PRIVATE_IP (the same variable used in the if condition) and if
it's empty or "None" print a clear error referencing
SSM_MACHINE_NODE_EXPORTER_TARGET_PARAM and exit non‑zero so the job fails early;
update the conditional block around MACHINE_PRIVATE_IP to include an else branch
that echoes the error and exits 1 (so downstream deploy-monitoring.yml won't run
with a missing target).
- Around line 146-161: The current authorize-security-group-ingress block
captures errors into INGRESS_OUTPUT and only warns on failures other than
InvalidPermission.Duplicate, but then continues; change the error handling so
that if the aws ec2 authorize-security-group-ingress command fails and the
output does NOT contain "InvalidPermission.Duplicate" (i.e., INGRESS_OUTPUT
check), you print the error and then exit non‑zero (e.g., exit 1) to fail the
workflow and prevent continuing with a potentially inaccessible node-exporter;
reference the aws ec2 authorize-security-group-ingress invocation and variables
MONITORING_SG_ID, MACHINE_SG_ID, and INGRESS_OUTPUT when making this change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f725fce7-04af-4013-a1c6-b1242b48aae0

📥 Commits

Reviewing files that changed from the base of the PR and between cceb191 and 3c6e032.

📒 Files selected for processing (2)

.github/workflows/deploy-ai.yml
docker-compose.yml

🚧 Files skipped from review as they are similar to previous changes (1)

docker-compose.yml

arlen02-01 added 3 commits March 15, 2026 00:19

[CHORE]: Grafana 대시보드 템플릿 정리 및 추가

1130d83

[DEPLOY]: 모니터링 배포 워크플로우 및 설정 갱신

b7b786a

[CHORE]: 로컬 Prometheus 포트 9091로 조정

873fbc3

arlen02-01 self-assigned this Mar 14, 2026

arlen02-01 added the chore 설정 파일 등 변경 (.gitignore, .yml 등) label Mar 14, 2026

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

[CHORE]: 모니터링 배포 워크플로우 및 Loki/Prometheus 설정 정합성 개선

cceb191

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread .github/workflows/deploy-ai.yml

yubin012 approved these changes Mar 16, 2026

View reviewed changes

arlen02-01 added 2 commits March 16, 2026 10:32

[OT-311] Merge branch 'develop' of https://github.com/OpenTheTaste/ba…

1e393ce

…ckend into OT-311-chore/grafana-template

[CHORE]: AI 배포 워크플로우 SSM 타임아웃 및 폴링 시간 연장

3c6e032

coderabbitai Bot reviewed Mar 16, 2026

View reviewed changes

arlen02-01 merged commit bec88de into develop Mar 16, 2026
1 check passed

phonil deleted the OT-311-chore/grafana-template branch April 4, 2026 09:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OT-311] [CHORE]: 그라파나 탬플릿 적용 및 필요 설정 추가#171

[OT-311] [CHORE]: 그라파나 탬플릿 적용 및 필요 설정 추가#171
arlen02-01 merged 6 commits intodevelopfrom
OT-311-chore/grafana-template

arlen02-01 commented Mar 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

arlen02-01 commented Mar 14, 2026

Uh oh!

coderabbitai Bot commented Mar 14, 2026

Uh oh!

arlen02-01 commented Mar 14, 2026

Uh oh!

coderabbitai Bot commented Mar 14, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arlen02-01 commented Mar 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 작업 내용

☑️ 체크 리스트

#️⃣ 연관된 이슈

Summary by CodeRabbit

릴리스 노트

Uh oh!

coderabbitai Bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

arlen02-01 commented Mar 14, 2026

Uh oh!

coderabbitai Bot commented Mar 14, 2026

Uh oh!

arlen02-01 commented Mar 14, 2026

Uh oh!

coderabbitai Bot commented Mar 14, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arlen02-01 commented Mar 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 14, 2026 •

edited

Loading