[OT-181] [CHORE]: 모니터링 서버 설정 및 git 액션 구현#102
Conversation
Walkthrough모니터링 스택(Prometheus, Grafana)과 Grafana 대시보드·프로비저닝, Prometheus 스크레이프 설정 및 관련 의존성/엔드포인트 노출을 추가하고, EC2에 SSM으로 배포하는 수동 트리거 GitHub Actions 워크플로우를 도입합니다. (50단어 이내) Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant SSMParam as AWS SSM Parameter Store
participant AWS as AWS (EC2/SSM/STS)
participant EC2 as EC2 Instance (SSM Agent / Docker)
GH->>SSMParam: 조회(사용자/admin/transcoder 타겟, Grafana 패스워드 키)
SSMParam-->>GH: 파라미터 값
GH->>GH: prometheus.prod.yml.tpl 치환 -> prometheus.yml 생성
GH->>GH: docker-compose / grafana/provisioning / dashboards 파일들을 base64 인코딩
GH->>AWS: DescribeInstances(태그 Name, 상태=running)
AWS-->>GH: 대상 EC2 인스턴스 ID
GH->>AWS: SSM SendCommand(조합된 RunShellScript 페이로드)
AWS-->>EC2: SSM 명령 전달
EC2->>EC2: SSM 에이전트가 base64 디코드, 파일 작성, docker-compose up
EC2-->>AWS: 명령 실행 상태(인보케이션 로그)
AWS-->>GH: 명령 상태(폴링 결과: Success/Failure/Timeout)
GH-->>GH: 성공/실패 로그 출력
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (5)
apps/api-admin/src/main/java/com/ott/api_admin/config/SecurityConfig.java (1)
51-52: Prometheus 엔드포인트 공개 노출에 대한 보안 고려
/actuator/prometheus엔드포인트가 인증 없이 공개되어 모니터링 스택에서 스크래핑할 수 있게 되었습니다. 이는 의도된 설정이지만, 해당 엔드포인트가 애플리케이션 메트릭(메모리, CPU, HTTP 요청 통계 등)을 노출하므로 프로덕션 환경에서는 네트워크 수준(Security Group, VPC)에서 모니터링 서버만 접근할 수 있도록 제한하는 것을 권장합니다.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api-admin/src/main/java/com/ott/api_admin/config/SecurityConfig.java` around lines 51 - 52, The SecurityConfig currently permits unauthenticated access to "/actuator/prometheus" (see the antMatchers entries); change this by removing || changing the permitAll rule for "/actuator/prometheus" and instead enforce restricted access in configure(HttpSecurity) — e.g. require a specific authority/role (hasAuthority/hasRole) or an IP-based restriction (hasIpAddress) for the antMatcher matching "/actuator/prometheus" in the SecurityConfig class, and add a code comment instructing that in production this endpoint should further be restricted at the network level (Security Group/VPC) so only monitoring servers can scrape it.apps/monitoring/docker-compose.yml (1)
21-23: 기본 Grafana 비밀번호가 코드에 노출됨
GF_SECURITY_ADMIN_PASSWORD:-admin1234기본값이 설정되어 있습니다. 로컬 개발 환경에서는 허용될 수 있지만, 이 compose 파일을 직접 사용하는 경우 취약한 비밀번호가 적용됩니다.프로덕션 배포 시에는 워크플로우에서 SSM을 통해 비밀번호를 주입하므로 문제가 없지만, 로컬 개발 시에도
.env파일 사용을 권장하는 문서화를 고려해주세요.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitoring/docker-compose.yml` around lines 21 - 23, The compose file currently exposes a hardcoded Grafana default password via GF_SECURITY_ADMIN_PASSWORD:-admin1234; remove the insecure fallback and require the env var (use GF_SECURITY_ADMIN_PASSWORD without the ":-admin1234" default) so deployments fail fast when not provided, update any local dev guidance to recommend a .env file for GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD, and add a short note in the repo docs explaining production secrets are injected via SSM and local dev should populate .env or export the variables..github/workflows/deploy-monitoring.yml (1)
128-128: Grafana 비밀번호 마스킹 권장
GRAFANA_PASSWORD가GITHUB_ENV에 기록되면 워크플로우 로그에 노출될 수 있습니다.::add-mask::명령을 사용하여 마스킹하는 것을 권장합니다.🔒 비밀번호 마스킹 추가
if [ -z "$GRAFANA_PASSWORD" ] || [ "$GRAFANA_PASSWORD" = "None" ]; then echo "Grafana admin password is empty. Check input or SSM parameter." >&2 exit 1 fi + echo "::add-mask::$GRAFANA_PASSWORD" echo "GRAFANA_PASSWORD=$GRAFANA_PASSWORD" >> "$GITHUB_ENV"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/deploy-monitoring.yml at line 128, The workflow currently writes the plain GRAFANA_PASSWORD into GITHUB_ENV which can expose it in logs; before appending it, call the GitHub Actions mask command (::add-mask::${GRAFANA_PASSWORD}) to mask the secret in logs, then append the variable to GITHUB_ENV as before; update the step that echoes "GRAFANA_PASSWORD=$GRAFANA_PASSWORD" to first run the ::add-mask:: for the GRAFANA_PASSWORD so the secret is redacted in subsequent logs and actions.apps/monitoring/prometheus/prometheus.prod.yml (1)
1-41: 플레이스홀더가 포함된 파일이 커밋됨
prometheus.prod.yml파일에__USER_API_TARGET__등의 플레이스홀더가 그대로 남아있습니다. 워크플로우에서.tpl템플릿을 렌더링하여 이 파일을 생성하므로, 현재 상태로는:
- 템플릿 파일과 내용이 중복됩니다.
- 로컬에서
docker-compose.prod.yml을 실행하면 유효하지 않은 타겟으로 실패합니다.이 파일을
.gitignore에 추가하고 배포 시에만 생성하거나, 명확한 예시 값(예:example.com:8080)을 넣어 문서화 목적임을 표시하는 것을 고려해주세요.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitoring/prometheus/prometheus.prod.yml` around lines 1 - 41, The committed prometheus.prod.yml contains literal placeholders (__USER_API_TARGET__, __ADMIN_API_TARGET__, __TRANSCODER_TARGET__) and should not be shipped as-is; either remove this generated file from the repo and add it to .gitignore so the deployment workflow renders the .tpl at deploy time (ensure the rendering step produces the job_name entries "user-api", "admin-api", "transcoder"), or replace the placeholders with explicit example values (e.g., example.com:8080) and a comment indicating they are examples for documentation only so local docker-compose runs don’t fail; update the repository accordingly and ensure the template source remains the single source of truth.apps/monitoring/grafana/provisioning/dashboards/json/New dashboard-1772584885701.json (1)
1-1: 대시보드 파일명을 의미 있는 이름으로 변경 권장
New dashboard-1772584885701.json은 Grafana에서 자동 생성된 파일명입니다.monitoring-dashboard.json또는oplust-overview.json등으로 변경하면 유지보수성이 향상됩니다.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/monitoring/grafana/provisioning/dashboards/json/New` dashboard-1772584885701.json at line 1, Rename the auto-generated dashboard file New dashboard-1772584885701.json to a meaningful name (e.g., monitoring-dashboard.json or oplust-overview.json) and update any Grafana provisioning or dashboard-import references that point to this filename so they continue to load the dashboard correctly; commit the renamed file and remove the old filename to avoid duplicates.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/deploy-monitoring.yml:
- Around line 147-163: The deployment currently encodes and writes
dashboards.yml but never writes the actual dashboard JSON files; update the
workflow to base64-encode each dashboard JSON (e.g., add a DASHBOARD_JSON_B64
variable similar to DASH_PROVIDER_B64) and add an additional jq --arg entry
(e.g., $c10) into PARAMS that decodes and writes the JSON into
${MONITORING_ROOT}/grafana/provisioning/dashboards/json/<filename>.json (use the
same pattern as c3..c7); reference the PARAMS construction and variables
COMPOSE_B64, DASH_PROVIDER_B64 so you add the new DASHBOARD_JSON_B64 and its
corresponding echo '$DASHBOARD_JSON_B64' | base64 -d | sudo tee ... command into
the commands array.
In `@apps/api-admin/src/main/java/com/ott/api_admin/config/SecurityConfig.java`:
- Around line 55-59: SecurityConfig에 하드코딩된 잘못된 Swagger 경로들을 제거하세요:
SecurityConfig 클래스의 해당 요청 허용 목록(permitAll 설정, 예: configure(HttpSecurity) 또는 메소드
내부에서 정의된 antMatchers 배열)에서 "/back-office/swagger-ui.html",
"/back-office/swagger-ui/**", "/back-office/v3/api-docs",
"/back-office/v3/api-docs/**", "/back-office/swagger-resources/**" 항목들을 삭제하고 대신
이미 존재하는 기본 Swagger 경로("/swagger-ui/**", "/v3/api-docs/**",
"/swagger-resources/**")만 유지하도록 정리하세요.
In `@apps/monitoring/grafana/provisioning/dashboards/json/New`
dashboard-1772584885701.json:
- Around line 263-275: The panel "서비스 상태 점검 (UP)" has thresholds that assume a
continuous range (green at 0, red at 80) but the Prometheus "up" metric is
binary (0 or 1); update the thresholds object for that panel by changing the
thresholds.steps entries so green corresponds to value 1 and red to value 0
(i.e., set the green step value to 1 and the red step value to 0) within the
existing "thresholds" structure so UP=1 displays green and UP=0 displays red.
In `@apps/monitoring/prometheus/prometheus.yml`:
- Around line 13-41: Prometheus scrape targets currently use
host.docker.internal; update each static_configs.targets to use the Docker
Compose service names and ports instead (e.g., replace
"host.docker.internal:8080" in the job_name "user-api" block with
"user-api:8080", replace "host.docker.internal:8081" in "admin-api" with
"admin-api:8081", and replace "host.docker.internal:8082" in "transcoder" with
"transcoder:8082") so Prometheus discovers the services by their container
network names when running via docker-compose.
In `@docker-compose.yml`:
- Around line 156-157: The docker-compose service currently hardcodes Grafana
defaults via GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD with fallback
values; remove the insecure defaults and require explicit injection by deleting
the ":-admin" and ":-admin1234" defaults (i.e., reference GF_SECURITY_ADMIN_USER
and GF_SECURITY_ADMIN_PASSWORD without defaults) or add a startup validation
that checks the environment variables and exits if they are unset or equal to
the insecure defaults; update references to GF_SECURITY_ADMIN_USER and
GF_SECURITY_ADMIN_PASSWORD in the compose snippet and any startup/init logic to
enforce presence of strong credentials.
- Around line 164-165: The volumes section only declares mysql-data but
referenced prometheus-data and grafana-data are missing; update the
docker-compose volumes block to also declare prometheus-data and grafana-data
(alongside the existing mysql-data) so the Prometheus and Grafana services
reference named volumes rather than creating anonymous volumes.
---
Nitpick comments:
In @.github/workflows/deploy-monitoring.yml:
- Line 128: The workflow currently writes the plain GRAFANA_PASSWORD into
GITHUB_ENV which can expose it in logs; before appending it, call the GitHub
Actions mask command (::add-mask::${GRAFANA_PASSWORD}) to mask the secret in
logs, then append the variable to GITHUB_ENV as before; update the step that
echoes "GRAFANA_PASSWORD=$GRAFANA_PASSWORD" to first run the ::add-mask:: for
the GRAFANA_PASSWORD so the secret is redacted in subsequent logs and actions.
In `@apps/api-admin/src/main/java/com/ott/api_admin/config/SecurityConfig.java`:
- Around line 51-52: The SecurityConfig currently permits unauthenticated access
to "/actuator/prometheus" (see the antMatchers entries); change this by removing
|| changing the permitAll rule for "/actuator/prometheus" and instead enforce
restricted access in configure(HttpSecurity) — e.g. require a specific
authority/role (hasAuthority/hasRole) or an IP-based restriction (hasIpAddress)
for the antMatcher matching "/actuator/prometheus" in the SecurityConfig class,
and add a code comment instructing that in production this endpoint should
further be restricted at the network level (Security Group/VPC) so only
monitoring servers can scrape it.
In `@apps/monitoring/docker-compose.yml`:
- Around line 21-23: The compose file currently exposes a hardcoded Grafana
default password via GF_SECURITY_ADMIN_PASSWORD:-admin1234; remove the insecure
fallback and require the env var (use GF_SECURITY_ADMIN_PASSWORD without the
":-admin1234" default) so deployments fail fast when not provided, update any
local dev guidance to recommend a .env file for GF_SECURITY_ADMIN_USER and
GF_SECURITY_ADMIN_PASSWORD, and add a short note in the repo docs explaining
production secrets are injected via SSM and local dev should populate .env or
export the variables.
In `@apps/monitoring/grafana/provisioning/dashboards/json/New`
dashboard-1772584885701.json:
- Line 1: Rename the auto-generated dashboard file New
dashboard-1772584885701.json to a meaningful name (e.g.,
monitoring-dashboard.json or oplust-overview.json) and update any Grafana
provisioning or dashboard-import references that point to this filename so they
continue to load the dashboard correctly; commit the renamed file and remove the
old filename to avoid duplicates.
In `@apps/monitoring/prometheus/prometheus.prod.yml`:
- Around line 1-41: The committed prometheus.prod.yml contains literal
placeholders (__USER_API_TARGET__, __ADMIN_API_TARGET__, __TRANSCODER_TARGET__)
and should not be shipped as-is; either remove this generated file from the repo
and add it to .gitignore so the deployment workflow renders the .tpl at deploy
time (ensure the rendering step produces the job_name entries "user-api",
"admin-api", "transcoder"), or replace the placeholders with explicit example
values (e.g., example.com:8080) and a comment indicating they are examples for
documentation only so local docker-compose runs don’t fail; update the
repository accordingly and ensure the template source remains the single source
of truth.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a310c75d-827f-4cef-b237-4caf8734834a
📒 Files selected for processing (15)
.github/workflows/deploy-monitoring.ymlapps/api-admin/src/main/java/com/ott/api_admin/config/SecurityConfig.javaapps/api-admin/src/main/resources/application.ymlapps/api-user/src/main/java/com/ott/api_user/config/SecurityConfig.javaapps/monitoring/docker-compose.prod.ymlapps/monitoring/docker-compose.ymlapps/monitoring/grafana/provisioning/dashboards/dashboards.ymlapps/monitoring/grafana/provisioning/dashboards/json/New dashboard-1772584885701.jsonapps/monitoring/grafana/provisioning/datasources/prometheus.ymlapps/monitoring/prometheus/prometheus.prod.ymlapps/monitoring/prometheus/prometheus.prod.yml.tplapps/monitoring/prometheus/prometheus.ymlapps/transcoder/src/main/resources/application.ymldocker-compose.ymlmodules/common-web/build.gradle
…ckend into OT-181-chore/monitoring
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/api-user/src/main/resources/application.yml`:
- Line 56: Remove "prometheus" from the default management endpoints list
(change the include setting to "health,info") so metrics are not exposed by
default, and provide a separate activation path for Prometheus (e.g., a
dedicated spring profile or a management property like
management.endpoint.prometheus.enabled) so it can only be enabled via
environment/profile in production; update any deployment docs and/or add an
application-prometheus profile file to enable the prometheus entry when
explicitly activated, and verify SecurityConfig (class SecurityConfig) still
enforces authentication for endpoints when the prometheus profile/property is
not active.
ℹ️ Review info
Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 258f0700-9a97-45e6-92e6-0ef28b51117b
📒 Files selected for processing (3)
apps/api-user/src/main/resources/application.ymlapps/monitoring/docker-compose.ymldocker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (2)
- apps/monitoring/docker-compose.yml
- docker-compose.yml
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
apps/api-user/src/main/resources/application.yml (1)
56-56:⚠️ Potential issue | 🟠 Major프로덕션 기본값에서 Prometheus 엔드포인트가 과다 노출됩니다.
Line 56에서
prometheus를 기본 include에 넣으면 운영에서도 메트릭이 기본 노출됩니다. 기본값은health,info로 유지하고, Prometheus는 프로파일/환경변수로만 활성화하는 구성이 안전합니다.🔧 제안 수정안
management: endpoints: web: exposure: - include: health,info,prometheus + include: ${MANAGEMENT_ENDPOINTS_WEB_EXPOSURE_INCLUDE:health,info}As per coding guidelines, "No insecure production defaults are introduced".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api-user/src/main/resources/application.yml` at line 56, Remove "prometheus" from the default management endpoints include in application.yml so the default value remains "health,info" (i.e., change the include key that currently reads "health,info,prometheus" to only "health,info"); instead document/implement activation of the Prometheus endpoint via a profile or environment variable (e.g., a spring profile or an env flag) so prometheus is only enabled when an explicit profile/ENV is set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/deploy-monitoring.yml:
- Around line 147-164: The .env creation (PARAMS c9) currently injects
${GRAFANA_PASSWORD} via a heredoc which risks shell injection and mishandles
special chars and leaves permissive file perms; instead base64-encode the .env
content like the other assets (add a new --arg holding base64 of
"GF_SECURITY_ADMIN_USER=admin\nGF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}\n"),
decode it in c9 with echo '$ENV_B64' | base64 -d | sudo tee
${MONITORING_ROOT}/.env >/dev/null, and immediately run sudo chmod 600
${MONITORING_ROOT}/.env to restrict permissions (update PARAMS entries and
replace the heredoc c9 and add a chmod step).
---
Duplicate comments:
In `@apps/api-user/src/main/resources/application.yml`:
- Line 56: Remove "prometheus" from the default management endpoints include in
application.yml so the default value remains "health,info" (i.e., change the
include key that currently reads "health,info,prometheus" to only
"health,info"); instead document/implement activation of the Prometheus endpoint
via a profile or environment variable (e.g., a spring profile or an env flag) so
prometheus is only enabled when an explicit profile/ENV is set.
ℹ️ Review info
Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: dbb3c713-21e0-4199-a8ab-29a37200cce9
📒 Files selected for processing (6)
.github/workflows/deploy-monitoring.ymlapps/api-user/src/main/resources/application.ymlapps/monitoring/docker-compose.ymlapps/monitoring/grafana/provisioning/dashboards/json/New dashboard-1772584885701.jsonapps/monitoring/prometheus/prometheus.ymldocker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- docker-compose.yml
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docker-compose.yml (1)
140-141: 모니터링 Compose 정의가 중복되어 설정 드리프트 위험이 있습니다.동일한 스택이
apps/monitoring/docker-compose.yml에도 존재하고, 볼륨명(prometheus-datavsprometheus_data)도 달라 환경 전환 시 서로 다른 named volume이 생길 수 있습니다. 모니터링 스택은 한 파일을 단일 소스로 두고 나머지는 override만 두는 구조로 정리하는 것을 권장합니다.Also applies to: 152-153, 159-160
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.yml` around lines 140 - 141, The monitoring stack is defined in multiple Compose files causing drift and inconsistent named volumes; consolidate into a single source-of-truth (keep apps/monitoring/docker-compose.yml) and remove the duplicate service definition from this top-level docker-compose.yml, or convert the top-level file to an override that references the primary file with docker-compose -f base -f override. Also standardize the Prometheus volume name (choose either prometheus-data or prometheus_data) and update all references across files to that exact volume name, and ensure the bind-mount line (./apps/monitoring/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro) remains only in the authoritative compose file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker-compose.yml`:
- Around line 137-138: The port mappings currently expose the monitoring port to
all interfaces via the entry "ports: - \"9090:9090\""; change these Docker
Compose port mappings to bind to localhost by prefixing the host address (e.g.,
"127.0.0.1:9090:9090") so Prometheus/Grafana are not exposed externally; apply
the same localhost-binding change to the other monitoring port mapping
referenced in the diff (the second "ports: - \"9090:9090\"" occurrence).
---
Nitpick comments:
In `@docker-compose.yml`:
- Around line 140-141: The monitoring stack is defined in multiple Compose files
causing drift and inconsistent named volumes; consolidate into a single
source-of-truth (keep apps/monitoring/docker-compose.yml) and remove the
duplicate service definition from this top-level docker-compose.yml, or convert
the top-level file to an override that references the primary file with
docker-compose -f base -f override. Also standardize the Prometheus volume name
(choose either prometheus-data or prometheus_data) and update all references
across files to that exact volume name, and ensure the bind-mount line
(./apps/monitoring/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro)
remains only in the authoritative compose file.
📝 작업 내용
☑️ 체크 리스트
#️⃣ 연관된 이슈
💬 리뷰 요구사항
Summary by CodeRabbit
New Features
Bug Fixes / Improvements