[Refactor] : autoTagSong 태그 프롬프트 외부 주입으로 변경 (#188) by GulSam00 · Pull Request #189 · GulSam00/singcode

GulSam00 · 2026-04-12T14:16:56Z

User description

📌 PR 제목

[Refactor] : autoTagSong 태그 프롬프트 외부 주입으로 변경

📌 변경 사항

getTagsForPrompt를 export하여 호출부에서 미리 한 번만 가져오도록 변경
autoTagSong이 tagsPrompt를 파라미터로 받도록 시그니처 변경
taggingSongs.ts에서 루프 진입 전 Promise.all로 태그 프롬프트를 사전 로드
모듈 레벨 캐시(cachedTagsPrompt) 제거 — 호출부가 관리

💬 추가 참고 사항

close refactor : autoTagSong 태그 프롬프트 외부 주입으로 변경 #188

PR Type

Enhancement, Refactor

Description

Refactor autoTagSong to accept tagsPrompt as parameter instead of fetching internally
Pre-fetch tag prompts in taggingSongs.ts using Promise.all before loop
Export getTagsForPrompt function for external caller management
Remove module-level cache (cachedTagsPrompt) and shift responsibility to caller
Enhance AI prompt with language detection heuristics for better tag categorization
Improve tagging strategy with explicit rules for language slot selection

Diagram Walkthrough

flowchart LR
  A["taggingSongs.ts"] -->|pre-fetch| B["getTagsForPrompt"]
  B -->|returns| C["tagsPrompt"]
  C -->|pass as param| D["autoTagSong"]
  D -->|enhanced prompt| E["OpenAI API"]
  E -->|language hints| F["Better tag selection"]

File Walkthrough

Relevant files

Refactor

taggingSongs.ts `Pre-fetch tag prompts before processing loop` packages/crawling/src/cron/taggingSongs.ts Import `getTagsForPrompt` function alongside `autoTagSong` Add `getTagsForPrompt()` to `Promise.all` batch to pre-fetch tags before loop Pass `tagsPrompt` parameter to `autoTagSong` call in loop Update comment to reflect tag prompt pre-loading	+8/-4

Enhancement

getSongTag.ts `Export getTagsForPrompt and enhance AI tagging strategy` packages/crawling/src/utils/getSongTag.ts Export `getTagsForPrompt` function for external use Remove module-level `cachedTagsPrompt` variable and caching logic Modify `autoTagSong` signature to accept `tagsPrompt` as third parameter Add language detection logic using regex for Hangul and Japanese Kana Generate `languageHints` string based on detected scripts Enhance system prompt with comprehensive language selection strategy and explicit tag slot rules Simplify internal logic by removing tag fetching from `autoTagSong`	+35/-24

GulSam00 · 2026-04-12T14:17:03Z

/describe

GulSam00 · 2026-04-12T14:17:05Z

/review

GulSam00 · 2026-04-12T14:17:06Z

/improve

vercel · 2026-04-12T14:17:07Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
singcode	Ready	Preview, Comment	Apr 12, 2026 2:17pm

qodo-code-review · 2026-04-12T14:17:12Z

Code Review by Qodo

🐞 Bugs (5) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX Issues (0)

🐞\ ≡ Correctness (3) ☼ Reliability (1) ➹ Performance (1) ⭐ New (3)

1. Ambiguous LLM JSON contract 🐞 ≡ ⭐

Description

autoTagSong() still parses the response as {"tag_ids": number[]} but the updated system prompt no
longer instructs the model to return a tag_ids field, so valid JSON responses without that key will
make the function return undefined and cause repeated per-song failures.

Code

packages/crawling/src/utils/getSongTag.ts[R60-82]

          role: 'system',
          content: `
-            You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
+            You are a music database expert specializing in global artist categorization.
+
+            [Language Selection Strategy]
+            - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
+            - If title/artist are in English, research the **artist's origin and primary market**.
+            - Priority Logic:
+              1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
+              2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
+              3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
+              4. For all other cases or truly global/mixed origins, use 103 (글로벌).
+
+            [Selection Rules]
+            - Language Slot (100-199): EXACTLY 1 tag.
+            - Genre Slot (200-299): EXACTLY 1 tag.
+            - Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

-            Guidelines:
-            1. Select at least one tag, but no more than 4.
-            2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
-            3. If it's Japanese music, ALWAYS include 101 (J-POP).
-            4. Be precise. If it's from an Anime, use 302 (애니메이션).
-            5. Return only JSON: {"tag_ids": [number, number, ...]}
+            [Contextual Hints]
+            ${languageHints}

            Allowed Tags List:
            ${tagsPrompt}

Evidence
The system prompt no longer specifies the required output shape (no mention of "tag_ids"), but the
implementation still blindly reads result.tag_ids after JSON.parse, so any JSON object without that
property breaks the contract.
packages/crawling/src/utils/getSongTag.ts[55-99]
packages/crawling/src/utils/validateSongMatch.ts[23-58]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`autoTagSong()` parses the OpenAI response as `{ tag_ids: number[] }`, but the updated prompt no longer requires the model to return a `tag_ids` field. This makes the parser contract ambiguous and can yield `undefined`/non-array values.

## Issue Context
The call uses `response_format: { type: 'json_object' }`, which enforces JSON validity but does not guarantee a particular key name or schema.

## Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[55-99]

## Suggested fix
1. Re-add an explicit output instruction in the system prompt, e.g.:
  - `Return JSON with this exact shape: {"tag_ids": [number, ...]}`
2. Add runtime validation after parsing:
  - If `result.tag_ids` is not an array of numbers, return `[]` (and optionally log the raw content for debugging).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. LLM JSON 스키마 불일치 🐞 ≡

Description

autoTagSong은 응답을 {tag_ids:number[]}로 파싱해 result.tag_ids를 그대로 반환하지만, 새 system 프롬프트에는 tag_ids 키를 반드시
포함하라는 출력 스키마 지시가 없어 json_object 모드에서 다른 키로 응답할 경우 tagIds가 undefined가 됩니다. 그 결과 taggingSongs에서
tagIds.length 접근 시 TypeError가 발생해 해당 곡 태깅이 실패합니다.

Code

packages/crawling/src/utils/getSongTag.ts[R61-82]

         content: `
-            You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
+            You are a music database expert specializing in global artist categorization.
+
+            [Language Selection Strategy]
+            - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
+            - If title/artist are in English, research the **artist's origin and primary market**.
+            - Priority Logic:
+              1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
+              2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
+              3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
+              4. For all other cases or truly global/mixed origins, use 103 (글로벌).
+
+            [Selection Rules]
+            - Language Slot (100-199): EXACTLY 1 tag.
+            - Genre Slot (200-299): EXACTLY 1 tag.
+            - Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

-            Guidelines:
-            1. Select at least one tag, but no more than 4.
-            2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
-            3. If it's Japanese music, ALWAYS include 101 (J-POP).
-            4. Be precise. If it's from an Anime, use 302 (애니메이션).
-            5. Return only JSON: {"tag_ids": [number, number, ...]}
+            [Contextual Hints]
+            ${languageHints}

           Allowed Tags List:
           ${tagsPrompt}

Evidence

getSongTag.ts의 system 프롬프트는 JSON object만 강제(response_format)하고 tag_ids 필드/형식을 명시하지 않는데, 바로 아래에서
JSON.parse 후 result.tag_ids를 신뢰하고 반환합니다. 반환값을 받은 taggingSongs.ts는 tagIds.length를 즉시 참조하므로(배열 전제)
tag_ids가 누락/오타/비배열이면 런타임 에러로 실패 카운트만 증가합니다.

packages/crawling/src/utils/getSongTag.ts[56-99]
packages/crawling/src/cron/taggingSongs.ts[27-36]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`autoTagSong()`이 LLM 응답에서 `tag_ids`를 항상 제공한다고 가정하고 `result.tag_ids`를 그대로 반환합니다. 그러나 현재 프롬프트는 `json_object`만 강제하고 `tag_ids` 필드를 명시하지 않아, 모델이 다른 키로 반환하면 `tagIds.length`에서 런타임 오류가 발생할 수 있습니다.
### Issue Context
- `response_format: { type: 'json_object' }`는 **유효한 JSON 객체**만 보장하며, 객체의 **필드명/스키마**는 보장하지 않습니다.
- 다운스트림(`taggingSongs.ts`)은 `tagIds`가 배열임을 전제로 동작합니다.
### Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[56-102]
- packages/crawling/src/cron/taggingSongs.ts[27-37]
### What to change
1) system 프롬프트에 출력 스키마를 명시적으로 복구/추가하세요. 예:
- "Return ONLY valid JSON with EXACTLY this shape: {\"tag_ids\": number[]}".
2) 파싱 후 런타임 검증을 추가하세요.
- `const parsed = JSON.parse(content)`
- `const tagIds = Array.isArray(parsed.tag_ids) ? parsed.tag_ids : []`
- 유효하지 않으면 로그를 남기고 `[]` 반환.
3) (선택) `taggingSongs.ts`에서도 방어적으로 `Array.isArray(tagIds)` 체크 후 사용하세요(이중 안전망).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. No guard for tagsPrompt 🐞 ☼ ⭐

Description

taggingSongs.ts preloads tagsPrompt once, but if getTagsForPrompt() returns an empty string (it does
on tag fetch error), the cron still loops and autoTagSong() short-circuits to [] for every song,
producing only failures and delays instead of stopping early.

Code

packages/crawling/src/cron/taggingSongs.ts[R11-16]

+// 1. 전체 곡 조회 + 이미 태그된 곡 ID + 태그 프롬프트 로드
+const [allSongs, taggedSongIds, tagsPrompt] = await Promise.all([
+  getSongsAllDB(),
+  getSongTagSongIdsDB(),
+  getTagsForPrompt(),
+]);

Evidence

The cron always enters the loop after prefetch, while getTagsForPrompt() returns '' on fetch error
and autoTagSong() immediately returns [] when tagsPrompt is falsy; therefore an empty tagsPrompt
makes the whole run deterministically fail per song.

packages/crawling/src/cron/taggingSongs.ts[11-54]
packages/crawling/src/utils/getSongTag.ts[22-45]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When `tagsPrompt` is empty (e.g., tag fetch failed), `autoTagSong()` returns `[]` for every song, but `taggingSongs.ts` still iterates through up to 5000 songs and sleeps between iterations.

## Issue Context
- `getTagsForPrompt()` returns `''` on Supabase error.
- `autoTagSong()` returns `[]` when `tagsPrompt` is falsy.

## Fix Focus Areas
- packages/crawling/src/cron/taggingSongs.ts[11-16]
- packages/crawling/src/utils/getSongTag.ts[22-45]

## Suggested fix
1. After `Promise.all(...)`, validate `tagsPrompt` and abort the script early:
  - `if (!tagsPrompt) { console.error('Failed to load tagsPrompt'); process.exit(1); }` (or throw).
2. Optionally wrap `getTagsForPrompt()` in a try/catch to ensure unexpected exceptions also produce a controlled failure mode (and a clear log message).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. LLM 출력 토큰 제한 없음 🐞 ➹

Description

OpenAI 호출에서 max_tokens 설정이 제거되어, 모델 출력이 불필요하게 길어질 경우 비용/지연이 증가하고 응답 실패 가능성이 커집니다. 특히 프롬프트에 전체 태그
목록(tagsPrompt)을 포함시키므로 출력 제한을 두는 편이 안전합니다.

Code

packages/crawling/src/utils/getSongTag.ts[R55-82]

   // 2단계: OpenAI API 호출
   const response = await client.chat.completions.create({
-      model: 'gpt-4o-mini', // 가성비가 좋은 모델 사용
+      model: 'gpt-4o-mini',
     messages: [
       {
         role: 'system',
         content: `
-            You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
+            You are a music database expert specializing in global artist categorization.
+
+            [Language Selection Strategy]
+            - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
+            - If title/artist are in English, research the **artist's origin and primary market**.
+            - Priority Logic:
+              1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
+              2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
+              3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
+              4. For all other cases or truly global/mixed origins, use 103 (글로벌).
+
+            [Selection Rules]
+            - Language Slot (100-199): EXACTLY 1 tag.
+            - Genre Slot (200-299): EXACTLY 1 tag.
+            - Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

-            Guidelines:
-            1. Select at least one tag, but no more than 4.
-            2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
-            3. If it's Japanese music, ALWAYS include 101 (J-POP).
-            4. Be precise. If it's from an Anime, use 302 (애니메이션).
-            5. Return only JSON: {"tag_ids": [number, number, ...]}
+            [Contextual Hints]
+            ${languageHints}

           Allowed Tags List:
           ${tagsPrompt}

Evidence

getTagsForPrompt는 tags 테이블 전체를 줄바꿈으로 연결해 긴 문자열을 만들고, autoTagSong은 이를 system prompt에 그대로 포함해 매 요청에
전송합니다. 그럼에도 completion.create 호출에 max_tokens가 없어 출력 상한을 코드 차원에서 제어하지 않습니다.

packages/crawling/src/utils/getSongTag.ts[22-36]
packages/crawling/src/utils/getSongTag.ts[55-93]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`autoTagSong()`의 OpenAI 호출에서 `max_tokens`(출력 토큰 상한)가 제거되어 출력 길이/비용/지연을 제어하지 못합니다.
### Issue Context
응답은 `tag_ids` 배열만 포함하는 짧은 JSON이면 충분하므로, 출력 토큰 상한을 두는 것이 안전합니다.
### Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[55-93]
### What to change
- `client.chat.completions.create({...})` 옵션에 `max_tokens`를 합리적인 값(예: 100~200)으로 재추가하고, 왜 그 값인지 주석으로 근거를 남기세요.
- (선택) 출력이 커지는 경우를 대비해 파싱/검증 실패 시 로그에 `content` 길이 등 최소한의 디버그 정보를 남기세요(민감정보 제외).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

5. Hangul regex false positives 🐞 ≡ ⭐

Description

The Hangul detection regex includes literal '|' characters inside the character class, so
titles/artists containing '|' are incorrectly treated as Hangul-present, skewing languageHints sent
to the model.

Code

packages/crawling/src/utils/getSongTag.ts[R47-48]

+    const hasHangul = /[ㄱ-ㅎ|ㅏ-ㅣ|가-힣]/.test(title + artist);
+    const hasKana = /[ぁ-んァ-ヶ]/.test(title + artist);

Evidence

Inside a character class ([...]), | is not alternation—it matches a literal pipe. The current
pattern therefore matches either Hangul ranges or the pipe character.

packages/crawling/src/utils/getSongTag.ts[46-53]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The Hangul detection regex uses `|` inside a character class, which makes `|` a literal match and can cause false positives.

## Issue Context
This signal is used to build `languageHints` and influences tag selection.

## Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[46-49]

## Suggested fix
Replace:
- `/[ㄱ-ㅎ|ㅏ-ㅣ|가-힣]/`
With:
- `/[ㄱ-ㅎㅏ-ㅣ가-힣]/`
(or an equivalent, more explicit grouping if desired).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

qodo-code-review · 2026-04-12T14:17:13Z

Code Review by Qodo

New Review Started

This review has been superseded by a new analysis

ⓘ The new review experience is currently in Beta. Learn more

qodo-code-review · 2026-04-12T14:17:15Z

Review Summary by Qodo

Refactor autoTagSong to inject tags prompt externally and improve AI categorization

✨ Enhancement

Walkthroughs

Description

• Extract getTagsForPrompt and pre-fetch tags before loop
• Pass tagsPrompt as parameter to autoTagSong function
• Remove module-level cache, delegate to caller management
• Enhance AI prompt with language detection heuristics
• Improve tag selection logic for K-Pop, J-Pop, and global artists

Diagram

flowchart LR
  A["taggingSongs.ts"] -->|"Promise.all"| B["getTagsForPrompt"]
  A -->|"getSongsAllDB"| C["allSongs"]
  A -->|"getSongTagSongIdsDB"| D["taggedSongIds"]
  B -->|"returns tagsPrompt"| E["autoTagSong"]
  C -->|"iterate songs"| E
  E -->|"enhanced AI prompt"| F["OpenAI API"]
  F -->|"tag_ids"| G["postSongTagsDB"]

File Changes

1. packages/crawling/src/cron/taggingSongs.ts Refactor +8/-4

Pre-fetch tags prompt before song processing loop

• Import getTagsForPrompt function alongside autoTagSong
• Pre-fetch tagsPrompt using Promise.all before loop iteration
• Pass tagsPrompt as third parameter to autoTagSong call
• Update comment to reflect tag prompt pre-loading

packages/crawling/src/cron/taggingSongs.ts

2. packages/crawling/src/utils/getSongTag.ts ✨ Enhancement +35/-24

Export tags prompt and enhance AI categorization logic

• Export getTagsForPrompt function for external use
• Remove module-level cachedTagsPrompt variable and caching logic
• Add tagsPrompt as required parameter to autoTagSong function
• Enhance system prompt with language detection strategy for K-Pop, J-Pop, and global artists
• Add script detection logic (Hangul and Kana) to provide contextual hints to AI
• Refine tag selection rules with explicit slot constraints (Language, Genre, Origin)
• Simplify temperature and remove max_tokens comment

packages/crawling/src/utils/getSongTag.ts

qodo-code-review · 2026-04-12T14:17:16Z

PR Description updated to latest commit (011f16d)

Copy walkthrough table to "Files Changed" Tab

qodo-code-review · 2026-04-12T14:17:17Z

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX Issues (0)

🐞\ ≡ Correctness (1) ➹ Performance (1)

1. LLM JSON 스키마 불일치 🐞 ≡

Description

autoTagSong은 응답을 {tag_ids:number[]}로 파싱해 result.tag_ids를 그대로 반환하지만, 새 system 프롬프트에는 tag_ids 키를 반드시
포함하라는 출력 스키마 지시가 없어 json_object 모드에서 다른 키로 응답할 경우 tagIds가 undefined가 됩니다. 그 결과 taggingSongs에서
tagIds.length 접근 시 TypeError가 발생해 해당 곡 태깅이 실패합니다.

Code

packages/crawling/src/utils/getSongTag.ts[R61-82]

          content: `
-            You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
+            You are a music database expert specializing in global artist categorization.
+
+            [Language Selection Strategy]
+            - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
+            - If title/artist are in English, research the **artist's origin and primary market**.
+            - Priority Logic:
+              1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
+              2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
+              3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
+              4. For all other cases or truly global/mixed origins, use 103 (글로벌).
+
+            [Selection Rules]
+            - Language Slot (100-199): EXACTLY 1 tag.
+            - Genre Slot (200-299): EXACTLY 1 tag.
+            - Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

-            Guidelines:
-            1. Select at least one tag, but no more than 4.
-            2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
-            3. If it's Japanese music, ALWAYS include 101 (J-POP).
-            4. Be precise. If it's from an Anime, use 302 (애니메이션).
-            5. Return only JSON: {"tag_ids": [number, number, ...]}
+            [Contextual Hints]
+            ${languageHints}

            Allowed Tags List:
            ${tagsPrompt}

Evidence

getSongTag.ts의 system 프롬프트는 JSON object만 강제(response_format)하고 tag_ids 필드/형식을 명시하지 않는데, 바로 아래에서
JSON.parse 후 result.tag_ids를 신뢰하고 반환합니다. 반환값을 받은 taggingSongs.ts는 tagIds.length를 즉시 참조하므로(배열 전제)
tag_ids가 누락/오타/비배열이면 런타임 에러로 실패 카운트만 증가합니다.

packages/crawling/src/utils/getSongTag.ts[56-99]
packages/crawling/src/cron/taggingSongs.ts[27-36]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`autoTagSong()`이 LLM 응답에서 `tag_ids`를 항상 제공한다고 가정하고 `result.tag_ids`를 그대로 반환합니다. 그러나 현재 프롬프트는 `json_object`만 강제하고 `tag_ids` 필드를 명시하지 않아, 모델이 다른 키로 반환하면 `tagIds.length`에서 런타임 오류가 발생할 수 있습니다.

### Issue Context
- `response_format: { type: 'json_object' }`는 **유효한 JSON 객체**만 보장하며, 객체의 **필드명/스키마**는 보장하지 않습니다.
- 다운스트림(`taggingSongs.ts`)은 `tagIds`가 배열임을 전제로 동작합니다.

### Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[56-102]
- packages/crawling/src/cron/taggingSongs.ts[27-37]

### What to change
1) system 프롬프트에 출력 스키마를 명시적으로 복구/추가하세요. 예:
- "Return ONLY valid JSON with EXACTLY this shape: {\"tag_ids\": number[]}".

2) 파싱 후 런타임 검증을 추가하세요.
- `const parsed = JSON.parse(content)`
- `const tagIds = Array.isArray(parsed.tag_ids) ? parsed.tag_ids : []`
- 유효하지 않으면 로그를 남기고 `[]` 반환.

3) (선택) `taggingSongs.ts`에서도 방어적으로 `Array.isArray(tagIds)` 체크 후 사용하세요(이중 안전망).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. LLM 출력 토큰 제한 없음 🐞 ➹

Description

OpenAI 호출에서 max_tokens 설정이 제거되어, 모델 출력이 불필요하게 길어질 경우 비용/지연이 증가하고 응답 실패 가능성이 커집니다. 특히 프롬프트에 전체 태그
목록(tagsPrompt)을 포함시키므로 출력 제한을 두는 편이 안전합니다.

Code

packages/crawling/src/utils/getSongTag.ts[R55-82]

    // 2단계: OpenAI API 호출
    const response = await client.chat.completions.create({
-      model: 'gpt-4o-mini', // 가성비가 좋은 모델 사용
+      model: 'gpt-4o-mini',
      messages: [
        {
          role: 'system',
          content: `
-            You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
+            You are a music database expert specializing in global artist categorization.
+
+            [Language Selection Strategy]
+            - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
+            - If title/artist are in English, research the **artist's origin and primary market**.
+            - Priority Logic:
+              1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
+              2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
+              3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
+              4. For all other cases or truly global/mixed origins, use 103 (글로벌).
+
+            [Selection Rules]
+            - Language Slot (100-199): EXACTLY 1 tag.
+            - Genre Slot (200-299): EXACTLY 1 tag.
+            - Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

-            Guidelines:
-            1. Select at least one tag, but no more than 4.
-            2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
-            3. If it's Japanese music, ALWAYS include 101 (J-POP).
-            4. Be precise. If it's from an Anime, use 302 (애니메이션).
-            5. Return only JSON: {"tag_ids": [number, number, ...]}
+            [Contextual Hints]
+            ${languageHints}

            Allowed Tags List:
            ${tagsPrompt}

Evidence

getTagsForPrompt는 tags 테이블 전체를 줄바꿈으로 연결해 긴 문자열을 만들고, autoTagSong은 이를 system prompt에 그대로 포함해 매 요청에
전송합니다. 그럼에도 completion.create 호출에 max_tokens가 없어 출력 상한을 코드 차원에서 제어하지 않습니다.

packages/crawling/src/utils/getSongTag.ts[22-36]
packages/crawling/src/utils/getSongTag.ts[55-93]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`autoTagSong()`의 OpenAI 호출에서 `max_tokens`(출력 토큰 상한)가 제거되어 출력 길이/비용/지연을 제어하지 못합니다.

### Issue Context
응답은 `tag_ids` 배열만 포함하는 짧은 JSON이면 충분하므로, 출력 토큰 상한을 두는 것이 안전합니다.

### Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[55-93]

### What to change
- `client.chat.completions.create({...})` 옵션에 `max_tokens`를 합리적인 값(예: 100~200)으로 재추가하고, 왜 그 값인지 주석으로 근거를 남기세요.
- (선택) 출력이 커지는 경우를 대비해 파싱/검증 실패 시 로그에 `content` 길이 등 최소한의 디버그 정보를 남기세요(민감정보 제외).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

qodo-code-review · 2026-04-12T14:19:40Z

          content: `
-            You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
+            You are a music database expert specializing in global artist categorization.
+
+            [Language Selection Strategy]
+            - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
+            - If title/artist are in English, research the **artist's origin and primary market**.
+            - Priority Logic:
+              1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
+              2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
+              3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
+              4. For all other cases or truly global/mixed origins, use 103 (글로벌).
+
+            [Selection Rules]
+            - Language Slot (100-199): EXACTLY 1 tag.
+            - Genre Slot (200-299): EXACTLY 1 tag.
+            - Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

-            Guidelines:
-            1. Select at least one tag, but no more than 4.
-            2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
-            3. If it's Japanese music, ALWAYS include 101 (J-POP).
-            4. Be precise. If it's from an Anime, use 302 (애니메이션).
-            5. Return only JSON: {"tag_ids": [number, number, ...]}
+            [Contextual Hints]
+            ${languageHints}

            Allowed Tags List:
            ${tagsPrompt}


1. Llm json 스키마 불일치 🐞 Bug ≡ Correctness

autoTagSong은 응답을 {tag_ids:number[]}로 파싱해 result.tag_ids를 그대로 반환하지만, 새 system 프롬프트에는 tag_ids 키를 반드시 포함하라는 출력 스키마 지시가 없어 json_object 모드에서 다른 키로 응답할 경우 tagIds가 undefined가 됩니다. 그 결과 taggingSongs에서 tagIds.length 접근 시 TypeError가 발생해 해당 곡 태깅이 실패합니다.

Agent Prompt

### Issue description `autoTagSong()`이 LLM 응답에서 `tag_ids`를 항상 제공한다고 가정하고 `result.tag_ids`를 그대로 반환합니다. 그러나 현재 프롬프트는 `json_object`만 강제하고 `tag_ids` 필드를 명시하지 않아, 모델이 다른 키로 반환하면 `tagIds.length`에서 런타임 오류가 발생할 수 있습니다. ### Issue Context - `response_format: { type: 'json_object' }`는 **유효한 JSON 객체**만 보장하며, 객체의 **필드명/스키마**는 보장하지 않습니다. - 다운스트림(`taggingSongs.ts`)은 `tagIds`가 배열임을 전제로 동작합니다. ### Fix Focus Areas - packages/crawling/src/utils/getSongTag.ts[56-102] - packages/crawling/src/cron/taggingSongs.ts[27-37] ### What to change 1) system 프롬프트에 출력 스키마를 명시적으로 복구/추가하세요. 예: - "Return ONLY valid JSON with EXACTLY this shape: {\"tag_ids\": number[]}". 2) 파싱 후 런타임 검증을 추가하세요. - `const parsed = JSON.parse(content)` - `const tagIds = Array.isArray(parsed.tag_ids) ? parsed.tag_ids : []` - 유효하지 않으면 로그를 남기고 `[]` 반환. 3) (선택) `taggingSongs.ts`에서도 방어적으로 `Array.isArray(tagIds)` 체크 후 사용하세요(이중 안전망).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-04-12T14:19:56Z

          role: 'system',
          content: `
-            You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
+            You are a music database expert specializing in global artist categorization.
+
+            [Language Selection Strategy]
+            - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
+            - If title/artist are in English, research the **artist's origin and primary market**.
+            - Priority Logic:
+              1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
+              2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
+              3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
+              4. For all other cases or truly global/mixed origins, use 103 (글로벌).
+
+            [Selection Rules]
+            - Language Slot (100-199): EXACTLY 1 tag.
+            - Genre Slot (200-299): EXACTLY 1 tag.
+            - Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

-            Guidelines:
-            1. Select at least one tag, but no more than 4.
-            2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
-            3. If it's Japanese music, ALWAYS include 101 (J-POP).
-            4. Be precise. If it's from an Anime, use 302 (애니메이션).
-            5. Return only JSON: {"tag_ids": [number, number, ...]}
+            [Contextual Hints]
+            ${languageHints}

            Allowed Tags List:
            ${tagsPrompt}


1. Ambiguous llm json contract 🐞 Bug ≡ Correctness

autoTagSong() still parses the response as {"tag_ids": number[]} but the updated system prompt no longer instructs the model to return a tag_ids field, so valid JSON responses without that key will make the function return undefined and cause repeated per-song failures.

Agent Prompt

## Issue description `autoTagSong()` parses the OpenAI response as `{ tag_ids: number[] }`, but the updated prompt no longer requires the model to return a `tag_ids` field. This makes the parser contract ambiguous and can yield `undefined`/non-array values. ## Issue Context The call uses `response_format: { type: 'json_object' }`, which enforces JSON validity but does not guarantee a particular key name or schema. ## Fix Focus Areas - packages/crawling/src/utils/getSongTag.ts[55-99] ## Suggested fix 1. Re-add an explicit output instruction in the system prompt, e.g.: - `Return JSON with this exact shape: {"tag_ids": [number, ...]}` 2. Add runtime validation after parsing: - If `result.tag_ids` is not an array of numbers, return `[]` (and optionally log the raw content for debugging).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

refactor : autoTagSong 태그 프롬프트 외부 주입으로 변경 (#188)

011f16d

vercel Bot deployed to Preview April 12, 2026 14:17 View deployment

GulSam00 merged commit b5b8b33 into develop Apr 12, 2026
2 checks passed

GulSam00 deleted the refactor/188-autoTagSongPreFetch branch April 12, 2026 14:18

qodo-code-review Bot reviewed Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] : autoTagSong 태그 프롬프트 외부 주입으로 변경 (#188)#189

[Refactor] : autoTagSong 태그 프롬프트 외부 주입으로 변경 (#188)#189
GulSam00 merged 1 commit intodevelopfrom
refactor/188-autoTagSongPreFetch

GulSam00 commented Apr 12, 2026 •

edited by qodo-code-review Bot

Loading

Uh oh!

GulSam00 commented Apr 12, 2026

Uh oh!

GulSam00 commented Apr 12, 2026

Uh oh!

GulSam00 commented Apr 12, 2026

Uh oh!

vercel Bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented Apr 12, 2026

Uh oh!

qodo-code-review Bot commented Apr 12, 2026

Uh oh!

qodo-code-review Bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

qodo-code-review Bot Apr 12, 2026

Uh oh!

qodo-code-review Bot Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GulSam00 commented Apr 12, 2026 • edited by qodo-code-review Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

📌 PR 제목