Skip to content

[Refactor] : autoTagSong 태그 프롬프트 외부 주입으로 변경 (#188)#189

Merged
GulSam00 merged 1 commit intodevelopfrom
refactor/188-autoTagSongPreFetch
Apr 12, 2026
Merged

[Refactor] : autoTagSong 태그 프롬프트 외부 주입으로 변경 (#188)#189
GulSam00 merged 1 commit intodevelopfrom
refactor/188-autoTagSongPreFetch

Conversation

@GulSam00
Copy link
Copy Markdown
Owner

@GulSam00 GulSam00 commented Apr 12, 2026

User description

📌 PR 제목

[Refactor] : autoTagSong 태그 프롬프트 외부 주입으로 변경

📌 변경 사항

  • getTagsForPrompt를 export하여 호출부에서 미리 한 번만 가져오도록 변경
  • autoTagSongtagsPrompt를 파라미터로 받도록 시그니처 변경
  • taggingSongs.ts에서 루프 진입 전 Promise.all로 태그 프롬프트를 사전 로드
  • 모듈 레벨 캐시(cachedTagsPrompt) 제거 — 호출부가 관리

💬 추가 참고 사항


PR Type

Enhancement, Refactor


Description

  • Refactor autoTagSong to accept tagsPrompt as parameter instead of fetching internally

  • Pre-fetch tag prompts in taggingSongs.ts using Promise.all before loop

  • Export getTagsForPrompt function for external caller management

  • Remove module-level cache (cachedTagsPrompt) and shift responsibility to caller

  • Enhance AI prompt with language detection heuristics for better tag categorization

  • Improve tagging strategy with explicit rules for language slot selection


Diagram Walkthrough

flowchart LR
  A["taggingSongs.ts"] -->|pre-fetch| B["getTagsForPrompt"]
  B -->|returns| C["tagsPrompt"]
  C -->|pass as param| D["autoTagSong"]
  D -->|enhanced prompt| E["OpenAI API"]
  E -->|language hints| F["Better tag selection"]
Loading

File Walkthrough

Relevant files
Refactor
taggingSongs.ts
Pre-fetch tag prompts before processing loop                         

packages/crawling/src/cron/taggingSongs.ts

  • Import getTagsForPrompt function alongside autoTagSong
  • Add getTagsForPrompt() to Promise.all batch to pre-fetch tags before
    loop
  • Pass tagsPrompt parameter to autoTagSong call in loop
  • Update comment to reflect tag prompt pre-loading
+8/-4     
Enhancement
getSongTag.ts
Export getTagsForPrompt and enhance AI tagging strategy   

packages/crawling/src/utils/getSongTag.ts

  • Export getTagsForPrompt function for external use
  • Remove module-level cachedTagsPrompt variable and caching logic
  • Modify autoTagSong signature to accept tagsPrompt as third parameter
  • Add language detection logic using regex for Hangul and Japanese Kana
  • Generate languageHints string based on detected scripts
  • Enhance system prompt with comprehensive language selection strategy
    and explicit tag slot rules
  • Simplify internal logic by removing tag fetching from autoTagSong
+35/-24 

@GulSam00
Copy link
Copy Markdown
Owner Author

/describe

@GulSam00
Copy link
Copy Markdown
Owner Author

/review

@GulSam00
Copy link
Copy Markdown
Owner Author

/improve

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
singcode Ready Ready Preview, Comment Apr 12, 2026 2:17pm

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Apr 12, 2026

Code Review by Qodo

🐞 Bugs (5)   📘 Rule violations (0)   📎 Requirement gaps (0)   🎨 UX Issues (0)
🐞\ ≡ Correctness (3) ☼ Reliability (1) ➹ Performance (1) ⭐ New (3)

Grey Divider


Action required

1. Ambiguous LLM JSON contract 🐞
Description
autoTagSong() still parses the response as {"tag_ids": number[]} but the updated system prompt no
longer instructs the model to return a tag_ids field, so valid JSON responses without that key will
make the function return undefined and cause repeated per-song failures.
Code

packages/crawling/src/utils/getSongTag.ts[R60-82]

          role: 'system',
          content: `
-            You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
+            You are a music database expert specializing in global artist categorization.
+
+            [Language Selection Strategy]
+            - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
+            - If title/artist are in English, research the **artist's origin and primary market**.
+            - Priority Logic:
+              1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
+              2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
+              3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
+              4. For all other cases or truly global/mixed origins, use 103 (글로벌).
+
+            [Selection Rules]
+            - Language Slot (100-199): EXACTLY 1 tag.
+            - Genre Slot (200-299): EXACTLY 1 tag.
+            - Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

-            Guidelines:
-            1. Select at least one tag, but no more than 4.
-            2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
-            3. If it's Japanese music, ALWAYS include 101 (J-POP).
-            4. Be precise. If it's from an Anime, use 302 (애니메이션).
-            5. Return only JSON: {"tag_ids": [number, number, ...]}
+            [Contextual Hints]
+            ${languageHints}

            Allowed Tags List:
            ${tagsPrompt}
Evidence
The system prompt no longer specifies the required output shape (no mention of "tag_ids"), but the
implementation still blindly reads result.tag_ids after JSON.parse, so any JSON object without that
property breaks the contract.

packages/crawling/src/utils/getSongTag.ts[55-99]
packages/crawling/src/utils/validateSongMatch.ts[23-58]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`autoTagSong()` parses the OpenAI response as `{ tag_ids: number[] }`, but the updated prompt no longer requires the model to return a `tag_ids` field. This makes the parser contract ambiguous and can yield `undefined`/non-array values.

## Issue Context
The call uses `response_format: { type: 'json_object' }`, which enforces JSON validity but does not guarantee a particular key name or schema.

## Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[55-99]

## Suggested fix
1. Re-add an explicit output instruction in the system prompt, e.g.:
  - `Return JSON with this exact shape: {"tag_ids": [number, ...]}`
2. Add runtime validation after parsing:
  - If `result.tag_ids` is not an array of numbers, return `[]` (and optionally log the raw content for debugging).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. LLM JSON 스키마 불일치 🐞
Description
autoTagSong은 응답을 {tag_ids:number[]}로 파싱해 result.tag_ids를 그대로 반환하지만, 새 system 프롬프트에는 tag_ids 키를 반드시
포함하라는 출력 스키마 지시가 없어 json_object 모드에서 다른 키로 응답할 경우 tagIds가 undefined가 됩니다. 그 결과 taggingSongs에서
tagIds.length 접근 시 TypeError가 발생해 해당 곡 태깅이 실패합니다.
Code

packages/crawling/src/utils/getSongTag.ts[R61-82]

         content: `
-            You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
+            You are a music database expert specializing in global artist categorization.
+
+            [Language Selection Strategy]
+            - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
+            - If title/artist are in English, research the **artist's origin and primary market**.
+            - Priority Logic:
+              1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
+              2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
+              3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
+              4. For all other cases or truly global/mixed origins, use 103 (글로벌).
+
+            [Selection Rules]
+            - Language Slot (100-199): EXACTLY 1 tag.
+            - Genre Slot (200-299): EXACTLY 1 tag.
+            - Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

-            Guidelines:
-            1. Select at least one tag, but no more than 4.
-            2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
-            3. If it's Japanese music, ALWAYS include 101 (J-POP).
-            4. Be precise. If it's from an Anime, use 302 (애니메이션).
-            5. Return only JSON: {"tag_ids": [number, number, ...]}
+            [Contextual Hints]
+            ${languageHints}

           Allowed Tags List:
           ${tagsPrompt}
Evidence
getSongTag.ts의 system 프롬프트는 JSON object만 강제(response_format)하고 tag_ids 필드/형식을 명시하지 않는데, 바로 아래에서
JSON.parse 후 result.tag_ids를 신뢰하고 반환합니다. 반환값을 받은 taggingSongs.ts는 tagIds.length를 즉시 참조하므로(배열 전제)
tag_ids가 누락/오타/비배열이면 런타임 에러로 실패 카운트만 증가합니다.

packages/crawling/src/utils/getSongTag.ts[56-99]
packages/crawling/src/cron/taggingSongs.ts[27-36]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`autoTagSong()`이 LLM 응답에서 `tag_ids`를 항상 제공한다고 가정하고 `result.tag_ids`를 그대로 반환합니다. 그러나 현재 프롬프트는 `json_object`만 강제하고 `tag_ids` 필드를 명시하지 않아, 모델이 다른 키로 반환하면 `tagIds.length`에서 런타임 오류가 발생할 수 있습니다.
### Issue Context
- `response_format: { type: 'json_object' }`는 **유효한 JSON 객체**만 보장하며, 객체의 **필드명/스키마**는 보장하지 않습니다.
- 다운스트림(`taggingSongs.ts`)은 `tagIds`가 배열임을 전제로 동작합니다.
### Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[56-102]
- packages/crawling/src/cron/taggingSongs.ts[27-37]
### What to change
1) system 프롬프트에 출력 스키마를 명시적으로 복구/추가하세요. 예:
- "Return ONLY valid JSON with EXACTLY this shape: {\"tag_ids\": number[]}".
2) 파싱 후 런타임 검증을 추가하세요.
- `const parsed = JSON.parse(content)`
- `const tagIds = Array.isArray(parsed.tag_ids) ? parsed.tag_ids : []`
- 유효하지 않으면 로그를 남기고 `[]` 반환.
3) (선택) `taggingSongs.ts`에서도 방어적으로 `Array.isArray(tagIds)` 체크 후 사용하세요(이중 안전망).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

3. No guard for tagsPrompt 🐞
Description
taggingSongs.ts preloads tagsPrompt once, but if getTagsForPrompt() returns an empty string (it does
on tag fetch error), the cron still loops and autoTagSong() short-circuits to [] for every song,
producing only failures and delays instead of stopping early.
Code

packages/crawling/src/cron/taggingSongs.ts[R11-16]

+// 1. 전체 곡 조회 + 이미 태그된 곡 ID + 태그 프롬프트 로드
+const [allSongs, taggedSongIds, tagsPrompt] = await Promise.all([
+  getSongsAllDB(),
+  getSongTagSongIdsDB(),
+  getTagsForPrompt(),
+]);
Evidence
The cron always enters the loop after prefetch, while getTagsForPrompt() returns '' on fetch error
and autoTagSong() immediately returns [] when tagsPrompt is falsy; therefore an empty tagsPrompt
makes the whole run deterministically fail per song.

packages/crawling/src/cron/taggingSongs.ts[11-54]
packages/crawling/src/utils/getSongTag.ts[22-45]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When `tagsPrompt` is empty (e.g., tag fetch failed), `autoTagSong()` returns `[]` for every song, but `taggingSongs.ts` still iterates through up to 5000 songs and sleeps between iterations.

## Issue Context
- `getTagsForPrompt()` returns `''` on Supabase error.
- `autoTagSong()` returns `[]` when `tagsPrompt` is falsy.

## Fix Focus Areas
- packages/crawling/src/cron/taggingSongs.ts[11-16]
- packages/crawling/src/utils/getSongTag.ts[22-45]

## Suggested fix
1. After `Promise.all(...)`, validate `tagsPrompt` and abort the script early:
  - `if (!tagsPrompt) { console.error('Failed to load tagsPrompt'); process.exit(1); }` (or throw).
2. Optionally wrap `getTagsForPrompt()` in a try/catch to ensure unexpected exceptions also produce a controlled failure mode (and a clear log message).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. LLM 출력 토큰 제한 없음 🐞
Description
OpenAI 호출에서 max_tokens 설정이 제거되어, 모델 출력이 불필요하게 길어질 경우 비용/지연이 증가하고 응답 실패 가능성이 커집니다. 특히 프롬프트에 전체 태그
목록(tagsPrompt)을 포함시키므로 출력 제한을 두는 편이 안전합니다.
Code

packages/crawling/src/utils/getSongTag.ts[R55-82]

   // 2단계: OpenAI API 호출
   const response = await client.chat.completions.create({
-      model: 'gpt-4o-mini', // 가성비가 좋은 모델 사용
+      model: 'gpt-4o-mini',
     messages: [
       {
         role: 'system',
         content: `
-            You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
+            You are a music database expert specializing in global artist categorization.
+
+            [Language Selection Strategy]
+            - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
+            - If title/artist are in English, research the **artist's origin and primary market**.
+            - Priority Logic:
+              1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
+              2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
+              3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
+              4. For all other cases or truly global/mixed origins, use 103 (글로벌).
+
+            [Selection Rules]
+            - Language Slot (100-199): EXACTLY 1 tag.
+            - Genre Slot (200-299): EXACTLY 1 tag.
+            - Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

-            Guidelines:
-            1. Select at least one tag, but no more than 4.
-            2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
-            3. If it's Japanese music, ALWAYS include 101 (J-POP).
-            4. Be precise. If it's from an Anime, use 302 (애니메이션).
-            5. Return only JSON: {"tag_ids": [number, number, ...]}
+            [Contextual Hints]
+            ${languageHints}

           Allowed Tags List:
           ${tagsPrompt}
Evidence
getTagsForPrompt는 tags 테이블 전체를 줄바꿈으로 연결해 긴 문자열을 만들고, autoTagSong은 이를 system prompt에 그대로 포함해 매 요청에
전송합니다. 그럼에도 completion.create 호출에 max_tokens가 없어 출력 상한을 코드 차원에서 제어하지 않습니다.

packages/crawling/src/utils/getSongTag.ts[22-36]
packages/crawling/src/utils/getSongTag.ts[55-93]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`autoTagSong()`의 OpenAI 호출에서 `max_tokens`(출력 토큰 상한)가 제거되어 출력 길이/비용/지연을 제어하지 못합니다.
### Issue Context
응답은 `tag_ids` 배열만 포함하는 짧은 JSON이면 충분하므로, 출력 토큰 상한을 두는 것이 안전합니다.
### Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[55-93]
### What to change
- `client.chat.completions.create({...})` 옵션에 `max_tokens`를 합리적인 값(예: 100~200)으로 재추가하고, 왜 그 값인지 주석으로 근거를 남기세요.
- (선택) 출력이 커지는 경우를 대비해 파싱/검증 실패 시 로그에 `content` 길이 등 최소한의 디버그 정보를 남기세요(민감정보 제외).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

5. Hangul regex false positives 🐞
Description
The Hangul detection regex includes literal '|' characters inside the character class, so
titles/artists containing '|' are incorrectly treated as Hangul-present, skewing languageHints sent
to the model.
Code

packages/crawling/src/utils/getSongTag.ts[R47-48]

+    const hasHangul = /[ㄱ-ㅎ|ㅏ-ㅣ|가-힣]/.test(title + artist);
+    const hasKana = /[ぁ-んァ-ヶ]/.test(title + artist);
Evidence
Inside a character class ([...]), | is not alternation—it matches a literal pipe. The current
pattern therefore matches either Hangul ranges or the pipe character.

packages/crawling/src/utils/getSongTag.ts[46-53]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The Hangul detection regex uses `|` inside a character class, which makes `|` a literal match and can cause false positives.

## Issue Context
This signal is used to build `languageHints` and influences tag selection.

## Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[46-49]

## Suggested fix
Replace:
- `/[ㄱ-ㅎ|ㅏ-ㅣ|가-힣]/`
With:
- `/[ㄱ-ㅎㅏ-ㅣ가-힣]/`
(or an equivalent, more explicit grouping if desired).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Apr 12, 2026

Code Review by Qodo

Grey Divider

New Review Started

This review has been superseded by a new analysis

Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Refactor autoTagSong to inject tags prompt externally and improve AI categorization

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Extract getTagsForPrompt and pre-fetch tags before loop
• Pass tagsPrompt as parameter to autoTagSong function
• Remove module-level cache, delegate to caller management
• Enhance AI prompt with language detection heuristics
• Improve tag selection logic for K-Pop, J-Pop, and global artists
Diagram
flowchart LR
  A["taggingSongs.ts"] -->|"Promise.all"| B["getTagsForPrompt"]
  A -->|"getSongsAllDB"| C["allSongs"]
  A -->|"getSongTagSongIdsDB"| D["taggedSongIds"]
  B -->|"returns tagsPrompt"| E["autoTagSong"]
  C -->|"iterate songs"| E
  E -->|"enhanced AI prompt"| F["OpenAI API"]
  F -->|"tag_ids"| G["postSongTagsDB"]
Loading

Grey Divider

File Changes

1. packages/crawling/src/cron/taggingSongs.ts Refactor +8/-4

Pre-fetch tags prompt before song processing loop

• Import getTagsForPrompt function alongside autoTagSong
• Pre-fetch tagsPrompt using Promise.all before loop iteration
• Pass tagsPrompt as third parameter to autoTagSong call
• Update comment to reflect tag prompt pre-loading

packages/crawling/src/cron/taggingSongs.ts


2. packages/crawling/src/utils/getSongTag.ts ✨ Enhancement +35/-24

Export tags prompt and enhance AI categorization logic

• Export getTagsForPrompt function for external use
• Remove module-level cachedTagsPrompt variable and caching logic
• Add tagsPrompt as required parameter to autoTagSong function
• Enhance system prompt with language detection strategy for K-Pop, J-Pop, and global artists
• Add script detection logic (Hangul and Kana) to provide contextual hints to AI
• Refine tag selection rules with explicit slot constraints (Language, Genre, Origin)
• Simplify temperature and remove max_tokens comment

packages/crawling/src/utils/getSongTag.ts


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

PR Description updated to latest commit (011f16d)

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Apr 12, 2026

Code Review by Qodo

🐞 Bugs (2)   📘 Rule violations (0)   📎 Requirement gaps (0)   🎨 UX Issues (0)
🐞\ ≡ Correctness (1) ➹ Performance (1)

Grey Divider


Action required

1. LLM JSON 스키마 불일치 🐞
Description
autoTagSong은 응답을 {tag_ids:number[]}로 파싱해 result.tag_ids를 그대로 반환하지만, 새 system 프롬프트에는 tag_ids 키를 반드시
포함하라는 출력 스키마 지시가 없어 json_object 모드에서 다른 키로 응답할 경우 tagIds가 undefined가 됩니다. 그 결과 taggingSongs에서
tagIds.length 접근 시 TypeError가 발생해 해당 곡 태깅이 실패합니다.
Code

packages/crawling/src/utils/getSongTag.ts[R61-82]

          content: `
-            You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
+            You are a music database expert specializing in global artist categorization.
+
+            [Language Selection Strategy]
+            - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
+            - If title/artist are in English, research the **artist's origin and primary market**.
+            - Priority Logic:
+              1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
+              2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
+              3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
+              4. For all other cases or truly global/mixed origins, use 103 (글로벌).
+
+            [Selection Rules]
+            - Language Slot (100-199): EXACTLY 1 tag.
+            - Genre Slot (200-299): EXACTLY 1 tag.
+            - Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

-            Guidelines:
-            1. Select at least one tag, but no more than 4.
-            2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
-            3. If it's Japanese music, ALWAYS include 101 (J-POP).
-            4. Be precise. If it's from an Anime, use 302 (애니메이션).
-            5. Return only JSON: {"tag_ids": [number, number, ...]}
+            [Contextual Hints]
+            ${languageHints}

            Allowed Tags List:
            ${tagsPrompt}
Evidence
getSongTag.ts의 system 프롬프트는 JSON object만 강제(response_format)하고 tag_ids 필드/형식을 명시하지 않는데, 바로 아래에서
JSON.parse 후 result.tag_ids를 신뢰하고 반환합니다. 반환값을 받은 taggingSongs.ts는 tagIds.length를 즉시 참조하므로(배열 전제)
tag_ids가 누락/오타/비배열이면 런타임 에러로 실패 카운트만 증가합니다.

packages/crawling/src/utils/getSongTag.ts[56-99]
packages/crawling/src/cron/taggingSongs.ts[27-36]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`autoTagSong()`이 LLM 응답에서 `tag_ids`를 항상 제공한다고 가정하고 `result.tag_ids`를 그대로 반환합니다. 그러나 현재 프롬프트는 `json_object`만 강제하고 `tag_ids` 필드를 명시하지 않아, 모델이 다른 키로 반환하면 `tagIds.length`에서 런타임 오류가 발생할 수 있습니다.

### Issue Context
- `response_format: { type: 'json_object' }`는 **유효한 JSON 객체**만 보장하며, 객체의 **필드명/스키마**는 보장하지 않습니다.
- 다운스트림(`taggingSongs.ts`)은 `tagIds`가 배열임을 전제로 동작합니다.

### Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[56-102]
- packages/crawling/src/cron/taggingSongs.ts[27-37]

### What to change
1) system 프롬프트에 출력 스키마를 명시적으로 복구/추가하세요. 예:
- "Return ONLY valid JSON with EXACTLY this shape: {\"tag_ids\": number[]}".

2) 파싱 후 런타임 검증을 추가하세요.
- `const parsed = JSON.parse(content)`
- `const tagIds = Array.isArray(parsed.tag_ids) ? parsed.tag_ids : []`
- 유효하지 않으면 로그를 남기고 `[]` 반환.

3) (선택) `taggingSongs.ts`에서도 방어적으로 `Array.isArray(tagIds)` 체크 후 사용하세요(이중 안전망).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. LLM 출력 토큰 제한 없음 🐞
Description
OpenAI 호출에서 max_tokens 설정이 제거되어, 모델 출력이 불필요하게 길어질 경우 비용/지연이 증가하고 응답 실패 가능성이 커집니다. 특히 프롬프트에 전체 태그
목록(tagsPrompt)을 포함시키므로 출력 제한을 두는 편이 안전합니다.
Code

packages/crawling/src/utils/getSongTag.ts[R55-82]

    // 2단계: OpenAI API 호출
    const response = await client.chat.completions.create({
-      model: 'gpt-4o-mini', // 가성비가 좋은 모델 사용
+      model: 'gpt-4o-mini',
      messages: [
        {
          role: 'system',
          content: `
-            You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
+            You are a music database expert specializing in global artist categorization.
+
+            [Language Selection Strategy]
+            - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
+            - If title/artist are in English, research the **artist's origin and primary market**.
+            - Priority Logic:
+              1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
+              2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
+              3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
+              4. For all other cases or truly global/mixed origins, use 103 (글로벌).
+
+            [Selection Rules]
+            - Language Slot (100-199): EXACTLY 1 tag.
+            - Genre Slot (200-299): EXACTLY 1 tag.
+            - Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

-            Guidelines:
-            1. Select at least one tag, but no more than 4.
-            2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
-            3. If it's Japanese music, ALWAYS include 101 (J-POP).
-            4. Be precise. If it's from an Anime, use 302 (애니메이션).
-            5. Return only JSON: {"tag_ids": [number, number, ...]}
+            [Contextual Hints]
+            ${languageHints}

            Allowed Tags List:
            ${tagsPrompt}
Evidence
getTagsForPrompt는 tags 테이블 전체를 줄바꿈으로 연결해 긴 문자열을 만들고, autoTagSong은 이를 system prompt에 그대로 포함해 매 요청에
전송합니다. 그럼에도 completion.create 호출에 max_tokens가 없어 출력 상한을 코드 차원에서 제어하지 않습니다.

packages/crawling/src/utils/getSongTag.ts[22-36]
packages/crawling/src/utils/getSongTag.ts[55-93]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`autoTagSong()`의 OpenAI 호출에서 `max_tokens`(출력 토큰 상한)가 제거되어 출력 길이/비용/지연을 제어하지 못합니다.

### Issue Context
응답은 `tag_ids` 배열만 포함하는 짧은 JSON이면 충분하므로, 출력 토큰 상한을 두는 것이 안전합니다.

### Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[55-93]

### What to change
- `client.chat.completions.create({...})` 옵션에 `max_tokens`를 합리적인 값(예: 100~200)으로 재추가하고, 왜 그 값인지 주석으로 근거를 남기세요.
- (선택) 출력이 커지는 경우를 대비해 파싱/검증 실패 시 로그에 `content` 길이 등 최소한의 디버그 정보를 남기세요(민감정보 제외).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@GulSam00 GulSam00 merged commit b5b8b33 into develop Apr 12, 2026
2 checks passed
@GulSam00 GulSam00 deleted the refactor/188-autoTagSongPreFetch branch April 12, 2026 14:18
Comment on lines 61 to 82
content: `
You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
You are a music database expert specializing in global artist categorization.

[Language Selection Strategy]
- **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
- If title/artist are in English, research the **artist's origin and primary market**.
- Priority Logic:
1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
4. For all other cases or truly global/mixed origins, use 103 (글로벌).

[Selection Rules]
- Language Slot (100-199): EXACTLY 1 tag.
- Genre Slot (200-299): EXACTLY 1 tag.
- Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

Guidelines:
1. Select at least one tag, but no more than 4.
2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
3. If it's Japanese music, ALWAYS include 101 (J-POP).
4. Be precise. If it's from an Anime, use 302 (애니메이션).
5. Return only JSON: {"tag_ids": [number, number, ...]}
[Contextual Hints]
${languageHints}

Allowed Tags List:
${tagsPrompt}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Llm json 스키마 불일치 🐞 Bug ≡ Correctness

autoTagSong은 응답을 {tag_ids:number[]}로 파싱해 result.tag_ids를 그대로 반환하지만, 새 system 프롬프트에는 tag_ids 키를 반드시
포함하라는 출력 스키마 지시가 없어 json_object 모드에서 다른 키로 응답할 경우 tagIds가 undefined가 됩니다. 그 결과 taggingSongs에서
tagIds.length 접근 시 TypeError가 발생해 해당 곡 태깅이 실패합니다.
Agent Prompt
### Issue description
`autoTagSong()`이 LLM 응답에서 `tag_ids`를 항상 제공한다고 가정하고 `result.tag_ids`를 그대로 반환합니다. 그러나 현재 프롬프트는 `json_object`만 강제하고 `tag_ids` 필드를 명시하지 않아, 모델이 다른 키로 반환하면 `tagIds.length`에서 런타임 오류가 발생할 수 있습니다.

### Issue Context
- `response_format: { type: 'json_object' }`는 **유효한 JSON 객체**만 보장하며, 객체의 **필드명/스키마**는 보장하지 않습니다.
- 다운스트림(`taggingSongs.ts`)은 `tagIds`가 배열임을 전제로 동작합니다.

### Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[56-102]
- packages/crawling/src/cron/taggingSongs.ts[27-37]

### What to change
1) system 프롬프트에 출력 스키마를 명시적으로 복구/추가하세요. 예:
- "Return ONLY valid JSON with EXACTLY this shape: {\"tag_ids\": number[]}".

2) 파싱 후 런타임 검증을 추가하세요.
- `const parsed = JSON.parse(content)`
- `const tagIds = Array.isArray(parsed.tag_ids) ? parsed.tag_ids : []`
- 유효하지 않으면 로그를 남기고 `[]` 반환.

3) (선택) `taggingSongs.ts`에서도 방어적으로 `Array.isArray(tagIds)` 체크 후 사용하세요(이중 안전망).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines 60 to 82
role: 'system',
content: `
You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list.
You are a music database expert specializing in global artist categorization.

[Language Selection Strategy]
- **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters.
- If title/artist are in English, research the **artist's origin and primary market**.
- Priority Logic:
1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래).
2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래).
3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions.
4. For all other cases or truly global/mixed origins, use 103 (글로벌).

[Selection Rules]
- Language Slot (100-199): EXACTLY 1 tag.
- Genre Slot (200-299): EXACTLY 1 tag.
- Origin Slot (300-399): 1 to 2 tags, sorted by relevance.

Guidelines:
1. Select at least one tag, but no more than 4.
2. Prioritize Language (100s), then Genre (200s), then Origin (300s).
3. If it's Japanese music, ALWAYS include 101 (J-POP).
4. Be precise. If it's from an Anime, use 302 (애니메이션).
5. Return only JSON: {"tag_ids": [number, number, ...]}
[Contextual Hints]
${languageHints}

Allowed Tags List:
${tagsPrompt}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Ambiguous llm json contract 🐞 Bug ≡ Correctness

autoTagSong() still parses the response as {"tag_ids": number[]} but the updated system prompt no
longer instructs the model to return a tag_ids field, so valid JSON responses without that key will
make the function return undefined and cause repeated per-song failures.
Agent Prompt
## Issue description
`autoTagSong()` parses the OpenAI response as `{ tag_ids: number[] }`, but the updated prompt no longer requires the model to return a `tag_ids` field. This makes the parser contract ambiguous and can yield `undefined`/non-array values.

## Issue Context
The call uses `response_format: { type: 'json_object' }`, which enforces JSON validity but does not guarantee a particular key name or schema.

## Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[55-99]

## Suggested fix
1. Re-add an explicit output instruction in the system prompt, e.g.:
   - `Return JSON with this exact shape: {"tag_ids": [number, ...]}`
2. Add runtime validation after parsing:
   - If `result.tag_ids` is not an array of numbers, return `[]` (and optionally log the raw content for debugging).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor : autoTagSong 태그 프롬프트 외부 주입으로 변경

1 participant