[Refactor] : autoTagSong 태그 프롬프트 외부 주입으로 변경 (#188)#189
Conversation
|
/describe |
|
/review |
|
/improve |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Code Review by Qodo
|
Code Review by QodoNew Review StartedThis review has been superseded by a new analysisⓘ The new review experience is currently in Beta. Learn more |
Review Summary by QodoRefactor autoTagSong to inject tags prompt externally and improve AI categorization
WalkthroughsDescription• Extract getTagsForPrompt and pre-fetch tags before loop • Pass tagsPrompt as parameter to autoTagSong function • Remove module-level cache, delegate to caller management • Enhance AI prompt with language detection heuristics • Improve tag selection logic for K-Pop, J-Pop, and global artists Diagramflowchart LR
A["taggingSongs.ts"] -->|"Promise.all"| B["getTagsForPrompt"]
A -->|"getSongsAllDB"| C["allSongs"]
A -->|"getSongTagSongIdsDB"| D["taggedSongIds"]
B -->|"returns tagsPrompt"| E["autoTagSong"]
C -->|"iterate songs"| E
E -->|"enhanced AI prompt"| F["OpenAI API"]
F -->|"tag_ids"| G["postSongTagsDB"]
File Changes1. packages/crawling/src/cron/taggingSongs.ts
|
|
PR Description updated to latest commit (011f16d)
|
Code Review by Qodo
|
| content: ` | ||
| You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list. | ||
| You are a music database expert specializing in global artist categorization. | ||
|
|
||
| [Language Selection Strategy] | ||
| - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters. | ||
| - If title/artist are in English, research the **artist's origin and primary market**. | ||
| - Priority Logic: | ||
| 1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래). | ||
| 2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래). | ||
| 3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions. | ||
| 4. For all other cases or truly global/mixed origins, use 103 (글로벌). | ||
|
|
||
| [Selection Rules] | ||
| - Language Slot (100-199): EXACTLY 1 tag. | ||
| - Genre Slot (200-299): EXACTLY 1 tag. | ||
| - Origin Slot (300-399): 1 to 2 tags, sorted by relevance. | ||
|
|
||
| Guidelines: | ||
| 1. Select at least one tag, but no more than 4. | ||
| 2. Prioritize Language (100s), then Genre (200s), then Origin (300s). | ||
| 3. If it's Japanese music, ALWAYS include 101 (J-POP). | ||
| 4. Be precise. If it's from an Anime, use 302 (애니메이션). | ||
| 5. Return only JSON: {"tag_ids": [number, number, ...]} | ||
| [Contextual Hints] | ||
| ${languageHints} | ||
|
|
||
| Allowed Tags List: | ||
| ${tagsPrompt} |
There was a problem hiding this comment.
1. Llm json 스키마 불일치 🐞 Bug ≡ Correctness
autoTagSong은 응답을 {tag_ids:number[]}로 파싱해 result.tag_ids를 그대로 반환하지만, 새 system 프롬프트에는 tag_ids 키를 반드시
포함하라는 출력 스키마 지시가 없어 json_object 모드에서 다른 키로 응답할 경우 tagIds가 undefined가 됩니다. 그 결과 taggingSongs에서
tagIds.length 접근 시 TypeError가 발생해 해당 곡 태깅이 실패합니다.
Agent Prompt
### Issue description
`autoTagSong()`이 LLM 응답에서 `tag_ids`를 항상 제공한다고 가정하고 `result.tag_ids`를 그대로 반환합니다. 그러나 현재 프롬프트는 `json_object`만 강제하고 `tag_ids` 필드를 명시하지 않아, 모델이 다른 키로 반환하면 `tagIds.length`에서 런타임 오류가 발생할 수 있습니다.
### Issue Context
- `response_format: { type: 'json_object' }`는 **유효한 JSON 객체**만 보장하며, 객체의 **필드명/스키마**는 보장하지 않습니다.
- 다운스트림(`taggingSongs.ts`)은 `tagIds`가 배열임을 전제로 동작합니다.
### Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[56-102]
- packages/crawling/src/cron/taggingSongs.ts[27-37]
### What to change
1) system 프롬프트에 출력 스키마를 명시적으로 복구/추가하세요. 예:
- "Return ONLY valid JSON with EXACTLY this shape: {\"tag_ids\": number[]}".
2) 파싱 후 런타임 검증을 추가하세요.
- `const parsed = JSON.parse(content)`
- `const tagIds = Array.isArray(parsed.tag_ids) ? parsed.tag_ids : []`
- 유효하지 않으면 로그를 남기고 `[]` 반환.
3) (선택) `taggingSongs.ts`에서도 방어적으로 `Array.isArray(tagIds)` 체크 후 사용하세요(이중 안전망).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| role: 'system', | ||
| content: ` | ||
| You are a music database expert. Based on the song title and artist, categorize the song by selecting appropriate tag IDs from the provided list. | ||
| You are a music database expert specializing in global artist categorization. | ||
|
|
||
| [Language Selection Strategy] | ||
| - **Do NOT** assume a song is 102 (팝송) solely based on English/Latin characters. | ||
| - If title/artist are in English, research the **artist's origin and primary market**. | ||
| - Priority Logic: | ||
| 1. If Hangul is detected OR the artist is a K-Pop artist: Select 100 (한국노래). | ||
| 2. If Kana is detected OR the artist is a J-Pop/Japanese artist: Select 101 (일본노래). | ||
| 3. Select 102 (팝송) ONLY if the artist is primarily from Western/English-speaking regions. | ||
| 4. For all other cases or truly global/mixed origins, use 103 (글로벌). | ||
|
|
||
| [Selection Rules] | ||
| - Language Slot (100-199): EXACTLY 1 tag. | ||
| - Genre Slot (200-299): EXACTLY 1 tag. | ||
| - Origin Slot (300-399): 1 to 2 tags, sorted by relevance. | ||
|
|
||
| Guidelines: | ||
| 1. Select at least one tag, but no more than 4. | ||
| 2. Prioritize Language (100s), then Genre (200s), then Origin (300s). | ||
| 3. If it's Japanese music, ALWAYS include 101 (J-POP). | ||
| 4. Be precise. If it's from an Anime, use 302 (애니메이션). | ||
| 5. Return only JSON: {"tag_ids": [number, number, ...]} | ||
| [Contextual Hints] | ||
| ${languageHints} | ||
|
|
||
| Allowed Tags List: | ||
| ${tagsPrompt} |
There was a problem hiding this comment.
1. Ambiguous llm json contract 🐞 Bug ≡ Correctness
autoTagSong() still parses the response as {"tag_ids": number[]} but the updated system prompt no
longer instructs the model to return a tag_ids field, so valid JSON responses without that key will
make the function return undefined and cause repeated per-song failures.
Agent Prompt
## Issue description
`autoTagSong()` parses the OpenAI response as `{ tag_ids: number[] }`, but the updated prompt no longer requires the model to return a `tag_ids` field. This makes the parser contract ambiguous and can yield `undefined`/non-array values.
## Issue Context
The call uses `response_format: { type: 'json_object' }`, which enforces JSON validity but does not guarantee a particular key name or schema.
## Fix Focus Areas
- packages/crawling/src/utils/getSongTag.ts[55-99]
## Suggested fix
1. Re-add an explicit output instruction in the system prompt, e.g.:
- `Return JSON with this exact shape: {"tag_ids": [number, ...]}`
2. Add runtime validation after parsing:
- If `result.tag_ids` is not an array of numbers, return `[]` (and optionally log the raw content for debugging).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
User description
📌 PR 제목
[Refactor] : autoTagSong 태그 프롬프트 외부 주입으로 변경
📌 변경 사항
getTagsForPrompt를 export하여 호출부에서 미리 한 번만 가져오도록 변경autoTagSong이tagsPrompt를 파라미터로 받도록 시그니처 변경taggingSongs.ts에서 루프 진입 전Promise.all로 태그 프롬프트를 사전 로드cachedTagsPrompt) 제거 — 호출부가 관리💬 추가 참고 사항
PR Type
Enhancement, Refactor
Description
Refactor
autoTagSongto accepttagsPromptas parameter instead of fetching internallyPre-fetch tag prompts in
taggingSongs.tsusingPromise.allbefore loopExport
getTagsForPromptfunction for external caller managementRemove module-level cache (
cachedTagsPrompt) and shift responsibility to callerEnhance AI prompt with language detection heuristics for better tag categorization
Improve tagging strategy with explicit rules for language slot selection
Diagram Walkthrough
File Walkthrough
taggingSongs.ts
Pre-fetch tag prompts before processing looppackages/crawling/src/cron/taggingSongs.ts
getTagsForPromptfunction alongsideautoTagSonggetTagsForPrompt()toPromise.allbatch to pre-fetch tags beforeloop
tagsPromptparameter toautoTagSongcall in loopgetSongTag.ts
Export getTagsForPrompt and enhance AI tagging strategypackages/crawling/src/utils/getSongTag.ts
getTagsForPromptfunction for external usecachedTagsPromptvariable and caching logicautoTagSongsignature to accepttagsPromptas third parameterlanguageHintsstring based on detected scriptsand explicit tag slot rules
autoTagSong