Skip to content

Fix/ky verify#165

Merged
GulSam00 merged 2 commits intodevelopfrom
fix/kyVerify
Mar 16, 2026
Merged

Fix/ky verify#165
GulSam00 merged 2 commits intodevelopfrom
fix/kyVerify

Conversation

@GulSam00
Copy link
Copy Markdown
Owner

📌 PR 제목

[Type] : 작업 내용 요약

📌 변경 사항

💬 추가 참고 사항

- crawlYoutubeVerify: index >= 2000 시 반복문 break
- validateSongMatch: JSON 파싱 try-catch 추가, isValid === true 명시적 비교, max_tokens 50으로 증가
@GulSam00 GulSam00 merged commit b353fda into develop Mar 16, 2026
2 checks passed
@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Mar 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
singcode Ready Ready Preview, Comment Mar 16, 2026 4:16pm

@GulSam00 GulSam00 deleted the fix/kyVerify branch March 16, 2026 16:16
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Fix crawl verification stability and alias mapping issues

🐞 Bug fix ✨ Enhancement

Grey Divider

Walkthroughs

Description
• Fix typo in krToJpnArtistSort alias mapping for ASIAN KUNG-FU GENERATION
• Add 2000-record limit to crawlYoutubeVerify crawling loop
• Improve validateSongMatch stability with try-catch and better GPT prompt
• Enable scheduled execution for verify_ky_youtube workflow
Diagram
flowchart LR
  A["krToJpnArtistSort"] -- "fix alias key order" --> B["ASIAN KUNG-FU GENERATION mapped correctly"]
  C["crawlYoutubeVerify"] -- "break at index >= 2000" --> D["Limit crawl to 2000 records"]
  E["validateSongMatch"] -- "improved prompt + try-catch" --> F["Stable JSON parsing & better matching"]
  G["verify_ky_youtube.yml"] -- "add schedule trigger" --> H["Daily cron at 23:00 KST"]
Loading

Grey Divider

File Changes

1. apps/web/src/constants/krToJpnArtist.ts 🐞 Bug fix +1/-1

Fix alias key for ASIAN KUNG-FU GENERATION

• Removed incorrect English-keyed entry for ASIAN KUNG-FU GENERATION
• Added correct Korean-keyed entry '아시안 쿵푸 제너레이션' mapped to 'ASIAN KUNG-FU GENERATION' in proper
 alphabetical section

apps/web/src/constants/krToJpnArtist.ts


2. packages/crawling/src/crawling/crawlYoutubeVerify.ts ✨ Enhancement +2/-0

Limit crawl loop to 2000 records

• Added a break statement when index >= 2000 to limit crawling to 2000 records per run

packages/crawling/src/crawling/crawlYoutubeVerify.ts


3. packages/crawling/src/utils/validateSongMatch.ts 🐞 Bug fix +19/-6

Improve GPT song match validation stability

• Expanded GPT system prompt with explicit matching rules for parenthetical info, language variants,
 and performing artist logic
• Improved user message format for clarity
• Increased max_tokens from 20 to 50
• Wrapped JSON.parse in try-catch to handle malformed responses
• Changed return to explicit result.isValid === true comparison

packages/crawling/src/utils/validateSongMatch.ts


View more (2)
4. .github/workflows/update_ky_youtube.yml ⚙️ Configuration changes +0/-1

Re-enable scheduled update_ky_youtube workflow

• Removed comment that indicated the workflow was paused, re-enabling the scheduled trigger

.github/workflows/update_ky_youtube.yml


5. .github/workflows/verify_ky_youtube.yml ⚙️ Configuration changes +2/-0

Add daily schedule to verify_ky_youtube workflow

• Added a schedule trigger with daily cron at 14:00 UTC (23:00 KST) to the verify workflow

.github/workflows/verify_ky_youtube.yml


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Mar 16, 2026

Code Review by Qodo

🐞 Bugs (4) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Cap bypassed on errors 🐞 Bug ⛯ Reliability
Description
In crawlYoutubeVerify, the new index >= 2000 break is skipped when isValidKYExistNumber throws
because the catch block continues before reaching the break check, so the job can still iterate
the full dataset under repeated validation failures. This defeats the intended runtime/API-call
limiter added in this PR.
Code

packages/crawling/src/crawling/crawlYoutubeVerify.ts[R46-49]

  index++;
  console.log('crawlYoutubeVerify : ', index);
+
+  if (index >= 2000) break;
Evidence
The loop increments index and immediately continues inside the catch, which bypasses the later
if (index >= 2000) break; guard. Therefore the cap only applies to the non-error path, and the
PR’s limiter does not constrain runs during common failure modes (network/KY site issues).

packages/crawling/src/crawling/crawlYoutubeVerify.ts[26-50]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`crawlYoutubeVerify` adds a 2000-item cap, but `continue` in the error path bypasses the break check, so failures can still lead to unbounded processing.

### Issue Context
The limiter appears intended to bound runtime/cost for scheduled execution.

### Fix Focus Areas
- packages/crawling/src/crawling/crawlYoutubeVerify.ts[26-50]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Browser not closed on crash 🐞 Bug ⛯ Reliability
Description
crawlYoutubeVerify launches a Puppeteer browser at top-level but does not use try/finally and
does not await browser.close(), so unexpected exceptions can leave the browser running and cause
the scheduled job to hang or leak resources. This is especially risky now that the workflow is
scheduled to run unattended.
Code

packages/crawling/src/crawling/crawlYoutubeVerify.ts[52]

browser.close();
Evidence
crawlYoutubeVerify uses top-level await puppeteer.launch(...) and ends with a non-awaited
browser.close(). In contrast, crawlYoutube.ts demonstrates the safer pattern: wrap work in `try
{ ... } finally { await browser.close(); }` to ensure cleanup even on errors.

packages/crawling/src/crawling/crawlYoutubeVerify.ts[12-17]
packages/crawling/src/crawling/crawlYoutubeVerify.ts[52-52]
packages/crawling/src/crawling/crawlYoutube.ts[83-155]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`crawlYoutubeVerify.ts` does not guarantee browser cleanup and does not await closing, which can hang/leak resources on exceptions.

### Issue Context
This script is now intended to run on a schedule, so cleanup must be deterministic.

### Fix Focus Areas
- packages/crawling/src/crawling/crawlYoutubeVerify.ts[12-17]
- packages/crawling/src/crawling/crawlYoutubeVerify.ts[52-52]
- packages/crawling/src/crawling/crawlYoutube.ts[83-155]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Scheduled job over-privileged 🐞 Bug ⛨ Security
Description
The verify workflow is now triggered on a daily schedule but still grants contents: write, giving
unnecessary repository write access to a job that only checks out code and runs a crawler script.
This increases the blast radius if the job or a dependency is compromised.
Code

.github/workflows/verify_ky_youtube.yml[R4-5]

+  schedule:
+    - cron: "0 14 * * *" # 한국 시간 23:00 실행 (UTC+9 → UTC 14:00)
Evidence
The PR adds a cron schedule to the verify workflow. The workflow also explicitly sets `permissions:
contents: write, while its steps only run pnpm run ky-verify` (no git commit/push steps shown) and
the entrypoint script performs web crawling and Supabase updates, not repository writes.

.github/workflows/verify_ky_youtube.yml[3-10]
.github/workflows/verify_ky_youtube.yml[15-43]
packages/crawling/src/crawling/crawlYoutubeVerify.ts[1-6]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The verify workflow is now scheduled but still requests `contents: write` without evidence of repo writes.

### Issue Context
Scheduled workflows run unattended; minimizing token scope reduces security risk.

### Fix Focus Areas
- .github/workflows/verify_ky_youtube.yml[3-10]
- .github/workflows/verify_ky_youtube.yml[15-43]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. Two schedulers run together 🐞 Bug ⛯ Reliability
Description
After adding a schedule to the verify workflow, both update_ky_youtube and verify_ky_youtube run
at the same cron time, causing two independent Puppeteer/OpenAI-driven crawlers to execute
concurrently. This increases the chance of rate-limits/timeouts and unnecessary load/cost at 23:00
KST.
Code

.github/workflows/verify_ky_youtube.yml[R4-5]

+  schedule:
+    - cron: "0 14 * * *" # 한국 시간 23:00 실행 (UTC+9 → UTC 14:00)
Evidence
Both workflows are configured with the identical cron expression. Each workflow runs a crawling
script from packages/crawling (update runs ky-youtube; verify runs ky-verify), and the
crawling path uses Puppeteer and calls validateSongMatch (OpenAI) via isValidKYExistNumber, so
concurrency directly increases external call volume at the same time window.

.github/workflows/verify_ky_youtube.yml[3-7]
.github/workflows/update_ky_youtube.yml[3-6]
.github/workflows/update_ky_youtube.yml[41-43]
.github/workflows/verify_ky_youtube.yml[41-43]
packages/crawling/src/crawling/isValidKYExistNumber.ts[31-38]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Two workflows are now scheduled at the same time and both run crawling scripts, increasing concurrent load and failure risk.

### Issue Context
Both jobs use Puppeteer and (via `validateSongMatch`) OpenAI calls.

### Fix Focus Areas
- .github/workflows/verify_ky_youtube.yml[3-7]
- .github/workflows/update_ky_youtube.yml[3-6]
- .github/workflows/update_ky_youtube.yml[41-43]
- .github/workflows/verify_ky_youtube.yml[41-43]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines 46 to +49
index++;
console.log('crawlYoutubeVerify : ', index);

if (index >= 2000) break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Cap bypassed on errors 🐞 Bug ⛯ Reliability

In crawlYoutubeVerify, the new index >= 2000 break is skipped when isValidKYExistNumber throws
because the catch block continues before reaching the break check, so the job can still iterate
the full dataset under repeated validation failures. This defeats the intended runtime/API-call
limiter added in this PR.
Agent Prompt
### Issue description
`crawlYoutubeVerify` adds a 2000-item cap, but `continue` in the error path bypasses the break check, so failures can still lead to unbounded processing.

### Issue Context
The limiter appears intended to bound runtime/cost for scheduled execution.

### Fix Focus Areas
- packages/crawling/src/crawling/crawlYoutubeVerify.ts[26-50]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant