feat: add llms.txt detection and content-focused scoring criteria by IISweetHeartII · Pull Request #58 · agentgram/ax-score

IISweetHeartII · 2026-04-14T01:28:40Z

Summary

Implement all 5 parts of issue #51 — enhanced content-focused scoring for AX Score.

Changes

1. llms.txt Detection (High Priority)

Upgrade LlmsTxtAudit from binary → numeric scoring
Check for H1 heading, blockquote summary, sectioned URL lists
Add companion /llms-full.txt bonus detection
New llmsFullTxt probe in HttpGatherer

2. JSON-LD Schema Type Analysis (High Priority)

Upgrade JsonLdAudit from binary → numeric scoring
Analyze specific schema types: WebSite, BlogPosting, Article, Person, Organization, BreadcrumbList, FAQPage, HowTo, WebPage
Handle @graph arrays and @type arrays
Give partial credit based on type richness

3. AI Crawler Permissions (Medium Priority)

Expand AI crawler list from 4 → 10 agents
Add: ClaudeBot, PerplexityBot, ChatGPT-User, Applebot-Extended, Bard, anthropic-ai

4. Content Feed Detection (Medium Priority)

New ContentFeedAudit — checks for RSS/Atom feed availability
HTML <link rel="alternate"> feed link extraction in HtmlGatherer
Numeric scoring based on feed count

5. Semantic HTML Analysis (Lower Priority)

Add <article> to semantic element checks
Add heading hierarchy validation (no skipped levels, e.g. h1 → h3)
Track headingLevels in SemanticElements

Config

Add content-feed audit (weight 4) to structured-data category
Rebalance json-ld (8), meta-tags (4), semantic-html (4) weights

Testing

All 139 tests pass (7 new tests added)
TypeScript type-check clean
ESLint clean

Files Changed

20 files, +569/-71 lines

Fixes #51

IISweetHeartII · 2026-04-14T01:51:25Z

Navi review: APPROVE 판단.

구현 범위가 issue #51의 5개 항목을 모두 정확히 덮음
CI (lint, type-check, test) clean
특히 llms.txt/JSON-LD/feed/semantic HTML scoring 확장이 일관됨

참고: 현재 gh auth 기준 PR 작성자와 동일 계정이라 GitHub self-approve는 불가해 코멘트로 판정 남깁니다.

IISweetHeartII · 2026-04-17T02:48:18Z

CI 실패: auto-label. 수정 필요.

IISweetHeartII · 2026-04-17T03:03:25Z

CI 실패: auto-label (failure). 수정 필요.

IISweetHeartII · 2026-04-17T04:47:13Z

CI 실패: auto-label. 수정 필요.

IISweetHeartII · 2026-04-17T05:15:12Z

Reviewed deeply. No blocking issues found. The llms.txt, JSON-LD, feed, and semantic HTML changes are internally consistent, test coverage was expanded alongside the new scoring logic, and the config/report updates match the new audit surface.

IISweetHeartII · 2026-04-17T05:25:44Z

Deep review is still positive on the code itself, but this PR is now blocked by merge conflicts against develop. I tried updating the branch and GitHub rejected the rebase due to conflicts, so this needs a conflict refresh before it can be merged.

IISweetHeartII · 2026-04-17T05:42:11Z

CI 실패: auto-label. 수정 필요.

IISweetHeartII

🐱 Navi review — APPROVE, blocking 없음.

issue #51의 5가지 변경 모두 합리적이고 테스트 커버리지 충분함.

판단 근거:

llms.txt: binary→numeric 점수 전환, 품질 신호(H1/blockquote/sectioned URLs/길이) 세분화 깔끔
JSON-LD: @graph + @type 배열 처리로 엣지 케이스 커버
AI 크롤러: 4→10 확장, 단순 추가
ContentFeed: 새 audit + gatherer 분리 구조 좋음
Semantic HTML: <article> 추가, heading hierarchy 검증 로직 정확

Non-blocking (3건):

ContentFeedAudit.detectBodyFeedLinks가 HtmlGatherer.extractFeedLinks와 동일한 <link rel="alternate"> 정규식 파싱을 수행. gatherer에서 이미 feedLinks로 제공하므로 audit의 body re-parsing은 불필요. Set dedup이라 동작엔 문제 없지만, 같은 로직이 두 곳에 있음.
ContentFeed 점수: min(1, feedCount * 0.5 + 0.5) → 피드 1개 = 만점 1.0. 점수 차별화가 안 됨.
runner.test.ts에서 /llms-full.txt 404 조건 추가 시 원래 if ( 포맷이 망가짐. 동작은 정상.

@graph

Implement all 5 parts of issue #51: 1. llms.txt Detection (High Priority) - Upgrade LlmsTxtAudit from binary to numeric scoring - Check for H1 heading, blockquote summary, sectioned URL lists - Add companion /llms-full.txt bonus detection - Add llmsFullTxt probe to HttpGatherer 2. JSON-LD Schema Type Analysis (High Priority) - Upgrade JsonLdAudit from binary to numeric scoring - Analyze specific schema types (WebSite, BlogPosting, Article, Person, Organization, BreadcrumbList, FAQPage, HowTo, WebPage) - Handle @graph arrays and @type arrays - Give partial credit based on type richness 3. AI Crawler Permissions (Medium Priority) - Expand AI_USER_AGENTS from 4 to 10 crawlers - Add ClaudeBot, PerplexityBot, ChatGPT-User, Applebot-Extended, Bard, anthropic-ai 4. Content Feed Detection (Medium Priority) - New ContentFeedAudit checking RSS/Atom feed availability - HTML <link rel='alternate'> feed link extraction in HtmlGatherer - Numeric scoring based on feed count 5. Semantic HTML Analysis (Lower Priority) - Add <article> element to semantic checks (was already gathered) - Add heading hierarchy validation (no skipped levels) - Add headingLevels tracking to SemanticElements Config changes: - Add content-feed audit (weight 4) to structured-data category - Rebalance json-ld/meta-tags/semantic-html weights Fixes #51

Add llmsFullTxt, headingLevels, and feedLinks to test helpers to match updated HttpGatherResult and HtmlGatherResult interfaces. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

IISweetHeartII enabled auto-merge (squash) April 17, 2026 05:24

IISweetHeartII commented Apr 17, 2026

View reviewed changes

IISweetHeartII and others added 2 commits April 17, 2026 20:55

test: update site-type test fixtures for new gatherer fields

a839f85

Add llmsFullTxt, headingLevels, and feedLinks to test helpers to match updated HttpGatherResult and HtmlGatherResult interfaces. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

IISweetHeartII force-pushed the fix/issue-51 branch from 2a53fc0 to a839f85 Compare April 17, 2026 11:59

IISweetHeartII merged commit 76d8f50 into develop Apr 17, 2026
2 of 3 checks passed

IISweetHeartII deleted the fix/issue-51 branch April 17, 2026 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add llms.txt detection and content-focused scoring criteria#58

feat: add llms.txt detection and content-focused scoring criteria#58
IISweetHeartII merged 2 commits intodevelopfrom
fix/issue-51

IISweetHeartII commented Apr 14, 2026

Uh oh!

IISweetHeartII commented Apr 14, 2026

Uh oh!

IISweetHeartII commented Apr 17, 2026

Uh oh!

IISweetHeartII commented Apr 17, 2026

Uh oh!

IISweetHeartII commented Apr 17, 2026

Uh oh!

IISweetHeartII commented Apr 17, 2026

Uh oh!

IISweetHeartII commented Apr 17, 2026

Uh oh!

IISweetHeartII commented Apr 17, 2026

Uh oh!

IISweetHeartII left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

IISweetHeartII commented Apr 14, 2026

Summary

Changes

1. llms.txt Detection (High Priority)

2. JSON-LD Schema Type Analysis (High Priority)

3. AI Crawler Permissions (Medium Priority)

4. Content Feed Detection (Medium Priority)

5. Semantic HTML Analysis (Lower Priority)

Config

Testing

Files Changed

Uh oh!

IISweetHeartII commented Apr 14, 2026

Uh oh!

IISweetHeartII commented Apr 17, 2026

Uh oh!

IISweetHeartII commented Apr 17, 2026

Uh oh!

IISweetHeartII commented Apr 17, 2026

Uh oh!

IISweetHeartII commented Apr 17, 2026

Uh oh!

IISweetHeartII commented Apr 17, 2026

Uh oh!

IISweetHeartII commented Apr 17, 2026

Uh oh!

IISweetHeartII left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant