fix: correct citation-claim mismatches in research report (v2.0.1) by natifridman · Pull Request #429 · ambient-code/agentready

natifridman · 2026-05-13T11:23:53Z

Summary

Addresses follow-up feedback from #374 (comment): three citations in Section 5.1 (Test Execution) don't support the claims they're attached to. All three were verified against the actual sources.

Terminal-Bench: cited under Test Execution but the 52.8%→66.5% improvement came from agent harness engineering (system prompts, middleware, loop detection), not test execution infrastructure. Also misattributed to LangChain — Terminal-Bench was created by Stanford/Laude Institute.
DORA 2025: URL pointed to a Google Cloud marketing page, not the actual DORA report. "TDD is more critical than ever" is marketing interpretation — the actual report finds AI adoption has a negative relationship with software delivery stability.
Salesforce Cursor: article is about using Cursor to write tests faster for a legacy coverage mandate — says nothing about test execution being important for AI agent effectiveness.

Changes

Removed 3 misaligned citations from Section 5.1, replaced with Cursor agent best practices (which explicitly says "write tests, give the agent clear signals")
Added Zhang et al. (arXiv:2604.11088) finding: positive directives hurt agent performance, only negative constraints help
Renamed report title from "Comprehensive Research" to "Curated Best Practices"
Added LIMITATIONS & EVIDENCE QUALITY section acknowledging source types and known gaps
Updated default-weights.yaml header comment and research_formatter.py template title
No tier or weight value changes — scoring behavior unchanged

Test plan

agentready research validate passes
All 74 research-related tests pass
Self-assessment score unchanged (73.2/100 before and after)
External repo assessment (Flask) runs without errors
Grep verification: no stale Terminal-Bench/DORA/Salesforce references remain (except in version history)
Zhang et al. appears in exec summary, Section 1.1, and references

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Research report updated to v2.0.1: refined finding that positive directives can degrade agent performance while negative constraints can improve it.
- Added "Limitations & Evidence Quality" section and updated methodology to note vendor best practices and arXiv preprints.
- Revised citations and key research list; adjusted subtitle to "Curated Best Practices" and template title accordingly.
Updates
- Clarified Testing & CI/CD guidance and supporting references.

Address follow-up feedback from ambient-code#374: three citations in Section 5.1 (Test Execution) didn't support the claims they were attached to. - Remove Terminal-Bench citation (agent scaffolding, not test execution) - Remove DORA 2025 marketing page (Google Cloud blog, not actual report) - Remove Salesforce Cursor article (AI writing tests, not tests helping AI) - Add Zhang et al. (arXiv:2604.11088) on positive vs negative directives - Rename title from "Comprehensive Research" to "Curated Best Practices" - Add LIMITATIONS & EVIDENCE QUALITY section - Update default-weights.yaml header attribution No tier or weight value changes. Scoring behavior unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The generate_template() in research_formatter.py had the old "Comprehensive Research" title hardcoded. Updated to match the v2.0.1 rename to "Curated Best Practices". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-13T11:24:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: be24475e-e898-491f-a20b-cf4d6595b499

📥 Commits

Reviewing files that changed from the base of the PR and between ed3d113 and e674953.

📒 Files selected for processing (1)

src/agentready/data/RESEARCH_REPORT.md

📝 Walkthrough

Walkthrough

Updated research report to v2.0.1 (2026-05-13): frontmatter and subtitle changed to "Curated Best Practices"; added a finding that positive directives hurt and negative constraints help (Zhang et al., Apr 2026); expanded "Directive Framing (Critical)"; revised testing citations/rationale; added a Limitations & Evidence Quality section; aligned generated template and YAML comment.

Changes

Research report content, citations, and supporting alignment

Layer / File(s)	Summary
Frontmatter, version history, and metadata `src/agentready/data/RESEARCH_REPORT.md` (lines 2–16, 2038–2048)	Bumped version to 2.0.1 (2026-05-13), changed subtitle to "Curated Best Practices", adjusted `reference_count` (55 → 53), and added a v2.0.1 VERSION HISTORY entry documenting citation/evidence-quality corrections.
Key finding insertion `src/agentready/data/RESEARCH_REPORT.md` (line 31)	Inserted new evidence-backed key finding: positive directives degrade agent performance while negative constraints improve it (Zhang et al., Apr 2026).
Directive framing guidance `src/agentready/data/RESEARCH_REPORT.md` (lines 86–90)	Added "Directive Framing (Critical)" subsection with explicit boundaries/prohibitions language and updated citation block to include Zhang alongside retained ETH Zurich findings.
Testing & CI/CD citations and rationale `src/agentready/data/RESEARCH_REPORT.md` (lines 541, 564)	Revised single-command test execution rationale to remove prior DORA marketing-language and cite Red Hat; added a Cursor citation and removed earlier industry references (including Salesforce/LangChain mentions).
Limitations & evidence quality `src/agentready/data/RESEARCH_REPORT.md` (lines 1958–1988)	Added a "LIMITATIONS & EVIDENCE QUALITY" section describing source-type distribution, known research gaps, and scope boundaries ("what this document is not").
Methodology clarification `src/agentready/data/RESEARCH_REPORT.md` (line 2072)	Updated Methodology line to state evidence sources include vendor best practices and arXiv preprints (plus engineering experience).
Generated template and comment alignment `src/agentready/services/research_formatter.py` (generate template title), `src/agentready/data/default-weights.yaml` (comment)	Updated generated Markdown top-level title to match "Curated Best Practices"; replaced a YAML comment referencing "LangChain Terminal-Bench" with "Cursor agent guidelines".

Suggested labels: released

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title follows Conventional Commits format (fix: description) and accurately summarizes the main change: correcting citation-claim mismatches in the research report.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-13T11:25:52Z

📈 Test Coverage Report

Branch	Coverage
This PR	73.3%
Main	73.3%
Diff	✅ +0%

Coverage calculated from unit tests only

jwm4

🤖 AgentReady Code Review

PR Status: 1 issue found (0 🔴 Critical, 0 🟡 Major, 1 🔵 Minor)
Score Impact: None. No scoring weights or assessor logic were modified.
Certification: Gold remains at 80.0/100 after this PR.

🔵 Minor Issues - Low-Risk Fix

1. Zhang et al. citation has wrong title and broken URL

Confidence: 95%
Location: RESEARCH_REPORT.md inline citation and References section

Issue Details:
The paper at arXiv:2604.11088 is cited as "What Makes Repository-Level Instructions Effective for AI Agents?" — that title doesn't match. The actual title is "Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents". Additionally, the /html/ URL path returns a 404; the correct URL is https://arxiv.org/abs/2604.11088.

The finding itself (positive directives hurt, negative constraints help) is accurately described.

Remediation:

Title: "Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents"
URL:   https://arxiv.org/abs/2604.11088

Summary

The three citation removals in Section 5.1 are correct. The LIMITATIONS section is a meaningful addition. The removal of the Terminal-Bench finding from Section 1.1 is also the right call — that result was about agent harness engineering, not codebase attributes, so it doesn't belong in this document regardless of attribution.

🤖 Generated with Claude Code under the supervision of Bill Murdock

_{If this review was useful, react with 👍. Otherwise, react with 👎.}

Addresses PR ambient-code#429 review: arXiv:2604.11088 actual title is "Do Agent Rules Shape or Distort?" not "What Makes Repository-Level Instructions Effective?", and /html/ URL returns 404 — switched to /abs/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jwm4

Citation title and URL are now correct. LGTM.

🤖 Generated with Claude Code under the supervision of Bill Murdock

## [2.37.1](v2.37.0...v2.37.1) (2026-05-14) ### Bug Fixes * correct 3 citation-claim mismatches in research report (v2.0.1) ([0b91f44](0b91f44)) * correct Zhang et al. citation title and broken URL (v2.0.1) ([e674953](e674953)), closes [#429](#429) * update research template title to match report rename ([ed3d113](ed3d113))

github-actions · 2026-05-14T12:38:02Z

🎉 This PR is included in version 2.37.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

natifridman and others added 2 commits May 13, 2026 11:42

coderabbitai Bot approved these changes May 13, 2026

View reviewed changes

natifridman mentioned this pull request May 13, 2026

Fabricated citations and unsubstantiated claims in RESEARCH_REPORT.md #374

Open

jwm4 requested changes May 13, 2026

View reviewed changes

jwm4 approved these changes May 14, 2026

View reviewed changes

jwm4 merged commit c1b1824 into ambient-code:main May 14, 2026
6 checks passed

github-actions Bot added the released label May 14, 2026

natifridman deleted the research-improvements branch May 14, 2026 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct citation-claim mismatches in research report (v2.0.1)#429

fix: correct citation-claim mismatches in research report (v2.0.1)#429
jwm4 merged 3 commits into
ambient-code:mainfrom
natifridman:research-improvements

natifridman commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Walkthrough

Changes

Review ran into problems

Uh oh!

github-actions Bot commented May 13, 2026 •

edited

Loading

Uh oh!

jwm4 left a comment

Uh oh!

jwm4 left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

natifridman commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Review ran into problems

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📈 Test Coverage Report

Uh oh!

jwm4 left a comment

Choose a reason for hiding this comment

🤖 AgentReady Code Review

🔵 Minor Issues - Low-Risk Fix

1. Zhang et al. citation has wrong title and broken URL

Summary

Uh oh!

jwm4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

natifridman commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading

github-actions Bot commented May 13, 2026 •

edited

Loading