Skip to content

fix: correct citation-claim mismatches in research report (v2.0.1)#429

Merged
jwm4 merged 3 commits into
ambient-code:mainfrom
natifridman:research-improvements
May 14, 2026
Merged

fix: correct citation-claim mismatches in research report (v2.0.1)#429
jwm4 merged 3 commits into
ambient-code:mainfrom
natifridman:research-improvements

Conversation

@natifridman
Copy link
Copy Markdown
Contributor

@natifridman natifridman commented May 13, 2026

Summary

Addresses follow-up feedback from #374 (comment): three citations in Section 5.1 (Test Execution) don't support the claims they're attached to. All three were verified against the actual sources.

  • Terminal-Bench: cited under Test Execution but the 52.8%→66.5% improvement came from agent harness engineering (system prompts, middleware, loop detection), not test execution infrastructure. Also misattributed to LangChain — Terminal-Bench was created by Stanford/Laude Institute.
  • DORA 2025: URL pointed to a Google Cloud marketing page, not the actual DORA report. "TDD is more critical than ever" is marketing interpretation — the actual report finds AI adoption has a negative relationship with software delivery stability.
  • Salesforce Cursor: article is about using Cursor to write tests faster for a legacy coverage mandate — says nothing about test execution being important for AI agent effectiveness.

Changes

  • Removed 3 misaligned citations from Section 5.1, replaced with Cursor agent best practices (which explicitly says "write tests, give the agent clear signals")
  • Added Zhang et al. (arXiv:2604.11088) finding: positive directives hurt agent performance, only negative constraints help
  • Renamed report title from "Comprehensive Research" to "Curated Best Practices"
  • Added LIMITATIONS & EVIDENCE QUALITY section acknowledging source types and known gaps
  • Updated default-weights.yaml header comment and research_formatter.py template title
  • No tier or weight value changes — scoring behavior unchanged

Test plan

  • agentready research validate passes
  • All 74 research-related tests pass
  • Self-assessment score unchanged (73.2/100 before and after)
  • External repo assessment (Flask) runs without errors
  • Grep verification: no stale Terminal-Bench/DORA/Salesforce references remain (except in version history)
  • Zhang et al. appears in exec summary, Section 1.1, and references

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Research report updated to v2.0.1: refined finding that positive directives can degrade agent performance while negative constraints can improve it.
    • Added "Limitations & Evidence Quality" section and updated methodology to note vendor best practices and arXiv preprints.
    • Revised citations and key research list; adjusted subtitle to "Curated Best Practices" and template title accordingly.
  • Updates

    • Clarified Testing & CI/CD guidance and supporting references.

natifridman and others added 2 commits May 13, 2026 11:42
Address follow-up feedback from ambient-code#374: three citations in Section 5.1
(Test Execution) didn't support the claims they were attached to.

- Remove Terminal-Bench citation (agent scaffolding, not test execution)
- Remove DORA 2025 marketing page (Google Cloud blog, not actual report)
- Remove Salesforce Cursor article (AI writing tests, not tests helping AI)
- Add Zhang et al. (arXiv:2604.11088) on positive vs negative directives
- Rename title from "Comprehensive Research" to "Curated Best Practices"
- Add LIMITATIONS & EVIDENCE QUALITY section
- Update default-weights.yaml header attribution

No tier or weight value changes. Scoring behavior unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The generate_template() in research_formatter.py had the old
"Comprehensive Research" title hardcoded. Updated to match the
v2.0.1 rename to "Curated Best Practices".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: be24475e-e898-491f-a20b-cf4d6595b499

📥 Commits

Reviewing files that changed from the base of the PR and between ed3d113 and e674953.

📒 Files selected for processing (1)
  • src/agentready/data/RESEARCH_REPORT.md

📝 Walkthrough

Walkthrough

Updated research report to v2.0.1 (2026-05-13): frontmatter and subtitle changed to "Curated Best Practices"; added a finding that positive directives hurt and negative constraints help (Zhang et al., Apr 2026); expanded "Directive Framing (Critical)"; revised testing citations/rationale; added a Limitations & Evidence Quality section; aligned generated template and YAML comment.

Changes

Research report content, citations, and supporting alignment

Layer / File(s) Summary
Frontmatter, version history, and metadata
src/agentready/data/RESEARCH_REPORT.md (lines 2–16, 2038–2048)
Bumped version to 2.0.1 (2026-05-13), changed subtitle to "Curated Best Practices", adjusted reference_count (55 → 53), and added a v2.0.1 VERSION HISTORY entry documenting citation/evidence-quality corrections.
Key finding insertion
src/agentready/data/RESEARCH_REPORT.md (line 31)
Inserted new evidence-backed key finding: positive directives degrade agent performance while negative constraints improve it (Zhang et al., Apr 2026).
Directive framing guidance
src/agentready/data/RESEARCH_REPORT.md (lines 86–90)
Added "Directive Framing (Critical)" subsection with explicit boundaries/prohibitions language and updated citation block to include Zhang alongside retained ETH Zurich findings.
Testing & CI/CD citations and rationale
src/agentready/data/RESEARCH_REPORT.md (lines 541, 564)
Revised single-command test execution rationale to remove prior DORA marketing-language and cite Red Hat; added a Cursor citation and removed earlier industry references (including Salesforce/LangChain mentions).
Limitations & evidence quality
src/agentready/data/RESEARCH_REPORT.md (lines 1958–1988)
Added a "LIMITATIONS & EVIDENCE QUALITY" section describing source-type distribution, known research gaps, and scope boundaries ("what this document is not").
Methodology clarification
src/agentready/data/RESEARCH_REPORT.md (line 2072)
Updated Methodology line to state evidence sources include vendor best practices and arXiv preprints (plus engineering experience).
Generated template and comment alignment
src/agentready/services/research_formatter.py (generate template title), src/agentready/data/default-weights.yaml (comment)
Updated generated Markdown top-level title to match "Curated Best Practices"; replaced a YAML comment referencing "LangChain Terminal-Bench" with "Cursor agent guidelines".

Suggested labels: released

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title follows Conventional Commits format (fix: description) and accurately summarizes the main change: correcting citation-claim mismatches in the research report.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

📈 Test Coverage Report

Branch Coverage
This PR 73.3%
Main 73.3%
Diff ✅ +0%

Coverage calculated from unit tests only

Copy link
Copy Markdown
Contributor

@jwm4 jwm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AgentReady Code Review

PR Status: 1 issue found (0 🔴 Critical, 0 🟡 Major, 1 🔵 Minor)
Score Impact: None. No scoring weights or assessor logic were modified.
Certification: Gold remains at 80.0/100 after this PR.


🔵 Minor Issues - Low-Risk Fix

1. Zhang et al. citation has wrong title and broken URL

Confidence: 95%
Location: RESEARCH_REPORT.md inline citation and References section

Issue Details:
The paper at arXiv:2604.11088 is cited as "What Makes Repository-Level Instructions Effective for AI Agents?" — that title doesn't match. The actual title is "Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents". Additionally, the /html/ URL path returns a 404; the correct URL is https://arxiv.org/abs/2604.11088.

The finding itself (positive directives hurt, negative constraints help) is accurately described.

Remediation:

Title: "Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents"
URL:   https://arxiv.org/abs/2604.11088

Summary

The three citation removals in Section 5.1 are correct. The LIMITATIONS section is a meaningful addition. The removal of the Terminal-Bench finding from Section 1.1 is also the right call — that result was about agent harness engineering, not codebase attributes, so it doesn't belong in this document regardless of attribution.


🤖 Generated with Claude Code under the supervision of Bill Murdock

If this review was useful, react with 👍. Otherwise, react with 👎.

Addresses PR ambient-code#429 review: arXiv:2604.11088 actual title is "Do Agent
Rules Shape or Distort?" not "What Makes Repository-Level Instructions
Effective?", and /html/ URL returns 404 — switched to /abs/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@jwm4 jwm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Citation title and URL are now correct. LGTM.

🤖 Generated with Claude Code under the supervision of Bill Murdock

@jwm4 jwm4 merged commit c1b1824 into ambient-code:main May 14, 2026
6 checks passed
github-actions Bot pushed a commit that referenced this pull request May 14, 2026
## [2.37.1](v2.37.0...v2.37.1) (2026-05-14)

### Bug Fixes

* correct 3 citation-claim mismatches in research report (v2.0.1) ([0b91f44](0b91f44))
* correct Zhang et al. citation title and broken URL (v2.0.1) ([e674953](e674953)), closes [#429](#429)
* update research template title to match report rename ([ed3d113](ed3d113))
@github-actions
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 2.37.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@natifridman natifridman deleted the research-improvements branch May 14, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants