fix(docs): correct contains* case-sensitivity in grader.md#1171
Merged
fix(docs): correct contains* case-sensitivity in grader.md#1171
Conversation
grader.md:42 incorrectly stated `contains` is case-insensitive by default; the implementation uses raw `.includes()` which is case-sensitive. Updated `contains`, `contains-any`, and `contains-all` descriptions to reflect the actual behaviour, and added regression tests pinning case-sensitivity for all three functions. Closes #1154 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Test runs create temp directories (e.g. __tmp_bench_test__) that biome picks up after the test step in the pre-push hook, causing spurious lint failures. Adding the pattern to files.ignore prevents this. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The test spawns multiple bun child processes (pipeline input/grade/bench) which takes 5-10s in constrained environments, exceeding bun's default per-test timeout of 5000ms. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Deploying agentv with
|
| Latest commit: |
c601ec3
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://f73fc9a4.agentv.pages.dev |
| Branch Preview URL: | https://fix-1154-contains-case-sensi.agentv.pages.dev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1154
What changed
grader.md:42incorrectly documentedcontainsas "case-insensitive by default". The implementation uses raw.includes()which is case-sensitive. This also made theicontains*entries internally inconsistent (they would only make sense as a distinct variant ifcontains*is already case-sensitive).Option taken: Option 1 from the issue — fix the documentation to match the implementation.
Changes
plugins/agentv-dev/skills/agentv-bench/agents/grader.md— correctedcontains,contains-any, andcontains-alldescriptions to say(case-sensitive). Simplifiedicontains*row to just saycase-insensitive.packages/core/test/evaluation/graders/assertions.test.ts— added regression tests pinning case-sensitivity forrunContainsAssertion,runContainsAnyAssertion, andrunContainsAllAssertion.Incidental fixes (unrelated to #1154, caught by pre-push hook)
examples/red-team/archetypes/coding-agent/fixtures/poisoned-mcp-server.js— biome lint/format fixes (double → single quotes, template literal) introduced by feat(examples): scenario-based red-team suites for coding and customer-facing agent archetypes #1168.biome.json— added**/__tmp_*/**tofiles.ignoreso test-created temp dirs don't trigger biome during the pre-push test→lint sequence.apps/cli/test/commands/eval/pipeline/pipeline-e2e.test.ts— set explicit 30s timeout; the test spawns multiple bun child processes and was hitting the 5s default in constrained environments.Red/green UAT
Red (before fix):
grader.md:42says(case-insensitive by default).Green (with fix):
grader.md:42says(case-sensitive). Regression testrunContainsAssertion('Hello, world!', 'hello').score === 0passes, confirming the documented and tested behaviour are consistent.Test plan
bun test packages/core/test/evaluation/graders/assertions.test.ts— 12 pass, 0 fail