Merged
Conversation
- Remove non-standard fields: metadata, tiers, installation_profiles - Remove plugin-level non-standard fields: tier, license, keywords, dependencies, commands, agents, strict - Change 'tier' to 'category' using standard values (development, productivity, learning, security) - Add description at root level - Add email to author objects - Maintain all 22 plugins with simplified structure
|
You have run out of free Bugbot PR reviews for this billing cycle. This will reset on January 7. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
boshu2
added a commit
that referenced
this pull request
Apr 3, 2026
- Remove non-standard fields: metadata, tiers, installation_profiles - Remove plugin-level non-standard fields: tier, license, keywords, dependencies, commands, agents, strict - Change 'tier' to 'category' using standard values (development, productivity, learning, security) - Add description at root level - Add email to author objects - Maintain all 22 plugins with simplified structure Co-authored-by: Claude <noreply@anthropic.com>
boshu2
pushed a commit
that referenced
this pull request
Apr 26, 2026
…sserts The fuzz target proves no-panic but never asserts the parser populates chain.ID, chain.EpicID, or chain.Entries correctly for its seed corpus. A regression that silently dropped entries would still pass. Add TestFuzzParseChainLines_SeedCorrectness covering all 7 seeds: - metadata-plus-one-entry → ID + 1 entry, first step "research" - metadata-only → ID set, 0 entries - empty input → no error, no entries - non-JSON first line → returns error - malformed entry between valid ones → entry skipped, valid one survives - empty metadata object → no error, empty fields - epic_id with two entries → ID + EpicID + 2 entries Closes the post-mortem #6 finding "Add fuzz seed correctness assertions" for the last fuzz target that lacked one (cli/cmd/ao already had companion SeedCorrectness tests for fuzz_jsonl_test.go and fuzz_context_test.go).
boshu2
added a commit
that referenced
this pull request
Apr 30, 2026
Final disposition of 10 harvested items: - 3 completed: #2 (brief_render delete, ee4e90a5), #4 (Tier 3 docs clarify, 2079ff78), #10 (validate-cli-skills-map wired, 5842445c) - 2 wont_fix: #3 supervisor ctx-cancel (existing test asserts current behavior is correct; analyst conflated supervisor-shutdown ctx with operator-cancel API) #8 JobSpec v0 (already wired via submitRPIPhasedDaemon POSTing to /v1/jobs; analyst missed call-site) - 1 in_progress: #1 pend- pollution → soc-2ctn (P0) - 4 deferred: #5 eval determinism → soc-v7s8 (test design cycle) #6 GC v1.0.0 options → soc-ey2h (upstream coordination) #7 control plane beads → soc-b0eq (plan-level decision) #9 snapshot caching → soc-hns4 (real perf work) Pattern observation: 2 of 10 items were post-mortem-analyst false positives that the /rpi --auto discovery phase caught before implementation. See .agents/learnings/2026-04-30-post-mortem-recommendations-need-test-validation.md. batch consumed: rpi-auto-2026-04-30-1648
boshu2
added a commit
that referenced
this pull request
May 1, 2026
…atom-1, soc-sijf) First atom of soc-7ftl chain (per-absorption #6, plans.projection pilot). Adds the JobType + projection name + GET /v1/plans/manifest, GET /v1/plans/diff stub handlers. Executor and projection body land in atom-2 (soc-acwf). Resolves foundation gap G1 (read-path capability site is server.go route table, not auth.go mutation map) per the §6 site 3 (alt) carve-out documented in .agents/plans/2026-05-01-daemon-absorption-spec/00-foundation-contract.md and applies F-PM-2 (docs/contracts/agentops-daemon.md catalogues the new job-type). Plan: .agents/plans/2026-05-01-absorption-6-pilot-implementation.md Closure proofs: .agents/proofs/atom-1/closure.yaml
boshu2
added a commit
that referenced
this pull request
May 2, 2026
Wave 2 of Day 2 — wires the substrate package (committed in 8cdfa85) to the cobra surface. Lands the four stop conditions from SCHEMA.md §9: - ao eval task add <task.yaml> registers a Task (canonical YAML write, refuses if stats.min_n_samples missing) - ao eval task list enumerates registered Tasks - ao eval task show <task-id> prints structured Task summary - ao eval task run <task-id> ... opens a Run via §4 atomic-write contract, runs gates 1/6/7/8/9 (refusals match §6 format), stamps all rc2 manifest fields (harness_content_hash, model_spec_hash, ground_truth_hash, seeds[>=3], rig_id, inspect_command, etc.), transitions pending->running on gate pass. - ao eval cleanup per §4 cleanup state-transition rule: stale pending->aborted (never_started), stale running->failed (orphaned_process). - ao eval cleanup --delete removes Run dirs whose status is failed OR aborted (NEVER retracted — retraction is audit-trail per §5). - ao eval cleanup --tmp-files sweeps orphan manifest.json.tmp left from rename-step crashes. AGENTOPS_EVALS_ROOT env var lets tests + alternative rigs override the default ~/.agents/evals. Smoke verified end-to-end: 1. Full Run produces a manifest with all 17 rc2-required fields populated. 2. Gate #1 (no held_constant) + gate #6 (n=10 < n_required=50) emit refusals matching the §6 4-line GATE FAILED / Why / Evidence / Fix format verbatim. 3. Stale running Run -> failed -> deleted via cleanup --delete. 4. Orphaned .tmp swept; original good Run preserved. Day-2 stop condition met. Day-3 unlocks (port hardware-bench prompts to Inspect Tasks, §6.5 paired cluster-bootstrap, gate #6 graduates to power-derived n_required).
boshu2
added a commit
that referenced
this pull request
May 2, 2026
Wires the Go CLI to the §6.5 statistics module (Python, lives at
~/.agents/evals/_stats/, committed separately as a backup tarball at
/tmp/evals-stats-backup-*.tar.gz).
- ao eval suite verdict <suite-id> --inputs <bootstrap-inputs.json> [--arms]
Shells out to `python -m _stats.cli verdict` against the substrate
venv (~/.agents/evals/.venv). Suite + decision_rule auto-loaded from
disk when --suite-id resolves; --arms overrides varied_axis. Output
includes all 5 verdict outcomes: improved | regressed | no_change |
underpowered | inconclusive_high_variance | inconclusive_degenerate.
- ao eval suite n-required --baseline-rate <p> --mde <d> --alpha <a>
Computes power-derived n_required via the standard normal
approximation: n = (z_{1-alpha/2} + z_{1-beta})^2 * sigma_d^2 / MDE^2.
Worst-case binomial variance (sigma_d^2 = 2*p*(1-p)) when no explicit
variance is provided. Paired by default; --paired=false multiplies by 2.
- evalsubstrate.GateInputs.NRequiredOverride: Day-3 graduates gate #6 to use
a power-derived value when the caller computes one. Falls back to
Task.stats.min_n_samples when override is unset (Day-2 behavior).
- pythonBinary() resolution order:
AGENTOPS_EVALS_VENV (env override)
-> $AGENTOPS_EVALS_ROOT/.venv/bin/python
-> ~/.agents/evals/.venv/bin/python
Returns "" with structured error if none found.
- All 51 substrate-package unit tests still green; gate #6 picks up the
override path without breaking existing Day-2 behavior.
End-to-end verified: `ao eval suite verdict` returns identical
bootstrap_inputs_hash + ci_low/ci_high/delta_point on re-runs (bit-exact),
matching direct Python invocation byte-for-byte.
Day-3 statistical contract module (Python) lives at ~/.agents/evals/_stats/
with 42 pytest tests covering bootstrap reproducibility, all 5 verdicts,
power formula edge cases, canonical JSON ordering, and PCG64 seed derivation.
That dir is NOT a git repo — backup tarball preserved separately.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Description
What does this PR do?
[Provide a clear and concise description of the changes]
Type of Change
What type of change is this?
Related Issues
Does this PR address any issues?
Closes #[issue-number]
Fixes #[issue-number]
Relates to #[issue-number]
For Plugin Submissions
If this is a new plugin or plugin update, complete this section:
Plugin Information
Components Added/Changed
Dependencies
Testing Completed
Pre-submission testing:
Installation test command:
/plugin install file://$(pwd)/plugins/[plugin-name]Usage Examples
Provide at least one working example:
# Example usageExpected output:
[What should happen]
For Bug Fixes
If this is a bug fix, complete this section:
Bug Description
[What bug does this fix?]
Root Cause
[What was causing the bug?]
Solution
[How does this PR fix it?]
Testing
For Documentation Updates
If this is a documentation update:
Changes Made
Reason for Update
[Why was this documentation update needed?]
Changes Made
Detailed breakdown of changes:
Files Added
path/to/file.ext- [Purpose]Files Modified
path/to/file.ext- [What changed and why]Files Deleted
path/to/file.ext- [Why deleted]Testing Strategy
How did you test these changes?
Test results:
Breaking Changes
Does this PR introduce breaking changes?
If yes, describe the breaking changes:
Documentation
Have you updated relevant documentation?
Code Quality
Self-review checklist:
Security
Security considerations:
Performance
Performance impact:
Deployment
Deployment considerations:
Screenshots/Examples
If applicable, add screenshots or examples:
[Attach or describe visual changes]
Checklist
Before submitting, ensure:
Additional Notes
Any additional information for reviewers:
For Reviewers
Review checklist:
Review notes:
[Space for reviewer comments]