Fix marketplace format to match schema by boshu2 · Pull Request #6 · boshu2/agentops

boshu2 · 2025-12-10T01:00:58Z

Remove non-standard fields: metadata, tiers, installation_profiles
Remove plugin-level non-standard fields: tier, license, keywords, dependencies, commands, agents, strict
Change 'tier' to 'category' using standard values (development, productivity, learning, security)
Add description at root level
Add email to author objects
Maintain all 22 plugins with simplified structure

Pull Request

Description

What does this PR do?

[Provide a clear and concise description of the changes]

Type of Change

What type of change is this?

Related Issues

Does this PR address any issues?

Closes #[issue-number]
Fixes #[issue-number]
Relates to #[issue-number]

For Plugin Submissions

If this is a new plugin or plugin update, complete this section:

Plugin Information

Name: [plugin-name]
Version: [1.0.0]
Description: [Brief description]
Token Budget: [~X tokens (X%)]

Components Added/Changed

Dependencies

core-workflow (required/optional)
Other plugins: [list]

Testing Completed

Pre-submission testing:

Installation test command:

/plugin install file://$(pwd)/plugins/[plugin-name]

Usage Examples

Provide at least one working example:

# Example usage

Expected output:
[What should happen]

For Bug Fixes

If this is a bug fix, complete this section:

Bug Description

[What bug does this fix?]

Root Cause

[What was causing the bug?]

Solution

[How does this PR fix it?]

Testing

Bug reproduced before fix
Bug no longer occurs after fix
No regression in other functionality
Added test to prevent regression

For Documentation Updates

If this is a documentation update:

Changes Made

Reason for Update

[Why was this documentation update needed?]

Changes Made

Detailed breakdown of changes:

Files Added

path/to/file.ext - [Purpose]

Files Modified

path/to/file.ext - [What changed and why]

Files Deleted

path/to/file.ext - [Why deleted]

Testing Strategy

How did you test these changes?

[Test approach 1]
[Test approach 2]
[Test approach 3]

Test results:

[Include relevant test output or screenshots]

Breaking Changes

Does this PR introduce breaking changes?

Yes (explain below)
No

If yes, describe the breaking changes:

[What breaks]
[Migration path for users]
[Documentation updates needed]

Documentation

Have you updated relevant documentation?

README.md (if user-facing changes)
CHANGELOG.md (if version bump)
Plugin README (if plugin changes)
Agent documentation (if agent changes)
Contributing guidelines (if process changes)
N/A - No documentation needed

Code Quality

Self-review checklist:

Code follows project style guidelines
Added comments for complex logic
No unnecessary console.log or debug code
No commented-out code (unless explained)
Variable/function names are descriptive
Error handling is appropriate
Security best practices followed

Security

Security considerations:

No secrets or credentials in code
Input validation where needed
No SQL injection vulnerabilities
No XSS vulnerabilities
Dependencies are up to date
Security policy reviewed
N/A - No security implications

Performance

Performance impact:

No significant performance impact
Performance improved (explain how)
Performance regressed (explain why acceptable)
Not applicable

Deployment

Deployment considerations:

No special deployment steps needed
Requires configuration changes (documented)
Requires database migration
Requires dependency installation
Other: [specify]

Screenshots/Examples

If applicable, add screenshots or examples:

[Attach or describe visual changes]

Checklist

Before submitting, ensure:

Additional Notes

Any additional information for reviewers:

Special considerations
Known limitations
Future enhancements planned
Questions for reviewers

For Reviewers

Review checklist:

Review notes:

[Space for reviewer comments]

- Remove non-standard fields: metadata, tiers, installation_profiles - Remove plugin-level non-standard fields: tier, license, keywords, dependencies, commands, agents, strict - Change 'tier' to 'category' using standard values (development, productivity, learning, security) - Add description at root level - Add email to author objects - Maintain all 22 plugins with simplified structure

cursor · 2025-12-10T01:01:02Z

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on January 7.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

- Remove non-standard fields: metadata, tiers, installation_profiles - Remove plugin-level non-standard fields: tier, license, keywords, dependencies, commands, agents, strict - Change 'tier' to 'category' using standard values (development, productivity, learning, security) - Add description at root level - Add email to author objects - Maintain all 22 plugins with simplified structure Co-authored-by: Claude <noreply@anthropic.com>

…sserts The fuzz target proves no-panic but never asserts the parser populates chain.ID, chain.EpicID, or chain.Entries correctly for its seed corpus. A regression that silently dropped entries would still pass. Add TestFuzzParseChainLines_SeedCorrectness covering all 7 seeds: - metadata-plus-one-entry → ID + 1 entry, first step "research" - metadata-only → ID set, 0 entries - empty input → no error, no entries - non-JSON first line → returns error - malformed entry between valid ones → entry skipped, valid one survives - empty metadata object → no error, empty fields - epic_id with two entries → ID + EpicID + 2 entries Closes the post-mortem #6 finding "Add fuzz seed correctness assertions" for the last fuzz target that lacked one (cli/cmd/ao already had companion SeedCorrectness tests for fuzz_jsonl_test.go and fuzz_context_test.go).

Final disposition of 10 harvested items: - 3 completed: #2 (brief_render delete, ee4e90a5), #4 (Tier 3 docs clarify, 2079ff78), #10 (validate-cli-skills-map wired, 5842445c) - 2 wont_fix: #3 supervisor ctx-cancel (existing test asserts current behavior is correct; analyst conflated supervisor-shutdown ctx with operator-cancel API) #8 JobSpec v0 (already wired via submitRPIPhasedDaemon POSTing to /v1/jobs; analyst missed call-site) - 1 in_progress: #1 pend- pollution → soc-2ctn (P0) - 4 deferred: #5 eval determinism → soc-v7s8 (test design cycle) #6 GC v1.0.0 options → soc-ey2h (upstream coordination) #7 control plane beads → soc-b0eq (plan-level decision) #9 snapshot caching → soc-hns4 (real perf work) Pattern observation: 2 of 10 items were post-mortem-analyst false positives that the /rpi --auto discovery phase caught before implementation. See .agents/learnings/2026-04-30-post-mortem-recommendations-need-test-validation.md. batch consumed: rpi-auto-2026-04-30-1648

…atom-1, soc-sijf) First atom of soc-7ftl chain (per-absorption #6, plans.projection pilot). Adds the JobType + projection name + GET /v1/plans/manifest, GET /v1/plans/diff stub handlers. Executor and projection body land in atom-2 (soc-acwf). Resolves foundation gap G1 (read-path capability site is server.go route table, not auth.go mutation map) per the §6 site 3 (alt) carve-out documented in .agents/plans/2026-05-01-daemon-absorption-spec/00-foundation-contract.md and applies F-PM-2 (docs/contracts/agentops-daemon.md catalogues the new job-type). Plan: .agents/plans/2026-05-01-absorption-6-pilot-implementation.md Closure proofs: .agents/proofs/atom-1/closure.yaml

Wave 2 of Day 2 — wires the substrate package (committed in 8cdfa85) to the cobra surface. Lands the four stop conditions from SCHEMA.md §9: - ao eval task add <task.yaml> registers a Task (canonical YAML write, refuses if stats.min_n_samples missing) - ao eval task list enumerates registered Tasks - ao eval task show <task-id> prints structured Task summary - ao eval task run <task-id> ... opens a Run via §4 atomic-write contract, runs gates 1/6/7/8/9 (refusals match §6 format), stamps all rc2 manifest fields (harness_content_hash, model_spec_hash, ground_truth_hash, seeds[>=3], rig_id, inspect_command, etc.), transitions pending->running on gate pass. - ao eval cleanup per §4 cleanup state-transition rule: stale pending->aborted (never_started), stale running->failed (orphaned_process). - ao eval cleanup --delete removes Run dirs whose status is failed OR aborted (NEVER retracted — retraction is audit-trail per §5). - ao eval cleanup --tmp-files sweeps orphan manifest.json.tmp left from rename-step crashes. AGENTOPS_EVALS_ROOT env var lets tests + alternative rigs override the default ~/.agents/evals. Smoke verified end-to-end: 1. Full Run produces a manifest with all 17 rc2-required fields populated. 2. Gate #1 (no held_constant) + gate #6 (n=10 < n_required=50) emit refusals matching the §6 4-line GATE FAILED / Why / Evidence / Fix format verbatim. 3. Stale running Run -> failed -> deleted via cleanup --delete. 4. Orphaned .tmp swept; original good Run preserved. Day-2 stop condition met. Day-3 unlocks (port hardware-bench prompts to Inspect Tasks, §6.5 paired cluster-bootstrap, gate #6 graduates to power-derived n_required).

Wires the Go CLI to the §6.5 statistics module (Python, lives at ~/.agents/evals/_stats/, committed separately as a backup tarball at /tmp/evals-stats-backup-*.tar.gz). - ao eval suite verdict <suite-id> --inputs <bootstrap-inputs.json> [--arms] Shells out to `python -m _stats.cli verdict` against the substrate venv (~/.agents/evals/.venv). Suite + decision_rule auto-loaded from disk when --suite-id resolves; --arms overrides varied_axis. Output includes all 5 verdict outcomes: improved | regressed | no_change | underpowered | inconclusive_high_variance | inconclusive_degenerate. - ao eval suite n-required --baseline-rate <p> --mde <d> --alpha <a> Computes power-derived n_required via the standard normal approximation: n = (z_{1-alpha/2} + z_{1-beta})^2 * sigma_d^2 / MDE^2. Worst-case binomial variance (sigma_d^2 = 2*p*(1-p)) when no explicit variance is provided. Paired by default; --paired=false multiplies by 2. - evalsubstrate.GateInputs.NRequiredOverride: Day-3 graduates gate #6 to use a power-derived value when the caller computes one. Falls back to Task.stats.min_n_samples when override is unset (Day-2 behavior). - pythonBinary() resolution order: AGENTOPS_EVALS_VENV (env override) -> $AGENTOPS_EVALS_ROOT/.venv/bin/python -> ~/.agents/evals/.venv/bin/python Returns "" with structured error if none found. - All 51 substrate-package unit tests still green; gate #6 picks up the override path without breaking existing Day-2 behavior. End-to-end verified: `ao eval suite verdict` returns identical bootstrap_inputs_hash + ci_low/ci_high/delta_point on re-runs (bit-exact), matching direct Python invocation byte-for-byte. Day-3 statistical contract module (Python) lives at ~/.agents/evals/_stats/ with 42 pytest tests covering bootstrap reproducibility, all 5 verdicts, power formula edge cases, canonical JSON ordering, and PCG64 seed derivation. That dir is NOT a git repo — backup tarball preserved separately.

boshu2 merged commit 437d3ba into main Dec 10, 2025
3 of 7 checks passed

boshu2 deleted the claude/fix-marketplace-format-018MioKqofTPYPCVi5SqdE2D branch December 10, 2025 01:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix marketplace format to match schema#6

Fix marketplace format to match schema#6
boshu2 merged 1 commit intomainfrom
claude/fix-marketplace-format-018MioKqofTPYPCVi5SqdE2D

boshu2 commented Dec 10, 2025 •

edited

Loading

Uh oh!

cursor Bot commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

boshu2 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Description

Type of Change

Related Issues

For Plugin Submissions

Plugin Information

Components Added/Changed

Dependencies

Testing Completed

Usage Examples

For Bug Fixes

Bug Description

Root Cause

Solution

Testing

For Documentation Updates

Changes Made

Reason for Update

Changes Made

Files Added

Files Modified

Files Deleted

Testing Strategy

Breaking Changes

Documentation

Code Quality

Security

Performance

Deployment

Screenshots/Examples

Checklist

Additional Notes

For Reviewers

Uh oh!

cursor Bot commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

boshu2 commented Dec 10, 2025 •

edited

Loading