Skip to content

fix(skills): reject invalid frontmatter YAML#896

Merged
phernandez merged 2 commits into
mainfrom
codex/fix-invalid-skill-yaml
Jun 5, 2026
Merged

fix(skills): reject invalid frontmatter YAML#896
phernandez merged 2 commits into
mainfrom
codex/fix-invalid-skill-yaml

Conversation

@phernandez
Copy link
Copy Markdown
Member

Summary

  • tighten the dependency-free skill frontmatter parser to reject unquoted plain-scalar values containing colon-space
  • add regression coverage for the Codex loader failure shape while preserving nested schema frontmatter behavior

Test Plan

  • uv run pytest tests/ci/test_validate_skills.py -q
  • uv run ruff check scripts/validate_skills.py tests/ci/test_validate_skills.py
  • uv run ruff format --check scripts/validate_skills.py tests/ci/test_validate_skills.py
  • just typecheck
  • just package-check-skills
  • just package-check

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Claude finished @phernandez's task in 1m 30s —— View job


PR Review: fix(skills): reject invalid frontmatter YAML

Todo List:

  • Gather context and understand the PR
  • Read changed files
  • Review against team checklist
  • Post final review

Summary

This PR tightens scripts/validate_skills.py to proactively reject unquoted plain-scalar YAML values containing : (colon-space) — the exact pattern that causes Codex to fail when loading skills. It also adds line-number tracking for better error messages and adds a regression test file.

The change is focused, dependency-free (as required for bare python3 in CI), and addresses a real loader failure. Overall the implementation is sound.


Code Quality & Standards

  • Follows Basic Memory's coding conventions in CLAUDE.md
  • Python 3.12+ type annotations used
  • 100-character line length maintained (line 17 in validate_plain_scalar is ~93 chars)
  • No commented-out code blocks
  • No SQLAlchemy/FastAPI/Typer concerns (CI utility script)

Testing & Documentation

  • Regression test for the Codex failure shape added
  • Nested schema frontmatter behavior preserved and tested
  • Missing test case: URL in description (see finding below)
  • Missing assertion in nested-fields test (see finding below)

Security & Performance

  • No hardcoded secrets or credentials
  • No sensitive data
  • Compiled regex at module level — correct

Compatibility

  • Uses pathlib.Path — Windows-compatible
  • No emoji/unicode in console output

Findings

1. No test for URL-shaped values in descriptions (informational)

The regex r":(?:\s|$)" correctly allows https://example.com (: followed by /, not whitespace). However, there's no test verifying this. A description like:

description: See https://docs.basicmemory.com for usage.

passes correctly today, but it would be easy to break this in a future refactor. Consider adding a positive test.

Fix this →

2. Nested test doesn't assert the schema key handling

In test_parse_frontmatter_keeps_nested_fields_nested, the schema: top-level line with no value gets stored as frontmatter["schema"] = "". The test correctly checks type and entity, but doesn't assert anything about schema. This leaves ambiguous whether the empty-string key being present is intentional. Consider either:

  • assert "schema" not in frontmatter (if nested key should be dropped), or
  • assert frontmatter.get("schema") == "" (documenting current behavior explicitly)

Fix this →

3. Single-quote stripping gap (pre-existing, not introduced here)

frontmatter[key] = value.strip('"') strips only double-quotes, not single-quotes. A properly quoted YAML value like description: 'foo: bar' would pass validation (starts with ') and be stored with the single-quotes intact. This is pre-existing behavior and not blocking for this PR, but worth tracking.


Verdict

The core logic is correct and the fix is appropriately minimal. Both findings are minor — the URL test is a nice-to-have, and the nested-key assertion is a documentation gap rather than a bug. This is ready to merge as-is, or the small improvements above can be added.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Claude finished @phernandez's task in 57s —— View job


PR Review: fix(skills): reject invalid frontmatter YAML

Todo List:

  • Gather context and understand the PR
  • Read changed files
  • Check what changed since the last review
  • Post final review

Summary

Two commits on this branch:

  • a45fa07 — core fix: regex-based plain-scalar validation with line-number tracking
  • c816645 — follow-up: addresses all three findings from the prior review

All three findings from the previous review have been resolved. The PR is in good shape.


Code Quality & Standards

  • Follows Basic Memory's coding conventions in CLAUDE.md
  • Python 3.12+ type annotations used throughout
  • 100-character line length maintained
  • No commented-out code blocks
  • Dependency-free (bare python3 compatible, as required for CI)

Testing & Documentation

Security & Performance

  • No hardcoded secrets or credentials
  • Compiled regex at module level — correct
  • No sensitive data

Compatibility

  • Uses pathlib.Path — Windows-compatible
  • No emoji/unicode in console output

Finding Resolution Walkthrough

Finding #1 (URL test) — Resolved by test_parse_frontmatter_allows_url_colons_in_plain_values. Confirms https://docs.basicmemory.com passes without raising, since : is followed by / not whitespace.

Finding #2 (nested assertion) — Resolved by adding assert frontmatter["schema"] == "" to test_parse_frontmatter_keeps_nested_fields_nested. Documents the current behavior explicitly: a top-level key with no value is stored as an empty string.

Finding #3 (single-quote stripping) — Resolved by replacing the value.strip('"') one-liner with strip_matching_quotes(), which handles both ' and " symmetrically, and the new test validates round-trip through a single-quoted description containing : .


One Minor Note

validate_plain_scalar is called on the raw stripped value before quote-stripping, which is correct — quoted values short-circuit at stripped[0] in {'"', "'"}. The ordering in parse_frontmatter (validate_plain_scalarstrip_matching_quotes) is sound.


Verdict

All previous findings addressed. The implementation is correct, minimal, and well-tested. Ready to merge.

@phernandez phernandez merged commit ce4b5d4 into main Jun 5, 2026
26 of 29 checks passed
@phernandez phernandez deleted the codex/fix-invalid-skill-yaml branch June 5, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant