Skip to content

Phase 1A: Rules pack — definition format, build script, first tier-1 rules #4

@RAprogramm

Description

@RAprogramm

Goal

Ship the first concrete rules pack: a per-rule directory layout, build-time bundling into a static RULES array exposed by rustmanifest-rules-core, and the first 5 tier-1 rules drawn directly from the EN methodology (code-review-methodology/en/quick-reference.md).

Phase 0 only locked the type surface. Phase 1A makes the type surface real with concrete data and validates that the build pipeline works end-to-end.

Locked design

Rule definition layout

Each rule lives in its own directory under crates/rustmanifest-rules-core/rules/:

rules/
  RM-SEC-001/
    rule.toml
    pass.rs       # MUST NOT trigger this rule
    fail.rs       # MUST trigger this rule
  RM-SEC-002/
    ...

rule.toml schema

id = "RM-SEC-001"
title = "Hardcoded credentials"
severity = "error"
rationale_uri = "rustmanifest://methodology/security-vulnerabilities#hardcoded-credentials"

[definition.pattern]
regex = '(?i)(password|secret|api_key|token|private_key)\s*=\s*["\x27]'
exclude_globs = ["**/tests/**", "**/*_test.rs", "**/test_*.rs"]

definition is a tagged enum (tier is now derived from definition variant). Variants:

  • [definition.pattern] — tier 1; carries regex and exclude_globs.
  • [definition.ast] — tier 2; carries check identifier (concrete checks land in Phase 1C).
  • [definition.semantic] — tier 3; reserved.

Build pipeline

  • build.rs in rustmanifest-rules-core walks rules/, parses each rule.toml, validates ID equals directory name, emits OUT_DIR/rules.json.
  • lib.rs exposes pub static RULES: LazyLock<Vec<Rule>> that parses the embedded JSON once on first access.
  • Build fails on duplicate IDs, malformed TOML, or fixture files missing.

Schema change

rustmanifest-schema::Rule becomes:

pub struct Rule {
    pub id: String,
    pub severity: Severity,
    pub title: String,
    pub rationale_uri: String,
    pub definition: RuleDefinition,
}

#[serde(tag = "tier", rename_all = "kebab-case")]
pub enum RuleDefinition {
    Pattern { regex: String, exclude_globs: Vec<String> },
    Ast { check: String },
    Semantic { check: String },
}

The old tier: Tier field on Rule is removed (the tier is implicit in the enum variant). Golden schemas regenerated and committed. Schema-drift CI gate stays green.

First 5 rules

Sourced verbatim from code-review-methodology/en/quick-reference.md:

ID Title Source line (quick-reference.md) Severity
RM-SEC-001 Hardcoded credentials L12 error
RM-SEC-002 SQL injection via format! L15 error
RM-SEC-003 Command injection via format! L18 error
RM-RUST-001 Panic in production code L33 error
RM-PERF-001 Vec::new() without with_capacity L60 warning

Each ships with a minimal pass.rs and fail.rs. Fixtures are intentionally narrow to lock the regex semantics; broader fixtures come with the actual engine in Phase 1B.

Deliverables

  • rustmanifest-schema: Rule reshaped, RuleDefinition enum added, golden schemas regenerated.
  • rustmanifest-rules-core/rules/ with 5 rule directories (TOML + pass.rs + fail.rs each).
  • rustmanifest-rules-core/build.rs bundling rules into OUT_DIR/rules.json.
  • rustmanifest-rules-core/src/lib.rs exposing RULES: LazyLock<Vec<Rule>>.
  • Tests in rustmanifest-rules-core:
    • All rule regexes compile.
    • Every fail.rs matches its rule's regex.
    • Every pass.rs does not match.
    • Rule IDs are unique within the pack.
    • Rule IDs match ^RM-(SEC|PERF|RUST|QUAL|STRUCT)-\d{3}$.
    • Rule IDs equal their parent directory name.
  • CI stays green (lint, test×4 OS, MSRV, audit, deny, reuse, schema-drift, coverage, mcp-conformance).

New dependencies

  • Runtime (rustmanifest-rules-core): regex 1 for compiling tier-1 patterns at runtime (used by tests in 1A, by the engine in 1B).
  • Build (rustmanifest-rules-core): toml 0.9 (build-deps only).
  • Both well-maintained, MIT-licensed, well within current cargo-deny policy.

Out of scope

  • Engine implementation (analyzers that walk source files and produce findings) — Phase 1B.
  • AST tier rules from STRUCTURE.md — Phase 1C.
  • CLI check subcommand wiring — Phase 1D.
  • SARIF / JSON / TTY renderers — Phase 1D.
  • Eval corpus and precision/recall measurement — Phase 1E.
  • Additional rules beyond the first 5 — follow-up PRs.

Acceptance criteria

  • cargo +nightly fmt --all -- --check clean.
  • cargo clippy --workspace --all-targets --all-features -- -D warnings clean.
  • cargo test --workspace passes — including the new rule-validation tests.
  • reuse lint 100% compliant on all new files.
  • cargo deny check clean.
  • Schema-drift CI gate green.
  • cargo run --bin rustmanifest-schema-export and committed schemas reflect the new RuleDefinition enum.
  • All 13 CI jobs green on the PR.

Risks

  • Regex from quick-reference.md (e.g. for SQL injection) is intentionally simple and over-broad; documented as "high recall, medium precision" tier-1 behavior. False-positive narrowing happens in Phase 1B when the engine adds glob exclusions and call-site context.
  • Schema breakage (Rule reshape) ripples through any consumer; since this is pre-1.0 (0.0.0), the cost is contained — but the schema-drift gate must catch any forgotten regeneration.
  • regex crate is large; verify it does not break multiple-versions = "deny" in cargo-deny. If it does, narrow the dep tree or document specific skips with rationale (no blanket allow).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions