Skip to content

Conversation

@ben-edna
Copy link
Contributor

@ben-edna ben-edna commented Jan 6, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Improved manifest validation diagnostics: multi-line messages that echo invalid content with line/position indicators.
    • Enforced uniqueness for manifest entries (names and destinations) to prevent ambiguous configurations.
  • New Features

    • Replaced previous validator with a stricter YAML-based parser for more precise type and structure checks.
    • Added comprehensive manifest schema validation.
  • Chores

    • Renamed packaged resource from "schema" to "template" and updated tests accordingly.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

Warning

Rate limit exceeded

@spoorcc has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 44 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 333474e and 22fccf3.

📒 Files selected for processing (9)
  • CHANGELOG.rst
  • dfetch/manifest/schema.py
  • dfetch/manifest/validate.py
  • dfetch/resources/__init__.py
  • dfetch/resources/schema.yaml
  • features/updated-project-has-dependencies.feature
  • features/validate-manifest.feature
  • pyproject.toml
  • tests/test_resources.py

Walkthrough

Migrate manifest validation from PyKwalify to StrictYAML: add a StrictYAML schema module, update validation to load with strictyaml and enforce cross-item uniqueness, remove the old YAML schema file, rename a resource helper, update tests/features, and replace pykwalify with strictyaml in dependencies.

Changes

Cohort / File(s) Summary
Manifest schema
dfetch/manifest/schema.py
New StrictYAML schema definitions: NUMBER, REMOTE_SCHEMA, PROJECT_SCHEMA, MANIFEST_SCHEMA.
Manifest validation
dfetch/manifest/validate.py
Validation switched from PyKwalify to StrictYAML; loads manifest with strictyaml, adds _ensure_unique() to enforce uniqueness (remotes.name, projects.name, projects.dst), and updates error handling to include line/position info.
Resources & template file
dfetch/resources/__init__.py, dfetch/resources/template.yaml (replaces)
Renamed schema_path()template_path() and switched resource from schema.yaml to template.yaml; removed old schema.yaml.
Removed resource file
dfetch/resources/schema.yaml
Deleted: old YAML schema file removed (schema moved into Python strictyaml definitions).
Dependency manifest
pyproject.toml
Replaced dependency pykwalify==1.8.0 with strictyaml==1.7.3.
Tests & feature specs
tests/test_resources.py, features/validate-manifest.feature, features/updated-project-has-dependencies.feature
Tests updated to use template_path(); feature expectations adjusted for strictyaml diagnostics; added scenario for duplicate project names validation.
Changelog
CHANGELOG.rst
Added Release 0.12.0 notes: "Switch from pykwalify to StrictYAML" and "Show line number when manifest validation fails".

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant FS as Filesystem
    participant Validator as validate(path)
    participant SY as StrictYAML (loader/schema)
    participant Uniq as _ensure_unique
    participant Reporter as Error reporting

    User->>Validator: validate(manifest_path)
    Validator->>FS: read(manifest_path)
    FS-->>Validator: manifest_text
    Validator->>SY: load(manifest_text, MANIFEST_SCHEMA)
    alt YAML parse error
        SY-->>Reporter: raise StrictYAMLError (with line/col)
        Reporter-->>User: RuntimeError(error message + location)
    else parsed
        SY-->>Validator: manifest_dict
        Validator->>Uniq: check remotes.name, projects.name, projects.dst
        alt duplicates found
            Uniq-->>Reporter: raise RuntimeError(Duplicate ... )
            Reporter-->>User: RuntimeError(duplicate message)
        else success
            Validator-->>User: None (validation passed)
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • Fix #891 - Update docs #892: Edits dfetch/resources/schema.yaml; closely related because this PR removes/replaces that file and migrates schema handling into StrictYAML.

Suggested labels

enhancement

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Switch from pykwalify to StrictYAML' directly and concisely summarizes the main change across the entire changeset, which is migrating from pykwalify to StrictYAML for YAML validation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ben-edna
Copy link
Contributor Author

ben-edna commented Jan 6, 2026

Follow-up for #903 see what i prefer more

@spoorcc spoorcc linked an issue Jan 6, 2026 that may be closed by this pull request
@spoorcc spoorcc marked this pull request as ready for review January 6, 2026 20:28
@spoorcc spoorcc added the development Tools for development label Jan 6, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI Agents
In @CHANGELOG.rst:
- Line 5: Update the changelog entry line that currently reads "* Switch from
pykwalify to StrictYAML (#0)" to reference the correct PR number by replacing
"#0" with "#922" so the line becomes "* Switch from pykwalify to StrictYAML
(#922)".

In @dfetch/manifest/schema.py:
- Line 36: The schema references an undefined symbol `Number` causing a
NameError; change the field mapping that uses `Number` to use the defined
`NUMBER` constant (defined as `NUMBER = Int() | Float()`) so the `"version"`
schema entry references `NUMBER` instead of `Number`, ensuring the runtime uses
the correct validator.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70403e3 and 804f8cd.

📒 Files selected for processing (8)
  • CHANGELOG.rst
  • dfetch/manifest/schema.py
  • dfetch/manifest/validate.py
  • dfetch/resources/__init__.py
  • dfetch/resources/schema.yaml
  • features/updated-project-has-dependencies.feature
  • pyproject.toml
  • tests/test_resources.py
💤 Files with no reviewable changes (1)
  • dfetch/resources/schema.yaml
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-01T11:05:43.761Z
Learnt from: ben-edna
Repo: dfetch-org/dfetch PR: 898
File: script/package.py:201-205
Timestamp: 2026-01-01T11:05:43.761Z
Learning: In the dfetch project's script/package.py, prefer explicit failures ("crash and burn") over silently handling missing configuration values, ensuring that builds fail if required metadata (e.g., URLs in pyproject.toml) is not properly configured.

Applied to files:

  • features/updated-project-has-dependencies.feature
📚 Learning: 2025-12-25T22:46:21.776Z
Learnt from: spoorcc
Repo: dfetch-org/dfetch PR: 861
File: dfetch/resources/__init__.py:12-17
Timestamp: 2025-12-25T22:46:21.776Z
Learning: In Nuitka standalone builds for dfetch, ensure resources are referenced from the top-level 'resources' directory (not the package path 'dfetch.resources') to avoid clashes with the executable named 'dfetch'. Specifically, in code under dfetch/resources/, when __compiled__ is detected, use the top-level 'resources' path as the resource locator (as already done in dfetch/resources/__init__.py with the 'resources' string). This pattern should apply to all Py files within dfetch/resources/ to prevent name collisions in standalone binaries.

Applied to files:

  • dfetch/resources/__init__.py
🧬 Code graph analysis (2)
dfetch/manifest/validate.py (1)
dfetch/manifest/manifest.py (2)
  • projects (266-268)
  • remotes (287-289)
tests/test_resources.py (1)
dfetch/resources/__init__.py (1)
  • template_path (20-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build / build (macos-latest)
  • GitHub Check: build / build (windows-latest)
  • GitHub Check: build / build (ubuntu-latest)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: run / run (ubuntu-latest, 3.10)
  • GitHub Check: run / test-cygwin
  • GitHub Check: DevContainer Build & Test
🔇 Additional comments (7)
tests/test_resources.py (1)

11-25: LGTM!

The test functions have been correctly updated to use the new template_path() API. The test logic remains sound and properly validates the context manager behavior.

features/updated-project-has-dependencies.feature (1)

93-97: LGTM!

The enhanced error format provides better diagnostics for manifest validation failures. The multi-line output with content display, line position indicator, and specific error message ("found arbitrary text") improves the user experience when debugging invalid manifests.

dfetch/resources/__init__.py (1)

20-22: LGTM!

The API has been correctly updated from schema_path() to template_path(), referencing template.yaml instead of schema.yaml. The function signature and implementation pattern remain consistent.

pyproject.toml (1)

44-44: strictyaml 1.7.3 is valid with no known security vulnerabilities.

dfetch/manifest/validate.py (2)

14-21: LGTM — clever duplicate detection.

The set-based one-liner on line 18 is idiomatic, though the side-effect in a comprehension can surprise readers. An alternative using a simple loop might be clearer, but this works correctly.


24-50: Validation logic looks solid.

The error handling correctly distinguishes schema violations (YAMLValidationError) from parsing issues (StrictYAMLError). The uniqueness enforcement for name and dst fields properly replaces the pykwalify constraints.

Minor note: on line 45, manifest.get("remotes", []) or [] is slightly redundant—if the key is missing, get already returns []. The or [] only guards against an explicit None value, which shouldn't occur if schema validation passed. Not worth changing, just noting for clarity.

dfetch/manifest/schema.py (1)

7-30: Schema definitions look correct.

REMOTE_SCHEMA and PROJECT_SCHEMA properly define required and optional fields with appropriate types. The Enum(["git", "svn"]) constraint for vcs and Seq(Str()) for ignore are well-formed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
dfetch/manifest/validate.py (1)

14-21: The duplicate-detection logic is clever but obscure.

The set comprehension on line 18 uses a side-effect trick (seen.add(v) returns None, making the or clause always add and return falsy). While functional, this pattern can confuse readers.

🔎 More readable alternative
 def _ensure_unique(seq: list[dict[str, Any]], key: str, context: str) -> None:
     """Ensure values for `key` are unique within a sequence of dicts."""
     values = [item.get(key) for item in seq if key in item]
-    seen = set()
-    dups = {v for v in values if (v in seen) or seen.add(v)}  # type: ignore
+    seen: set[Any] = set()
+    dups: set[Any] = set()
+    for v in values:
+        if v in seen:
+            dups.add(v)
+        else:
+            seen.add(v)
     if dups:
         dup_list = ", ".join(sorted(map(str, dups)))
         raise RuntimeError(f"Duplicate {context}.{key} value(s): {dup_list}")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 804f8cd and d9f342d.

📒 Files selected for processing (8)
  • CHANGELOG.rst
  • dfetch/manifest/schema.py
  • dfetch/manifest/validate.py
  • dfetch/resources/__init__.py
  • dfetch/resources/schema.yaml
  • features/updated-project-has-dependencies.feature
  • pyproject.toml
  • tests/test_resources.py
💤 Files with no reviewable changes (1)
  • dfetch/resources/schema.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
  • dfetch/resources/init.py
  • features/updated-project-has-dependencies.feature
  • pyproject.toml
  • tests/test_resources.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: build / build (windows-latest)
  • GitHub Check: build / build (macos-latest)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: run / run (ubuntu-latest, 3.10)
  • GitHub Check: run / run (macos-latest, 3.9)
  • GitHub Check: run / run (macos-latest, 3.13)
  • GitHub Check: run / run (ubuntu-latest, 3.9)
  • GitHub Check: run / run (windows-latest, 3.14)
  • GitHub Check: run / run (macos-latest, 3.10)
  • GitHub Check: run / run (ubuntu-latest, 3.12)
  • GitHub Check: run / run (windows-latest, 3.13)
  • GitHub Check: run / run (windows-latest, 3.9)
  • GitHub Check: run / run (windows-latest, 3.12)
  • GitHub Check: run / run (windows-latest, 3.10)
  • GitHub Check: run / run (ubuntu-latest, 3.11)
  • GitHub Check: run / run (windows-latest, 3.11)
  • GitHub Check: run / test-cygwin
  • GitHub Check: DevContainer Build & Test
🔇 Additional comments (5)
CHANGELOG.rst (1)

6-6: Verify the issue/PR reference #36.

The reference #36 appears unusually low for an unreleased feature in version 0.12.0. Please confirm this is the correct issue number, or update it to reference the appropriate issue/PR.

dfetch/manifest/schema.py (2)

1-42: Schema definition looks correct.

The StrictYAML schema is well-structured with appropriate use of Optional() for non-required fields, proper type validators, and a sensible Enum constraint for the VCS field.


38-38: Verify if projects should be required or optional.

The schema marks projects as required (no Optional() wrapper), meaning manifests without any projects will fail validation. If empty manifests should be valid (e.g., for initialization), consider wrapping it with Optional():

-                "projects": Seq(PROJECT_SCHEMA),
+                Optional("projects"): Seq(PROJECT_SCHEMA),

If requiring at least one project is intentional, this is fine as-is.

dfetch/manifest/validate.py (2)

44-45: Sync projects handling with schema optionality.

If projects is required in the schema (as currently defined), line 44's direct access is correct. However, if you make projects optional in the schema (per the earlier suggestion), you'll need to update this line to use .get("projects", []) similar to remotes.

Also, the or [] on line 45 is defensive against remotes: null in YAML—which StrictYAML's Seq() validator would reject anyway, making it redundant. Consider removing it for clarity, or keep it if you prefer the extra safety.


24-50: Validation logic is well-structured.

The migration to StrictYAML is clean:

  • Proper exception hierarchy handling (YAMLValidationError before StrictYAMLError)
  • Appropriate use of exception chaining with from err
  • Clear uniqueness constraint enforcement
  • Reasonable logging suppression for the strictyaml module

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI Agents
In @dfetch/manifest/validate.py:
- Line 51: In validate.py update the inline comment for the remotes declaration
(the line with "remotes: list[dict[str, Any]] = manifest.get(\"remotes\", []) or
[]") to fix the typo: change "option" to "optional" so the comment reads that
the field is optional.
🧹 Nitpick comments (2)
dfetch/manifest/validate.py (1)

12-21: Consider a more readable duplicate detection approach.

The current implementation works correctly but uses a set comprehension with side effects (seen.add(v)) that relies on set.add() returning None. This pattern can be confusing for maintainers.

🔎 Alternative implementation for clarity
 def _ensure_unique(seq: list[dict[str, Any]], key: str, context: str) -> None:
     """Ensure values for `key` are unique within a sequence of dicts."""
     values = [item.get(key) for item in seq if key in item]
-    seen = set()
-    dups = {v for v in values if (v in seen) or seen.add(v)}  # type: ignore
+    from collections import Counter
+    counts = Counter(values)
+    dups = {v for v, count in counts.items() if count > 1}
     if dups:
         dup_list = ", ".join(sorted(map(str, dups)))
         raise RuntimeError(
             f"Schema validation failed:\nDuplicate {context}.{key} value(s): {dup_list}"
         )
dfetch/manifest/schema.py (1)

19-21: Consider validating mutual exclusivity of branch/tag/revision.

The branch, tag, and revision fields are typically mutually exclusive in version control workflows. While StrictYAML schemas don't support cross-field constraints, consider adding runtime validation in validate.py to ensure only one of these fields is specified per project.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9f342d and 333474e.

📒 Files selected for processing (9)
  • CHANGELOG.rst
  • dfetch/manifest/schema.py
  • dfetch/manifest/validate.py
  • dfetch/resources/__init__.py
  • dfetch/resources/schema.yaml
  • features/updated-project-has-dependencies.feature
  • features/validate-manifest.feature
  • pyproject.toml
  • tests/test_resources.py
💤 Files with no reviewable changes (1)
  • dfetch/resources/schema.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • dfetch/resources/init.py
  • CHANGELOG.rst
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-01T11:05:43.761Z
Learnt from: ben-edna
Repo: dfetch-org/dfetch PR: 898
File: script/package.py:201-205
Timestamp: 2026-01-01T11:05:43.761Z
Learning: In the dfetch project's script/package.py, prefer explicit failures ("crash and burn") over silently handling missing configuration values, ensuring that builds fail if required metadata (e.g., URLs in pyproject.toml) is not properly configured.

Applied to files:

  • features/updated-project-has-dependencies.feature
🧬 Code graph analysis (2)
tests/test_resources.py (1)
dfetch/resources/__init__.py (1)
  • template_path (20-22)
dfetch/manifest/validate.py (2)
dfetch/manifest/manifest.py (3)
  • path (251-253)
  • projects (266-268)
  • remotes (287-289)
dfetch/project/metadata.py (1)
  • path (128-136)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: Publish Python distribution 📦 to TestPyPI
  • GitHub Check: build / build (windows-latest)
  • GitHub Check: build / build (ubuntu-latest)
  • GitHub Check: build / build (macos-latest)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: run / run (windows-latest, 3.14)
  • GitHub Check: run / run (windows-latest, 3.11)
  • GitHub Check: run / run (windows-latest, 3.13)
  • GitHub Check: run / run (macos-latest, 3.10)
  • GitHub Check: run / run (windows-latest, 3.10)
  • GitHub Check: run / run (windows-latest, 3.12)
  • GitHub Check: run / run (windows-latest, 3.9)
  • GitHub Check: run / run (ubuntu-latest, 3.9)
  • GitHub Check: run / run (macos-latest, 3.9)
  • GitHub Check: run / test-cygwin
  • GitHub Check: test / test
  • GitHub Check: DevContainer Build & Test
🔇 Additional comments (11)
features/updated-project-has-dependencies.feature (1)

93-97: LGTM: Enhanced error diagnostics.

The updated error format provides better context with multi-line diagnostics including the invalid content, line position, and a clear error description. This aligns with StrictYAML's error reporting style and improves the user experience.

features/validate-manifest.feature (2)

47-52: LGTM: Updated error format aligns with StrictYAML.

The multi-line diagnostic format with line position indicators matches StrictYAML's error reporting style and provides clearer feedback to users.


54-72: LGTM: Good test coverage for duplicate detection.

The new test scenario validates the duplicate project name detection feature introduced in the validation logic. This is valuable coverage for the new _ensure_unique functionality.

tests/test_resources.py (1)

11-25: LGTM: Clean API rename from schema_path to template_path.

The test updates correctly reflect the renamed resource API. The functionality remains unchanged, only the naming has been updated for clarity.

dfetch/manifest/validate.py (2)

24-46: LGTM: Clean StrictYAML integration with proper error handling.

The validation function correctly loads the manifest with StrictYAML and handles both YAMLValidationError and StrictYAMLError exceptions. The error message formatting provides clear, multi-line diagnostics that improve the user experience.


48-55: No changes needed—MANIFEST_SCHEMA correctly enforces required keys.

The schema defines "manifest" and "projects" as required fields (not wrapped in Optional()), while "remotes" is marked Optional(). The code's direct key access and .get() usage align with the schema constraints, making the implementation safe.

pyproject.toml (1)

44-44: strictyaml==1.7.3 is valid and free from known vulnerabilities.

Version 1.7.3 (released 2023-03-10) is the latest stable release available on PyPI and has no known CVEs or security advisories.

dfetch/manifest/schema.py (4)

1-5: LGTM!

The imports and NUMBER definition are correct. The previously flagged NameError (using Number instead of NUMBER) has been resolved.


7-13: LGTM!

The REMOTE_SCHEMA definition correctly specifies required fields (name, url-base) and the optional default field with appropriate types.


26-26: No issues found. The vcs enum is correctly restricted to ["git", "svn"], which are the only version control systems supported in the codebase.


38-38: No change needed — projects is correctly required.

The projects field is intentionally required, not optional. The schema code is consistent with the documentation, which explicitly lists projects in the required fields section. The validation code also treats it as required by directly accessing manifest["projects"] without optional handling. There is no inconsistency.

Likely an incorrect or invalid review comment.

@spoorcc spoorcc merged commit 716f0ea into main Jan 6, 2026
38 checks passed
@spoorcc spoorcc deleted the strictyaml branch January 6, 2026 20:56
This was referenced Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development Tools for development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Show line number when validation fails

3 participants