Skip to content

Add Reusable Schema Definitions With $defs And $ref#31

Merged
MuellerSeb merged 6 commits into
mainfrom
add_reusable_definitions
May 26, 2026
Merged

Add Reusable Schema Definitions With $defs And $ref#31
MuellerSeb merged 6 commits into
mainfrom
add_reusable_definitions

Conversation

@MuellerSeb
Copy link
Copy Markdown
Owner

Closes #30

This pull request adds local JSON Schema Draft 2020-12 $defs and $ref
support for the scalar and scalar-array schema model already implemented by
nml-tools.

Schemas can now define reusable field specifications inline or in local
YAML/JSON definition files:

x-fortran-namelist: solver
type: object

$defs:
  positive_count:
    type: integer
    minimum: 1
    x-fortran-kind: i4

properties:
  iterations:
    $ref: "#/$defs/positive_count"
    title: Iteration limit
    default: 100

The new schema resolver expands supported references before any generator or
validator processes a schema. Fortran modules, Markdown documentation,
namelist templates, f2py/Python wrappers, kind maps, and namelist validation
therefore operate on one consistent effective schema.

Main Changes

  • Added SchemaResolver, load_schema(..., resolver=...), and
    resolve_schema(...) in src/nml_tools/schema.py.
  • Added same-document and local external .yml, .yaml, and .json
    reference resolution.
  • Added standard JSON Pointer fragment handling, including escaped pointer
    tokens.
  • Added shared resolver contexts to generate, check, gen-fortran,
    gen-markdown, gen-template, and validate, so referenced documents are
    cached consistently for one command invocation.
  • Added deterministic $ref sibling composition for the supported nml-tools
    subset:
    • representation keywords must agree;
    • numeric bounds are narrowed;
    • enum restrictions are intersected;
    • use-site annotations and defaults take precedence;
    • root properties and required entries can be composed.
  • Added consistent operational-default validation across Fortran generation,
    Markdown generation, template generation, and namelist validation.
  • Updated README documentation for reusable definitions, local references,
    composition rules, and unsupported forms.

Reference Behavior

References may be:

  • inline, such as #/$defs/positive_count;
  • relative local files, such as
    common-definitions.yml#/$defs/fraction;
  • absolute local YAML/JSON files.

Relative paths are resolved from the document containing the $ref, not from
the current working directory or the nml-tools config file. More than one
definition document can be used without adding a registry to
nml-config.toml.

The following reference locations are supported in this initial
implementation:

  • the root namelist schema;
  • property schemas;
  • scalar items schemas within supported arrays.

References that would introduce nested object fields remain out of scope until
derived-type generation is implemented.

Sibling Composition And Defaults

Draft 2020-12 allows keywords next to $ref. For the supported schema subset,
the resolver materializes a single effective schema:

  • type, x-fortran-kind, x-fortran-len, x-fortran-shape, and
    x-fortran-flex-tail-dims are inherited when omitted and rejected when
    conflicting.
  • minimum, maximum, exclusiveMinimum, and exclusiveMaximum combine to
    the narrower interval; empty intervals are rejected.
  • enum values are intersected when a use site further restricts a reusable
    definition; empty intersections are rejected.
  • Use-site title, description, examples, and default values are used
    when provided; otherwise referenced annotations/defaults are inherited.
  • At the root level, referenced properties remain in their original order,
    local fields are appended, matching fields are composed, and required
    entries are combined.

For arrays, an array-level default and its
x-fortran-default-order, x-fortran-default-repeat, and
x-fortran-default-pad keywords are treated as one operational bundle. A
use-site array default replaces any referenced control bundle unless new
controls are supplied alongside that local default.

Validation Tightening

default affects generated initialization and filled templates in nml-tools,
so it is validated as operational input rather than left as an unchecked
annotation.

This pull request validates defaults uniformly for inline and referenced
schemas:

  • scalar defaults must satisfy their type, enum, numeric-bound, and
    fixed-string-length constraints;
  • array defaults and padding values must satisfy item constraints;
  • array default layout must satisfy x-fortran-shape and pad/repeat rules;
  • defaults remain incompatible with flexible-tail arrays.

This is an intentional correctness tightening: existing valid schemas are
unchanged, while previously accepted invalid defaults now fail consistently
across generation and validation entry points.

Rejected Or Deferred Forms

The first implementation deliberately rejects:

  • network or URI schema retrieval;
  • recursive or mutually recursive references;
  • $id/$anchor-based base resolution;
  • $dynamicRef and $dynamicAnchor;
  • allOf, anyOf, oneOf, not, and conditional composition;
  • Draft-07 definitions as an alias for $defs;
  • derived-type object properties.

If a schema using references declares a JSON Schema dialect, it must declare
Draft 2020-12. Schemas without $schema remain accepted.

Tests

Added focused test coverage for:

  • inline and external $defs/$ref resolution;
  • YAML/JSON files, relative paths, caching, and escaped JSON Pointer tokens;
  • root schema composition and use-site metadata/default selection;
  • array default bundle replacement;
  • conflicts, narrowed bounds, enum intersections, old dialects, cycles, and
    useful unresolved-reference diagnostics;
  • equivalent output through Fortran, Markdown, templates, f2py wrappers,
    Python wrappers, f2py kind maps, and namelist validation;
  • CLI generation and validate --schema using external definitions with
    constants and runtime dimensions;
  • invalid scalar and array operational defaults.

Introduce SchemaResolver and extend load_schema with an optional shared resolver context. The resolver materializes Draft 2020-12 / references in the scalar and scalar-array subset already understood by nml-tools.

Support same-document and local YAML/JSON references, standard JSON Pointer fragments, root-schema property composition, sibling bound/enum narrowing, annotation/default selection, and atomic array default-control replacement. Reject remote references, cycles, unsupported resource/anchor behavior, older declared dialects, unsupported composition, and derived-type object fields with diagnostics that retain use-site and target context.

Instantiate one resolver per CLI command so generate, check, standalone subcommands, templates, f2py paths, and validation reuse loaded reference documents consistently without changing generator input interfaces.
Add validate_schema_defaults so defaults are checked against the effective scalar or array constraints before generation or input validation. This makes inline schemas and schemas expanded through  follow the same correctness rules.

Validate scalar defaults, item defaults, array pad values, array default extent/layout controls, runtime-resolved shapes, and the existing flex-array incompatibility. Invoke the check from Fortran, Markdown, template, and namelist validation paths so invalid defaults cannot render differently depending on command.
Document the Draft 2020-12 / subset, local-file resolution rules, use-site annotation/default behavior, array default bundle policy, and deliberately unsupported reference forms in the README.

Add schema-layer tests for inline and external references, escaped JSON Pointer fragments, root composition, default override semantics, diagnostics, caching, cycles, dialect rejection, and cross-output equivalence. Add CLI tests for generation and validation through external definitions and validator regressions for invalid operational defaults.
@MuellerSeb MuellerSeb added this to the v0.3 milestone May 26, 2026
@MuellerSeb MuellerSeb requested a review from Copilot May 26, 2026 13:19
@MuellerSeb MuellerSeb self-assigned this May 26, 2026
@MuellerSeb MuellerSeb added the enhancement New feature or request label May 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds local JSON Schema Draft 2020-12 $defs / $ref support to nml-tools by introducing a schema resolution phase that expands supported references (including same-document and local file references) before generators and validation operate on the schema. It also tightens and centralizes “operational default” validation so defaults used by generation/templates are consistently checked.

Changes:

  • Added SchemaResolver, load_schema(..., resolver=...), and resolve_schema(...) to load/resolve $ref and compose supported sibling keywords into a single effective schema.
  • Introduced validate_schema_defaults(...) and wired it into validation + Fortran/Markdown/template generation to uniformly validate operational defaults.
  • Added extensive test coverage for reference resolution/composition and CLI behavior, and updated README docs.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/nml_tools/schema.py Implements local $defs/$ref resolution + deterministic composition rules for the supported schema subset.
src/nml_tools/cli.py Uses a shared SchemaResolver per command invocation so referenced documents are cached consistently.
src/nml_tools/validate.py Adds schema-default validation and calls it from validate_namelist.
src/nml_tools/codegen_fortran.py Validates schema defaults early during context building to ensure generated initialization is correct.
src/nml_tools/codegen_markdown.py Validates schema defaults to keep docs consistent with operational behavior.
src/nml_tools/codegen_template.py Validates schema defaults before rendering filled/documented templates.
tests/test_schema.py New test suite covering $ref resolution, composition behavior, caching, and cross-output equivalence.
tests/test_validate.py Adds tests asserting invalid schema defaults are rejected by validation.
tests/test_cli_config.py Adds CLI integration tests for external refs and for generation using referenced definitions.
README.md Documents reusable definitions, local references, and composition/unsupported forms.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/nml_tools/schema.py Outdated
Comment thread src/nml_tools/schema.py
Comment thread src/nml_tools/validate.py Outdated
Add examples/03_references with a local definition library, a chained external root reference, inline $defs, use-site sibling refinement, inherited and overridden defaults, and a runtime-sized referenced array.

Check in generated Fortran, Markdown, and template artifacts so the example can be inspected without regeneration. Extend CI to verify those generated files and validate the filled namelist against both resolved schemas.
Recognize Windows drive-letter absolute paths as local external $ref targets before URI parsing, while continuing to reject URI and remote reference forms. Remove the unused reference identity accumulation state.

Preserve the existing array default diagnostic phrase through centralized default validation and add regression tests for both Windows path spellings and diagnostic compatibility.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 23 changed files in this pull request and generated 2 comments.

Comment thread src/nml_tools/validate.py Outdated
Comment thread src/nml_tools/validate.py
Reject reachable raw $ref nodes at the shared default-validation boundary with guidance to call load_schema() or resolve_schema(), so direct validation and generation APIs report the required normalization step clearly.

Reject optional arrays that supply operational defaults without an object items schema, while retaining the existing diagnostic for default controls supplied without an array default. Add regression tests for both paths.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 23 changed files in this pull request and generated no new comments.

@MuellerSeb MuellerSeb merged commit 7a5f348 into main May 26, 2026
11 checks passed
@MuellerSeb MuellerSeb deleted the add_reusable_definitions branch May 26, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Reusable Schema Definitions With $defs And $ref

2 participants