Skip to content

Add v0.4 mock_spec foundation#28

Merged
DougManuel merged 18 commits into
devfrom
v04-production-refactor
May 20, 2026
Merged

Add v0.4 mock_spec foundation#28
DougManuel merged 18 commits into
devfrom
v04-production-refactor

Conversation

@DougManuel
Copy link
Copy Markdown
Collaborator

@DougManuel DougManuel commented May 18, 2026

Summary

This is the umbrella draft PR for the MockData v0.4 production refactor.

Current review target: Milestone 1 — mock_spec foundation.

This PR is intentionally not merge-ready yet. It is open for external review, transparency, and milestone-by-milestone discussion while v0.4 is built.

Milestone Checklist

  • Milestone 1: mock_spec constructors and validators
  • Milestone 2: direct API adapters to mock_spec
  • Milestone 3: recodeflow adapter to mock_spec
  • Milestone 4: native backend from mock_spec
  • Milestone 5: post-processing layer and diagnostics contract
  • Milestone 6: promote spike assertions to testthat
  • Milestone 7: optional simstudy backend
  • Milestone 8: current API wrappers over new internals

Milestone 1 Contents

  • Adds mock_spec() and direct variable-spec constructors:
    • mock_spec_continuous()
    • mock_spec_categorical()
    • mock_spec_date()
  • Adds is_mock_spec() and validate_mock_spec().
  • Adds forward-compatible spec fields: spec_version, provenance, and model_hint.
  • Adds tests for empty specs, NULL, single-variable specs, n = 0, malformed ranges, duplicate names, invalid model hints, and structured validation errors.
  • Adds the v0.4 architecture plan and ADR.
  • Removes the spike prototype files from development/v04-simstudy-spike/; the spike remains available in PR Spike: evaluate simstudy as v0.4 generation backend #27 history.

Verification

Ran locally:

Rscript --vanilla -e 'devtools::load_all(quiet = TRUE); testthat::test_file("tests/testthat/test-mock-spec.R")'
Rscript --vanilla -e 'devtools::load_all(quiet = TRUE); testthat::test_dir("tests/testthat", reporter = "summary")'
git diff --check

@DougManuel DougManuel added the enhancement New feature or request label May 18, 2026
@DougManuel
Copy link
Copy Markdown
Collaborator Author

v0.4 milestone progress summary

M1 — mock_spec foundation

Completed the normalized specification layer for v0.4.

  • Added mock_spec(), variable constructors, validation, is_mock_spec(), and validation result printing.
  • Added forward-compatible fields: spec_version, provenance, and model_hint.
  • Tightened construction-time validation and auditability contracts after review.
  • Added NEWS/ADR coverage and focused testthat coverage.

Key commits: 9d23b27, 75484a3

M2 — direct API adapters

Completed simple direct helpers for users who do not want to start from metadata tables.

  • Added mock_continuous(), mock_categorical(), and mock_date().
  • Each returns a validated one-variable mock_spec; no data generation yet.
  • Added direct API to explicit mock_spec() equivalence tests.
  • Fixed provenance/model-hint auditability gaps before moving on.

Key commits: 33d7aee, f29414d

M3 — recodeflow adapter

Completed the first recodeflow metadata adapter into mock_spec.

  • Added mock_spec_from_recodeflow() for variables + variable_details data frames or CSV paths.
  • Preserves exact role/database filtering, derived-variable exclusion, categorical levels/proportions, recEnd missing-code semantics, ranges, rType, date ranges, garbage rules, and survival/date fields.
  • Review follow-up hardened Func:: filtering, missing databaseStart handling, numeric scalar parsing, distribution fallback messaging, and cross-adapter equivalence coverage.

Key commits: bd7b389, b1895f5

Current verification after M3 follow-up: full test suite passed locally with 483 passing tests, 2 existing skips, and 34 existing warnings.

Next milestone: M4 native backend from mock_spec.

@DougManuel
Copy link
Copy Markdown
Collaborator Author

M4 — native mock_spec backend

Completed the first native generation backend from mock_spec.

  • Added generate_mock_data_native() for baseline valid data generation.
  • Supports continuous uniform and truncated normal, categorical sampling with proportions, and uniform date generation.
  • Handles empty specs, n = 0, seed reproducibility, and RNG-state restoration.
  • Fails loudly for unsupported native distributions and lossy categorical numeric coercion.
  • Confirmed the backend can consume simple direct specs and simple recodeflow-derived specs.

Key commit: a6aa4cf

Verification: full test suite passed locally with 507 passing tests, 2 existing skips, and 34 existing warnings.

Next milestone: M5 post-processing layer for missing codes, garbage values, rType coercion, date/source-format conversion, and diagnostics.

@DougManuel
Copy link
Copy Markdown
Collaborator Author

M4 status: native mock_spec backend

M4 is complete after the follow-up commit 501698b.

Closed from review:

  • Added a formula gate so formula-bearing specs now error loudly instead of being silently ignored.
  • Added tests for formula rejection, truncated-normal fallback warnings, statistical contracts, n = 0 / n = 1, and direct-vs-recodeflow equivalence.
  • Truncated-normal fallback warnings now include the affected variable name.

Verification:

  • Focused native backend tests passed: 37 pass, 0 fail.
  • Worktree was clean after push.

Ready to proceed to M5.

@DougManuel
Copy link
Copy Markdown
Collaborator Author

M5 status: post-processing and diagnostics

M5 is complete after the follow-up commit 6daef47.

Closed from review:

  • Protected preexisting missing-code collisions from garbage overwrite, preserving the diagnostics audit trail.
  • Added an idempotency gate so already-postprocessed data cannot be contaminated twice.
  • Confirmed missing-proportion overflow fails at validation and pinned the post-processing entry path.
  • Canonicalized garbage rule order (low, then high) and improved unnamed-rule errors.
  • Documented that diagnostics live in a data-frame attribute that may be dropped by subsetting/tools.

Verification:

  • Focused M5 tests passed: 32 pass, 0 fail.
  • Full suite passed: 552 pass, 2 existing skips, 34 existing warnings.
  • git diff --check clean.

Ready to proceed to M6.

@DougManuel
Copy link
Copy Markdown
Collaborator Author

M6 status: pipeline assertion promotion

M6 is complete after 2b6aa1e.

Closed from review:

  • Promoted the strongest spike assertions into end-to-end pipeline tests.
  • Added composed native pipeline coverage for generation → post-processing.
  • Pinned categorical code preservation, valid/missing 97 collision diagnostics, composed seed reproducibility, recEnd-driven missingness, direct-vs-recodeflow equivalence, and deferred formula failure.

Verification:

  • Focused M6 tests passed: 18 pass, 0 fail.
  • Full suite passed: 570+ expectations, 0 failures, with the existing skips/warnings unchanged.

Ready to proceed to M7.

@DougManuel
Copy link
Copy Markdown
Collaborator Author

M7 status: optional simstudy backend

M7 is complete after the follow-up commit 70da7b8.

Closed from review:

  • Kept simstudy optional in Suggests and pinned the soft-gated backend to simstudy >= 0.8.1.
  • Added guards for simstudy-specific silent-failure paths: categorical label/version drift, reserved id variable names, and semicolon-delimited labels.
  • Added categorical normalization that accepts current label output and recovers older integer-index output, otherwise failing loudly.
  • Added skipped simstudy-installed tests for cross-backend contracts, per-variable routing, empty/n = 0, post-processing composition, and reproducibility.

Verification:

  • Local environment has no simstudy installed: unavailable-path test passes; simstudy-installed tests skip cleanly.
  • Focused M7 tests passed with expected skips.
  • Full suite passed: 574 pass, 10 skips, 34 existing warnings.
  • git diff --check clean.

Ready for the documentation sprint before M8.

@DougManuel DougManuel changed the base branch from main to dev May 20, 2026 12:52
@DougManuel DougManuel marked this pull request as ready for review May 20, 2026 12:58
Copilot AI review requested due to automatic review settings May 20, 2026 12:58
@DougManuel DougManuel merged commit 223e9bd into dev May 20, 2026
1 check failed
@DougManuel DougManuel review requested due to automatic review settings May 20, 2026 13:20
@DougManuel DougManuel deleted the v04-production-refactor branch May 21, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant