Separate logic from schema by swo · Pull Request #68 · CDCgov/pygriddler

swo · 2025-04-09T17:50:22Z

Establish a conceptual basis for Parameters, Specifications, and Experiments, and operations to manipulate them.
Expose these core concepts in the package API, so that advanced users can manipulate them directly via Python. We don't expect these concepts to change substantially through, so this API will change with the package-level versioning.
Reconceptualize griddles as a convenience front-end that accesses these core concepts without needing to write any Python.
- Allow for different griddle schemas. A schema is a way to turn a dictionary (e.g., YAML) into an Experiment.
- This separation allows for multiple schemas, which need not even be successors to one another, but rather can be co-existing mini-DSLs. E.g., if someone wants a new feature, like ranges (cf. Create ranges #31), then that could be made in a new schema, or a successor to an existing schema.
- For now, we support one schema:, the one discussed in A(nother) new grammar of parameters #67's, and call it v0.4, speculatively. This schema is fiddly but it is substantially easier to implement than the other ones.
- In this new package organization, adding a new schema is simply a matter of dispatching the top-level parse() function to the correct parser, based on the schema specified in the griddle.

See #67 for rationale behind the logic for this change, which solves many of the higher-order conceptual problems. E.g., there are now no DAGs.

Resolves #67

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull Request Overview

This PR removes the old v0.3 schema and parser, introduces a clean separation between schema (v0.4) and core logic, provides a new parse API, adds Spec/Experiment in core.py, updates tests for the v0.4 schema, and refreshes documentation for the new organization.

Remove legacy v0.3 code and tests, introduce v0.4 schema-driven parsing
Add Spec and Experiment in griddler/core.py and top-level parse dispatch
Update docs (nav structure, new core concepts, API pages) and expand tests for v0.4

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_schema.py	Removed v0.3 schema tests
tests/test_griddle_v04.py	New tests validating v0.4 parsing and experiment operations
tests/test_griddle.py	Removed legacy griddle parser tests
mkdocs.yml	Updated nav entries for new docs pages
griddler/schemas/v04/schema.json	Added JSON schema for the v0.4 schema
griddler/schemas/v04/init.py	Implemented schema loading and v0.4 parse logic
griddler/core.py	Introduced `Spec` and `Experiment` classes with union/product
griddler/init.py	Top-level `parse` that dispatches to the proper schema parser
docs/index.md, docs/griddles.md, etc.	Updated documentation to reflect new schema and API structure

Comments suppressed due to low confidence (1)

tests/test_griddle_v04.py:148

Add a test case that passes two specs with overlapping keys to Experiment * and asserts a RuntimeError, covering the key-disjointness error path.

<end of file>

griddler/schemas/v04/schema.json

griddler/schemas/v04/__init__.py

bbbruce

I'm a little lost in the nomenclature... I'm not sure I understand why you need the third concept of an "Experiment" and even in the examples below it appears that you are simply taking unions and products of "Parameter Sets", i.e., "Specification" here (of course in the code you have only implemented the operations at the Experiment level). What makes an "Experiment" fundamentally different from a "Spec" such that all things aren't simply operations upon "Spec"s or "Parameter Sets"? E.g., if you were going to replicate simulations with different seeds, etc., you'd just product those sets across the seeds or the replicate index (which is what I did here - metapop/sim.py in https://github.com/cdcent/metapop-model/pull/439/ new lines 296-315) in the measles work) further simplifying your approach (I think).

swo · 2025-06-12T18:29:01Z

@bbbruce I'm not following your comment fully and I expect that that's because I've changed nomenclature.

I no longer refer to "parameter sets." I've replaced that concept with "Specification," which I think I confusingly used to use for what I now call "Experiment".

I'm not sure I understand why you need the third concept of an "Experiment"

You're right that an "Experiment" is just a set of Specifications (which is my new name for "parameter set"). The union and product operations are just unions and products over those sets of Specifications (with some caveats about disjointness/parameter name collisions).

I like having a word for that concept, because it roughly maps onto modeler jargon that an "experiment" is a set of simulations with different parameterizations. It also means I don't need to say "sets of sets of parameter name/value pairs".

even in the examples below it appears that you are simply taking unions and products of "Parameter Sets", i.e., "Specification"

To be clear, I'm showing unions and products of Experiments.

in the code you have only implemented the operations at the Experiment level

This is true; I should have an explicit Spec | Spec union operator, that checks for disjoint names. I'll add that. I have thought that one might want to have an "update" function, similar to what Python does with dict | dict, that prefers values in the right dict if there are shared keys.

This is making me realize I should make a distinction between a union+update operation and a pure union operation, for both Specifications and Experiments. They could be called "union" and "disjoint/strict union", or "update union" and "union", depending on which one I want as default.

What makes an "Experiment" fundamentally different from a "Spec" such that all things aren't simply operations upon "Spec"s or "Parameter Sets"?

I think this is a confusion about nomenclature. There are no longer "parameter sets", and yes, operations on Experiments are just operations on the sets of Specifications that constitute them.

E.g., if you were going to replicate simulations with different seeds, etc., you'd just product those sets across the seeds or the replicate index

The way I would think about this is that you have (1) an Experiment representing of all the simulations you want replicated and (2) another Experiment that is a set of Specifications, each of which has only a single Parameter, which is the seed and its value. You take the product of those two Experiments, and now you have a single Experiment that says, run each simulation in Experiment 1 using each seed from Experiment 2.

I agree it's a little funny to call Experiment 2 here an "experiment." It's maybe more natural to think of your seeds as a vector that you want to cross-product into Experiment 1. But the current approach means that I don't need to introduce a new concept for a "vector".

KOVALW

The Parameter..Spec..Experiment setup seems very concise to me, and I think will work perfectly with python models. I'm wondering how easily we'll be able to use this for writing input files for ixa and GCM models though

KOVALW · 2025-06-12T19:57:29Z

README.md

+    - union:
+        - product:
+            - [{ distribution: normal }]
+            - [{ mean: 0.5 }, { mean: 1.0 }, { mean: 1.5 }]


Since these parameters are properties of a normal distribution, not necessarily separate parameters, is there any automatic handling that you envision being done by the package or is that supposed to be on the user side? As you know, in a lot of ixa input files, we have Specifications like

{ "Parameters": { "initial_cases": 1, "offspring_distribution": { "Poisson": { "mean": 1.0 } }, "generation_interval_distribution": { "Uniform": { "min": 7.0, "max": 17.0 } }

...

} }

which have both atomic parameter values and nested schemes that assign properties to parameter keys.

Our workaround has been to flatten and unflatten nested dictionaries, using only flattened setups for griddles, such as

"offspring_distribution>>>NegativeBinomial>>>concentration": { "vary": [0.5, 1.0], "if": { "equals": { "scenario_offspring_distribution": "NegativeBinomial" } } }

in the v0.3 JSON syntax, and leaving off fixed variable unions until they're combined with a native ixa input file. We then specify overwriting the upper level parameter, such as offspring_distribution in this case, where the whole nested chunk is replaced by the output of each griddle Specification to generate the Experiment set.

Providing this as a method would probably see a lot of common use and it might be worth figuring out a consistent internal method on Experiment so that we don't have to write out long flattened names

This comment is both entirely on point and out of scope.

For me, this PR is about (1) clarifying the underlying objects & concepts, (2) separating that from the griddle schemas, and (3) kind of incidentally, introducing a v0.4 schema. I'm putting in this v0.4 before it's so easy to parse!

I take your comment to say that you'd prefer a different schema, which I think is great! If v0.3 worked better for you, then let's get that implemented (using the new underlying logic, where we construct an Experiment, rather than whatever I did at package version v0.3).

And if neither v0.3 nor v0.4 works for you, then let's make an ixa schema?? To me the beauty now is that we can iterate on the different schemas independently. There need not be ONE schema, and we can have useful experimentation and mutation!

So I'm not going to address these comments in this PR, but I'd love to talk with you about what you'd like a convenient schema to look like, and how to write that parser.

@KOVALW @swo if as I'm discussing below griddler creates a 'flat' set of parameters (which I think is what you are getting at @KOVALW being a 'challenge'), I personally think it is ideal to handle your problem through mapping across a handlebar type format string or something similar in your code or the python preprocess step?

(I probably don't have the below formatted quite right).

I agree that this should be out of scope for griddler (i.e., getting the output in the format some x, y, z thing needs).

f'''{ "Parameters": { "initial_cases": {{initial_cases}}, "offspring_distribution": { "{{offspring_distribution_name}}": { "mean": {{offspring_distibution_mean}} } }, "generation_interval_distribution": { "{{generation_interval_name}}": { "min": {{generation_interval_min}}, "max": {{generation_interval_min}} } } <...> } }''' ...

Yes I agree @swo and @bbbruce. The general point that this new grammar doesn't prohibit multiple schemas is good to have clarified here because I wasn't certain that the hard-coded union and product methods on each subexperiment would hold up with the schema I describe above

README.md

bbbruce · 2025-06-13T12:39:16Z

@bbbruce I'm not following your comment fully and I expect that that's because I've changed nomenclature.

I no longer refer to "parameter sets." I've replaced that concept with "Specification," which I think I confusingly used to use for what I now call "Experiment".

@swo - Good - yes, I am following, i.e., "Spec(ification)" <=> (old) "parameter set." I still think there is one concept too many because I guess I don't understand where and why "sets of sets" arise. Maybe there are just experiments and there are parameters. And I think the "vector" example is perfect. You don't need a concept of vector. The 'vector' is just a Spec or an Experiment that just as you note, products across another thing of the same type (Spec or Experiment).

product:
            - [{ r0: 0.1, v_eff: 0.5 }, { r0: 0.2, v_eff: 0.2 }, ...]   # <- Is this a Spec, and
            - [{ seed: 1 }, { seed: 2 }, { seed: 3 }]                  # <- is this a Spec? Yes, right?

Or not? This is why I feel like I'm still missing something even after reading your explanations.

Ultimately you want to be able to run a simulation function or the like across a 'flat' list of 'parameter sets' which is what I thought griddler was trying to generate. But again, it seems I'm missing something somewhere. If this doesn't clear up the confusion it may help to have a discussion.

swo · 2025-06-17T16:07:30Z

This was a helpful discussion!

@bbbruce had some questions about terminology and the utility of the Specification concept. We resolved a lot of those synchronously.

@KOVALW had some questions about how to efficiently represent Specifications with nested structure (i.e., dictionaries). This is a schema-specific question that I'm going to punt to #74.

bbbruce · 2025-06-17T16:09:00Z

Appreciate the discussion with @swo that helped me understand that what I was writing was an Experiment [...] with Specs {...} of Parameters and how having all three is valuable for reasoning about them.

swo · 2025-06-17T16:09:34Z

product:
            - [{ r0: 0.1, v_eff: 0.5 }, { r0: 0.2, v_eff: 0.2 }, ...]   # <- Is this a Spec, and
            - [{ seed: 1 }, { seed: 2 }, { seed: 3 }]                  # <- is this a Spec? Yes, right?
Or not? This is why I feel like I'm still missing something even after reading your explanations.

{ r0: 0.1, v_eff: 0.5} is a Specification, as is { seed: 1}. So the two lines there are each an Experiment.

swo changed the title ~~Alternative grammar~~ Separate logic from schema Jun 11, 2025

swo added 5 commits June 12, 2025 10:39

Complete grammar

9887b4d

Simplify language

6d21ac5

Moving toward separate logic and schema

7cde443

Progress to separate

d5f9e20

Separated schemas

fec14ce

swo force-pushed the swo_grammar branch from e97e16e to fec14ce Compare June 12, 2025 14:40

swo requested a review from Copilot June 12, 2025 14:40

This comment was marked as outdated.

Sign in to view

swo and others added 6 commits June 12, 2025 10:43

Add core

d9017a8

Migrating tests

4dd771b

Multiple fixes and docs

a64d019

Update docs/index.md

2c98128

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update griddler/__main__.py

5ebadcc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Fix recursion

64edbbd

swo marked this pull request as ready for review June 12, 2025 15:40

Update readme

6c470f7

swo requested review from KOVALW, bbbruce, Copilot and jasonasher June 12, 2025 15:45

Copilot AI reviewed Jun 12, 2025

View reviewed changes

griddler/schemas/v04/schema.json Show resolved Hide resolved

griddler/schemas/v04/schema.json Show resolved Hide resolved

griddler/schemas/v04/__init__.py Show resolved Hide resolved

Bullet-proof the JSON schema

128a5e4

bbbruce reviewed Jun 12, 2025

View reviewed changes

There are no Parameter objects

423e421

KOVALW reviewed Jun 12, 2025

View reviewed changes

swo commented Jun 13, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

Update README.md

73be035

swo mentioned this pull request Jun 17, 2025

Nested specs #74

Open

swo merged commit 1d94a06 into main Jun 17, 2025
4 checks passed

swo deleted the swo_grammar branch June 17, 2025 16:09

This was referenced Jun 17, 2025

Specify varying bundles as list of objects? #66

Closed

Future schema for complicated nesting #65

Closed

Theory: Remove Parameter layer #77

Merged

KOVALW mentioned this pull request Jun 20, 2025

Breaking changes to pygriddler CDCgov/cfa_azure#283

Closed

KOVALW mentioned this pull request Jul 28, 2025

Version bump past 0.3.0 to indicate incompatible PR #91

Closed

Conversation

swo commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bbbruce left a comment

Choose a reason for hiding this comment

Uh oh!

swo commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KOVALW left a comment

Choose a reason for hiding this comment

Uh oh!

KOVALW Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

swo Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bbbruce Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

KOVALW Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bbbruce commented Jun 13, 2025

Uh oh!

swo commented Jun 17, 2025

Uh oh!

bbbruce commented Jun 17, 2025

Uh oh!

swo commented Jun 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

swo commented Apr 9, 2025 •

edited

Loading

swo commented Jun 12, 2025 •

edited

Loading

swo Jun 13, 2025 •

edited

Loading