Skip to content

Separate logic from schema#68

Merged
swo merged 15 commits intomainfrom
swo_grammar
Jun 17, 2025
Merged

Separate logic from schema#68
swo merged 15 commits intomainfrom
swo_grammar

Conversation

@swo
Copy link
Copy Markdown
Contributor

@swo swo commented Apr 9, 2025

  • Establish a conceptual basis for Parameters, Specifications, and Experiments, and operations to manipulate them.
  • Expose these core concepts in the package API, so that advanced users can manipulate them directly via Python. We don't expect these concepts to change substantially through, so this API will change with the package-level versioning.
  • Reconceptualize griddles as a convenience front-end that accesses these core concepts without needing to write any Python.
    • Allow for different griddle schemas. A schema is a way to turn a dictionary (e.g., YAML) into an Experiment.
    • This separation allows for multiple schemas, which need not even be successors to one another, but rather can be co-existing mini-DSLs. E.g., if someone wants a new feature, like ranges (cf. Create ranges #31), then that could be made in a new schema, or a successor to an existing schema.
    • For now, we support one schema:, the one discussed in A(nother) new grammar of parameters #67's, and call it v0.4, speculatively. This schema is fiddly but it is substantially easier to implement than the other ones.
    • In this new package organization, adding a new schema is simply a matter of dispatching the top-level parse() function to the correct parser, based on the schema specified in the griddle.

See #67 for rationale behind the logic for this change, which solves many of the higher-order conceptual problems. E.g., there are now no DAGs.

Resolves #67

@swo swo changed the title Alternative grammar Separate logic from schema Jun 11, 2025
@swo swo requested a review from Copilot June 12, 2025 14:40

This comment was marked as outdated.

swo and others added 6 commits June 12, 2025 10:43
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@swo swo marked this pull request as ready for review June 12, 2025 15:40
@swo swo requested review from KOVALW, bbbruce, Copilot and jasonasher June 12, 2025 15:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the old v0.3 schema and parser, introduces a clean separation between schema (v0.4) and core logic, provides a new parse API, adds Spec/Experiment in core.py, updates tests for the v0.4 schema, and refreshes documentation for the new organization.

  • Remove legacy v0.3 code and tests, introduce v0.4 schema-driven parsing
  • Add Spec and Experiment in griddler/core.py and top-level parse dispatch
  • Update docs (nav structure, new core concepts, API pages) and expand tests for v0.4

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_schema.py Removed v0.3 schema tests
tests/test_griddle_v04.py New tests validating v0.4 parsing and experiment operations
tests/test_griddle.py Removed legacy griddle parser tests
mkdocs.yml Updated nav entries for new docs pages
griddler/schemas/v04/schema.json Added JSON schema for the v0.4 schema
griddler/schemas/v04/init.py Implemented schema loading and v0.4 parse logic
griddler/core.py Introduced Spec and Experiment classes with union/product
griddler/init.py Top-level parse that dispatches to the proper schema parser
docs/index.md, docs/griddles.md, etc. Updated documentation to reflect new schema and API structure
Comments suppressed due to low confidence (1)

tests/test_griddle_v04.py:148

  • Add a test case that passes two specs with overlapping keys to Experiment * and asserts a RuntimeError, covering the key-disjointness error path.
<end of file>

Copy link
Copy Markdown
Contributor

@bbbruce bbbruce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little lost in the nomenclature... I'm not sure I understand why you need the third concept of an "Experiment" and even in the examples below it appears that you are simply taking unions and products of "Parameter Sets", i.e., "Specification" here (of course in the code you have only implemented the operations at the Experiment level). What makes an "Experiment" fundamentally different from a "Spec" such that all things aren't simply operations upon "Spec"s or "Parameter Sets"? E.g., if you were going to replicate simulations with different seeds, etc., you'd just product those sets across the seeds or the replicate index (which is what I did here - metapop/sim.py in https://github.com/cdcent/metapop-model/pull/439/ new lines 296-315) in the measles work) further simplifying your approach (I think).

@swo
Copy link
Copy Markdown
Contributor Author

swo commented Jun 12, 2025

@bbbruce I'm not following your comment fully and I expect that that's because I've changed nomenclature.

I no longer refer to "parameter sets." I've replaced that concept with "Specification," which I think I confusingly used to use for what I now call "Experiment".

I'm not sure I understand why you need the third concept of an "Experiment"

You're right that an "Experiment" is just a set of Specifications (which is my new name for "parameter set"). The union and product operations are just unions and products over those sets of Specifications (with some caveats about disjointness/parameter name collisions).

I like having a word for that concept, because it roughly maps onto modeler jargon that an "experiment" is a set of simulations with different parameterizations. It also means I don't need to say "sets of sets of parameter name/value pairs".

even in the examples below it appears that you are simply taking unions and products of "Parameter Sets", i.e., "Specification"

To be clear, I'm showing unions and products of Experiments.

in the code you have only implemented the operations at the Experiment level

This is true; I should have an explicit Spec | Spec union operator, that checks for disjoint names. I'll add that. I have thought that one might want to have an "update" function, similar to what Python does with dict | dict, that prefers values in the right dict if there are shared keys.

This is making me realize I should make a distinction between a union+update operation and a pure union operation, for both Specifications and Experiments. They could be called "union" and "disjoint/strict union", or "update union" and "union", depending on which one I want as default.

What makes an "Experiment" fundamentally different from a "Spec" such that all things aren't simply operations upon "Spec"s or "Parameter Sets"?

I think this is a confusion about nomenclature. There are no longer "parameter sets", and yes, operations on Experiments are just operations on the sets of Specifications that constitute them.

E.g., if you were going to replicate simulations with different seeds, etc., you'd just product those sets across the seeds or the replicate index

The way I would think about this is that you have (1) an Experiment representing of all the simulations you want replicated and (2) another Experiment that is a set of Specifications, each of which has only a single Parameter, which is the seed and its value. You take the product of those two Experiments, and now you have a single Experiment that says, run each simulation in Experiment 1 using each seed from Experiment 2.

I agree it's a little funny to call Experiment 2 here an "experiment." It's maybe more natural to think of your seeds as a vector that you want to cross-product into Experiment 1. But the current approach means that I don't need to introduce a new concept for a "vector".

Copy link
Copy Markdown

@KOVALW KOVALW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Parameter..Spec..Experiment setup seems very concise to me, and I think will work perfectly with python models. I'm wondering how easily we'll be able to use this for writing input files for ixa and GCM models though

- union:
- product:
- [{ distribution: normal }]
- [{ mean: 0.5 }, { mean: 1.0 }, { mean: 1.5 }]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these parameters are properties of a normal distribution, not necessarily separate parameters, is there any automatic handling that you envision being done by the package or is that supposed to be on the user side? As you know, in a lot of ixa input files, we have Specifications like

{
    "Parameters": {
        "initial_cases": 1,
        "offspring_distribution": {
            "Poisson": {
                "mean": 1.0
              }
         },
        "generation_interval_distribution": {
                "Uniform": {
                    "min": 7.0, "max": 17.0
                 }
          }
  ...
    }
}

which have both atomic parameter values and nested schemes that assign properties to parameter keys.

Our workaround has been to flatten and unflatten nested dictionaries, using only flattened setups for griddles, such as

"offspring_distribution>>>NegativeBinomial>>>concentration": {
            "vary": [0.5, 1.0],
            "if": {
                "equals": {
                    "scenario_offspring_distribution": "NegativeBinomial"
                }
            }
        }

in the v0.3 JSON syntax, and leaving off fixed variable unions until they're combined with a native ixa input file. We then specify overwriting the upper level parameter, such as offspring_distribution in this case, where the whole nested chunk is replaced by the output of each griddle Specification to generate the Experiment set.

Providing this as a method would probably see a lot of common use and it might be worth figuring out a consistent internal method on Experiment so that we don't have to write out long flattened names

Copy link
Copy Markdown
Contributor Author

@swo swo Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is both entirely on point and out of scope.

For me, this PR is about (1) clarifying the underlying objects & concepts, (2) separating that from the griddle schemas, and (3) kind of incidentally, introducing a v0.4 schema. I'm putting in this v0.4 before it's so easy to parse!

I take your comment to say that you'd prefer a different schema, which I think is great! If v0.3 worked better for you, then let's get that implemented (using the new underlying logic, where we construct an Experiment, rather than whatever I did at package version v0.3).

And if neither v0.3 nor v0.4 works for you, then let's make an ixa schema?? To me the beauty now is that we can iterate on the different schemas independently. There need not be ONE schema, and we can have useful experimentation and mutation!

So I'm not going to address these comments in this PR, but I'd love to talk with you about what you'd like a convenient schema to look like, and how to write that parser.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KOVALW @swo if as I'm discussing below griddler creates a 'flat' set of parameters (which I think is what you are getting at @KOVALW being a 'challenge'), I personally think it is ideal to handle your problem through mapping across a handlebar type format string or something similar in your code or the python preprocess step?

(I probably don't have the below formatted quite right).

I agree that this should be out of scope for griddler (i.e., getting the output in the format some x, y, z thing needs).

f'''{
    "Parameters": {
        "initial_cases": {{initial_cases}},
        "offspring_distribution": {
            "{{offspring_distribution_name}}": {
                "mean": {{offspring_distibution_mean}}
              }
         },
        "generation_interval_distribution": {
                "{{generation_interval_name}}": {
                    "min": {{generation_interval_min}}, "max": {{generation_interval_min}}
                 }
          }
 <...>
    }
}'''
...

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree @swo and @bbbruce. The general point that this new grammar doesn't prohibit multiple schemas is good to have clarified here because I wasn't certain that the hard-coded union and product methods on each subexperiment would hold up with the schema I describe above

@bbbruce
Copy link
Copy Markdown
Contributor

bbbruce commented Jun 13, 2025

@bbbruce I'm not following your comment fully and I expect that that's because I've changed nomenclature.

I no longer refer to "parameter sets." I've replaced that concept with "Specification," which I think I confusingly used to use for what I now call "Experiment".

@swo - Good - yes, I am following, i.e., "Spec(ification)" <=> (old) "parameter set." I still think there is one concept too many because I guess I don't understand where and why "sets of sets" arise. Maybe there are just experiments and there are parameters. And I think the "vector" example is perfect. You don't need a concept of vector. The 'vector' is just a Spec or an Experiment that just as you note, products across another thing of the same type (Spec or Experiment).

product:
            - [{ r0: 0.1, v_eff: 0.5 }, { r0: 0.2, v_eff: 0.2 }, ...]   # <- Is this a Spec, and
            - [{ seed: 1 }, { seed: 2 }, { seed: 3 }]                  # <- is this a Spec? Yes, right?

Or not? This is why I feel like I'm still missing something even after reading your explanations.

Ultimately you want to be able to run a simulation function or the like across a 'flat' list of 'parameter sets' which is what I thought griddler was trying to generate. But again, it seems I'm missing something somewhere. If this doesn't clear up the confusion it may help to have a discussion.

@swo swo mentioned this pull request Jun 17, 2025
@swo
Copy link
Copy Markdown
Contributor Author

swo commented Jun 17, 2025

This was a helpful discussion!

@bbbruce had some questions about terminology and the utility of the Specification concept. We resolved a lot of those synchronously.

@KOVALW had some questions about how to efficiently represent Specifications with nested structure (i.e., dictionaries). This is a schema-specific question that I'm going to punt to #74.

@bbbruce
Copy link
Copy Markdown
Contributor

bbbruce commented Jun 17, 2025

Appreciate the discussion with @swo that helped me understand that what I was writing was an Experiment [...] with Specs {...} of Parameters and how having all three is valuable for reasoning about them.

@swo
Copy link
Copy Markdown
Contributor Author

swo commented Jun 17, 2025

product:
            - [{ r0: 0.1, v_eff: 0.5 }, { r0: 0.2, v_eff: 0.2 }, ...]   # <- Is this a Spec, and
            - [{ seed: 1 }, { seed: 2 }, { seed: 3 }]                  # <- is this a Spec? Yes, right?

Or not? This is why I feel like I'm still missing something even after reading your explanations.

{ r0: 0.1, v_eff: 0.5} is a Specification, as is { seed: 1}. So the two lines there are each an Experiment.

@swo swo merged commit 1d94a06 into main Jun 17, 2025
4 checks passed
@swo swo deleted the swo_grammar branch June 17, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A(nother) new grammar of parameters

4 participants