Conversation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
This PR removes the old v0.3 schema and parser, introduces a clean separation between schema (v0.4) and core logic, provides a new parse API, adds Spec/Experiment in core.py, updates tests for the v0.4 schema, and refreshes documentation for the new organization.
- Remove legacy v0.3 code and tests, introduce v0.4 schema-driven parsing
- Add
SpecandExperimentingriddler/core.pyand top-levelparsedispatch - Update docs (nav structure, new core concepts, API pages) and expand tests for v0.4
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_schema.py | Removed v0.3 schema tests |
| tests/test_griddle_v04.py | New tests validating v0.4 parsing and experiment operations |
| tests/test_griddle.py | Removed legacy griddle parser tests |
| mkdocs.yml | Updated nav entries for new docs pages |
| griddler/schemas/v04/schema.json | Added JSON schema for the v0.4 schema |
| griddler/schemas/v04/init.py | Implemented schema loading and v0.4 parse logic |
| griddler/core.py | Introduced Spec and Experiment classes with union/product |
| griddler/init.py | Top-level parse that dispatches to the proper schema parser |
| docs/index.md, docs/griddles.md, etc. | Updated documentation to reflect new schema and API structure |
Comments suppressed due to low confidence (1)
tests/test_griddle_v04.py:148
- Add a test case that passes two specs with overlapping keys to
Experiment *and asserts aRuntimeError, covering the key-disjointness error path.
<end of file>
bbbruce
left a comment
There was a problem hiding this comment.
I'm a little lost in the nomenclature... I'm not sure I understand why you need the third concept of an "Experiment" and even in the examples below it appears that you are simply taking unions and products of "Parameter Sets", i.e., "Specification" here (of course in the code you have only implemented the operations at the Experiment level). What makes an "Experiment" fundamentally different from a "Spec" such that all things aren't simply operations upon "Spec"s or "Parameter Sets"? E.g., if you were going to replicate simulations with different seeds, etc., you'd just product those sets across the seeds or the replicate index (which is what I did here - metapop/sim.py in https://github.com/cdcent/metapop-model/pull/439/ new lines 296-315) in the measles work) further simplifying your approach (I think).
|
@bbbruce I'm not following your comment fully and I expect that that's because I've changed nomenclature. I no longer refer to "parameter sets." I've replaced that concept with "Specification," which I think I confusingly used to use for what I now call "Experiment".
You're right that an "Experiment" is just a set of Specifications (which is my new name for "parameter set"). The union and product operations are just unions and products over those sets of Specifications (with some caveats about disjointness/parameter name collisions). I like having a word for that concept, because it roughly maps onto modeler jargon that an "experiment" is a set of simulations with different parameterizations. It also means I don't need to say "sets of sets of parameter name/value pairs".
To be clear, I'm showing unions and products of Experiments.
This is true; I should have an explicit This is making me realize I should make a distinction between a union+update operation and a pure union operation, for both Specifications and Experiments. They could be called "union" and "disjoint/strict union", or "update union" and "union", depending on which one I want as default.
I think this is a confusion about nomenclature. There are no longer "parameter sets", and yes, operations on Experiments are just operations on the sets of Specifications that constitute them.
The way I would think about this is that you have (1) an Experiment representing of all the simulations you want replicated and (2) another Experiment that is a set of Specifications, each of which has only a single Parameter, which is the seed and its value. You take the product of those two Experiments, and now you have a single Experiment that says, run each simulation in Experiment 1 using each seed from Experiment 2. I agree it's a little funny to call Experiment 2 here an "experiment." It's maybe more natural to think of your seeds as a vector that you want to cross-product into Experiment 1. But the current approach means that I don't need to introduce a new concept for a "vector". |
KOVALW
left a comment
There was a problem hiding this comment.
The Parameter..Spec..Experiment setup seems very concise to me, and I think will work perfectly with python models. I'm wondering how easily we'll be able to use this for writing input files for ixa and GCM models though
| - union: | ||
| - product: | ||
| - [{ distribution: normal }] | ||
| - [{ mean: 0.5 }, { mean: 1.0 }, { mean: 1.5 }] |
There was a problem hiding this comment.
Since these parameters are properties of a normal distribution, not necessarily separate parameters, is there any automatic handling that you envision being done by the package or is that supposed to be on the user side? As you know, in a lot of ixa input files, we have Specifications like
{
"Parameters": {
"initial_cases": 1,
"offspring_distribution": {
"Poisson": {
"mean": 1.0
}
},
"generation_interval_distribution": {
"Uniform": {
"min": 7.0, "max": 17.0
}
} ...
}
}
which have both atomic parameter values and nested schemes that assign properties to parameter keys.
Our workaround has been to flatten and unflatten nested dictionaries, using only flattened setups for griddles, such as
"offspring_distribution>>>NegativeBinomial>>>concentration": {
"vary": [0.5, 1.0],
"if": {
"equals": {
"scenario_offspring_distribution": "NegativeBinomial"
}
}
}in the v0.3 JSON syntax, and leaving off fixed variable unions until they're combined with a native ixa input file. We then specify overwriting the upper level parameter, such as offspring_distribution in this case, where the whole nested chunk is replaced by the output of each griddle Specification to generate the Experiment set.
Providing this as a method would probably see a lot of common use and it might be worth figuring out a consistent internal method on Experiment so that we don't have to write out long flattened names
There was a problem hiding this comment.
This comment is both entirely on point and out of scope.
For me, this PR is about (1) clarifying the underlying objects & concepts, (2) separating that from the griddle schemas, and (3) kind of incidentally, introducing a v0.4 schema. I'm putting in this v0.4 before it's so easy to parse!
I take your comment to say that you'd prefer a different schema, which I think is great! If v0.3 worked better for you, then let's get that implemented (using the new underlying logic, where we construct an Experiment, rather than whatever I did at package version v0.3).
And if neither v0.3 nor v0.4 works for you, then let's make an ixa schema?? To me the beauty now is that we can iterate on the different schemas independently. There need not be ONE schema, and we can have useful experimentation and mutation!
So I'm not going to address these comments in this PR, but I'd love to talk with you about what you'd like a convenient schema to look like, and how to write that parser.
There was a problem hiding this comment.
@KOVALW @swo if as I'm discussing below griddler creates a 'flat' set of parameters (which I think is what you are getting at @KOVALW being a 'challenge'), I personally think it is ideal to handle your problem through mapping across a handlebar type format string or something similar in your code or the python preprocess step?
(I probably don't have the below formatted quite right).
I agree that this should be out of scope for griddler (i.e., getting the output in the format some x, y, z thing needs).
f'''{
"Parameters": {
"initial_cases": {{initial_cases}},
"offspring_distribution": {
"{{offspring_distribution_name}}": {
"mean": {{offspring_distibution_mean}}
}
},
"generation_interval_distribution": {
"{{generation_interval_name}}": {
"min": {{generation_interval_min}}, "max": {{generation_interval_min}}
}
}
<...>
}
}'''
...
There was a problem hiding this comment.
@swo - Good - yes, I am following, i.e., "Spec(ification)" <=> (old) "parameter set." I still think there is one concept too many because I guess I don't understand where and why "sets of sets" arise. Maybe there are just experiments and there are parameters. And I think the "vector" example is perfect. You don't need a concept of vector. The 'vector' is just a Spec or an Experiment that just as you note, products across another thing of the same type (Spec or Experiment). Or not? This is why I feel like I'm still missing something even after reading your explanations. Ultimately you want to be able to run a simulation function or the like across a 'flat' list of 'parameter sets' which is what I thought griddler was trying to generate. But again, it seems I'm missing something somewhere. If this doesn't clear up the confusion it may help to have a discussion. |
|
This was a helpful discussion! @bbbruce had some questions about terminology and the utility of the Specification concept. We resolved a lot of those synchronously. @KOVALW had some questions about how to efficiently represent Specifications with nested structure (i.e., dictionaries). This is a schema-specific question that I'm going to punt to #74. |
|
Appreciate the discussion with @swo that helped me understand that what I was writing was an Experiment |
|
v0.4, speculatively. This schema is fiddly but it is substantially easier to implement than the other ones.parse()function to the correct parser, based on the schema specified in the griddle.See #67 for rationale behind the logic for this change, which solves many of the higher-order conceptual problems. E.g., there are now no DAGs.
Resolves #67