Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the semantics of the interaction of repeated tasks with reports and plots #103

Closed
jonrkarr opened this issue Feb 2, 2021 · 5 comments

Comments

@jonrkarr
Copy link
Contributor

jonrkarr commented Feb 2, 2021

As noted in the specifications (NOTE: This example produces three dimensional results ...) repeated tasks implicitly produce multi-dimensional results. The specifications show examples that generate three dimensions (time, model variable, iteration over a range of a repeated task). More generally, repeated tasks can produce yet more dimensions. If a repeated task has multiple subtasks, there should be an additional dimension (e.g., time, model variable, iteration over a range of a repeated task, subtask within the repeated task). For spatial simulations that produce multiple dimensions on their own, the results could have dimensions (time, x, y, z, model variable, iteration over a range of a repeated task, subtask within the repeated task).

The semantics of the above for variables, data generators and especially reports and plots is under-specified.

  • Reports are ill-defined, especially because reports implicitly are encouraged to use CSV.
  • Plots are ill-defined. For example, should simulation tools display one curve per iteration and subtask?

A few changes could make reports better defined

  • Explicitly indicate that multi-dimensional reports need to be stored with a format such as HDF5
  • Define a convention for the order of dimensions of the multi-dimensional reports
    1. Data set
    2. Model dimensions (e.g., {time} for non-spatial simulations or {time, x, y, z} for spatial simulations)
    3. Subtask within listOfSubTasks with a repeated task
    4. Iteration through the range of a repeated task
  • Define conventions for labeling the axes of reports and each slice of the subtask and iteration dimensions.
    • One useful ontology for labeling axes is SIO.
  • Embrace conventions for annotating the dimensions of reports. This is another weakness of CSV/TSV that HDF5 overcomes.

Addressing the issues with plots requires more work.

In addition to these changes, more examples (including expected results) would be helpful.

@luciansmith
Copy link
Contributor

The current spec reads, in the 'Report' introduction:

"The encoding of simulation results is not part of SED-ML Level 1 Version 4, but it is recommended that
2D output be exported as CSV files, using the label as column headers, and that output with more
dimensions be exported as HDF5, again using the label to uniquely identify the data sets."

You've proposed a lot more detail above, and I'm not sure where to put it, i.e. in the relevant sections, or maybe in a 'best practices' appendix? Or is the short description above enough?

@jonrkarr
Copy link
Contributor Author

Since its "not part of SED-ML Level 1 Version 4", I guess this is enough. Clarifying this should should be high priority for a future version. The lack of clear output is one of the biggest barriers to adoption.

What about plots for repeated tasks?

@luciansmith
Copy link
Contributor

I would assume that plotting a repeated task would plot all the x,y pairs on a single x,y plane? Do we need more of a description than that?

And I'm more than happy to write more than that brief note, but I'm not sure what the most important parts are to add. What would be your 'highest priority' facts/conventions to add to that description?

@jonrkarr
Copy link
Contributor Author

Plotting each a curve for each individual simulation sounds reasonable. This clarifies that the plot shouldn't be something else such as a density plot.

Similar to the discussion for mathematical calculations, x, y, and z data generators for a curve/surface need to have the same shape.

luciansmith added a commit that referenced this issue Jun 11, 2021
Define the HDF5 format, and explain how to plot multidimensional data.
@luciansmith
Copy link
Contributor

Added a whole HDF5 section (see #52) as well as clarifying how to plot multidimensional data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants