New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add URIs & definitions for additional data formats (XLSX, HDF5, JSON) #52
Comments
EDAM (see #94) already has terms for all of these formats. I second the use of HDF5. This is key for large datasets. For structured datasets, another format that might make sense is SQLite. |
I added hdf5 as an option, as it's clearly already getting a ton of use. Here's what I put in for its section (after the CSV/TSV descriptions): HDF5 (Hierarchical Data Format version 5) Each dimension of SED-ML RepeatedTask output should be labeled according to the id of the SED-ML Each dimension may also be annotated in this format, with some ontology such as the ’Semanticscience |
I didn't add xlsx or JSON or SQLite. I can, though those might be more complicated? |
This information is only straightforward for datasets when datasets derive from a single top-level task. Data sets which arise from computations spanning the results of multiple tasks won't have a single top-level task id or clear semantics for other dimensions. There's multiple options around this
I think L1V4 could say something like "when data generators only contain results from a single task, we recommend that reports of their results contain the following metadata ...". Dealing with this properly could be punted to L2. |
If JSON is being used, I feel like that would benefit from its own explanation since there's multiple ways data could be encoded. |
You're right that I should include a bit about the RemainingDimensions, but I don't know of any other way to reduce the dimensionality of SED-ML data through computation, given that we require all calculations to be element-by-element, and for cross-matrix data calculations to have identical dimensions. I don't know of anyone using JSON; if there is, I would invite them to write about how they're using it to encode this data! |
OK, I updated the HDF5 section to include: "When a DependentVariable is used to reduce the dimensionality of a set of data, the ids of whatever When output from multiple tasks are combined mathematically, their dimensions must match exactly, I also added this bit to the DataGenerator class: "When multidimensional data is output to a Report, information about the dimensions should be stored (Both CSV and HDF5 are links to the relevant sections.) |
I think this will conflict with making
I don't think this is needed. The results of calculations are assigned to data generators, which have ids. Users can set these ids to be meaningful strings as with all other ids. |
Issue
In L1V3 only NuML, CSV and TSV is defined. We have to add section to the spec describing additional formats.
Proposal
Define the respective URIS
urn:sedml:format:xslx
urn:sedml:format:hdf5
urn:sedml:format:json
with the restriction of the allowed data and DimensionDescriptions.
This requires the ability to specify complex sources. I.e. nested files and parts of files.
#46
The text was updated successfully, but these errors were encountered: