Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata for results #12

Closed
rproepp opened this issue Aug 6, 2014 · 4 comments
Closed

Metadata for results #12

rproepp opened this issue Aug 6, 2014 · 4 comments
Assignees

Comments

@rproepp
Copy link
Member

rproepp commented Aug 6, 2014

I think this warrants its own issue:

@toddrjen wrote in #11:

This also leads me to another issue I have been thinking about: what do we do about the metadata of a neo object? When, for example, we get the average spike rate of a spike train, we end up with just a quantity. Is that what we want? Might it be a good idea to have some class that stores the output of these sorts of analyses along with the metadata of the original neo object? Or is that overkill?

The problem with this is doing it in a generic manner. You can't really use a SpikeTrain, since the resulting object may not meet the rules of a SpikeTrain. On the other hand, creating a generic "results" class would make it impossible to know what metadata you should expect from an object. And having a more specific SpikeTrainResults object would be difficult since it would need to be able to handle scalars, 1D arrays, and maybe even ND arrays depending on what analyses we allow. So it is a difficult problem, but I think having some way to keep the metadata bound to the results of some manipulation is important.

I think this verges into overkill territory :-) For most results (like the average rate of a spike train), the caller knows exactly from what object the result has been calculated. The caller also knows if and what metadata is needed, while our analysis function doesn't, so I would leave the responsibility upstream.

However, there might be analysis where this information is not available to the caller. For example, an analysis that takes a number of objects, but only uses some of them based on their content. I don't know if we will have such functions - I would try to avoid it but it might be necessary for some algorithms. In that case, I would return providence information to the caller: provide which objects have actually been used. By linking results to the actual objects used in their creation, all metadata is available and we do not need to create new result types with all the complications that come with that.

@toddrjen
Copy link
Contributor

toddrjen commented Aug 6, 2014

This wouldn't necessarily be something supported inside the algorithms. It could very well be up to the user to load the data and metadata (although it could be made very easy, such as having a method to copy the properties and annotations of a class).

Carrying around the original objects doesn't strike me as a very efficient approach. If I have gotten the average firing rate of a spike train, it is a huge memory and, if I save intermediate results, storage space savings if I can just abandon the original data. But neo doesn't really have a class that would be suitable for saving a single rate value with SpikeTrain metadata.

@rproepp
Copy link
Member Author

rproepp commented Aug 7, 2014

Ok, I thought you were talking about supporting it in the analysis functions. But I am also skeptical whether a class to keep results and metadata in absence of the original data object would be useful, either.

For the example of a spike train: there are various kinds of metadata that might be of interest. First there's the annotations, those are just a dictionary. Easy to obtain and not much else that can be done with it. There are some properties of the class that can be considered metadata such as t_start etc. It's a small amount of work to gather all of them; a standard way to get and store all of them could be useful. And then there's the context: it is often more interesting to what unit or segment a spike train belongs to or what their metadata is than the data on the SpikeTrain object itself.

Supporting the context metadata without the original containers becomes complex quickly. Storing all of this takes quite a bit of space as well, many times more than the average rate for example, so users might want some fine grained control over what to include. That further increases the complexity and reduces the convenience advantage compared to just doing it manually.

Then there are analyses that operate on multiple objects, possibly of different types. And the result itself can be pretty much any type as you said, so I don't see an advantage of encapsulating that part, either.

@mdenker
Copy link
Member

mdenker commented Aug 7, 2014

We had several discussions about this issue, and I think most here would agree that meta data management and retaining provenance information in a central container is a very difficult issue to tackle this early on. Besides the compelxity of meta data on the original data objects as outlined by rproepp, the return types and the semantic meaning of these return types can vary a lot between analysis. Some of the more advanced routines will not be able to fit their data into the neo framework, but may produce quite complex outputs. Keeping a complete trail of such information across a couple of routines chained together is very difficult I believe. I would suggest to postpone this topic until more routines have accumulated to better estimate if there is some way (and which) of managing their output.

On a much lower level, it could be worthwhile though to make it common practice to least annotate data that is output as neo object with useful information from the analysis, e.g., when filtering a signal to add the filter parameters as annotation to the resulting AnalogSignal. That would be a very small step, but already be a big step forward towards better workflows.

@Moritz-Alexander-Kern
Copy link
Member

Moritz-Alexander-Kern commented Sep 5, 2023

For provenance tracking see Alpaca (Automated Lightweight Provenance Capture):
Image Description
Documentation:
https://alpaca-prov.readthedocs.io/en/latest/

Code:
https://github.com/INM-6/alpaca

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants