Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguity about how to deal with inherited tsv contents #337

Open
tyarkoni opened this issue Sep 19, 2019 · 8 comments
Open

Ambiguity about how to deal with inherited tsv contents #337

tyarkoni opened this issue Sep 19, 2019 · 8 comments

Comments

@tyarkoni
Copy link

When discussing inheritance, the spec says this:

Any metadata file (.json, .bvec, .tsv, etc.) may be defined at any directory level, but no more than one applicable file may be defined at a given level (Example 1). The values from the top level are inherited by all lower levels unless they are overridden by a file at the lower level.

This is fine for JSON, but it's not obvious if/whether it also applies to TSV files. What's the expected interpretation here? Is it that only the first TSV file is valid, and the rest are disregarded? Or that the contents of the TSV files should be merged, with events defined at lower levels taking precedence over those at higher levels in the case of matching timing information?

The latter seems like the more sensible approach for at least events.tsv files, and I would like to preserve the ability to specify events at multiple levels (consider, e.g., a case where all participants get the same stimulus sequence, but responses are recorded separately for each run). But this requires some detail about the expected implementation... e.g., which columns define a unique event? For events.tsv, is it the combination of onset and duration, and trial_type? This should all go in the spec.

@nicholst
Copy link
Collaborator

Can't but help tag BIDS 2.0 suggestion Remove inheritance rule :)

More constructively, I concur that more clarity is needed and no intra-column merging was implied. However, it seems like it could get quickly complex. If a lower-level TSV file has a column name already defined above, it may have a different set of onset times.... do we just fill in NA's for all other rows?

Given this, I think I'm actually more in favor of Tal's first option: "the TSV file in the present level has precedence in its entity, others are disregarded". It will simplify the spec and parsing of BIDS data. This puts the onus of merging on the data-generator and not the spec.

@effigies
Copy link
Collaborator

effigies commented Sep 20, 2019

I think it's reasonable for non-time-series data to follow the rule that @nicholst proposes, but for time series, I'm not sure that this is as manageable.

I'm particularly thinking of derivatives' timeseries.tsv. At a certain point, it starts to make sense to split by type of time series to preserve some semblance of human readability. For example, we might want to have a desc-aCompCor_timeseries.tsv and a desc-motion_timeseries.tsv.

In this context, a useful constraint would be "if a column has the same name in two files, it must have the same contents", or perhaps that columns can't appear in two files. The situation I'm thinking of is desc-aCompCor_timeseries.tsv and desc-tCompCor_timeseries.tsv, where we've pre-filtered with a discrete cosine basis before calculating either, so it would make sense to have identical cosine basis columns in both files.

Perhaps this isn't a problem because we've already specified that this refers to "metadata" TSV files, and the thing to do is to explicitly identify which types of TSV are metadata. I would say there are three types:

  1. Metadata (participants, sessions, scans, phenotype/*.tsv)
  2. Events
  3. Time series (right now physio and stim, but will be more general when functional derivatives show up)

We can set the rules as appropriate for each case.

@nicholst
Copy link
Collaborator

@effigies I'm far from the derivatives discussion, so this is useful.

But your examples seem to relate to two files in the same directory, and so isn't about inheritance, right?

I agree that there might be an issue about duplicate column names in multiple files in a single directory, but isn't that a distinct discussion?

@sappelhoff
Copy link
Member

sappelhoff commented Sep 20, 2019

Given this, I think I'm actually more in favor of Tal's first option: "the TSV file in the present level has precedence in its entity, others are disregarded". It will simplify the spec and parsing of BIDS data. This puts the onus of merging on the data-generator and not the spec.

I feel like this is a pragmatic decision that avoids complicated rules about column or row merging across TSV files at different levels. --> If two TSV files are defined (one at the upper level, and one at the lower level) ... the one at the lower level has precedence for its purpose.

Perhaps this isn't a problem because we've already specified that this refers to "metadata" TSV files, and the thing to do is to explicitly identify which types of TSV are metadata. I would say there are three types:

Going via this route seems to me to be a promising alternative to the pragmatic solution above. In that case we would have to define "merging" rules for a limited set of TSV files, like suggested (participants, scans, phenotype files) [note: I don't know what you mean by "session" TSV files, Chris]

@effigies
Copy link
Collaborator

effigies commented Sep 20, 2019

@nicholst Yes, sorry, this was a bit of a broadening of the topic. I was responding to this portion of the text:

no more than one applicable file may be defined at a given level

@tyarkoni
Copy link
Author

On further thought, I think I'm now with @nicholst on this. It would greatly simplify life for everyone except the data generator to say that inheritance applies only to JSON files, and for TSVs, you only get one shot at it.

@effigies I'm kind of against having multiple metadata files at a given level apply to a file. It would be a bit of a pain to implement in a clear way. E.g., in the case of tsv files with different desc keys, the target file is presumably not going to have the same desc value. So then we'd need a rule like "tsv files apply if they match all keywords except the following...", which is kind of unappealing. I'd prefer to say that if you add multiple files with keywords that don't perfectly match the target (except for suffix and extension), you're on the hook for figuring out how to process those yourself. Software like pybids can still provide functionality to read in multiple event or time series for a given run, but I don't think the spec needs to take a stance on this.

@effigies
Copy link
Collaborator

effigies commented Sep 20, 2019

@effigies I'm kind of against having multiple metadata files at a given level apply to a file.

Same. The point is I want to distinguish between metadata TSV files, such as participants.tsv, sessions.tsv and scans.tsv, and data TSV files, such as stim, physio and timeseries. I think you should be able to have multiple of the latter.

Edit: I realize this may be off topic for this issue, and I would be glad to move it to a new issue, if we agree that this distinction exists and it's worth making clear in the spec.

@sappelhoff
Copy link
Member

sappelhoff commented Sep 23, 2019

It would greatly simplify life for everyone except the data generator to say that inheritance applies only to JSON files, and for TSVs, you only get one shot at it.

I also favor this option, but simply removing all files except JSON from the list of "inheritance-allowed-files" would be a backward incompatible change, so it's not a practical solution for our current BIDS version.

Luckily (:confused:) the inheritance principle is currently not very clear about how inheritance is dealt with for TSV files (@tyarkoni's original post), so we can now try to clarify/specify it to be as simple as possible without violating backward compatibility.

@effigies I agree that it's worth distinguishing different kinds of TSV files (meta, time-series, events, ...). A new issue for this would be nice to keep this one focused on inheritance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants