Standardize .bidsignore #131

yarikoptic · 2019-01-22T19:40:49Z

ATM bids-validator supports .bidsignore file and describes it as

.bidsignore

Optionally one can include a .bidsignore file in the root of the dataset. This file lists patterns (compatible
with the .gitignore syntax) defining files that should be ignored by the
validator. This option is useful when the validated dataset includes file types not yet supported by BIDS specification.

*_not_bids.txt
extra_data/

while also hardcoding some additional ignores:

    .add('.*')
    .add('!*.icloud')
    .add('/derivatives')
    .add('/sourcedata')
    .add('/code')

In the Python land of pybids there is a constant battle to hardcode various additional ignores, see e.g. bids-standard/pybids#277 (comment) and a common consensus is that supporting .bidsignore would be the way to go forward. For that to happen properly, and to not have varying/possibly conflicting support of .bidsignore file across software toolkits, it is desired to describe that in BIDS specification document somewhere, e.g. a new 99-appendices/09-software-support.html, stating our expectations for that file syntax. I see two choices

unification with .gitsignore - sounds like a nice idea, but I am afraid it might be also a fragile one unless we start use git itself, i.e. git-check-ignore (with hard-dependency on git, which I am not) since through time Git might introduce/change .gitignore specification. ATM (if I got it right) bids-validator relies on https://www.npmjs.com/package/ignore to provide JS implementation. For pybids we could probably find some Python implementation, but so far the ones I found are too adhoc and/or old/abandoned (e.g. https://github.com/snark/ignorance), so calling out to git-check-ignore sounds like a best idea
prescribed subset of .gitignore - we clearly describe what patterns are supported etc.

What do you think the @bids-standard/everyone ?

The text was updated successfully, but these errors were encountered:

nicholst · 2019-01-23T08:43:42Z

Seems sensible... like the use of .gitignore syntax and not unifying with .gitignore itself.

sappelhoff · 2019-01-23T09:30:21Z

it is desired to describe that in BIDS specification document somewhere, e.g. a new 99-appendices/09-software-support.html, stating our expectations for that file syntax.

+1

though as long as we describe our ignore syntax explicitly it is IMO not strictly necessary to have it as a subset of .gitignore, that is, a few additional ignore rules should be fine even if they are not supported by .gitignore (all given that there are good reasons)

robertoostenveld · 2019-01-23T09:52:24Z

You write

support of .bidsignore file across software toolkits

I have a conceptual issue there... but perhaps I am just stating the obvious.

with .gitignore I know that it is git (the application) that is reading it and subsequently ignoring certain files. With .bidsignore it is not explicit which application should ignore the files. I expect bids-validator to ignore it (and know that expectation to be met). But if have a .bidsignore in my own BIDS dataset, I don't expect my own pipelines to be ignoring the listed files. Most likely I put the to-be-ignored datafiles there just because my application/pipeline needs them.

For me .bidsignore conceptually corresponds to the "diff" between the file list that the one application (bids-validator) expects, and the file list that the other application (my pipeline) expects. That works in the specific case, but fails to scale if three applications are involved. I also don't see how it would scale to arbitrary applications.

Can we agree on .bidsignore only to be read and interpreted by the bids-validator? Or more precisely, that it is the list of files (and directories) to ignore to make the dataset bids compliant. Compliance testing is currently implemented in the nodejs bids-validator app, but in the future might be checked by other toolkits.

Also note that if I were to release my application/pipeline as "myapp", and if it would need to ignore files, I might want those be listed in a .myappignore. As such it would fall outside the bids-specification (since future apps cannot be part of the current specification). The solution to keep the dataset bids-compliant would then be to include .myappignore in the .bidsignore.

edickie · 2019-01-23T13:42:23Z

My assumption would be that .bidsignore would be used by the bids-validator and pybids (the python package that many python-based bids-apps are using for parsing the BIDS folder structure). Are there any others?

…

On Wed, Jan 23, 2019 at 4:52 AM Robert Oostenveld ***@***.***> wrote: You write support of .bidsignore file across software toolkits I have a conceptual issue there... but perhaps I am just stating the obvious. with .gitignore I know that it is git (the application) that is reading it and subsequently ignoring certain files. With .bidsignore it is not explicit which application should ignore the files. I expect bids-validator to ignore it (and know that expectation to be met). But if have a .bidsignore in my own BIDS dataset, I don't expect my own pipelines to be ignoring the listed files. Most likely I put the to-be-ignored datafiles there just *because* my application/pipeline needs them. For me .bidsignore conceptually corresponds to the "diff" between the file list that the one application (bids-validator) expects, and the file list that the other application (my pipeline) expects. That works in the specific case, but fails to scale if *three* applications are involved. I also don't see how it would scale to arbitrary applications. Can we agree on .bidsignore only to be read and interpreted by the bids-validator? Or more precisely, that it is the list of files (and directories) to ignore to make the dataset bids compliant. Compliance testing is currently implemented in the nodejs bids-validator app, but in the future might be checked by other toolkits. Also note that if I were to release my application/pipeline as "myapp", and if it would need to ignore files, I might want those be listed in a .myappignore. As such it would fall outside the bids-specification (since future apps cannot be part of the current specification). The solution to keep the dataset bids-compliant would then be to include .myappignore in the .bidsignore. — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#131 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMl59qyQkdVQDLa-rw9ROFQ14_xgt6C1ks5vGDDbgaJpZM4aNYRV> .

jasmainak · 2019-01-23T13:49:21Z

I agree with @robertoostenveld . However, regarding this:

Compliance testing is currently implemented in the nodejs bids-validator app, but in the future might be checked by other toolkits.

Do you mean that these toolkits are completely equivalent implementations? That is, for the same input they produce the same validation output. Otherwise, I can imagine a scenario where one dataset would pass on one validator but not on the other. Thus, I think there can be only one validator ...

yarikoptic · 2019-01-23T16:48:14Z

Can we agree on .bidsignore only to be read and interpreted by the bids-validator?

I would say - "No", because it is is not a .bids-validatorignore.
But indeed software-specifics is a valid concern, but it is not for the "dataset producer" to code up all the ignores for all possible software, so I would say there should be a way to provide generic ignores (e.g. supplementals, environments etc, which are present but not part of BIDS and/or just shouldn't be considered -- may be even some bad sub-* directories) and then software-specific ignores. But if it is app specific -- it is for the app to take care about ignoring some to start with. E.g. bids-validator already hard-codes application specific ignore - derivatives since it doesn't support them (for now at least). And that is why indeed generally speaking derivatives/ shouldn't be in the .bidsignore.
If there is some other use case where one tool (flexibly) should ignore some directories which are generally available in the BIDS (any specific usecase @robertoostenveld?), I see two ways to proceed;

if we decide to go for a "custom" format for .bidsignore -- then we could allow for annotation of entries with a list of software to take them into account (somewhat messy, don't like it)
in addition ot .bidsignore -- add that .myappignore or .bidsignore-myapp (myapp could be bids-validator or pybids) and corresponding software would read it (pybids would probably then just allow to pass some "software" name instead of pybids since pybids largely is a library so it might be for a higher sitting software to make decisions on what to ignore)

jasmainak · 2019-01-23T17:07:19Z

wouldn't you say that this requires a fix in the validator rather than bending the standard to make the validator happy? Perhaps a command line flag like --ignore-derivatives or so might do the trick.

Then you can have .pybidsignore for files to be ignored by pybids. For me .bidsignore should mean what you call .bids-validatorignore because you have no standard without the validator.

effigies · 2019-01-23T17:46:06Z

This is starting to get complicated, IMO, which worries me. I think .bidsignore is a decent idea, and that its interpretation is "I know about these non-BIDS-conformant files. Don't throw errors." It is then up to the app developer what else to do about these files.

With bids-validator, that's easy: Its whole purpose in life is to throw errors.

With pybids, it's a little less straightforward, as I may mean "Do not index" or "Index, but don't yell about non-standard entities". I think we have strictness modes in bids.BIDSLayout already, so I think that's fine. If an app developer needs some other interpretation of .bidsignore, that can be handled in pybids. Further, pybids allows custom entities to be defined by app developers, which is going to be necessary as derivatives cannot encompass all possible data descriptions.

But I really don't like the idea of a .pybidsignore vs .bids-validatorignore, selecting different subsets of the dataset to make invisible to different tools. It's either valid BIDS or it's not, and then how to work with non-standard files is a separate question. It's also no longer describing the data but a dataset-dependent config file for an app. That's a bit concerning, and to the extent that this is desirable, I think it is more appropriately handled by the as-yet-nonexistent BIDS Execution configuration.

That said, there's nothing stopping you from adding .*ignore to your .bidsignore and making 20 different .XYZignore files.

satra · 2019-01-23T18:20:10Z

@yarikoptic - but .bidsignore was introduced to the bids spec to support validation for bids. it was not introduced to support any other application other than to say "ignore these files when determining if this a bids dataset".

and to reduce output noise from the validator just as .git and others do to reduce noise on their output.

so it's tied to the spec and hence to validation of the dataset.

yarikoptic · 2019-01-23T20:20:49Z

@yarikoptic - but .bidsignore was introduced to the bids spec ...

Was it?

yarikoptic · 2019-01-23T20:26:46Z

@effigies et al. I agree that we should stop talking about separate application specifics here and concentrate on the .bidsignore to get described in the spec as to prescribe files/directories which must not be considered as far as we are talking about BIDS (even if conform bids, possibly in a subdirectory structure) - queries, manipulations, etc. So mentally really the same what .gitignore does for git.

effigies · 2019-02-06T15:17:01Z

Just to return to this, I want to copy in the .gitignore conventions so we can directly discuss them. I think they're generally reasonable, though I don't see much value in the ! rules. The rest can be mostly summarized as fnmatch + ** globs.

And my overall position is I would rather have explicit rules than a dependency on git, which could potentially (though is unlikely to) be a moving target.

PATTERN FORMAT

A blank line matches no files, so it can serve as a separator for readability.

A line starting with # serves as a comment. Put a backslash ("\") in front of the first hash for patterns that begin with a hash.

Trailing spaces are ignored unless they are quoted with backslash ("\").

An optional prefix "!" which negates the pattern; any matching file excluded by a previous pattern will become included again. It is not possible to re-include a file if a parent directory of that file is excluded. Git doesn’t list excluded directories for performance reasons, so any patterns on contained files have no effect, no matter where they are defined. Put a backslash ("\") in front of the first "!" for patterns that begin with a literal "!", for example, "\!important!.txt".

If the pattern ends with a slash, it is removed for the purpose of the following description, but it would only find a match with a directory. In other words, foo/ will match a directory foo and paths underneath it, but will not match a regular file or a symbolic link foo (this is consistent with the way how pathspec works in general in Git).

If the pattern does not contain a slash /, Git treats it as a shell glob pattern and checks for a match against the pathname relative to the location of the .gitignore file (relative to the toplevel of the work tree if not from a .gitignore file).

Otherwise, Git treats the pattern as a shell glob: "*" matches anything except "/", "?" matches any one character except "/" and "[]" matches one character in a selected range. See fnmatch(3) and the FNM_PATHNAME flag for a more detailed description.

I copied that description here:

The name fnmatch() is intended to imply filename match, rather than pathname match. The default action of this function is to match filenames, rather than pathnames, since it gives no special significance to the slash character. With the FNM_PATHNAME flag, fnmatch() does match pathnames, but without tilde expansion, parameter expansion, or special treatment for a period at the beginning of a filename.

A leading slash matches the beginning of the pathname. For example, "/*.c" matches "cat-file.c" but not "mozilla-sha1/sha1.c".

Two consecutive asterisks ("**") in patterns matched against full pathname may have special meaning:

A leading "**" followed by a slash means match in all directories. For example, "**/foo" matches file or directory "foo" anywhere, the same as pattern "foo". "**/foo/bar" matches file or directory "bar" anywhere that is directly under directory "foo".

A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.

A slash followed by two consecutive asterisks then a slash matches zero or more directories. For example, "a/**/b" matches "a/b", "a/x/b", "a/x/y/b" and so on.

Other consecutive asterisks are considered regular asterisks and will match according to the previous rules.

tyarkoni · 2019-03-21T18:29:28Z

I thought I commented on this earlier, but I guess not. Just to add my two cents: I feel pretty strongly that .bidsignore handling should live in BIDS-Validator and not in PyBIDS. I'm happy to pass the path to the .bidsignore file (or even its contents) to the validator at initialization (probably necessary, since the validator currently operates only on a file-by-file basis), but there's no reason PyBIDS should have to be aware of .bidsignore at all. Putting handling in the validator ensures that other Python packages can guarantee proper .bidsignore processing without having to reimplement it themselves.

yarikoptic · 2019-03-21T19:10:18Z

@tyarkoni Could you please elaborate a bit more since I am not sure I am following.

I'm happy to pass the path to the .bidsignore file (or even its contents) to the validator

Are you suggesting that PyBIDS could/should use bids-validator (the alien nodejs utility) to provide it (PyBIDS) with the list of ignored (or the opposite - legit) files, so that PyBIDS could then use that information to filter its results?

probably necessary, since the validator currently operates only on a file-by-file basis

it validates consistency of lengths for the same func/ file across subjects/runs - so clearly not just independently validates files in isolation.

but there's no reason PyBIDS should have to be aware of .bidsignore at all.

E.g. if I have a file (or a subtree of directories, e.g. myworkdir/) which is not a conventional BIDS, to pacify bids-validator I will add an entry to .bidsignore. It would stop whining and I get happy. Then I start using PyBIDS, and it would return me all the hits for the files, even the ones I .bidsignore'd (e.g. under myworkdir/). Is that what is intended?

Putting handling in the validator ensures that other Python packages can guarantee proper .bidsignore processing without having to reimplement it themselves.

How would that happen? validator would just ignore them, while Python packages would have no clue they were ignored (unless the idea for them to talk to validator)?

tyarkoni · 2019-03-21T19:45:40Z

Are you suggesting that PyBIDS could/should use bids-validator (the alien nodejs utility) to provide it (PyBIDS) with the list of ignored (or the opposite - legit) files, so that PyBIDS could then use that information to filter its results?

Each BIDSLayout internally initializes a BIDSValidator. Thereafter, file validation is done by calling that validator. I'm proposing doing something like BIDSValidator(bidsignore='/path/to/.bidsignore', ...). The BIDSValidator would be in charge of parsing that file and updating its internal filters for subsequent is_bids() calls.

it validates consistency of lengths for the same func/ file across subjects/runs - so clearly not just independently validates files in isolation.

Well, the JavaScript validator may be doing this... I don't believe the Python validator is (which is a problem that probably needs to be addressed separately).

E.g. if I have a file (or a subtree of directories, e.g. myworkdir/) which is not a conventional BIDS, to pacify bids-validator I will add an entry to .bidsignore. It would stop whining and I get happy. Then I start using PyBIDS, and it would return me all the hits for the files, even the ones I .bidsignore'd (e.g. under myworkdir/). Is that what is intended?

The BIDSLayout has ignore and force_index arguments that take precedence over everything else. So you can always make pybids do what you like. But I definitely don't think we should have separate .bidsignore and .pybidsignore files!

How would that happen? validator would just ignore them, while Python packages would have no clue they were ignored (unless the idea for them to talk to validator)?

On this proposal, the BIDSValidator would take a bidsignore argument. The BIDSValidator is currently unaware of BIDS projects (as I said earlier, it receives files one at a time). So it would fall to each tool to decide whether or not to pass the .bidsignore file to the validator.

tyarkoni · 2019-03-21T19:48:10Z

On re-read, @yarikoptic, I think what you may be missing is that the Python validator doesn't work like the JS validator. It doesn't do a single pass through the dataset and return problems; it currently validates each file on an ad-hoc basis. We definitely need to implement layout-level checking too (and that should probably live in the BIDSValidator as a method that takes a BIDSLayout as an argument), but otherwise the way it's set up is pretty helpful in terms of interaction with other packages (including pybids).

yarikoptic mentioned this issue Jan 22, 2019

[ENH]: exclude common non-BIDS data sources + evade potential named tuple problem bids-standard/pybids#277

Closed

effigies mentioned this issue Feb 6, 2019

How to handle file exclusions and inclusions in 0.8? bids-standard/pybids#378

Closed

robertoostenveld mentioned this issue Jul 4, 2019

[FIX] Clarify paragraph about custom data types #264

Merged

emdupre mentioned this issue Feb 10, 2020

select which T1 to use when multiple _T1w.nii.gz are present nipreps/smriprep#104

Closed

sappelhoff mentioned this issue Jul 6, 2020

[MRG] ENH: Ignore entities from derivatives subfolder mne-tools/mne-bids#470

Merged

7 tasks

shnizzedy mentioned this issue Dec 17, 2020

✨ Activate .bidsignore in C-PAC FCP-INDI/C-PAC#1415

Open

yarikoptic mentioned this issue Jun 17, 2021

Create/populate .bidsignore with .git (if datalad mode?) and .heudiconv nipy/heudiconv#515

Open

SamGuay mentioned this issue Oct 31, 2023

Add a flag to ignore specific runs for some subjects bids-apps/freesurfer#82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize .bidsignore #131

Standardize .bidsignore #131

yarikoptic commented Jan 22, 2019

nicholst commented Jan 23, 2019

sappelhoff commented Jan 23, 2019

robertoostenveld commented Jan 23, 2019

edickie commented Jan 23, 2019 via email

jasmainak commented Jan 23, 2019

yarikoptic commented Jan 23, 2019

jasmainak commented Jan 23, 2019

effigies commented Jan 23, 2019 •

edited

satra commented Jan 23, 2019

yarikoptic commented Jan 23, 2019

yarikoptic commented Jan 23, 2019

effigies commented Feb 6, 2019 •

edited

PATTERN FORMAT

tyarkoni commented Mar 21, 2019

yarikoptic commented Mar 21, 2019

tyarkoni commented Mar 21, 2019

tyarkoni commented Mar 21, 2019

Standardize .bidsignore #131

Standardize .bidsignore #131

Comments

yarikoptic commented Jan 22, 2019

.bidsignore

nicholst commented Jan 23, 2019

sappelhoff commented Jan 23, 2019

robertoostenveld commented Jan 23, 2019

edickie commented Jan 23, 2019 via email

jasmainak commented Jan 23, 2019

yarikoptic commented Jan 23, 2019

jasmainak commented Jan 23, 2019

effigies commented Jan 23, 2019 • edited

satra commented Jan 23, 2019

yarikoptic commented Jan 23, 2019

yarikoptic commented Jan 23, 2019

effigies commented Feb 6, 2019 • edited

PATTERN FORMAT

tyarkoni commented Mar 21, 2019

yarikoptic commented Mar 21, 2019

tyarkoni commented Mar 21, 2019

tyarkoni commented Mar 21, 2019

effigies commented Jan 23, 2019 •

edited

effigies commented Feb 6, 2019 •

edited