Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] BEP031 - New entity: sample and samples.tsv file #812

Merged
merged 9 commits into from
Jul 26, 2021

Conversation

mariehbourget
Copy link
Collaborator

@mariehbourget mariehbourget commented May 31, 2021

Dear BIDS community,

Context

As part of the development of the Microscopy BEP (BEP031 @mariehbourget, @jcohenadad) and Animal Ephys BEP (BEP032 @SylvainTakerkart, @JuliaSprenger), the “sample” entity was introduced in order to distinguish different tissue samples from the same subject.

In brief, the scope of the BEP031 proposal is to extend the Brain Imaging Data Structure (BIDS) specification to 2D and 3D microscopy images of histological samples, both ex vivo and in vivo. In this context, we were advised to split the BEP into smaller separate PRs when possible.

Contribution

The purpose of this PR is to add a new “sample” entity and samples.tsv/json files with required column (sample_type) and recommended columns (pathology, derived_from).

The "sample" entity is described as:

A sample pertaining to a subject such as tissue, primary cell or cell-free sample.
The `sample-<label>` key/value pair is used to distinguish between different samples from the same subject.
The label MUST be unique per subject and is RECOMMENDED to be unique throughout the dataset.

See issue #779 for related discussions on this topic.
Related PR #816 for new columns in participants.tsv file.

@satra
Copy link
Collaborator

satra commented May 31, 2021

@mariehbourget - shouldn't this PR also describe a samples.tsv file providing additional details about a sample?

@sappelhoff sappelhoff added the BEP label Jun 1, 2021
@mariehbourget
Copy link
Collaborator Author

@mariehbourget - shouldn't this PR also describe a samples.tsv file providing additional details about a sample?

Hi @satra,
From previous discussions in the BEP031 meeting and with the maintainers, and from the comments in #779 (here, here and here), my understanding was that dealing with the sample entity and the samples.tsv file in different PRs was preferred. So the plan is to open 2 separate PRs: 1) sample entity 2) participants.tsv and samples.tsv.

@satra
Copy link
Collaborator

satra commented Jun 1, 2021

they would likely have to be merged at the same time, so perhaps doing it together instead of linking would work better for review. @bids-standard/maintainers thoughts?

@mariehbourget
Copy link
Collaborator Author

they would likely have to be merged at the same time, so perhaps doing it together instead of linking would work better for review. @bids-standard/maintainers thoughts?

I suggest to add the samples.tsv file description in this PR, and open another PR for the additional recommended columns in the participants.tsv file.

@effigies
Copy link
Collaborator

effigies commented Jun 2, 2021

Sounds good to me.

@mariehbourget mariehbourget changed the title [ENH] BEP031 - New entity: sample [ENH] BEP031 - New entity: sample and samples.tsv file Jun 4, 2021
@mariehbourget
Copy link
Collaborator Author

Hi everyone!
As discussed, this PR now includes the description of the new samples.tsv file.
A new PR #816 is also open for the addition of recommended/optional columns in participants.tsv.

@satra
Copy link
Collaborator

satra commented Jun 5, 2021

@mariehbourget - this looks good. just a few clarification comments.

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented Jun 5, 2021

Couple of open questions that relate more to where those changes should go rather than about their content:

Currently those changes read to me like the sample entity will be made "available" for ALL modalities.


Describing it in the common principles puts it at the same level as the other entities that are described there and that are pretty much available to all modalities (subject, session, run). Though this is not the case for event so there is a precedent although it is a suffix not an entity.

Should we possibly only have the entity defined only the in the entity table? Or do we feel that sample is a concept important enough that it deserves to be defined in the common principle?

I think a good case can be made for the latter but I just want to make sure we are on the same page.


Similarly putting the description of the samples.tsv in the modality agnostic section feels strange. Sample will be used for microscopy and ephys. Therefore samples.tsv is not like the scans.tsv or sessions.tsv (the latter is being transferred into the modality agnostic section in another PR).

So maybe it should go somewhere else?

What has been done so far is to duplicate, into their respective page, a description of the file: for example with the electrodes.tsv and channels.tsv files in EEG, MEG, iEEG (also because they might have slightly different requirements). See also my regular mentions of the details about the task entity being repeated across several pages. #IAmABrokenRecord

As a user or reader of the spec this is usually fine because this concentrates all the info about a modality in a single page. As maintainer I am not a big fan of duplication.

The other problem is that we don't even have a microscopy or ephys page so far to put this in.

I would not be against the creation of a new page where we can put information about things that are shared by some but not all modalities: this could definitely help with "refactoring".

That latter point would probably deserves its own issue and PR, so feel free to disregard this for this PR.

Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick thoughts. I'll try to have a more detailed read this week.

src/03-modality-agnostic-files.md Outdated Show resolved Hide resolved
src/03-modality-agnostic-files.md Show resolved Hide resolved
src/03-modality-agnostic-files.md Outdated Show resolved Hide resolved
src/03-modality-agnostic-files.md Outdated Show resolved Hide resolved
@mariehbourget
Copy link
Collaborator Author

mariehbourget commented Jun 7, 2021

@Remi-Gau, thank you for the feedback about where those changes should go, those are important questions and I’m glad to discuss them here.

Should we possibly only have the entity defined only the in the entity table? Or do we feel that sample is a concept important enough that it deserves to be defined in the common principle?

The rationale for this was twofold:

  1. It was discussed in (Add sample entity and samples.tsv file #779 (comment)) and in meetings that sample could eventually be used by modalities other than microscopy/ephys. For example, sample could be scans from different body part of the same subject in anat MRI.
  2. Moreover, sample is closely related to subject because it identifies “what” is scanned rather than “how” (e.g.: acq, task, ce). To preserve the BIDS subject definition, we added the sample entity. When sample is used, it is the combination of sub-id + sample-id that serves as the “unique identifier” for the specimen scanned.

On the other hand, since no modality is using the sample entity yet in the spec, we could add the entity in the appendix list, and postpone its integration to the common principles for when microscopy is merged.

Similarly putting the description of the samples.tsv in the modality agnostic section feels strange. Sample will be used for microscopy and ephys. Therefore samples.tsv is not like the scans.tsv or sessions.tsv

For the same reason as above, I put samples.tsv in the modality agnostic files section. Another particularity is that the samples.tsv file is global and sits at the root of the dataset (unlike channels.tsv or electrodes.tsv).

Finally, as you mentioned, without an official page for microscopy yet, that information has nowhere to go! I am not against creating a page for microscopy with only the sample description and samples.tsv file for now. That page could eventually become the official microscopy page and the samples info could be moved to the page “about things that are shared by some but not all modalities” you suggested.

So I’m open to make changes to facilitate the integration and maintenance of the specs but I'm not sure what is best, let me know your thoughts. Thanks!

@SylvainTakerkart
Copy link

For the same reason as above, I put samples.tsv in the modality agnostic files section. Another particularity is that the samples.tsv file is global and sits at the root of the dataset (unlike channels.tsv or electrodes.tsv).

+1 on listing the samples.tsv file in the modality-agnostic files section ; on top of @mariehbourget 's arguments, I think any future modality that can be recorded in animal models will probably use this file...

Co-authored-by: Chris Markiewicz <effigies@gmail.com>
@mariehbourget
Copy link
Collaborator Author

Thank you @effigies and @Remi-Gau for your feedback, I left a few suggestions and questions regarding the validation and BIDS-compliance, let me know what you think and I’ll make the appropriate changes to the PR. Thanks!

@mariehbourget
Copy link
Collaborator Author

Hi @bids-standard/maintainers,
The doc build failed after my last change 2 weeks ago. I’m not sure why this error occurs, I can build the doc locally with mkdocs serve. I was wondering if there is a way that I can re-trigger the build of the doc or correct the error in the PR? Thanks!

@sappelhoff
Copy link
Member

yes, you can merge master into your branch, these issues have been fixed on there already. Or you can rebase your changes on top of master and force push.

Look at this screenshot:

image

you are 37 commits behind master, that should be removed so that you are even with master

Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Collaborator

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone who worked on this BEP -- looks great. My only concern is a bit of "special casing" samples as to be promoted to the top level, although it is "per-subject" level description.

FWIW: Following my "generalization blurb" at #751 (comment) , I still think that it might have been more consistent if as long as we do not have sample-<sample_id>/ top level directories, to have sub-<label>/sub-<label>_samples.{tsv,json} (and with inheritance principle do allow for samples.json on top) analogous to _scans.{tsv,json} we have.

But if, as proposed in this PR, subject_id column is retained in such *samples.tsv at any level -- we "kinda have it", just that this PR promotes it to the top level via inheritance principle, so no conflict for the future if we do decide to support sub-<label>/sub-<label>_samples.{tsv,json} etc. Thus -- approve! ;)

@effigies
Copy link
Collaborator

@yarikoptic I initially (#779 (comment)) preferred using a per-subject table or the for a global table that applied to all via the inheritance principle, but the BEP working group argued that a global table would need to be recomputed by every tool, so dropped the objection (#779 (comment)).

@mariehbourget
Copy link
Collaborator Author

Thank you everyone for the reviews!
I’ll tag @SylvainTakerkart and @lazaral here if they want to take a look at the latest changes.

Also, this is our first PR for BEP031, are there other things we have to do before the merge? Thanks!

@lazaral
Copy link

lazaral commented Jul 18, 2021

This looks great to me, and very much in line with the discussions of the BEP031 working group. Thank you @mariehbourget and all the BIDS maintainers who helped with it. Exciting to see the first PR from BEP031 is about to come to life!

@effigies effigies merged commit 1323f23 into bids-standard:master Jul 26, 2021
@CPernet
Copy link
Collaborator

CPernet commented Jul 27, 2021

@mariehbourget @jcohenadad @SylvainTakerkart @JuliaSprenger @satra @Remi-Gau @sappelhoff is there any overlap with https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/08-genetic-descriptor.html

SampleOrigin: Describes from which tissue the genetic information was extracted
TissueOrigin: Describes the type of tissue analyzed for SampleOrigin value 'brain'
BrainLocation: Refers to the location in space of the TissueOrigin
CellType: Describes the type of cell analyzed

@mariehbourget
Copy link
Collaborator Author

is there any overlap with https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/08-genetic-descriptor.html

Hi @CPernet,
I double-checked the genetic descriptor part of the spec and I don’t see any overlaps.
If I understand correctly, the genetic information in genetic_info.json applies to all participants, whereas the sample-<label> entity from this PR is meant to distinguish between different samples from the same participant.

For example, a participant sub-01 could have 2 tissue samples extracted (sample-01 and sample-02). Each sample would be imaged by a microscope and the sample-<label> is used in the filename of those images, ex: sub-01_sample-01_TEM.png, sub-01_sample-02_TEM.png.

The samples.tsv file is meant to list all the different samples present in the dataset with their attributes. In particular, the column sample_type which describe the type of sample (ex: tissue, primary cell, cell-free sample). This is similar to the CellType field in genetic_info.json but does not directly overlap as it is an attribute of the samples that are imaged and not the cells used for genetic analysis.

There are no suggested columns in samples.tsv similar to SampleOrigin, TissueOrigin or BrainLocation from genetic_info.json. In the Microscopy BEP, we use the metadata fields BodyPart and BodyPartDetails to describe the tissue location (PET also uses BodyPart). However, those fields will be included in the JSON sidecar files associated to the image files (for each sample of each participant).

Please let me know if that answer your question or if your have any other concerns.

@CPernet
Copy link
Collaborator

CPernet commented Jul 28, 2021

ok cool - I just wanted to make sure the names and possibly ontologies to pick information from are in agreement (and by that I mean we can amend genetic if that's easier, re cell type for instance) - thx

@effigies
Copy link
Collaborator

Definitely a good thing to keep genetics concepts in mind as we add microscopy and electrophysiology. Thanks for the nudge, @CPernet!

@satra
Copy link
Collaborator

satra commented Jul 28, 2021

i think the sample stuff can be pulled over into the genetic side too, or the columns of samples.tsv augmented with some of the keys. spatial proteomics and transcriptomics across tissues/samples from humans and others are going to share a bunch of things.

@mariehbourget mariehbourget mentioned this pull request Sep 16, 2021
8 tasks
@mariehbourget mariehbourget mentioned this pull request Oct 6, 2021
2 tasks
@tsalo tsalo mentioned this pull request Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants