Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] BEP018: Genetic information #287

Closed
wants to merge 31 commits into from
Closed

[ENH] BEP018: Genetic information #287

wants to merge 31 commits into from

Conversation

CPernet
Copy link
Collaborator

@CPernet CPernet commented Jul 31, 2019

I created the md file, and this looks fine
one issue, the json examples don't go the next line (despite spaces ??)

@franklin-feingold
Copy link
Collaborator

franklin-feingold commented Jul 31, 2019

Hi @CPernet,

Thank you for converting and opening this PR! We can work on getting the formatting passed. Travis flagged a number of these formatting issues too. These issues could have resulted in the behavior you are seeing with the json examples. I can work on opening a PR on your fork

Edit: opened to render the examples and pass Travis

Edit2: flagging @bids-standard/everyone may be interested in reviewing!

@franklin-feingold franklin-feingold changed the title genetic extension [ENH] BEP018: Genetic information Aug 1, 2019
@CPernet
Copy link
Collaborator Author

CPernet commented Aug 1, 2019

thx @franklin-feingold i think i fixed it (just pushed it in my fork)
--> atom beta whitespace package was being annoying ...
--> travis returned an issue with table ending which i did forget

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening the PR @CPernet - I made some comments regarding the formatting.

src/04-modality-specific-files/08-genetic-descriptor.md Outdated Show resolved Hide resolved

## dataset_description.json

Two additional keys related to the genetic data can be added. The Key `GeneticDataBase` (MANDATORY) links to the name of the database and web address. The key `GeneticDescriptor` (OPTIONAL) refers to the descriptor (e.g. journal article) of the genetic data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to revise this paragraph to use our words MUST(=mandatory), SHOULD(=recommended), and MAY(=optional), see rfc2119.

This would also solve the issue that the first sentence is a bit misleading, because it says "two keys can be added", which sounds like both are optional

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe GeneticDataBase could be mandatory if 'GeneticID' is present in any tsv files or if a genetic_info.json file is present.

Copy link
Collaborator Author

@CPernet CPernet Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Key GeneticDataBase MUST be added to link to the name of the database and web address. The key GeneticDescriptor MAY also be present refering to the descriptor (e.g. journal article) of the genetic data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'GeneticID' is not mandatory if the imaging repo and genetic repo use the same ID (which the validator cannot check)

src/04-modality-specific-files/08-genetic-descriptor.md Outdated Show resolved Hide resolved

## genetic_info.json

This file is the descriptor of the genetic information available either in the participant tsv file and/or the genetic database described in the dataset_description.json. The 'GeneticLevel' and 'SampleOrigin' are the only two mandatory fields.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This file is the descriptor of the genetic information available either in the participant tsv file and/or the genetic database described in the dataset_description.json. The 'GeneticLevel' and 'SampleOrigin' are the only two mandatory fields.
This file is the descriptor of the genetic information available either in the participant tsv file and/or the genetic database described in the dataset_description.json. The `GeneticLevel` and `SampleOrigin` are the only two mandatory fields.

Use backticks to format field names

| Field name | Definition | Values |
| :----------- | :--------- | :------|
| GeneticLevel | MANDATORY Describes the level of analysis | `Genetic`, `Genomic`, `Epigenomic`, `Transcriptomic`, `Metabolomic`, or `Proteomic` |
| AnalyticalApproach | OPTIONAL Methodology used to analyse the GeneticLevel | Value must be taken from [gapsolr](https://www.ncbi.nlm.nih.gov/projects/gapsolr/facets.html) under /Study/Molecular Data Type, for instance `SNP Genotypes (Array)` or `Methylation (CpG)` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you give an example of a filepath in text, please format that path using backticks like: /this/is/a/path

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't got that? i used eg /Study/Molecular what's wrong

@franklin-feingold
Copy link
Collaborator

Travis is flagging some of the formating. This can be resolved after the formatting is reviewed. I can assist with that at that time

Cyril Pernet and others added 2 commits August 8, 2019 16:05
Co-Authored-By: Franklin Feingold <35307458+franklin-feingold@users.noreply.github.com>
Co-Authored-By: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
@CPernet
Copy link
Collaborator Author

CPernet commented Aug 8, 2019

making changes in my fork

@CPernet CPernet closed this Aug 8, 2019
@CPernet CPernet reopened this Aug 8, 2019
@CPernet
Copy link
Collaborator Author

CPernet commented Aug 8, 2019

@franklin-feingold @sappelhoff the build has work (i think) -- the only stuff i did not do was the / in the table, since that already how this was ??

@sappelhoff
Copy link
Member

the only stuff i did not do was the / in the table, since that already how this was ??

The table fences will have to be fixed :-) could you do that please?

@franklin-feingold
Copy link
Collaborator

@CPernet should have fixed the formatting issue in my branch (https://github.com/franklin-feingold/bids-specification/blob/enh/genetics/src/04-modality-specific-files/08-genetic-descriptor.md) . My PR in your repo has some conflicts so perhaps can be easiest to grab my file and bring it over to your repo

@CPernet
Copy link
Collaborator Author

CPernet commented Aug 10, 2019

@franklin-feingold do you know why Travis failed? couldn't figure from 'details'

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CPernet See my suggestions: Travis (=remark-lint) does not like too many empty lines :-)

Co-Authored-By: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
@emdupre
Copy link
Collaborator

emdupre commented Aug 13, 2019

I do acknowledge however, that it seems to be hard to get more feedback on this BEP ... we announced it in several places but so far, no-one has chimed in.

August is a particularly hard month for feedback -- I'm writing this as I'm packing for a flight ! I think you'll be likely to see more community input in September.

All this to say, I'd be disappointed to see this merged before at least a few external members have chimed in. I've sent it to one of my colleagues working with genetic data, but they're on vacation ! So September seems safer to me.

Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some specific comments, but on the whole I think this could do with an edit for terminological clarity. For instance, database and dataset are used somewhat interchangeably, and we use descriptor to refer to both the entire section, a .json file and an associated publication.

Possibly it would be useful to settle on specific vocabulary, introduce it either at the start of the section or at the end, and go through to conform.

Cyril Pernet and others added 5 commits August 13, 2019 18:43
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Copy link

@AlexandreHutton AlexandreHutton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataset_description.json needs to be clarified. It appears inconsistent with the given example (and other references to it later in the document).
Does this spec allow for the data to be split across multiple files (e.g. by chromosome)? Is there supposed to be a convention for naming the genetics data?
I tried to see if it would be possible to apply this spec to imaging + genetic data I can access (UK Biobank), and it's unclear whether I could make the genetics data compliant with this.

@rwblair
Copy link
Member

rwblair commented Aug 27, 2019

@CPernet Here's the start of the bids-validator PR:
bids-standard/bids-validator#828

This is checks for 'GeneticDatabase' in dataset_descrtiption.json if a genetic_info.json is present at the top level. It also add a json schema to validate genetic_info.json files.

Issues so far:

  • The dataset_descriprion validation uses the flat structure first specified in this bep, not the nested one suggested by @AlexandreHutton. Should I implement the nested structure?
  • It is checking for 'GeneticDatabase' instead of the 'GeneticDataBase' listed in the spec, I don't think we want that 3rd capitalized letter.
  • It is only looking for genetic_info.json in the root level of the project. I wanted to confirm do we want multiple genetic_info.json files to be able to appear at any level, and assume that they are associated with the closest participants.tsv in the hierarchy? If this is the case I need to spend some time looking at how to test and enforce this.

Since this conversation is getting complex I'd be happy to continue the validator talk in the validator PR.

Cyril Pernet and others added 3 commits October 24, 2019 19:59
Co-Authored-By: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
@CPernet
Copy link
Collaborator Author

CPernet commented Oct 24, 2019

@effigies i pushed the last changes but travis fails, can't figure out why? help please

Copy link
Collaborator Author

@CPernet CPernet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still fails :-(
thx anyway

@rwblair
Copy link
Member

rwblair commented Oct 24, 2019

@CPernet I made a PR from my personal fork to your fork that might fix the problem. The github editor was too finicky for getting the spacing right.

couldn't fix alignment in the GH editor
@CPernet
Copy link
Collaborator Author

CPernet commented Oct 25, 2019

Success! thx - these trailings are so annoying

Copy link
Collaborator Author

@CPernet CPernet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sappelhoff @effigies @franklin-feingold seems ready now? needs approval
we are making examples for the repo + validator testing

@franklin-feingold
Copy link
Collaborator

sounds good! perhaps for the specification and validator/examples to stay in lock step, the associated materials (e.g., validator) can be prepared so everything continues to stay in lock?

what do you all think?

@CPernet
Copy link
Collaborator Author

CPernet commented Oct 29, 2019

yes, since I PR to merge to master - better have the validator tested, good point

@effigies
Copy link
Collaborator

I've merged master, pushed this to the bep018 and added a ReadTheDocs build so its rendering can be seen: https://bids-specification.readthedocs.io/en/bep018/

@CPernet If it's okay with you, could we switch to that branch for the BEP018 PR? It will make making suggestions easier.

@effigies
Copy link
Collaborator

Closing in favor of #395.

@effigies effigies closed this Jan 25, 2020
@Remi-Gau Remi-Gau added the BEP label Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants