Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] BEP018: Genetic information #287

Closed
wants to merge 31 commits into from
Closed

Conversation

@CPernet
Copy link
Collaborator

CPernet commented Jul 31, 2019

I created the md file, and this looks fine
one issue, the json examples don't go the next line (despite spaces ??)

franklin-feingold and others added 6 commits Jul 25, 2019
update 7/25
updates
cyril.pernet@ed.ac.uk
@franklin-feingold

This comment has been minimized.

Copy link
Collaborator

franklin-feingold commented Jul 31, 2019

Hi @CPernet,

Thank you for converting and opening this PR! We can work on getting the formatting passed. Travis flagged a number of these formatting issues too. These issues could have resulted in the behavior you are seeing with the json examples. I can work on opening a PR on your fork

Edit: opened to render the examples and pass Travis

Edit2: flagging @bids-standard/everyone may be interested in reviewing!

@franklin-feingold franklin-feingold changed the title genetic extension [ENH] BEP018: Genetic information Aug 1, 2019
franklin-feingold and others added 3 commits Aug 1, 2019
BEP018 update
@CPernet

This comment has been minimized.

Copy link
Collaborator Author

CPernet commented Aug 1, 2019

thx @franklin-feingold i think i fixed it (just pushed it in my fork)
--> atom beta whitespace package was being annoying ...
--> travis returned an issue with table ending which i did forget

Copy link
Collaborator

sappelhoff left a comment

Thanks for opening the PR @CPernet - I made some comments regarding the formatting.


## dataset_description.json

Two additional keys related to the genetic data can be added. The Key `GeneticDataBase` (MANDATORY) links to the name of the database and web address. The key `GeneticDescriptor` (OPTIONAL) refers to the descriptor (e.g. journal article) of the genetic data.

This comment has been minimized.

Copy link
@sappelhoff

sappelhoff Aug 1, 2019

Collaborator

I suggest to revise this paragraph to use our words MUST(=mandatory), SHOULD(=recommended), and MAY(=optional), see rfc2119.

This would also solve the issue that the first sentence is a bit misleading, because it says "two keys can be added", which sounds like both are optional

This comment has been minimized.

Copy link
@rwblair

rwblair Aug 1, 2019

Member

Maybe GeneticDataBase could be mandatory if 'GeneticID' is present in any tsv files or if a genetic_info.json file is present.

This comment has been minimized.

Copy link
@CPernet

CPernet Aug 8, 2019

Author Collaborator

The Key GeneticDataBase MUST be added to link to the name of the database and web address. The key GeneticDescriptor MAY also be present refering to the descriptor (e.g. journal article) of the genetic data.

This comment has been minimized.

Copy link
@CPernet

CPernet Aug 8, 2019

Author Collaborator

'GeneticID' is not mandatory if the imaging repo and genetic repo use the same ID (which the validator cannot check)


## genetic_info.json

This file is the descriptor of the genetic information available either in the participant tsv file and/or the genetic database described in the dataset_description.json. The 'GeneticLevel' and 'SampleOrigin' are the only two mandatory fields.

This comment has been minimized.

Copy link
@sappelhoff

sappelhoff Aug 1, 2019

Collaborator
Suggested change
This file is the descriptor of the genetic information available either in the participant tsv file and/or the genetic database described in the dataset_description.json. The 'GeneticLevel' and 'SampleOrigin' are the only two mandatory fields.
This file is the descriptor of the genetic information available either in the participant tsv file and/or the genetic database described in the dataset_description.json. The `GeneticLevel` and `SampleOrigin` are the only two mandatory fields.

Use backticks to format field names

| Field name | Definition | Values |
| :----------- | :--------- | :------|
| GeneticLevel | MANDATORY Describes the level of analysis | `Genetic`, `Genomic`, `Epigenomic`, `Transcriptomic`, `Metabolomic`, or `Proteomic` |
| AnalyticalApproach | OPTIONAL Methodology used to analyse the GeneticLevel | Value must be taken from [gapsolr](https://www.ncbi.nlm.nih.gov/projects/gapsolr/facets.html) under /Study/Molecular Data Type, for instance `SNP Genotypes (Array)` or `Methylation (CpG)` |

This comment has been minimized.

Copy link
@sappelhoff

sappelhoff Aug 1, 2019

Collaborator

When you give an example of a filepath in text, please format that path using backticks like: /this/is/a/path

This comment has been minimized.

Copy link
@CPernet

CPernet Aug 8, 2019

Author Collaborator

don't got that? i used eg /Study/Molecular what's wrong

@franklin-feingold

This comment has been minimized.

Copy link
Collaborator

franklin-feingold commented Aug 1, 2019

Travis is flagging some of the formating. This can be resolved after the formatting is reviewed. I can assist with that at that time

CPernet and others added 2 commits Aug 8, 2019
Co-Authored-By: Franklin Feingold <35307458+franklin-feingold@users.noreply.github.com>
Co-Authored-By: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
@CPernet

This comment has been minimized.

Copy link
Collaborator Author

CPernet commented Aug 8, 2019

making changes in my fork

@CPernet CPernet closed this Aug 8, 2019
cyril.pernet@ed.ac.uk
@CPernet CPernet reopened this Aug 8, 2019
@CPernet

This comment has been minimized.

Copy link
Collaborator Author

CPernet commented Aug 8, 2019

@franklin-feingold @sappelhoff the build has work (i think) -- the only stuff i did not do was the / in the table, since that already how this was ??

@sappelhoff

This comment has been minimized.

Copy link
Collaborator

sappelhoff commented Aug 8, 2019

the only stuff i did not do was the / in the table, since that already how this was ??

The table fences will have to be fixed :-) could you do that please?

@franklin-feingold

This comment has been minimized.

Copy link
Collaborator

franklin-feingold commented Aug 8, 2019

@CPernet should have fixed the formatting issue in my branch (https://github.com/franklin-feingold/bids-specification/blob/enh/genetics/src/04-modality-specific-files/08-genetic-descriptor.md) . My PR in your repo has some conflicts so perhaps can be easiest to grab my file and bring it over to your repo

CPernet added 2 commits Aug 9, 2019
Fix Travis and spacing - merged changes
@CPernet

This comment has been minimized.

Copy link
Collaborator Author

CPernet commented Aug 10, 2019

@franklin-feingold do you know why Travis failed? couldn't figure from 'details'

Copy link
Collaborator

sappelhoff left a comment

@CPernet See my suggestions: Travis (=remark-lint) does not like too many empty lines :-)

Co-Authored-By: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
@CPernet

This comment has been minimized.

Copy link
Collaborator Author

CPernet commented Aug 13, 2019

brilliant - I was gonna (reluctantly) do the validator after the example ; thx @rwblair

@emdupre

This comment has been minimized.

Copy link
Collaborator

emdupre commented Aug 13, 2019

I do acknowledge however, that it seems to be hard to get more feedback on this BEP ... we announced it in several places but so far, no-one has chimed in.

August is a particularly hard month for feedback -- I'm writing this as I'm packing for a flight ! I think you'll be likely to see more community input in September.

All this to say, I'd be disappointed to see this merged before at least a few external members have chimed in. I've sent it to one of my colleagues working with genetic data, but they're on vacation ! So September seems safer to me.

Copy link
Collaborator

effigies left a comment

I've added some specific comments, but on the whole I think this could do with an edit for terminological clarity. For instance, database and dataset are used somewhat interchangeably, and we use descriptor to refer to both the entire section, a .json file and an associated publication.

Possibly it would be useful to settle on specific vocabulary, introduce it either at the start of the section or at the end, and go through to conform.

CPernet and others added 5 commits Aug 13, 2019
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
Copy link

AlexandreHutton left a comment

The dataset_description.json needs to be clarified. It appears inconsistent with the given example (and other references to it later in the document).
Does this spec allow for the data to be split across multiple files (e.g. by chromosome)? Is there supposed to be a convention for naming the genetics data?
I tried to see if it would be possible to apply this spec to imaging + genetic data I can access (UK Biobank), and it's unclear whether I could make the genetics data compliant with this.

@rwblair

This comment has been minimized.

Copy link
Member

rwblair commented Aug 27, 2019

@CPernet Here's the start of the bids-validator PR:
bids-standard/bids-validator#828

This is checks for 'GeneticDatabase' in dataset_descrtiption.json if a genetic_info.json is present at the top level. It also add a json schema to validate genetic_info.json files.

Issues so far:

  • The dataset_descriprion validation uses the flat structure first specified in this bep, not the nested one suggested by @AlexandreHutton. Should I implement the nested structure?
  • It is checking for 'GeneticDatabase' instead of the 'GeneticDataBase' listed in the spec, I don't think we want that 3rd capitalized letter.
  • It is only looking for genetic_info.json in the root level of the project. I wanted to confirm do we want multiple genetic_info.json files to be able to appear at any level, and assume that they are associated with the closest participants.tsv in the hierarchy? If this is the case I need to spend some time looking at how to test and enforce this.

Since this conversation is getting complex I'd be happy to continue the validator talk in the validator PR.

CPernet and others added 3 commits Oct 24, 2019
Co-Authored-By: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
Co-Authored-By: Chris Markiewicz <effigies@gmail.com>
@CPernet

This comment has been minimized.

Copy link
Collaborator Author

CPernet commented Oct 24, 2019

@effigies i pushed the last changes but travis fails, can't figure out why? help please

CPernet and others added 2 commits Oct 24, 2019
Copy link
Collaborator Author

CPernet left a comment

still fails :-(
thx anyway

@rwblair

This comment has been minimized.

Copy link
Member

rwblair commented Oct 24, 2019

@CPernet I made a PR from my personal fork to your fork that might fix the problem. The github editor was too finicky for getting the spacing right.

couldn't fix alignment in the GH editor
@CPernet

This comment has been minimized.

Copy link
Collaborator Author

CPernet commented Oct 25, 2019

Success! thx - these trailings are so annoying

Copy link
Collaborator Author

CPernet left a comment

@sappelhoff @effigies @franklin-feingold seems ready now? needs approval
we are making examples for the repo + validator testing

@franklin-feingold

This comment has been minimized.

Copy link
Collaborator

franklin-feingold commented Oct 29, 2019

sounds good! perhaps for the specification and validator/examples to stay in lock step, the associated materials (e.g., validator) can be prepared so everything continues to stay in lock?

what do you all think?

@CPernet

This comment has been minimized.

Copy link
Collaborator Author

CPernet commented Oct 29, 2019

yes, since I PR to merge to master - better have the validator tested, good point

…ile.
@effigies

This comment has been minimized.

Copy link
Collaborator

effigies commented Jan 15, 2020

I've merged master, pushed this to the bep018 and added a ReadTheDocs build so its rendering can be seen: https://bids-specification.readthedocs.io/en/bep018/

@CPernet If it's okay with you, could we switch to that branch for the BEP018 PR? It will make making suggestions easier.

@effigies

This comment has been minimized.

Copy link
Collaborator

effigies commented Jan 25, 2020

Closing in favor of #395.

@effigies effigies closed this Jan 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants
You can’t perform that action at this time.