-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommendation 4: too specific (SRA/Genbank)? #10
Comments
Hi Brian,
Traveling so responding on my phone, so brief for now. Recommendation 4 came specifically from the minimal standards group. That is their recommendation which we incorporated for consistency. So this may be an issue you want to raise with that group. Thanks!
|
Hy Brian & Lindsay Although the reference implementation the MiniStd WG is working on is based on SRA & Genbank, I do not see any general reasons to object against Brian's changes. The main points (i.e. free and open deposition of the sequence data in a public DB that has long-term maintenance) will be served by any of the INSDC databases. In addition, when thinking about data sets that require controlled access, for EU-based depositors it will be simpler to go for EGA than for dbGAP. The devil is - as usual - in the details and in this case it is the metadata mapping, which is not uniform for INSDC once you go beyond the "flat file". Thus ENA's data scheme differs slightly from the one of NCBI. I asked the ENA helpdesk about this end of May:
On which their answer was:
We have not yet found the time to come up with a mapping and it is not our top priority right now. So in summary, yes we should broaden recommendation 4 to all INSDC DB's, but keep in mind that the current implementation only supports SRA/Genbank. |
I think that makes sense, recognizing that there is the "principle" of having the data in the INSDC DB and the implementation, which is having a mechanism/process to upload data to a specific one of those DBs that meets AIRR minimal standards. The implementation will almost always lag behind the principle, and I think that is OK... If we agree that this makes sense, we are agreeing that the data can reside in any of the INSDC repositories and that the AIRR community will work with them, over time, to come up with processes for those repositories to enable uploading data easily. The current status of our implementation of such processes are: SRA/GenBank templates done, other templates are on the roadmap - but as Christian says, not a high priority right now. I think agreeing with this means that we are adding scope to the Minimal Standards Working Group in that we are saying that the community, managed through the MSWG, should come up with a mechanism to make it easy to load AIRR data into ENA etc... As Lindsay says, this does need to get tabled for discussion at the AIRR MSWG. |
Thanks Christian. I agree, but I think MS would have to broaden theirs and then we would modify to be consistent.
|
I have created an issue with Minimal Standard in this regard... |
Please see commit c8e751a for altered wording. |
Hi All,
In reading the recommendations and seeing how they apply to iReceptor, it occurred to me that Recommendation 4 might be to specific - that is specifying explicitly SRA and Genbank and only those... Although I am not an expert about other repositories (nor these ones) it seems that this is very narrow and somewhat North America specific. Would it make more sense to have something like this:
Recommendation 4: For long-term storage, data and metadata should be deposited in one of the International Nucleotide Sequence Database Collaboration (INSDC) archives such as SRA, Genbank, and ENA, per the recommendations established by the AIRR Minimal Standards Working Group. The AIRR Working Groups should work with the INSDC archives to coordinate the accurate gathering and storage of metadata for AIRR data.
In this way, we are recommending that data be published in one of the recognized national/international repositories but not telling people "exactly" what to do. If INSDC has another collaborator soon, then that should be a reasonable option. As long as the second phrase is there, and the AIRR Community works with the repositories to ensure there are easy mechanisms to store data (as has been done with SRA and Genbank), then this should be fine...
The text was updated successfully, but these errors were encountered: