Recommendation 4: too specific (SRA/Genbank)? #10

bcorrie · 2017-09-11T19:37:04Z

Hi All,
In reading the recommendations and seeing how they apply to iReceptor, it occurred to me that Recommendation 4 might be to specific - that is specifying explicitly SRA and Genbank and only those... Although I am not an expert about other repositories (nor these ones) it seems that this is very narrow and somewhat North America specific. Would it make more sense to have something like this:

Recommendation 4: For long-term storage, data and metadata should be deposited in one of the International Nucleotide Sequence Database Collaboration (INSDC) archives such as SRA, Genbank, and ENA, per the recommendations established by the AIRR Minimal Standards Working Group. The AIRR Working Groups should work with the INSDC archives to coordinate the accurate gathering and storage of metadata for AIRR data.

In this way, we are recommending that data be published in one of the recognized national/international repositories but not telling people "exactly" what to do. If INSDC has another collaborator soon, then that should be a reasonable option. As long as the second phrase is there, and the AIRR Community works with the repositories to ensure there are easy mechanisms to store data (as has been done with SRA and Genbank), then this should be fine...

lgcowell · 2017-09-11T20:06:24Z

Hi Brian, Traveling so responding on my phone, so brief for now. Recommendation 4 came specifically from the minimal standards group. That is their recommendation which we incorporated for consistency. So this may be an issue you want to raise with that group. Thanks!

bussec · 2017-09-11T20:50:53Z

Hy Brian & Lindsay

Although the reference implementation the MiniStd WG is working on is based on SRA & Genbank, I do not see any general reasons to object against Brian's changes. The main points (i.e. free and open deposition of the sequence data in a public DB that has long-term maintenance) will be served by any of the INSDC databases. In addition, when thinking about data sets that require controlled access, for EU-based depositors it will be simpler to go for EGA than for dbGAP.

The devil is - as usual - in the details and in this case it is the metadata mapping, which is not uniform for INSDC once you go beyond the "flat file". Thus ENA's data scheme differs slightly from the one of NCBI. I asked the ENA helpdesk about this end of May:

[We have] completed the mapping of the [MiniStd items] to NCBI's BioProject/BioSample system. However, ENA's metadata structure (studies, experiment, sample, run) seems to a bit different. Therefore I wanted to ask whether there is already any existing scheme for mapping metadata between the two databases.

On which their answer was:

It turns out there is no easy way of doing this. However, every of the ENA SRA studies/samples has a BioProject/BioSample equivalent in NCBI, so de facto you could extract mapping rules from public metadata XMLs.

We have not yet found the time to come up with a mapping and it is not our top priority right now.

So in summary, yes we should broaden recommendation 4 to all INSDC DB's, but keep in mind that the current implementation only supports SRA/Genbank.

bcorrie · 2017-09-11T23:18:07Z

I think that makes sense, recognizing that there is the "principle" of having the data in the INSDC DB and the implementation, which is having a mechanism/process to upload data to a specific one of those DBs that meets AIRR minimal standards. The implementation will almost always lag behind the principle, and I think that is OK...

If we agree that this makes sense, we are agreeing that the data can reside in any of the INSDC repositories and that the AIRR community will work with them, over time, to come up with processes for those repositories to enable uploading data easily.

The current status of our implementation of such processes are: SRA/GenBank templates done, other templates are on the roadmap - but as Christian says, not a high priority right now.

I think agreeing with this means that we are adding scope to the Minimal Standards Working Group in that we are saying that the community, managed through the MSWG, should come up with a mechanism to make it easy to load AIRR data into ENA etc...

As Lindsay says, this does need to get tabled for discussion at the AIRR MSWG.

lgcowell · 2017-09-12T01:22:48Z

Thanks Christian. I agree, but I think MS would have to broaden theirs and then we would modify to be consistent.

bcorrie · 2017-10-27T18:41:12Z

I have created an issue with Minimal Standard in this regard...

airr-community/airr-standards#45

bussec · 2017-12-13T00:31:51Z

Please see commit c8e751a for altered wording.

bcorrie changed the title ~~Recommendation 4 to specific?~~ Recommendation 4 to specific (SRA/Genbank)? Sep 11, 2017

bcorrie changed the title ~~Recommendation 4 to specific (SRA/Genbank)?~~ Recommendation 4: too specific (SRA/Genbank)? Sep 11, 2017

bcorrie mentioned this issue Oct 27, 2017

Ensuring MiAIRR is not NCBI specific - RE: CRWG airr-community/airr-standards#45

Closed

lgcowell closed this as completed Dec 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommendation 4: too specific (SRA/Genbank)? #10

Recommendation 4: too specific (SRA/Genbank)? #10

bcorrie commented Sep 11, 2017

lgcowell commented Sep 11, 2017 via email •

edited by bussec

Loading

bussec commented Sep 11, 2017 •

edited

Loading

bcorrie commented Sep 11, 2017

lgcowell commented Sep 12, 2017 via email •

edited by bussec

Loading

bcorrie commented Oct 27, 2017

bussec commented Dec 13, 2017

Recommendation 4: too specific (SRA/Genbank)? #10

Recommendation 4: too specific (SRA/Genbank)? #10

Comments

bcorrie commented Sep 11, 2017

lgcowell commented Sep 11, 2017 via email • edited by bussec Loading

bussec commented Sep 11, 2017 • edited Loading

bcorrie commented Sep 11, 2017

lgcowell commented Sep 12, 2017 via email • edited by bussec Loading

bcorrie commented Oct 27, 2017

bussec commented Dec 13, 2017

lgcowell commented Sep 11, 2017 via email •

edited by bussec

Loading

bussec commented Sep 11, 2017 •

edited

Loading

lgcowell commented Sep 12, 2017 via email •

edited by bussec

Loading