Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create option to export GFF3 to comply with requirements at NCBI #565

Open
monicacecilia opened this issue Sep 11, 2015 · 7 comments
Open
Assignees
Labels
enhancement NAL NAL, WhipWorm, Large Organization problems etc.

Comments

@monicacecilia
Copy link
Member

These are the GFF3 formatting requirements provided by Terence Murphy from NCBI. Before submitting the official gene set (OGS), that is, the integrated GFF of predicted and manually curated models, some attributes need to be added:

  1. Add locus_tag attribute to top-level features such as gene or pseudogene (e.g., locus_tag=W904_OFAS000001; where W904 is the species accession number used in the NCBI submission system). The locus_tag prefix is generated when a BioProject is created, as shown here: http://www.ncbi.nlm.nih.gov/bioproject/230921
  2. Add transcript_id and protein_id attributes to both mRNA and CDS features (e.g., transcript_id=OFAS000001-RA;protein_id=OFAS000001-PA). Note: add only transcript_id to transcripts that are not from coding genes (e.g., pseudogenic_transcript, rRNA)
  3. Add a product attribute to CDS features (e.g., product=prophenoloxidase); this is usually the mRNA name when the name is different from ID.

Adapted from email sent by Mei-Ju Chen at USDA/NAL. Mei-Ju's request:
"It will be great if WA team could help to batch processing some of the attributes. Let me know if you have questions."

@monicacecilia monicacecilia added enhancement NAL NAL, WhipWorm, Large Organization problems etc. labels Sep 11, 2015
@nathandunn nathandunn added this to the 2.0.2 milestone Sep 11, 2015
@monicacecilia
Copy link
Member Author

@nathandunn you or @deepakunni3 ? I assigned this one to you to bring it back to the spotlight. cheers,

@nathandunn
Copy link
Contributor

Should this be the default for exporting GFF3 . . or should this be another option?

@monicacecilia
Copy link
Member Author

For now, I think it should be another option called "GFF3 for NCBI" or something similar. We may want to incorporate this permanently later on, but I don't know how many people are using the output on their pipelines, so we should make an announcement before changing it for good.

@deepakunni3
Copy link
Member

This is interesting. Maybe I can look into this.

@nathandunn
Copy link
Contributor

@deepakunni3 Sure. @monicacecilia Okay . an option makes the most sense for now.

@monicacecilia
Copy link
Member Author

Si @deepakunni3! cheers,

@deepakunni3
Copy link
Member

👍

@nathandunn nathandunn modified the milestones: 2.0.2, 2.0.3 Jan 21, 2016
@monicacecilia monicacecilia modified the milestones: 2.0.3, 2.0.5 Apr 7, 2016
@nathandunn nathandunn modified the milestones: 2.0.6, 2.0.5, 2.0.7 Nov 19, 2016
@monicacecilia monicacecilia modified the milestones: 2.0.7, 2.0.8 Jan 11, 2017
@nathandunn nathandunn modified the milestones: 2.0.8, 2.0.9 Oct 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement NAL NAL, WhipWorm, Large Organization problems etc.
Projects
None yet
Development

No branches or pull requests

3 participants