Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional examples #2

Closed
childers opened this issue Mar 21, 2017 · 4 comments
Closed

Additional examples #2

childers opened this issue Mar 21, 2017 · 4 comments

Comments

@childers
Copy link
Collaborator

Terence had some additional examples for us to test with:

For more testing, here are two assemblies with lots of sequences, so the mapping table is big:
https://www.ncbi.nlm.nih.gov/assembly/GCF_000715135.1
https://www.ncbi.nlm.nih.gov/assembly/GCF_000233375.1

The program should gracefully fail given an assembly report like this one:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/180/655/GCA_000180655.1_ASM18065v1/GCA_000180655.1_ASM18065v1_assembly_report.txt
As I mentioned, we’re planning to switch to always populating the file so cases like that will go away. It’s also never the case for RefSeq assemblies.

@guilhemfaure
Copy link
Collaborator

guilhemfaure commented Mar 21, 2017 via email

@childers
Copy link
Collaborator Author

childers commented Mar 22, 2017

For tobacco, there are no alternative IDs to convert to (not even genbank IDs).

It does work if we convert from refSeq to refSeq:

$ time seqconv convert --ref Ntab-TN90 --out rs ref_Ntab-TN90_top_level.gff3.gz >test_tobacco_gb.gff3
Converting from None to rs
Starting Conversion
FORMAT detected: rs
real	0m16.931s
user	0m14.429s
sys	0m1.302s

@childers
Copy link
Collaborator Author

Fro Salmon, it appears to work ok:

$ time seqconv convert --ref ICSASG_v2 --out gb  ref_ICSASG_v2_top_level.gff3.gz> test_salmon.gff3
Converting from None to gb
Starting Conversion
No corresponding id for nc_001960.1 from rs
FORMAT detected: rs
real	0m50.122s
user	0m37.864s
sys	0m3.102s

@childers
Copy link
Collaborator Author

text output

$ head -n 20 test_salmon.gff3
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/233/375/GCF_000233375.1_ICSASG_v2/GCF_000233375.1_ICSASG_v2_assembly_report.txt
##gff-version 3
#!gff-spec-version 1.21
#!processor NCBI annotwriter
#!genome-build ICSASG_v2
#!genome-build-accession NCBI_Assembly:GCF_000233375.1
#!annotation-date 22 September 2015
#!annotation-source NCBI Salmo salar Annotation Release 100
##sequence-region CM003279.1 1 159038749
##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=8030
CM003279.1	RefSeq	region	1	159038749	.	+	.	ID=id0;Dbxref=taxon:8030;Name=ssa01;breed=double haploid;chromosome=ssa01;dev-stage=adult;gbkey=Src;genome=chromosome;isolate=Sally;mol_type=genomic DNA;sex=female;tissue-type=muscle
CM003279.1	Gnomon	gene	5501	62139	.	-	.	ID=gene0;Dbxref=GeneID:106560212;Name=LOC106560212;gbkey=Gene;gene=LOC106560212;gene_biotype=protein_coding
CM003279.1	Gnomon	mRNA	5501	62139	.	-	.	ID=rna0;Parent=gene0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;Name=XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
CM003279.1	Gnomon	exon	61647	62139	.	-	.	ID=id1;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
CM003279.1	Gnomon	exon	43486	43714	.	-	.	ID=id2;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
CM003279.1	Gnomon	exon	23978	24241	.	-	.	ID=id3;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
CM003279.1	Gnomon	exon	16966	17019	.	-	.	ID=id4;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
CM003279.1	Gnomon	exon	5501	5691	.	-	.	ID=id5;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XM_014160784.1;gbkey=mRNA;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;transcript_id=XM_014160784.1
CM003279.1	Gnomon	CDS	43486	43633	.	-	0	ID=cds0;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XP_014016259.1;Name=XP_014016259.1;gbkey=CDS;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;protein_id=XP_014016259.1
CM003279.1	Gnomon	CDS	23978	24241	.	-	2	ID=cds0;Parent=rna0;Dbxref=GeneID:106560212,Genbank:XP_014016259.1;Name=XP_014016259.1;gbkey=CDS;gene=LOC106560212;product=fibroblast growth factor receptor 3-like;protein_id=XP_014016259.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants