Skip to content

bcgsc/picea-glauca-plastid

Repository files navigation

Plastid genome of White spruce (Picea glauca), genotype WS77111

🌲 Annotation of the plastid genome of white spruce (Picea glauca), genotype WS77111 https://www.ncbi.nlm.nih.gov/nuccore/MK174379

WS77111 chloroplast genome

Methods and Results

The white spruce WS77111 chloroplast assembly was annotated using GeSeq. The GenBank file generated by GeSeq was then converted into a Gene Feature File using EMBOSS Seqret, where duplicate annotations were removed and manual annotations were added. Reference chloroplast genomes used include interior spruce PG29 and Sitka spruce Q903, and occasionally the Norway spruce. In addition to GeSeq, two third party tRNA annotators were used: tRNAscan-SE v2.0 and ARAGORN v1.2.38. Although these third party tRNA annotators did in fact detect some 'novel' tRNAs, these tRNAs were not found in all reference chloroplast genomes used. Further analysis of these tRNAs was conducted using RNAweasel, and ARAGORN to produce 2D structures and folding results of the tRNAs. Due to these inconclusive results and the fact that the spruce chloroplast genome is known to be highly conserved, it was concluded that these tRNAs to be excluded in the final annotation. Inverted repeats were also found but excluded from the final annotation.

GeSeq

The assembled FASTA file was inputted into GeSeq.

EMBOSS Seqret

The .gb file is converted to a .gff file using EMBOSS Seqret:

Manual Annotation

Duplicates were removed. Most conflicts were due to the Picea morrisonicola and Picea asperata reference annotations. One of them annotated tRNAs with anti-codons, and the other did not, so they were treated as different annotations and placed in the GeSeq generated .gb file twice. Those without anti-codons were removed from the final file.

ARAGORN and tRNAscan detected some tRNAs that GeSeq did not detect. However, those were not detected with high confidence and were removed from the final annotation as they were not present in PG29, Q903, or Norway spruce annotations, making them highly unlikely as most of them are highly conserved sequences (see diagram.

  • GeSeq files were regenerated with and without third party tRNA annotators to cross reference which annotations were valid:

Four genes specifically needed to be manually annotated: rps12, petB, petD, rpl16. Rps12 is trans-spliced, while the other genes had initial short exons.

GeSeq did not annotate some mRNAs as well as some exons, which were later manually annotated as well (see final annotation). In the final annotation, all 114 genes were conserved, including the 74 coding regions (CDS), 4 rRNAs, 36 tRNAs, and 15 introns (9 of them in coding regions, 6 in tRNAs).

RNAweasel

RNAweasel used to confirm that the tRNAs were not valid, but tRNA-Ser was worth looking into further.

ARAGORN

ARAGORN was run independently of GeSeq to generate the ARAGORN text report with 2D tRNA structures. The tRNA in question tRNA-Ser is tRNA #17.

tRNAscan

tRNAscan, in conjunction with ARAGORN, was used to determine tRNA products.

table2asn_GFF

GFF annotation validated using table2asn_GFF:

OGDRAW

The .gbf file generated by table2asn_GFF was fed through OGDRAW:

Citation

Lin D, Coombe L, Jackman SD, Gagalova KK, Warren RL, Hammond SA, Kirk H, Pandoh P, Zhao Y, Moore RA, Mungall AJ, Ritland C, Jaquish B, Isabel N, Bousquet J, Jones SJM, Bohlmann J, Birol I. 2019. Complete chloroplast genome sequence of a white spruce (Picea glauca, genotype WS77111) from eastern Canada. Microbiol Resour Announc 8:e00381-19. doi: 10.1128/MRA.00381-19.

About

🌲 Annotate the plastid genome of white spruce (Picea glauca)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published