Skip to content

CompBio-BO/Bivalvia_TEs

Repository files navigation

Bivalvia TEs

Collection of manually curated LINEs, SINEs and DDE/D-related consensus sequences extracted from bivalve genomes.

Format:

<TE CLASS/SUPERFAMILY>_<SPECIES NAME>_<PROGRESSIVE NUMBER>_cons#<RepeatMasker classification>
e.g: LINE_A.i.concentricus_9_cons#LINE/CR1-Zenon

Consensus sequences construction workflow (LINEs and DDE/D transposons):

  1. RepeatMasker annotation of TE using species-specific automatically generated libraries.
  2. Extension, Extraction and transaltion of all annotated insertions (min ORF length = 300 aa).
  3. Selection of all ORFs with a significant hmmscan hit (e-value < 0.05) against LINE-specific RVT HMM profiles or against DDE/D-related HMM profiles (specific for each DDE/D superfamily, as described in Yuan and Wessler, (2011)).
  4. Clustering of all ORF nucleotide sequecens following the 80-80 rule (CD-HIT).
  5. Identification of clusters with at least 5 members (for LINEs we also required that at least one sequence posses both RVT and EN domains on the same ORF)
  6. Back blastn of each rapresentative sequence against the genome (min 70% identity and coverage), extension and extraction of all hits.
  7. Consensus construction using EMBOSS cons (plurality of 3) followed by manual curation and validation.
  8. Merging all libraries and reduce redundancy (CD-HIT; 80-80 rule)

Consensus sequences construction workflow (SINEs):

For SINEs elements we selected 12 species for in-depth SINEs annotation using SINE_Scan. SINEs candidates resulting from RepeatModeler and SINE_Scan were merged and subjected to a "Blast-Extend-Extract" process to identify boundaries of the elements and confirm the presence of a 3' tRNA-related head, a conserved domain and a 5' tail.

Accession numbers/sources of genome assemblies:

IMPORTANT The RepeatMasker formatted-style classification for DDE/D and SINEs was automatically generated using RepeatClassifier from the RepeatModeler package and can therefore be innacurate. If you have any correction to suggest please contact jacopo.martelossi2@unibo.it

About

Manually curated LINEs, DDE/D and SINEs consensus sequences extracted from Bivalvia genomes

Resources

Stars

Watchers

Forks

Packages

No packages published