-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mutator_like elements and repeatmask #2
Comments
Dear Isra, ############ (QUESTION1) ################ I recommend you check out the preprint manuscript on BioRXiv: In footnote 2 of the taxonomy you can see, that we just classify MULE and Mutator as "TIR" (class 2/1), but not more specifically. I totally agree, that it would be very nice to have further subcategories for all of the various Transposon families in the footnote, however this would decrease the classification performance, as there are simply not sufficient examples in the database used for training the model. Does this answer your question? ############ (QUESTION2) ################ The second refers to projectFolder > finalResults > FinalAnnotations_Transposons.gff3 The structural features however, are not gone or something, they are just in another file that you can find here: So now the answer to your question: some tools like TIRvish annotate sequences that they consider as TIR transposons (DNA transposon). When we run our RFSB classifier, that is proven to classify on a very high performance, it finds that the annotation most probably is a LTR transposon. Investigating different tools we found that many tool annotations do not necessarily correspond to the dedicated transposon class (see manuscript, the heatmaps at the end). Does this make sense to you? Please answer on that :-) ############ (QUESTION3) ################ What do you think about that? I hope I could answer your questions, looking forward for your answers and getting back in touch, |
Dear Kevin,
thank you so much for this fantastic software. I was looking for this specific tool, a wrapper of many different TE annotation softwares.
I have a three question regarding the annotation of TEs.
############ (QUESTION1) ################
I am highly interested in the annotation of all MUTATOR-like elements of one species of nematode, however, this family of DNA-transposons is missing in your code, right? (maybe I am missing something...
The only categories your program is able to detect are:
1:Class I, Retrotransposon
1/1:LTR, Retrotransposon
1/1/1:Copia, LTR, Retrotransposon
1/1/2:Gypsy, LTR, Retrotransposon
1/1/3:ERV, LTR, Retrotransposon
1/2:Non-LTR, Retrotransposon
1/2/1:LINE, Non-LTR, Retrotransposon
1/2/2:SINE, Non-LTR, Retrotransposon
2:Class II, DNA Transposon
2/1:TIR, DNA Transposon
2/1/1:Tc1-Mariner, TIR, DNA Transposon
2/1/2:hAT, TIR, DNA Transposon
2/1/3:CMC, TIR, DNA Transposon
2/1/4:Sola, TIR, DNA Transposon
2/1/5:Zator, TIR, DNA Transposon
2/1/6:Novosib, TIR, DNA Transposon
2/2:Helitron, DNA Transposon
2/3:MITE, DNA Transposon
am I right?
what about MULEs elements?
############ (QUESTION2) ################
Also, I do not understand how can I get the two following results:
1.- With TIRvish I obtain a clear DNA transposon, with its TSD and TIR sequences:
seq1 TIRvish tsd 13153 13155 . + . transposon=6 ;description=Left TSD of transposon 6
seq1 TIRvish tir 13156 13995 . + . transposon=6 ;description=Left TIR of transposon 6
seq1 TIRvish tir 19926 20769 . + . transposon=6 ;description=Right TIR of transposon 6
seq1 TIRvish tsd 20770 20772 . + . transposon=6 ;description=Right TSD of transposon 6
2.- However, when it comes to annotate the transposon, I get the following:
seq1 reasonaTE transposon 13153 20772 . + . transposon=6;class=1/1/2(Gypsy,LTR,Retrotransposon)
I do not understand how, after inferring the TIR and TSD sequences (meaning a "clear" DNA-transposon, the software can determine that the transposon corresponds with a LTR retrotransposon.
why that?
By playing around a bit with blast I suspect that this element corresponds with a MULE element.
############ (QUESTION3) ################
When I run the whole transposon_annotation_reasonaTE pipeline I obtain in the RepeatMasker folder that only 1.4% of the C.elegans genome is masked. However, when I run independently RepeatMasker with the default format, I obtain that around 18.93% of the genome is mask. Why this difference? I used the default RepeatMasker code you suggest:
reasonaTE -mode annotate -projectFolder workspace_${genome} -projectName testProject_${genome} -tool repeatmodel
reasonaTE -mode annotate -projectFolder workspace_${genome} -projectName testProject_${genome} -tool repMasker
Thank you in advance,
Isra
The text was updated successfully, but these errors were encountered: