-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alu Retrotransposon Matches Seem Artefactual #380
Comments
Thoses are the repeatmasker annotations for the nominal
They do: in hg38 chr1:26810050-26810085 is a poly-A sequence which RepeatMasker annotates as part of a SINE element. That said, the above doesn't really help you that much as clearly annotating polyA sequence with The overall pipeline is VCF -> (gridss.InsertedSequencesToFasta^) -> RepeatMasker -> (TODO annotate VCF) The other issue I've had has been the interpretation of multiple hits. What would you expect the correct annotation to be for sequences that have multiple (either overlapping, or no-overlapping) repeatmasker matches? ^ New in the 2.10.0 dev branch. |
Does that mean GRIDSS will require internet connectivity to make web service queries? That will make it harder for people using HPCs, which often don't allow internet connections on the compute nodes. For example:
fails to establish a connection to EBI FTP server
Is the functionality available yet in development branch? How would I compile GRIDSS if I wanted to test pre-release on a HPC? Or, is the next release imminent and it would be simplest to just wait for it? |
Just finished it off today.
No, but it does require a local RepeatMasker installation. I just grabbed the bioconda version. To test your installation is working:
|
…maskerbed from gridss.sh Writing SW alignment score and inferred edit distance to INSRM
scripts/gridss_annotate_vcf_repeatmasker.sh will be included in the next release. |
That is convenient but the server which I am using doesn't have RepeatMasker nor conda installed. I'll be busy with dependencies. |
I am interested in LINE1, Alu and SVA retrotransposons and I was looking at Alu specifically by
INSRMRC=SINE/Alu
If I look at the popular software in the retrotransposon research community MELT and look inside of its references folder, I see a FASTA file named ALU.fa which hasas the reference sequence of Alu elements.
However, GRIDSS is labelling repeats of As or Ts as Alu, although they don't appear in the reference sequence.
The text was updated successfully, but these errors were encountered: