-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Round-5 issue in small genome #118
Comments
Hm. Yeah, we ought to fix this. LTR_retriever used to always bundle and run its own copy of TRF, and the new version doesn't try different filenames in the same way RepeatModeler does. It looks like the latest version of LTR_Retriever does support a command-line override instead, so we should be able to use that to point to the TRF program configured for RepeatModeler instead of whatever one is on
To be sure, every other genome has worked but this particular genome always fails? Is this a genome that is publicly available or that could be shared with us for troubleshooting purposes? |
Yes, it worked in many other genomes, and I have tried this particular genome several times and it always dies at the same step, that's why I am surprised. It is public, you can download it from here: Thanks for having a look! A. |
@AlexdeMendoza Thanks, I have been able to reproduce this issue! RepeatModeler uses a sampling approach with larger samples each round, without re-using the same sample twice. This genome size or structure falls into a "sweet spot": it seems there was enough un-sampled sequence remaining after round 4 that RepeatModeler went on to round 5, but at the start of round 5 it turned out there were not a sufficient number of long enough sequences (>40Kbp) remaining after all. This may be because most of the contigs are very small, but this is a bug: RepeatModeler ought to be stopping after round 4 here instead of failing. One workaround for this genome is to add some parameters to your command-line: |
Thanks for having a look! I will give it a shot. I was playing with the Another thing that I have noticed is that every time I try to do Also, it is quite unclear how to run the |
At this time the default round sizes are 40,000,000bp (round 1, with RepeatScout), then 3000000, 9000000, 27000000, 81000000, 243000000 -- mutiplying by 3 each time -- for rounds 2-6, with the RECON program. So I picked the size corresponding to round 4. RepeatModeler does try to stop early if necessary; but for this particular file that wasn't detected correctly.
I think I have seen this before, but I can't find the issue about it. In your case round 2 did not find any elements, but round 3 did and it could have been recovered. round-2 is checked before round-3, and RepeatModeler is interpreting that as failure too quickly.
This one is actually pretty straightforward to run standalone to test that it works:
The current version of |
Thanks again for all the details. Regarding the |
Dear RepeatModeler developers,
I have been using RepeatModeler for a while now, and I can get it to run on many organisms / genomes. I have also updated to RepeatModeler 2. One thing that I would recommend you to do, is specify in your installation instructions that "TRF" has to be in your PATH named "trf", since this is how LTR_retriever will call it, no matter what the configuration file specified in RepeatModeler2 (my copy was trf409.linux64).
In any case, I've come across this genome where the software always breaks at the same step:
The
round-4
folder is filled with results, butround-5
only has two empty files: "sampleDB-5.fa" and "sampleDB-5.fa.entry_batch". I have run the pipeline in the same computer many times, for genomes with similar sizes (smaller, larger), and I have tried this several times on this genome, so I am not sure what it is going on. I get a "consensi.fa" out of this process, however, I would like to see it finish and see how the new LTR module works on this genome. Any tip on what might be going wrong?Thanks in advance for your help.
Alex
The text was updated successfully, but these errors were encountered: