New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Producing LAVEnder Report #72
Comments
Hi Rachel, Thank you for reporting the issue. Could you please send me the GP file causing this issue? Yeah, the RNFtools was developed almost five years ago when R wasn't very common and BioConda didn't exist yet. The original version contained even own package manager which compiled all used packages from sources. This is why RNFtools uses GnuPlot. Unfortunately, GnuPlot is now a bit obsolete and seems to be slightly incompatible across versions; it sometimes fails very strangely. I am thinking about creating RNFtools 2, which would be more up-to-date with modern technologies and would be better suited for clusters and metagenomic simulations. I plan to incorporate all your comments. However, I was thinking about not including LAVEnder in the new version. This part of the framework was a little bit overkill – it was developed before modern long-read technologies emerged. At that time, it seemed that the future was "pair-end" -> "many fragments", therefore, the LAVEnder aimed at something what didn't come into practice much. |
It's great that you archived all these files. In this case, it should be possible to run
and then to plot the graphs from the obtained ROC files using GP files from RNFtools examples. Sorry for this inconvenience. |
Thanks for the quick responses even though RNFtools isn't under super active development - I really appreciate it. Perhaps it would help to clarify what my goals are, and perhaps you can help point me to ways of using RNFtools that might be more efficient, if that's possible with it's current implementation. I'm primarily interested in the "m" number in the evaluation reports - the number of reads which should be unmapped, but are mapped. I'm basically simulating from two references, and aligning back to only one, and then want the stats on what was correctly/incorrectly aligned from both the reference that should and shouldn't align. RNFtools seems to be a lovely wrapper to do this. From the documentation I understood LAVEnder to be the evaluator once reads have been simulated, named, and aligned. If you're saying LAVEnder is "overkill", is there a way to get out the alignment stats (number of reads mapped, unmapped, etc) without the week-long step of going from the es to et files? It seems to me that even just reading through the alignment files in python to get these stats should be considerably faster than LAVENder is, and I'm not sure what's going on under the hood as I haven't looked into the code base here. Perhaps I should just be writing something fairly simple myself to get out these stats from the bam alignments, but using RNFtools as a wrapper for simulation + evaluation, and having it produce such a nice report of the stats seems so convenient that I'd love to use it if possible! |
Thanks for mentioning this. Even though RNFtools is not until active development now, I would love to modernize it. However I would need to find a grad student who would be willing to do it as a side project. The motivation could be that and he/she would get a first-author paper (probably in Bioinformatics). I know which parts of the method/tool should be improved, but I don't have time for that. If you by any chance knew about anyone, please let me know.
That's correct.
The overkill was designing LAVEnder in such a way that it supports an unlimited number of segments. Then there are many possible combinations of correctly/incorrectly mapped/unmapped segments and the program has to consider all of them, and at the same time evaluate the criteria for all possible mapping qualities.
The critical part of the code is here: rnftools/rnftools/lavender/Bam.py Line 371 in 2551079
and especially the following function: rnftools/rnftools/lavender/Bam.py Line 260 in 2551079
It definitely can be optimized. At the time of developing, I prioritized correctness and generality over performance. And then I switched to other projects and never got to optimize it.
What sequencing technology do you simulate? If it's Illumina, you could write a simple Python script taking ES files and producing ROC files directly (without the intermediate step). |
I am indeed using Illumina (paired end) simulated data, so writing a script to just produce the ROC files directly is probably the way to go - I'll go take a look at the pieces of the code you pointed to. Thanks. |
In that case I would propose the following:
The ES format should be easy to parse, see the following description: rnftools/rnftools/lavender/Bam.py Line 140 in 2551079
|
I seem to be getting an error towards the end of the LAVEnder evaluation (just producing a simple report, via a Snakefile exactly as specified in the tutorial/examples). It does produce the individual html reports for the two different aligners I used, but does not produce any sort of combined report, which I think is supposed to be output as well? Additionally, it took about a week to run, which seems to have been pretty much entirely in the step going from es to et files -- I assume this is a bug of some sort, as it seems like evaluation, once the simulation and alignments have already been run, should be relatively quick.
It gives the following error and does not produce any "combined" files:
For reference, here is the Snakefile run, though as mentioned it is basically exactly as specified in the tutorial:
Any help would be appreciated in figuring out this error, as well as how to generate the missing report file(s) without having to re-run the full evaluation since it took over a week -- I did have it save intermediate files, so I have the es and et files which were generated.
The text was updated successfully, but these errors were encountered: