Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IsoQuant WorkTime #97

Closed
laetitiarialland opened this issue Jul 3, 2023 · 6 comments
Closed

IsoQuant WorkTime #97

laetitiarialland opened this issue Jul 3, 2023 · 6 comments
Labels
performance Issues related to computational perfromance

Comments

@laetitiarialland
Copy link

Hello,
I was wondering how long it takes to run a ONT-long read for a single sample (from bam file with 8GB size) ? I've been trying to run this sample and it appears to work (no error) but it is very long (running for 200hrs in 128gb RAM ; and it is not finished (step : Processing chromosomes)). I wonder if it is normal ; or if you are aware of some bugs when you start from bam file or with only one sample ?

Thank you,
Kind regards,

Laëtitia

@andrewprzh
Copy link
Collaborator

Dear @laetitiarialland

It's hard to say in general. For example, I just found a log in which a 37 GB BAM file was processed in 3 hours in 16 threads. However, I encountered cases in which a lot of reads map to the same region (sometimes mitochondrial chromosome) which causes slow processing. I don't think there are any particular issues with running a single sample or using a BAM file. By the way, which aligner did you use?

Performance was significantly improved in the recent version, so make sure you are using the latest one. Disc I/O can also be an issue. Could you also check how many processes are actually running?

200 hours seems like quite a lot, could you send me the log file as well?

Best
Andrey

@andrewprzh andrewprzh added the performance Issues related to computational perfromance label Jul 3, 2023
@laetitiarialland
Copy link
Author

Hello, thanks for your answer.
It could be linked with mitochondrial chromosome because I've a lot of reads that are mapping to this region. I use minimap2 for alignment.
The version I have is 3.2.0, I will try to upload to 3.3 to see if it works better.
Here is the log file for the run :
isoquant.log

Thanks
Kind regards
Laëtitia

@andrewprzh
Copy link
Collaborator

Dear @laetitiarialland

Seems like chrM is not the issue in this case. There are several chromosomes that being processed for too long. As I mentioned, the problem is typical for genes/regions with extreme amounts of reads. If you have a chance to send me a part of your BAM file, let's say chr9, I can take a look at it as some point.

And unfortunately 3.3 should not be different in terms of performance.

Best
Andrey

@dawangran
Copy link

Hello, I have 17 samples, and I can finish two samples in three days, which consumes a lot of time, is there any way to solve it?
isoquant.txt

@andrewprzh
Copy link
Collaborator

@dawangran

Running time is not always proportional to the number of sample or the number of reads. The most problematic cases are loci with extremely high coverage.
For example, in your case it is the mitochondrial chromosome that requires 3 days alone (the rest are processed within hours).

2024-02-12 19:33:57,258 - INFO - Finished processing chromosome 4
2024-02-15 09:27:41,671 - INFO - Finished processing chromosome MT

It is a known problem, which I working on, but it is not easy to deal with without changing the functionality.
The only workarounds now are either to remove MT chromosome if you don't really need it, or use --no_model_construction option, which will disable novel transcript discovery.

By the way, if you could share a BAM containing reads mapped to MT, that can also help.

Best
Andrey

@andrewprzh
Copy link
Collaborator

Finally released new version 3.4, which has significantly better performance, especially for such cases.
On my test dataset new release version works for 3-4 hours instead of 3 days.

Best
Andrey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues related to computational perfromance
Projects
None yet
Development

No branches or pull requests

3 participants