Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run time issue #8

Open
Maveriyke opened this issue Mar 15, 2024 · 3 comments
Open

Run time issue #8

Maveriyke opened this issue Mar 15, 2024 · 3 comments

Comments

@Maveriyke
Copy link

The run.py execution is taking way too long to execute. I have 200 samples, is this expected?
Also, how much RAM should be allocated to each thread, the program seems to be exceeding memory allocated when I allocate less than 2gigs for each thread.

@lianos
Copy link

lianos commented May 12, 2024

As a point of reference, I'm trying to run the latest version of MntJULiP (run.py --version reports MntJULiP v2.0) on 64 samples (21 conditions) using 20 threads (all pegged at 100% CPU usage), and it's been cooking for almost 3 days now.

It has finished successfully on smaller subsets of the same data in shorter time, but I wanted to run the samples all at once to make downstream analysis of splice events (group_id's) easier ... let's see if/when this finishes ... 🤞

@edwwlui
Copy link
Collaborator

edwwlui commented May 14, 2024

Thank you for using Mntjulip! To optimize speed, please consider using the "--raw-counts-only" flag if estimated counts and psis values are not required. Additionally, the "--group-filter" flag can filter groups where all samples have counts lower than, like, 15.
Regarding memory usage, which depends on the data input size, generally I suggest using a lower batch size and number of threads to optimize memory.
Let me know if you have any further questions or need assistance!

@lianos
Copy link

lianos commented May 14, 2024

Thanks for these detailed suggestions, @edwwlui !

I ended up killing the large run -- it consisted of dose response data from several compounds.

I broke up the datasets into smaller ones, batched by compound ... maybe ~4 doses, with 25 samples in each batch.

These runs finished within 45 minutes or so. I may go back to debug the larger run at some point, but this is good enough for me for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants