Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STAR speed slowing down during run #27

Closed
cyrilcros opened this issue Mar 9, 2015 · 4 comments
Closed

STAR speed slowing down during run #27

cyrilcros opened this issue Mar 9, 2015 · 4 comments

Comments

@cyrilcros
Copy link

Hi, I am trying to run a STAR job, but it is taking an unusually long time. I see the mapping speed declined drastically in time (50M reads paired end unstranded, mouse genome mm10, it has been running for 3 days...). I am using 7 threads on a desktop computer with 32Gb RAM running Xubuntu. Not really a dedicated server, really...
I see in the closed issues that there was a similar case, but the error came from an external memory leak. Here, I launch STAR inside a bash shell script function so there are no 3rd party software causing a leak.
I have been reading the manual for the previous version of STAR (2.3.0.1): there is a paragraph on the genomeLoad parameter usage which is not in the new manual. I was thinking of maybe trying to load the genome in the shared memory and see how it goes, after settting the kernel shmmax and shmall parameters.
I have pasted the log files below. The genome index was built successfully,
Thanks a lot for your help!
Cyril Cros

http://pastebin.com/dfDpt8wR <- Log file
http://pastebin.com/2LzzPCid <- Log progress file

EDIT1: I am running several jobs with genomeLoad=LoadAndKeep and an unsorted BAM output (to avoid reserving shared memory for the BAM sorting). A first job was finished sucessfully in 15min only, all went well. I suggest you add to the new version of the manual the paragraph on the option genomeLoad.

@alexdobin
Copy link
Owner

Hi,

there is a brief description of the genomeLoad options in the last chapter of the manual (see below). I agree that a more detailed description could be useful.

Cheers
Alex

genomeLoad NoSharedMemory
string: mode of shared memory usage for the genome files
LoadAndKeep ... load genome into shared and keep it in memory after run
LoadAndRemove ... load genome into shared but remove it after run
LoadAndExit ... load genome into shared memory and exit, keeping the genome in memory for future runs
Remove ... do not map anything, just remove loaded genome from memory
NoSharedMemory ... do not use shared memory, each job will have its own private copy of the genome

@cyrilcros
Copy link
Author

Thanks for the answer. Your alignment method is really lightning fast,
it was quite a change from bowtie/samtools.
My only issue was that with the genomeLoad option enabled, you need to
specify an amount of RAM from BAM sorting. I was not too sure of the
amount of RAM needed in total (for storing the genome+ other operations:
alignment and sorting), so I asked for an unsorted output and used
samtools afterwards. It does not seem to be really efficient.
With a 32Gb RAM and a genome index of maybe 10Gb, would the sorting have
been possible with the LoadAndKeep option? I had to do ~12 samples with
the same genome, ~50M reads each.
Thanks for any precisions
Cyril Cros

Le 3/10/2015 9:09 PM, alexdobin a écrit :

ared memory usage for the genome files
LoadAndKeep ... load

@alexdobin
Copy link
Owner

Since 2.4.0k, the sorting algorithm has been re-worked and requires much less RAM. Try it with 10GB allocated for sorting: --genomeLoad LoadAndKeep --limitBAMsortRAM 10000000000

@cyrilcros
Copy link
Author

Thanks a lot,
I will be doing some new runs shortly.
Regards,
Cyril Cros

Le 4/21/2015 9:37 PM, alexdobin a écrit :

Closed #27 #27.


Reply to this email directly or view it on GitHub
#27 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants