Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add to Readme MarkDuplicatesSpark syntax for specifying no. of cores on a local machine #6324

Closed
bhanugandham opened this issue Dec 16, 2019 · 0 comments · Fixed by #6682
Closed
Assignees

Comments

@bhanugandham
Copy link
Contributor

bhanugandham commented Dec 16, 2019

User Report:

Hi,

I'm trying to run gatk MarkDuplicatesSpark (v 4.1.4.1) locally, so not on a spark cluster, and provided the option --conf 'spark.executor.cores=4' to tell MarkDuplicatesSpark to use only 4 cores on the machine. However when I check the system load with e.g. top I see that all 44 cores of the system are used by MarkDuplicatesSpark. What am I doing wrong?

command:
gatk MarkDuplicatesSpark
--tmp-dir /local/scratch/tmp
-I Control_aligned.bam
-O Control_aligned_sort_mkdp.bam \
-M Control_aligned_sort_mkdp.txt
--create-output-bam-index true
--read-validation-stringency LENIENT
--conf 'spark.executor.cores=4'


Solution is to use this argument: --spark-master local[2] -> "Run on the local machine using two cores". More details in this doc: https://software.broadinstitute.org/gatk/documentation/article?id=11245

This Issue was generated from your [forums]
[forums]: https://gatkforums.broadinstitute.org/gatk/discussion/24671/markduplicatesspark-not-respecting-conf-spark-executor-cores-4-option/p1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants