Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BQSRPipelineSpark different Execution Times #4479

Closed
Vzzarr opened this issue Mar 2, 2018 · 5 comments
Closed

BQSRPipelineSpark different Execution Times #4479

Vzzarr opened this issue Mar 2, 2018 · 5 comments
Labels

Comments

@Vzzarr
Copy link

Vzzarr commented Mar 2, 2018

I don't know if it is an issue or I am doing something wrong, but I report my experience in case might be useful for GATK developers:

I took some Performance Analysis for the tool BQSRPipelineSpark (when GATK 4.0 was still in Beta, for understanding there was still gatk-launch as command to execute the tool), processing a Whole Exome Sequencing Genome of about 14 GB (obviously after applying FastqToSam and BwaAndMarkDuplicatesPipelineSpark) and it took about 70 minutes.

I tried to use the same tool in the same VM, with the same input data and now takes 626,95 minutes (a considerable difference of execution time). Is it normal or am I doing something wrong?

To be sure of what I am saying, I re-executed the old version tool with gatk-launch and it takes 65 minutes for example

@droazen
Copy link
Collaborator

droazen commented Mar 2, 2018

Possibly you are running into the Spark performance regression described in #4376. This was patched in the latest release (4.0.2.0) -- could you try running with that release and see if the issue is resolved?

@droazen droazen added the Spark label Mar 2, 2018
@Vzzarr
Copy link
Author

Vzzarr commented Mar 2, 2018

@droazen great! I faced this problem because I am using a self deployed Docker Swarm and so I was using a GATK version of few weeks ago. Sorry for opening an useful issue.

For completeness, I used a VM with double of resources and it took 36,92 minutes, as predictable.

@Vzzarr Vzzarr closed this as completed Mar 2, 2018
@droazen
Copy link
Collaborator

droazen commented Mar 2, 2018

@Vzzarr Was the 36,92 minute run with version 4.0.2.0?

@Vzzarr
Copy link
Author

Vzzarr commented Mar 2, 2018

@droazen Yes, I simply re-cloned the GATK GitHub repository and executed ./gradlew bundle as always. I used a 16 cores with 110 GB RAM VM

@droazen
Copy link
Collaborator

droazen commented Mar 2, 2018

👍 great, glad to hear the fix worked for you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants