Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threads used affects results #450

Closed
keiranmraine opened this issue Feb 8, 2021 · 17 comments
Closed

Threads used affects results #450

keiranmraine opened this issue Feb 8, 2021 · 17 comments

Comments

@keiranmraine
Copy link

keiranmraine commented Feb 8, 2021

Hi,

This may be a know issue or specific to the version I'm running but I couldn't see issues relating to this. Docker image downloaded from dockerhub - version 2.9.4.

I've been running some tests to evaluate the most efficient way to deploy gridss on our system, different numbers of threads etc.

I'm finding that the calls are varying considerably depending on the number of threads used:

  • output-2 = 2 threads
  • output-4 = 4 threads
  • output-6 = 6 threads
  • output-8 = 8 threads
$ comm -23 <(zcat output-2/test.vcf.gz | cut -f 1-5 | sort) <(zcat output-4/test.vcf.gz | cut -f 1-5 | sort) | wc -l
46039
$ comm -23 <(zcat output-2/test.vcf.gz | cut -f 1-5 | sort) <(zcat output-6/test.vcf.gz | cut -f 1-5 | sort) | wc -l
720
$ comm -23 <(zcat output-2/test.vcf.gz | cut -f 1-5 | sort) <(zcat output-8/test.vcf.gz | cut -f 1-5 | sort) | wc -l
0

Along with the difference in calls I noted the following:

  1. The VCFs aren't equally sorted. Extending sort to REF/ALT ensures stable outputs when multiple events occur at the same position.
  2. Although 2 threads vs 8 looks to match there are differences in the genotype column.

Executed via:

export CPUS=8 # this value changed for each thread count
rm -rf output-${CPUS} output-${CPUS}-tmp
mkdir -p output-${CPUS} output-${CPUS}-tmp
gridss.sh\
 --reference $PWD/ref/genome.fa\
 --blacklist $PWD/ref/blacklist.bed\
 --threads $CPUS\
 --labels test\
 --assembly $PWD/output-${CPUS}/test.assembly.bam\
 --output $PWD/output-${CPUS}/test.vcf.gz\
 --workingdir output-${CPUS}-tmp\
 $PWD/inputs/INPUT.bam
@d-cameron
Copy link
Member

d-cameron commented Feb 9, 2021 via email

@d-cameron
Copy link
Member

Doing regression testing to see if this is a symptom of #363, or a more widespread issue.

@keiranmraine
Copy link
Author

I am able to confirm that the first point of divergence is the *.sv.bam. All of the ASCII files generated during the preprocess step are equal (other than paths/timestamps).

d-cameron pushed a commit that referenced this issue Feb 16, 2021
@d-cameron
Copy link
Member

d-cameron commented Feb 16, 2021

Are you using a version of bwa that is not stable w.r.t number of threads used?

See https://www.biostars.org/p/90390/

GRIDSS calls bwa during preprocessing to identify split reads (e.g. bwa does not report all split reads & bowtie2 does not do split read alignment at all). Differences in the alignment results return will have a downstream impact on the GRIDSS SV calls.

@keiranmraine
Copy link
Author

We are using the image you have placed on dockerhub:

docker pull gridss/gridss:2.9.4

The input is the same file for each execution, the only variable being the number of threads passed to GRIDSS.

@keiranmraine
Copy link
Author

I should mention we are moving to 2.10.2 (or greater), we are not fixed on the 2.9.x branch.

@d-cameron
Copy link
Member

Just to clarify: are the .sv.bam difference difference in the content of the SAM records themselves, or just md5 differences due to the bwa program group header being different due to different paths/ # thread?

@d-cameron
Copy link
Member

Ok, the root cause is that bwa 0.7.17-r1188 mapq scores are not stable w.r.t to number of threads used to run bwa

@keiranmraine
Copy link
Author

It is the content of the SAM records themselves (2cpu vs 4cpu):

$ comm --nocheck-order -23 <(samtools view stepwise-2-tmp/test.bam.gridss.working/test.bam.sv.bam) <(samtools view stepwise-4-tmp/test.bam.gridss.working/test.bam.sv.bam) | wc -l 
5712180

@d-cameron
Copy link
Member

Ok, root cause is that bwa is not stable w.r.t thread count due to dynamic batch size. -K need to be passed to bwa to force stability

@d-cameron
Copy link
Member

--externalaligner is required if you need stability as I don't enforce deterministic call order when doing multithreaded processing against the bwa JNI API

@keiranmraine
Copy link
Author

Just checking I understand. The dev branch has a correction, but to use this we need to set the --externalaligner option.

Is there a timeline for a release/hotfix with the vcf order and this to be pushed out or the possibility of a docker image being made available for testing? Thanks

@d-cameron
Copy link
Member

Fix is on the dev branch, but you still need to use --externalaligner for the fix to actually work since the in-process bwa can occur out-of-order since they're not going via a fastq file.

ETA for next release is end of next week.

@d-cameron
Copy link
Member

d-cameron commented Feb 22, 2021

The preprocess step is slower when using --externalaligner so there will be a small performance hit if you require deterministic alignment behaviour.

@keiranmraine
Copy link
Author

Can we get an updated ETA on a versioned release please?

@d-cameron
Copy link
Member

I have a release candidate for v2.11.0 already prepared. Release notes have been written and I'm currently doing regression testing. Should be a few days if all goes as expected, within a week if I pick up any regression bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants