Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doing just variant calling /haplotypecaller? non-model organism no known-variant file for bqsr #49

Open
desmodus1984 opened this issue Jun 6, 2021 · 7 comments

Comments

@desmodus1984
Copy link

desmodus1984 commented Jun 6, 2021

Hi,
I got interested in elprep5 because I have been trying GATK4 but it is taking more than five days which is ridiculous.
I wanted to ask two questions. Since I have tried GATK4 I have a sorted/markeduplicated/bam file. Is there a way to just perform variant calling using elprep?

Also, I have tried to do the mapping and converting to get a .bam file as input, I am using the following job script:

./elprep sfm PA113corr.bam --mark-duplicates --mark-optical-duplicates PA113corr.metrics --sorting-order coordinate
--bqsr PA113corr.recal --haplotypecaller PA113corr.vcf.gz --reference Autosome.elfasta

and I still get and error
elprep version 5.0.2 compiled with go1.16.4 - see http://github.com/exascience/elprep for more information.
2021/06/06 03:18:42 Filename(s) in command line missing.

Thanks;

@desmodus1984 desmodus1984 changed the title Having no know vcf for base quality score recalibration Doing just variant calling /haplotypecaller? Jun 6, 2021
@desmodus1984 desmodus1984 changed the title Doing just variant calling /haplotypecaller? Doing just variant calling /haplotypecaller? non-model organism no known-variant file for bqsr Jun 6, 2021
@caherzee
Copy link
Contributor

caherzee commented Jun 7, 2021

Hi,

With regard to the error in the command, it seems you forgot to specify the name of the output file.

The structure of the sfm command is: elprep sfm input.bam output.bam ...

You can also just use elprep for haplotype calling (assuming the input bam has already been sorted, duplicate marked, etc because the algorithm relies on that). However, it is best to combine all steps of the pipeline in a single elprep command because elprep internally merges and parallises the different steps of a pipeline, which leads to better performance than calling the command separately for different pipeline steps.

Thanks!

@desmodus1984
Copy link
Author

desmodus1984 commented Jun 7, 2021 via email

@desmodus1984
Copy link
Author

I ran the code as you suggested it worked, but then I got an error:

elprep version 5.0.2 compiled with go1.16.4 - see http://github.com/exascience/elprep for more information.
2021/06/07 20:58:25 Created log file at /users/PHS0338/jpac1984/logs/elprep/elprep-2021-06-07-20-58-25-469790574-EDT.log
2021/06/07 20:58:25 Command line: [./elprep sfm PA113corr.bam PA113.output.bam --mark-duplicates --mark-optical-duplicates PA113corr.metrics --sorting-order coordinate --bqsr PA113cor$
2021/06/07 20:58:25 Executing command:
./elprep sfm PA113corr.bam PA113.output.bam --mark-duplicates --mark-optical-duplicates PA113corr.metrics --optical-duplicates-pixel-distance 100 --bqsr PA113corr.recal --reference A$
2021/06/07 20:58:25 Splitting...
2021/06/07 21:22:54 Filtering (phase 1)...

2021/06/07 21:27:25 signal: bus error (core dumped)

I gave the job 170Gb of ram and 2 cores. My genome is 1.9 GB and about 40X WGS.

@caherzee
Copy link
Contributor

caherzee commented Jun 9, 2021 via email

@desmodus1984
Copy link
Author

desmodus1984 commented Jun 9, 2021 via email

@caherzee
Copy link
Contributor

Hi,

The error you first reported does not imply an out of memory error. The amount of memory for the job you describe indeed seems sufficient based on our previous experience.

The error seems to suggest a memory addressing error, e.g. because of accessing a corrupted data file, an OS issue, a bug, or something else entirely. It is hard to help figuring out what the problem is without access to the detailed elprep log files or the data.

For your system, you may also want to look into the --tmp-path option (see documentation). By default the temp data elprep creates is stored on the path where the elprep binary is called. You may want to instead store it on a local scratch, specific shared storage, etc.

If you can send us the detailed log files when errors occur, we may be able to better help.

Thanks.

@desmodus1984
Copy link
Author

desmodus1984 commented Jun 10, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants