Skip to content

Releases: ExaScience/elprep

elPrep 5.1.2 release

09 Feb 12:50
Compare
Choose a tag to compare

Binaries were compiled for Linux using go1.17.

Bug fix:

  • Added metrics output for single-end input data.

Many thanks to Leonor Palmeira for reporting.

elPrep 5.1.1 release

04 Oct 09:13
Compare
Choose a tag to compare

Binaries were compiled for Linux using go1.17.

Bug fix:

  • added missing source code file

Many thanks to Matthias De Smet

elPrep5.1.0 release

01 Oct 13:05
Compare
Choose a tag to compare

Binaries can be downloaded here:
https://www.imec-int.com/en/expertise/lifesciences/genomics/dna-sequence-analysis-software

Update to the sfm command:

Added merging of headers when using the sfm command with a directory
containing multiple files as input.

The merging respects the requirements outlined in the SAM spec:

  • order of sequence dictionaries is kept
  • read group identifiers are made unique if needed + optional rg tags of
    reads are updated accordingly
  • program line identifiers are made unique if needed + optional pg tags
    of reads are updated accordingly
  • comment lines are merged
  • the sorting order is set to Unknown

If elprep cannot merge while guaranteeing the above constraints, an
error is produced.

elPrep5.0.2 release

19 May 15:20
Compare
Choose a tag to compare

Binaries can be downloaded here:
https://www.imec-int.com/en/expertise/lifesciences/genomics/dna-sequence-analysis-software

Performance improvements:

  • Updated the explicit calls to the gc, improving average runtime

Small bug fixes and code clean up:

  • Added immediate error when calling sfm with --haplotype-caller without reference
  • Updated calls to match renaming in latest release of bitset library

Included paper reference in README.

Many thanks to Leonor Palmeira for reporting bugs.

elPrep5.0.1 Binaries

04 Feb 09:03
Compare
Choose a tag to compare

Binaries can be downloaded here:
https://www.imec-int.com/en/expertise/lifesciences/genomics/dna-sequence-analysis-software

Small bug fixes and code clean up:

  • Fixed optional logging of haplotype caller (using —activity-profile and —assembly-regions)
  • Fixed —log-path to create full directory path if necessary
  • Added an error message when using BQSR without read groups.

Many thanks to Leonor Palmeira and Jacques Dainat for reporting bugs.

elPrep v5.0.0 release

08 Dec 15:56
Compare
Choose a tag to compare

Binaries can be downloaded here:

https://www.imec-int.com/en/expertise/lifesciences/genomics/dna-sequence-analysis-software

The major new feature of elPrep 5 is the addition of variant calling, which means that elPrep can now do a full variant calling pipeline on its own, starting from an aligned BAM file, and producing a VCF file. We follow the haplotype caller algorithm.
 
There are a number of additional improvements and changes, some of which, but not all related to variant calling.

Functionality
- The option —haplotypecaller for variant calling

Tool changes
- The previous --bqsr-reference option has been renamed to --reference because it is also used for the Haplotypecaller.
- There exist different semantics in different tools for implementing -L options that are used to filter reads based on genomic regions. The already existing --remove-non-overlapping-reads option implements a different option from the newly added —target-regions, which is especially relevant for the Haplotypecaller. If you use the --remove-non-overlapping-reads option, reads outside of the regions of the given BED file will be removed, but the variant calling will not be restricted to the regions in that BED file, which may lead to surprisingly large VCF files. If you want to restrict variant calling to those regions, use --target-regions instead. A peculiar corner case occurs when you use base quality score recalibration and --target-regions in the same pipeline -, the reads outside of the BED region will then effectively also be removed (just a bit later in the pipeline than with --remove-non-overlapping-reads). There are other peculiar effects. For example, the —target-regions option does not restrict the variant caller exactly to the BED regions, but adds some padding around those regions, so effectively processes reads outside of these regions as well. We carefully covered all these corner cases in detail to ensure elPrep’s result are identical to these semantics.
- Comparing reads by coordinate order is now more fine-grained.
- We have removed the previously already deprecated original filter command that existed only for compatibility with very old versions of elPrep. This should not matter for the majority of end users.
- We have dropped the undocumented --deterministic and --mark-duplicates-deterministic options. Marking duplicates is now always deterministic. The --deterministic option has been replaced with compile-time options. They are rarely interesting for end users.

File handling and formats
- We have improved VCF parsing and formatting to be in line with Haplotypecaller requirements. For example, the GT field is now explicitly supported, among other things.
- We do not require the presence of bcftools for parsing or formatting .vcf.gz files anymore, but now handle them completely ourselves. As a downside, the BCF format is not supported anymore in elPrep 5. If you need BCF support, consider converting between BCF and VCF separately, for example with bcftools.
- Whether input files are gzip-compressed (for example, BAM or .vcf.gz files) is not determined anymore by file extensions, but by looking at the actual contents of the input files. This makes elPrep more stable with regard to non-standard file extensions, and for example, now also allows for accepting BAM files as inputs from Unix pipes.
- We now support an --output-type parameter to select SAM or BAM format for output. This is useful, for example, if you want to send BAM files to Unix pipes.
- When parsing BED files, we do not process comment, track, or browser lines anymore, but simply ignore them now.
 
API
- We have dropped Go-style error handling with error return codes in the elPrep source code in most places, in favor of exception-style error handling. For the end user, this difference doesn’t matter, but this is primarily for making lives easier for developers.
- We dropped support for concurrent access to ELFASTA files in favor of exclusively using memory-mapped access.
 
Performance
- Various other bug fixes and improvements.

elPrep Binaries

19 Aug 14:47
Compare
Choose a tag to compare

elPrep binaries compiled with go1.15.

Bug fixes:

  • Fixed optical duplicate parameter passing in sfm mode.
  • Ensure that reads that are already marked as duplicates are handled correctly in mark duplicates / mark optical duplicates.

Many thanks to Benoit Charloteaux and babicjovana.

elPrep Binaries

25 Jul 07:49
Compare
Choose a tag to compare

elPrep binaries compiled with go1.12.7.

Bug fixes:

  • Fixed reading of floating point numbers from BAM files.
  • Fixed sanity checks for some BQSR options.

Extension:

  • Added --max-cycle option to specify maximum cycle value during BQSR.

Many thanks to Hideyuki Tanaka, ChristopheH, and wcarre.

elPrep Binaries

23 May 13:31
Compare
Choose a tag to compare

elPrep binaries compiled with go1.12.5.

Small bug fixes:

  • Fixed /dev/stdin and /dev/stdout for sfm
  • Fixed the detection of directory path names in sfm

Extensions:

  • Added --tmp-path option for sfm for specifying the path where directories for temporary files are created.
  • Added merge-optical-duplicates-metrics command for merging intermediate metrics files created using the --mark-optical-duplicates-intermediate option.

Update documentation with options for writing your own split/filter/merge scripts. See:
--mark-optical-duplicates-intermediate, --bqsr-tables-only, --bqsr-apply.

Many thanks to Geert Vandeweyer, Amin Ardeshirdavani, and Tuan Li.

elPrep Binaries

30 Apr 09:00
Compare
Choose a tag to compare

elPrep binaries compiled with Go.1.12.4

Fasta parser now accepts empty lines in front of description lines. Existing elfasta files do not need to be recreated.

Many thanks to Amin Ardeshirdavani for reporting this issue.