Skip to content

Releases: GregoryFaust/samblaster

v.0.1.26

04 Jun 16:29
b642639
Compare
Choose a tag to compare

This release addresses the following issues:

  1. Fix the bug that failed to output discordant reads to the discordantFile when the --discordantFile (-d) option is used in conjunction with the --acceptDupMarks (-a) option. This closes issue #46.

  2. Add spaces around every occurrence of 'PRIu64' format specification to be compliant with c++11 string constant standards. This incorporates the changes to samblaster.cpp from pull request #44. Also add string literal warning to Makefile.

v.0.1.25

17 Mar 01:24
89b1ac1
Compare
Choose a tag to compare

This release addresses the following issues:

  1. Change the behavior of --addMateTags to first ensure that the tag does not already exist before adding to a line. As a result, we now have no duplicate tags added. Closes issue #41.

  2. Improve handling of orphan/singleton reads.

    1. Improve orphan/singleton duplicate identification by allowing forward and reverse strand orphans/singletons to be duplicates of each other. This identifies 10% to 100% more orphan/singleton reads as duplicates, but still not as many as Picard MarkDuplicates. See SAMBLASTER_Supplemental.pdf for our initial discussion of this issue. Common samblaster usage is with Illumina paired-end reads aligned with BWA MEM. Orphans are ~0.5% of all pairs in such data, resulting in a small overall effect. The main impetus for this change is to increase the suitability of samblaster for finding duplicates in singleton long-read data where the entire dataset is subject to this change. In a test of long reads aligned with NGMLR this release found ~80% more duplicates than the previous algorithm.
    2. Add a --maxReadLength parameter to fix the issue addressed in release 0.1.24 in a more general way, especially for proper handling of long singleton reads, but also anticipating longer reads from Illumina.
    3. Keep track of the count and max of read lengths that are larger than --maxReadLength. Report these at the end of the run as warnings, unless a read is such that it causes the calculated reference offset to fall outside the reference genome. In this latter case, a fatal error will occur (previously caused a segfault).
    4. The above changes close issues #40, #43, and provide a more permanent resolution to issue #26.
    5. Add example usage scenarios in both the program help and README.md for using samblaster to mark duplicates and/or pull splitters from files containing long singleton reads.
      Responds to issue #42.
  3. Additional changes to error handling and run stats reporting.

    1. Add check to ensure that every read has a reference sequence that is listed in the SAM header.
    2. Add better input parameter error checking and error messages.
    3. Output partial run stats if a fatal error occurs after samblaster starts processing SAM lines.
    4. Keep track of the number of ids with no primary alignment, pairs in which reads are: both unmapped, one mapped, and both mapped. Add a table of run statistics output to stderr which includes these counts and percentages of all reads, as well as counts and percentages of duplicates that occur in each appropriate category. This is a partial response to issue #16.

v.0.1.24

28 Nov 22:27
Compare
Choose a tag to compare

This release fixes a sometimes fatal bug caused by underflow/overflow of reference coordinates on contigs due to clipped reads near the beginning/end of a contig. Also, especially in overflow, there was a small chance of samblaster falsely identifying as duplicates a read pair mapping to the end of one contig with a read pair at the beginning of the next contig.

Fixes issue #26.

v.0.1.23

22 Sep 23:08
Compare
Choose a tag to compare

THIS RELEASE IS DEPRACATED. PLEASE USE RELEASE 0.1.24 INSTEAD

This release addresses the following issues:

  • Change a data structure to handle both longer and a larger number of sequences in the input SAM file. Thanks to carsonhh for the idea and code. Closes issue #21.
  • Change the behavior of --addMateTags to not add tags to any alignment with an unmapped mate. Thanks to ernfrid for the code. Closes pull request #24.
  • Add genomic location information in the error message for missing mate. Closes issue #22.
  • Explicitly check if the input file is in BAM format, and if so exit. This avoids unpredictable results including potential segfault.

v.0.1.22

18 Jun 20:52
Compare
Choose a tag to compare

This release addresses the following issues:

  • Add -M option. Use of -M is backward compatible with the samblaster behavior since release 0.1.15 in which alignments marked with both FLAG values 0x100 and 0x800 were treated as supplemental alignments for the purposes of identifying split-reads. The new default behavior is compliant with the latest SAM specification and treats only reads flagged with 0x800 as supplemental. The -M option can (and should) be used with older alignment files created when chimeric reads were marked 0x100, or ones produced by recent versions of bwa mem using its -M option. See README.md for details.
  • Bug fixes for buffer overruns that sometimes occurred while adding the duplicate flag or mate tag information to output lines.
  • The 'N' CIGAR op is now supported, as well as multiple clip ops ('S' and/or 'H') at the beginning or end of a CIGAR string.
  • An @pg line is now placed in the header of all output SAM files.
  • SAM headers with up to 32,000 contigs may now be supported if the system being used can allocate sufficiently large arrays.
  • Add --ignoreUnmated option. This option is not recommended for general use. It disables checks in samblaster that detect incorrectly sorted or malformed input files. However, it can be useful in cases in which some alignments have been filtered out of an otherwise well formed and read-id (QNAME) grouped input file.

v.0.1.21

23 Dec 19:43
Compare
Choose a tag to compare

This release addresses the following issues:

  • Fixed bug that resulted in intermittent incorrect detection of a few split-read mappings when using --splitterFile option in conjunction with --acceptDupMarks option.
  • samblaster now supported on OSX. Thanks to Aakrosh Ratan and Saket Choudhary for their code mods.
  • Updated SAMBLASTER_supplemental.pdf to include final samblaster citation, and fixed the recently broken link to the file in README.

v.0.1.20

06 Sep 23:53
Compare
Choose a tag to compare

This release adds support for sequence match ('=') and mismatch ('X') in CIGAR strings when calculating the 5' reference coordinates of alignments for duplicate identification.

Thanks Brad.

v.0.1.19

15 Aug 00:41
Compare
Choose a tag to compare

This release adds support for the --addMateTags option. This option will cause samblaster to add MC (Mate CIGAR) and MQ (Mate Mapping Quality) tags to all SAM output lines that are associated with paired-end reads for which both reads appear in the input SAM file. The CIGAR and Mapping Quality of the primary alignment for each read are used. The MC and MQ tags are added regardless of whether both, one, or neither of the reads are mapped, and are also added to any secondary alignments associated with such a pair of reads. In addition to the main output SAM file, they are also added to the discordantFile and/or splitterFile output file(s) when specified.

These tags can be useful for downstream processing of the output SAM file(s) especially after they are position sorted, which can separate the alignments associated with paired-end reads by an indefinite distance. However, to my knowledge, no commonly used aligner includes the MC or MQ tags in their output. Therefore, it is easier for samblaster to add these tags while the file is read-id grouped than for downstream processing steps to try to access the alignment for each mate to get this information. In addition, as of now, when --addMateTags is specified, samblaster will add these tags as outlined above without first checking whether or not these tags are already present in the input file. This release is no slower than previous releases when --addMateTags is not specified. However, samblaster is a few to several percent slower when this option is used, and of course the output SAM files are also larger.

Thanks to Colby Chiang for helping to test this release.

v.0.1.18

08 Aug 20:48
Compare
Choose a tag to compare

Fixed a bug for single-read (not paired-end) input files that sometimes resulted in junk lines in output SAM file(s) immediately following a marked duplicate.

v0.1.17

01 Aug 18:36
Compare
Choose a tag to compare

Fixed a bug that mishandled unmapped reads in single-read (not paired-end) input files.