Skip to content

Releases: PapenfussLab/gridss

2.10.0

14 Oct 06:51
Compare
Choose a tag to compare

This version includes VIRUSBreakend: Viral Integration Recognition Using Single Breakends

VIRUSBreakend is a high-speed viral integration detection tool designed to be incorporated in the whole genome sequence piplines with minimal additional cost.

This version includes gridsstools: an optimised C implemention of the performance-critical steps used in VIRUSBreakend. A precompiled binary is included in the release package. If the precompiled binary does not run on your system, source code for building is available in src/main/c/gridsstools.

This version includes offical support for performing targeted GRIDSS calling. Use gridss_extract_overlapping_fragments.sh on a BED or VCF file to make GRIDSS calls based on read/read pairs with an alignment overlapping the region of interest.

The following tools and entry points have been added in this version:

  • virusbreakend.sh: driver script for VIRUSBreakend
  • virusbreakend-build.sh: script for downloading and building VIRUSBreakend database
  • gridss_extract_overlapping_fragments.sh: subsets a BAM based on regions of interest defined in a BED or VCF file
    • Use this to extract reads of interest and metrics then run GRIDSS on the extracted bam.
  • gridss_annotate_vcf_repeatmasker.sh: annotes single breakends and breakpoint inserted sequences with RepeatMasker annotations. Requires RepeatMasker to be installed.
  • gridss_annotate_vcf_kraken2.sh: annotes single breakends and breakpoint inserted sequences with Kraken2 taxonomic identifiers. Requires kraken2 to be installed.
  • gridsstools unmappedSequencesToFastq: Exports unmapped sequences to fastq. This tool is soft clip and split read-aware.
  • gridsstools extractFragmentsToFastq: Extracts reads/read pairs from a list of read names to paired fastq files
  • gridsstools extractFragmentsToBam: Subsets a BAM based on a list of read names
    • This tool will be deprecated when samtools view has this capability. See samtools/samtools#1324 for progress

The follow entry points have been added to the GRIDSS jar:

  • gridss.InsertedSequencesToFasta: exports single breakend and breakpoint inserted sequences to fasta
  • gridss.ExtractFragmentsToFastq
  • gridss.UnmappedSequencesToFastq
  • gridss.repeatmasker.AnnotateVariantsRepeatMasker
  • gridss.kraken.AnnotateVariantsKraken
  • gridss.kraken.ExtractBestSequencesBasedOnReport
  • gridss.kraken.SubsetToTaxonomy
  • gridss.VirusBreakendFilter

This release also includes the following:

  • Added scripts used to generate all figures in the GRIDSS2 preprint
  • #349 Fixed poor assembly performance edge case
  • #372 Default IO thread pool size now matches specified thread count
  • #372 changed default memory usage to 30g since it's only DNA Nexus azure:mem2_ssd1 which won't like it
  • #376 gridss_somatic_filter.R: added --configdir so path to gridss_config.R can be specified.
  • #380 #393 gridss.sh: removed --repeatmaskerbed and replaced with gridss_annotate_vcf_repeatmasker.sh utility
  • #385 don't write Q2 tag when using external aligner
  • #386 Fixed assembly telemetry crash
  • #389 Passing reference genome to metrics calculations
  • #390 filtering any linkages to variants that have been hard filtered
  • #392 recognising .tbi .csi .crai as index files when moving files around
  • #396 catching OOM and immediately terminating to prevent hangs
  • Passing through WORKER_THREADS to ComputeSamTags
  • Precomputed are used if available
  • Removed gridss.[Indexed]ExtractFullReads: removing entry points since they don't handle RP with supplementary alignments correctly.
    • Replaced by gridss_extract_overlapping_fragments.sh

2.9.4

15 Jul 11:40
Compare
Choose a tag to compare
  • #368 reduced GKL loading failure to warning message
  • #348 Fixed NPE in GeneratePonBedpe
  • #356 realignment records not longer being unclipped twice
  • #349 Fixed performance issue where an empty blacklist was not cached
  • #349 Fixed poor assembly performance edge case
  • #366 Updated dependencies to latest versions
    • CRAM input files should now be fully support
  • Removed SoftClipsToSplitReads.REALIGN_ANCHORING_BASES parameter
    • No longer used since this approach had more edge cases than realigning the entire assembly contig
  • Cleaning up namedsorted bam
    • Extra unnecessary bam file no longer left in .gridss.working directory
  • Extended minimum realignment length to 20bp
    • improves libbwa stability
  • Added AnnotateInsertedSequence.MIN_SEQUENCE_LENGTH
    • improves libbwa stability
  • Fixed potential intermediate file corruption if the gridss.SoftClipToSplitReads process was killed during the preprocessing step
  • Upgraded defensive GC log message to INFO
  • Added single breakend assembly support bias filter
  • Not reporting variants entirely contained in assembly anchor
  • Fixed "Record should have been dropped" in SoftClipToSplitReads
  • Repository now includes all R scripts used to generate the GRIDSS2 paper

2.9.3

08 Jun 16:57
Compare
Choose a tag to compare
  • #348 Fixed NPE in GeneratePonBedpe
  • Cleaning up named sorted temporary bam file when no longer required
  • Added ASSEMBLY_BIAS single breakend assembly support bias filter
    • This is a more generalised version of the ASSEMBLY_ONLY/NO_ASSEMBLY filters
  • Added NO_SR and NO_RP filters to reduce single breakend FDR
  • Fixed "Record should have been dropped" in SoftClipToSplitReads when using external alignment
  • Only writing a single realignment record for anchoring bases
    • Fixes edge case where unphased variants are sometimes phased cis
  • Removed SoftClipsToSplitReads.REALIGN_ANCHORING_BASES parameter
    • This split breakend/anchoring sequence alignment approach ends up worse than realigning the entire read. If the initial assembly was over-aligned, it will remain so. Worse, it will result in a soft clip in the anchoring bases thus inserted sequence which should be aligned to the other side.
    • The is a reversion to pre-2.9.0 GRIDSS behavour
  • Reduced lock contention when performing multi-threaded BAM reading
  • Not attempting realignment for sequences shorter than 20bp
    • Fixes issues with in-process bwa instablility when aligning very short sequences
    • Added AnnotateInsertedSequence.MIN_SEQUENCE_LENGTH parameter with default of 20
    • SoftClipsToSplitReads.MIN_CLIP_LENTH now defaults to 20
  • Added ability to dump the sequences sent for in-process realignment to a fastq file

2.9.2

20 May 01:43
Compare
Choose a tag to compare
  • #333 Fixed tumourordinal crash in gridss_somatic_filter.R
  • Reduced RepeatMaskerBEDFeature memory usage
    • Fixes Out of Memory exception in gridss.AnnotateInsertedSequence when a RepeatMasker BED file is specified
  • AnnotateInsertedSequence defaulting to in-process alignment
  • External process streaming aligner output buffer size is now bounded
    • Fixes Out of Memory exception in gridss.AnnotateInsertedSequence
  • #344 Fixed IHOMLEN bug where -ve breakends had revcomp insert sequences when comparing
    • Fixes inconsistent IHOMLEN when inserted sequence is present
  • #343 Fixed race condition in SinglePassSamProgram
  • #342 fixed crash when ref genome masking for assembly debug export
  • Reduced logging level of "found path with no support" assembly message
  • #340 Added packaging script to automate github release file set
  • Added version sanity check on Dockerfile

2.9.1

11 May 13:18
Compare
Choose a tag to compare
  • Reimplemented gridss_annotate_insertions_repeatmaster.R into gridss.InsertedSequenceAnnotator
    • Added --repeatmaskerbed command line option to gridss.sh to do RepeatMasker annotating of inserted sequences
    • gridss_annotate_insertions_repeatmaster.R is no longer included in GRIDSS releases
  • Fixed potential memory leak when using in-process bwa alignment
  • Improved performance of steps using in-process bwa alignment
  • Improved performance of variant calling steps
  • Limited some spammy log messages
  • Improved assembly stability
    • Fixes issues some users have encountered when processing hg38 alt contigs when using bwa-aligned input files

2.9.0 pre-release

06 May 14:34
2ea3435
Compare
Choose a tag to compare
2.9.0 pre-release Pre-release
Pre-release

This release includes significant changes to how GRIDSS performs preprocessing and alignment. GRIDSS now uses in-process bwa alignment instead of requiring a command-line bwa. This requires an additional .img file which is generated from the bwa index. A new setupreference step has been added to GRIDSS driver script so all files related to the reference genome can be generated as a once-off operation.

  • Added setupreference step to GRIDSS driver script
    • One-off initialisation and files written to the reference genome directory are now explicitly a separate step
  • Added BWA JNI interface
    • External alignment no longer required
  • Added PreprocessForBreakendAssembly command line program
    • Combines ComputeSamTags and SoftClipsToSplitReads in a single pass over each input.sv.bam file.
    • Approximately 50% speedup in preprocessing time due to better parallelisation
  • Added SoftClipsToSplitReads REALIGN_UNANCHORED_BASES option
  • Using REALIGN_UNANCHORED_BASES instead of REALIGN_ENTIRE_READ for assembly realignment
    • Fixes an issue with GRIDSS2 having slightly sensitivity than GRIDSS1 for deletions in which the ref has a tandem duplication (e.g SINE-SINE becomes SINE)
  • Fixed bug causing the nominal position of the two sides of a breakpoint with homology to not match for both BND records
  • Better error message if aligner process is killed
  • Added max/max/mean mapq INFO fields
  • #319 Writing out all reproduction data for all assembly errors to prevent early abort truncating the file write
  • #329 Fixed crash in gridss_annotate_insertions_repeatmaster.R when processing chromomsomes containing ":" (HLA types)
  • #312 now supporting arbitrary split read alignment overlaps (fixes java.lang.OutOfMemoryError error)
  • Standardised error codes to match sysexits.h
    • More meaningful exit codes from gridss.sh
  • #334 cleaned up driver script logging
    • Full log file now include all log messages
  • #323 added --nojni command line option to disable native acceleration
  • Updated htsjdk/picard versions
  • Fixed error where split read records to be dropped were not actually dropped.

2.8.3

05 Apr 02:25
Compare
Choose a tag to compare
  • #317 Fixed NullPointerException assembly crash
  • Fixed inconsistent scoring of 3-way split reads when the primary is mapped to a blacklisted region
  • Reduced SanityCheckEvidence memory usage

2.8.2

02 Apr 02:13
Compare
Choose a tag to compare
2.8.2 Pre-release
Pre-release

Fixed assembly errors and inconsistencies in evidence handling.

  • #311 Fixed ComputeSamTags split read processing error when the split reads overlap.
  • #307 fixed --useproperpair parameter
  • #298 Fixed issue with RP reads not always being jointly tracked during assembly
  • #278 supplementary alignments no longer provide read pair evidence
  • #278 SAMRecord dovetail filter moved to soft clip evidence to prevent orphaning of split read evidence
  • Not counting RP anchor KmerEvidence interval as it's encoded in the non-anchoring KmerEvidence
  • Improved assembly logging
  • Improved debugging and error reporting
    • Added --keepTempFiles debugging option
    • Added --sanityCheck parameter for identifying inconsistent evidence

2.8.1

12 Mar 03:19
Compare
Choose a tag to compare
  • Better assembly handling of libraries with fragment size shorter than read length
    • #299, #300 Fixed assembly streaming loading window size to handle PE libraries with fragment size shorter than read length
    • Not considering read pair mate positions in which the mate fully overlaps the anchoring read
  • #286 Explicitly logging assembly error stack trace to assist in debugging potential assembly errors.
  • #306 replacing :| in BEALN reference names with _ to prevent downstream parsing errors.
  • #301 Using QualityScoreDistribution instead of CollectAlignmentSummaryMetrics as placeholder picard metric gathering assembly metrics
  • Driver script improvements
    • #310 fixed driver script crash on single-ended sequencing data
    • Improved error message with input argument is missing.
    • Fixed error on multisample VCF with --plotdir specified
    • #307 Added --useproperpair and --concordantreadpairdistribution driver script command line arguments
    • Driver script no longer defaults to extracting RP based on SAM flag
      • Fixes issue in which either too many or too few RP were extracted

2.8.0

17 Feb 01:42
Compare
Choose a tag to compare
  • Reverted to MATEID instead of PARID for the VCF breakpoint record pairing.
    • MATEID is the correct field to use according to the VCF specifications.
  • Added FIX_SA and FIX_MISSING_HARD_CLIP to gridss.ComputeSamTags
    • FIX_SA: rewrites split read SA tags
      • corrects GATK indel realignment SA tag data inconsistency
    • FIX_MISSING_HARD_CLIP: infers missing hard clipping if split read records have different read lengths
      • corrects for GATK indel realignment stripping hard clipping when realigning
    • GRIDSS log files should no longer be full of SA tag of read ********** refers to missing alignments warning messages!
    • There should be significantly fewer data inconsistencies when running on GATK indel realigned bams.
  • #291 Updated libraries to htsjdk 2.21.1 and picard 2.21.8
    • Improved CRAM support
  • #278 the nominal position breakpoint position at both breakend records is guaranteed to be the same
  • #293 gridss.GeneratePonBedpe now defaults to treating the first sample as the normal
  • #283 Validating steps command line argument. Fixed bug with "all"/"call" step parsing
  • gridss_somatic_filter.R now writes VCF header for all filters
  • #295 Added error message if using a very old samtools version
  • #296 gridss_annotate_insertions_repeatmasker.R now explicitly sets repeatmasker column types
    • Fixes crash reading a repeatmasker .fa.out file when using integer chromosome numbers without a chr prefix.
  • #292 gridss.SoftClipToSplitReads now soft clips alignments that align over the start or end of a chromosome
    • Fixes occasional crash during assembly realignment with older bwa versions
  • #287 assembly contig per-base support treats RP anchoring with no valid kmers treated as the anchoring read was ignored
    • Fixes crash when one of two reads in a read pair is shorter than 25bp