Skip to content

Releases: PapenfussLab/gridss

2.13.2

07 Feb 00:55
Compare
Choose a tag to compare

Bug fix release. Includes update to latest version of log4j

  • Updated log4j transitive dependency to 2.17.1. Log4j is only used in the command-line parsing library
  • Ignoring unmapped primary records when supplementary alignments exist
    • supplementary record will treated as the primary alignment
  • #560 removing any left-over .sv.tmp.bam files
  • Improved determinism
    • Removed unnecessary samtools merge munging of RG/PG headers
    • Forcing deterministic ordering when merging SAM records from multiple input files

2.13.1

23 Dec 09:21
Compare
Choose a tag to compare

Bug fix release

  • #550 not redirecting stderr when checking samtools version so samtools library errors aren't swallowed
  • #551 passing picard options (such as VALIDATION_STRINGENCY) to all gridss steps
  • #553 Only building in-process bwa index if it's actually going to be used
  • #554 virusbreakend auto-generating GRIDSS reference files for the host reference

Docker changes:

  • Dropped gridss/gridss_minimal docker image support
  • #535 added makeblastdb to image
  • #549 using samtools 1.14
  • Docker image generation now part of CI
  • Added github action to publish docker image upon release creation

Special thanks to @keiranmraine, @scwatts and @lkhilton for bugfix code contributions

2.13.0

05 Dec 08:40
Compare
Choose a tag to compare

Breaking changes:

  • Removed B prefixed from ANRP ANSR ANRPQ ANSRQ fields.
    • Fields are now consistent with the GRIDSS convention that B prefixed fields are breakend fields
  • #520 #542 require samtools 1.13 or later

Significant changes:

  • #537 reverting back to using external aligner as the default soft clip realigner
    • Reverted due to:
      • in-process bwa causing JVM crashes on some systems.
      • In-process bwa alignments not deterministic when using more than 1 thread
    • Added --internalaligner flag.
      • Useful if the slight performance boost during the preprocessing step is more important that determinism

Minor changes:

  • Added workaround for samtools/htsjdk#1584
  • #520 @keiranmraine updated edirect install in Dockerfile
  • #537 AnnotateInsertedSequence now respects --externalaligner command line argument
  • Using fast compression for temporary files in the preprocess step
  • Improved word wrapping in usage message
  • #520 ensuring java.io.tmpdir exists before unpacking the JNI .so files
  • #530 moved ulimit change to before set errexit so all ulimit failures will be silent
  • Updated gridsstools to htslib 1.14
  • Updated gridsstools docker building to match 1.14 htslib requirement
  • Bump gridsstools version due to htslib version update
  • #538 using htslib/htslib_static.mk instead of hard-coding htslib system libraries
  • Added (non-default) ReadWeighted scoring model option

2.12.2

14 Oct 08:27
Compare
Choose a tag to compare

Critical bug fix release

DO NOT USE THE PREVIOUS GRIDSS VERSION (version 2.12.1): the previous version contained a critical bug in the preprocessing step that removes all split reads.

All results from GRIDS 2.12.1 should be rerun - the assembly.bam and gridss.working directories must be deleted before rerunning

  • #513 Fixed critical regression error
    • The #513 fix in v2.12.1 incorrectly removed the SA tag instead of the aa as intended
    • This resulted in all split read being treated as soft clipped reads. v2.12.1 outputs, assembly.bam, and gridss.working directories should be deleted and GRIDSS rerun
  • #526 limiting parallel GC threads to match --threads
  • #522 gridss: required arguments now depends on which step is being run
  • #529 gridss_somatic_filter: added error message when neither of --output and --fulloutput are specified
  • #531 GeneratePonBedpe: Fixed unnecessary imprecise call inclusion error message.
  • #530 Not outputing time version if time is not being run
  • #530 Continuing even when setting the ulimit fails
  • #523 cleaning up any old samtools sort temp files
  • Fixed VIRUSBreakend max coverage threshold not being calculated
  • fixes OutOfMemory Error
  • Increased heap size for gridss.VirusBreakendFilter
  • fixes OutOfMemory Error
  • Added gridss.DumpReadSupport to exporting all supporting evidence to BED/BEDPE

2.12.1

11 Aug 12:45
Compare
Choose a tag to compare

DO NOT USE: this release has a critical bug in the preprocessing step that removes all split reads

Minor feature and bug fix release.

A pre-built VIRUSBreakend database is now available from [here].(https://github.com/PapenfussLab/gridss/blob/master/VIRUSBreakend_Readme.md#reference-data).

GRIDSS

  • gridss with no arguments now prints the usage message as intended
  • #511 AF VCF FORMAT field now populated
  • #513 Added --skipsoftcliprealignment option
    • Reduces runtime when using an aligner that already reports split reads (e.g. bwa)
  • Ignoring imprecise deletion-like calls less than 500bp in size.
    • Very high coverage samples have a high rate of small FP DEL calls caused by fragments slightly longer than the (99%) maximum expected fragment size
  • Fixed non-deterministic assembly SAM tag evidence ordering
  • #503 limiting error reproduction packages to 5.
    • configurable with assembly.maximumReproductionExportPackages
  • #503 stripping SAM "aa" tag from input reads
  • Fixed crash bug in SAMRecordUtil.ByBestPrimarySplitCandidate
  • Added libdeflate to gridsstools
  • Added --no-PG to samtools sort so BAM outputs are deterministic
  • #509 improved gridss_somatic_filter script path determination so it works in the Docker image
  • Added LICENSE to release package to remove bioconda lint warning

VIRUSBreakend

  • #508 using RefSeq viral genome whenever available
  • #504 Fixed rname header separator in VIRUSBreakend .summary.tsv output
  • #499 running gridss preprocess so annotation doesn't fail
  • #499 Including edirect setup step in docker image
  • #499 added bcftools to the Docker image

2.12.0

21 May 07:36
Compare
Choose a tag to compare

New release to coincide with formal publication of VIRUSBreakend (https://doi.org/10.1093/bioinformatics/btab343).

** Temporary notice: Docker images are building/uploading. gridss conda package is in the process of being updated. This notice will be removed when the docker images and conda packages are available for usage.**

Critical packaging changes

  • .sh and .R suffix removed from all gridss tools.
    • You'll need to update your pipeline scripts or add symlinks (e.g. ln -s gridss gridss.sh)
Previously 2.12.0
gridss.sh gridss
virusbreakend.sh virusbreakend
virusbreakend-build.sh virusbreakend-build
gridss_annotate_vcf_kraken2.sh gridss_annotate_vcf_kraken2
gridss_annotate_vcf_repeatmasker.sh gridss_annotate_vcf_repeatmasker
gridss_extract_overlapping_fragments.sh gridss_extract_overlapping_fragments
gridss_somatic_filter.R gridss_somatic_filter
gridss.config.R gridss.config.R
libgridss.R libgridss.R
  • gridss/gridss:latest Docker image now includes tools and dependencies for running all GRIDSS tools including VIRUSBreakend (#473)
  • New gridss/gridss_minimal:latest docker image for running just gridss
  • New gridss/virusbreakend:latest docker image for running virusbreakend and virusbreakend-build

VIRUSBreakend

  • #484 Added VIRUSBreakend support for viral contigs in reference genome
    • By default, chrEBV and all NCBI viral contigs are considered viral reference contigs
    • Fixes missing EBV due to inclusion of NC_007605 in hs37d5 and chrEBV in hs38DH/GRCh38_full_analysis_set_plus_decoy_hla
  • Using default GRIDSS jvm heap size since Kraken2 needs more memory now that bacterial sequences are included
  • Added nodes.dmp to VIRUSBreakend database so downstream tools can look up the virus name from the .summary.tsv output
  • #446 Now immediately terminates instead of attempting to continue if one of the child processes does not complete successfully
  • Expanded documentation

Other changes

  • #494 Added support using the hmmer engine of RepeatMasker
  • #488 regenerating index files if they are zero size
  • Fixed gridss not exiting after usage message when no BAM supplied
  • Fixed broken --working_dir argument of gridss_annotate_vcf_repeatmasker
  • Fixed regression error causing assembly to be run even when the assembly output file already exists
  • Releases no longer depend on local environment and are built using a multi-stage Dockerfile
  • #489 removed unused RF VCF header
  • gridsscache files deleted if outdated
  • #491 Updated dependencies

2.11.1

30 Mar 23:38
Compare
Choose a tag to compare

Bug fixes and a VIRUSBreakend database updates.

  • Added bacteria and archaea to VIRUSBreakend db
    • Fixes false positives caused by bacteria reads with short homology to viral genomes in the expanded viral database
    • VIRUSBreakend database must be rebuilt.
    • If you haven't deleted the kraken2 intermediate files, the rebuild can be done by running just the following steps:
      • kraken2-build --download-library bacteria --db $dbname
      • kraken2-build --download-library archaea--db $dbname
      • kraken2-build --threads $(nproc) --build --db $dbname
      • tar -czvf virusbreakend.db.$dbname.tar.gz $dbname/*.k2d $dbname/taxonomy/nodes.dmp $dbname/library/viral/*.fna* $dbname/library/added/*.fna* $dbname/taxid10239.nbr $dbname/seqid2taxid.map
  • #474 #476 Fixed off-by-one crash error occurring when read length was an exact multiple of 32.
  • #475 Fixed --externalaligner handling
  • #480 Aborting VIRUSBreakend upon gridsstools failure
  • #480 clarified VIRUSBreakend gridsstools error message
  • #478 Added defensive check to NonReferenceContigAssembler.SupportLookup

2.11.0

17 Mar 04:33
Compare
Choose a tag to compare

This release contains a new error correction step that improves assembly runtime, overhauls the VIRUSBreakend viral database, and bug fixes.

VIRUSBreakend databases need to be rebuilt.

  • Changed minimum clip length threshold from 50bp back to 5bp
    • Versions 2.10.x have lower sensitivity due to this regression. Rerunning with 2.11.0 is recommended
    • This regression primarily impacted single breakend sensitivity (since as split reads will generally have at least one side with 50bp clipped)
  • Improved assembly
    • Now performing local error correction of reads prior to assembly.
      • Improves assembly runtime and memory usage
    • Complex assembly graph regions are now downsampled instead of being excluded from assembly
    • Additional assembly performance optimisations
  • VIRUSBreakend
    • Extended database to include all NCBI viral neighbour sequences.
    • Deuplicating viral references with same taxid based on total kmer hits to each viral reference
    • Added LOW_MAPQ filter to ambiguous viral integrations
    • Additional .summary.tsv output columns
    • Added QCstatus & direct taxid counts to summary output
    • only extracting SC and OEA reads
    • EXCESSIVE_VIRAL_COVERAGE & ASSEMBLY_ABORTED QC failure modes
    • Hard filtering < 10% viral coverage
    • Actually using precomputed metrics in GRIDSS calling
  • New ubuntu 20 Dockerfile mostly based on @alexiswl #461
  • Added error message pointing to the log file with the underlying cause of the error
  • Added examples/annotate_most_likely_centromere.R
  • #439 extended documentation of GRIDSS SAM tags
  • #448 added gridsstools source code to release package so the bioconda recipe can rebuild from source
  • #450 stabilising sort order in VCF output
  • #450 setting bwa mem batch size (-K) to force deterministic behaviour
  • #449 preventing underflow of contig bounds
  • #444 fixed handling of --useproperpair
  • #463 fixed race in assembly bam header creation
  • Internally treating flanking indels are clipping (e.g. 4I5M becomes 4S5M)
    • Reads containing only clipping and I/D alignments are treated as unmapped (bwa occasionally reports alignments such as 36S30I85S)
  • Added reference matching checks to start of GRIDSS
  • Updated ComputeSamTags behaviour
    • Dropping excessively overlapping split read alignments (contained alignment & >25bp overlaps)
    • #282 Add summary stats to ComputeSamTags/PreprocessForBreakendAssembly output
    • #363 preferring non-ALT alignments when determining best split reads
    • Added summary output file containing tag changes
  • #438 added --otherjvmheap
  • Fixed bug causing assembly recovery to fail if near (<50kbp) end of chromosome

2.10.2

10 Dec 05:01
Compare
Choose a tag to compare
  • Added support for batched assembly #354 #397 #403 #406 #430
  • Overhaul of VIRUSBreakend behaviour
    • Added a OUTPUT.summary.csv file that is always written
      • This file is only written if VIRUSBreakend was successful
      • Provides useful stats such as viral presence, coverage, and number of integrations found
    • increased assembly threshold cut-offs
      • Better integration detection on samples with high (1000x+) viral coverage
    • Removed Virus-Host DB dependency due to Non-Commerical licence restriction
      • Now determining human virus status from NCBI virus host file
      • This change requires a rebuild of the VIRUSBreakend database
    • Maximum of 1 virus reported per genus
      • Stops multiple viral strains/subtypes being reporting due to a small portion of reads being misclassified by kraken2
  • Added scripts for reproduction of results in VIRUSBreakend manuscript
  • Fixed crash bug in VirusBreakendFilter
  • Added kraken version to VIRUSBreakend output
  • Added check for existence of RepeatMasker
  • BEALN now always replaces pipes in contig name with underscores
  • Removed unnecessary GKL library warning message
  • Hard limit maximum assembly base quality to the max of 93 representable by Sanger format fastq #404

2.10.1

20 Oct 01:47
Compare
Choose a tag to compare

Incremental release fixing some minor issues with with 2.10.0 release

  • #410 added locking on the setupreference step
    • prevents bwa index corruption due to concurrent writing from two GRIDSS instances using the same reference
  • virusbreakend-build.sh: creating sequence dictionaries so virusbreakend.sh never has to write to the virusbreakenddb directory
  • virusbreakend.sh: reusing GRIDSS metrics from the full BAM if they already exist
  • gridss_extract_overlapping_fragments.sh/virusbreakend.sh: sampling 10M reads when calculating metrics.
    • sampling only 1M reads was resulting in telomeric bias and underreporting of QUAL scores on 100x data
  • virusbreakend.sh: reduced heap size when running gridss.sh to 13g so it fits in a 4core/16gb VM