Releases: PapenfussLab/gridss
2.13.2
Bug fix release. Includes update to latest version of log4j
- Updated log4j transitive dependency to 2.17.1. Log4j is only used in the command-line parsing library
- Ignoring unmapped primary records when supplementary alignments exist
- supplementary record will treated as the primary alignment
- #560 removing any left-over .sv.tmp.bam files
- Improved determinism
- Removed unnecessary samtools merge munging of RG/PG headers
- Forcing deterministic ordering when merging SAM records from multiple input files
2.13.1
Bug fix release
- #550 not redirecting stderr when checking samtools version so samtools library errors aren't swallowed
- #551 passing picard options (such as VALIDATION_STRINGENCY) to all gridss steps
- #553 Only building in-process bwa index if it's actually going to be used
- #554
virusbreakend
auto-generating GRIDSS reference files for the host reference
Docker changes:
- Dropped gridss/gridss_minimal docker image support
- #535 added makeblastdb to image
- #549 using samtools 1.14
- Docker image generation now part of CI
- Added github action to publish docker image upon release creation
Special thanks to @keiranmraine, @scwatts and @lkhilton for bugfix code contributions
2.13.0
Breaking changes:
- Removed B prefixed from
ANRP
ANSR
ANRPQ
ANSRQ
fields.- Fields are now consistent with the GRIDSS convention that
B
prefixed fields are breakend fields
- Fields are now consistent with the GRIDSS convention that
- #520 #542 require samtools 1.13 or later
Significant changes:
- #537 reverting back to using external aligner as the default soft clip realigner
- Reverted due to:
- in-process bwa causing JVM crashes on some systems.
- In-process bwa alignments not deterministic when using more than 1 thread
- Added
--internalaligner
flag.- Useful if the slight performance boost during the preprocessing step is more important that determinism
- Reverted due to:
Minor changes:
- Added workaround for samtools/htsjdk#1584
- #520 @keiranmraine updated edirect install in Dockerfile
- #537 AnnotateInsertedSequence now respects --externalaligner command line argument
- Using fast compression for temporary files in the preprocess step
- Improved word wrapping in usage message
- #520 ensuring
java.io.tmpdir
exists before unpacking the JNI .so files - #530 moved ulimit change to before set errexit so all ulimit failures will be silent
- Updated gridsstools to htslib 1.14
- Updated gridsstools docker building to match 1.14 htslib requirement
- Bump gridsstools version due to htslib version update
- #538 using htslib/htslib_static.mk instead of hard-coding htslib system libraries
- Added (non-default) ReadWeighted scoring model option
2.12.2
Critical bug fix release
DO NOT USE THE PREVIOUS GRIDSS VERSION (version 2.12.1): the previous version contained a critical bug in the preprocessing step that removes all split reads.
All results from GRIDS 2.12.1 should be rerun - the assembly.bam and gridss.working directories must be deleted before rerunning
- #513 Fixed critical regression error
- The #513 fix in v2.12.1 incorrectly removed the
SA
tag instead of theaa
as intended - This resulted in all split read being treated as soft clipped reads. v2.12.1 outputs, assembly.bam, and gridss.working directories should be deleted and GRIDSS rerun
- The #513 fix in v2.12.1 incorrectly removed the
- #526 limiting parallel GC threads to match --threads
- #522
gridss
: required arguments now depends on which step is being run - #529
gridss_somatic_filter
: added error message when neither of --output and --fulloutput are specified - #531 GeneratePonBedpe: Fixed unnecessary imprecise call inclusion error message.
- #530 Not outputing time version if time is not being run
- #530 Continuing even when setting the ulimit fails
- #523 cleaning up any old samtools sort temp files
- Fixed VIRUSBreakend max coverage threshold not being calculated
- fixes OutOfMemory Error
- Increased heap size for gridss.VirusBreakendFilter
- fixes OutOfMemory Error
- Added
gridss.DumpReadSupport
to exporting all supporting evidence to BED/BEDPE
2.12.1
DO NOT USE: this release has a critical bug in the preprocessing step that removes all split reads
Minor feature and bug fix release.
A pre-built VIRUSBreakend database is now available from [here].(https://github.com/PapenfussLab/gridss/blob/master/VIRUSBreakend_Readme.md#reference-data).
GRIDSS
gridss
with no arguments now prints the usage message as intended- #511
AF
VCF FORMAT field now populated - #513 Added --skipsoftcliprealignment option
- Reduces runtime when using an aligner that already reports split reads (e.g. bwa)
- Ignoring imprecise deletion-like calls less than 500bp in size.
- Very high coverage samples have a high rate of small FP DEL calls caused by fragments slightly longer than the (99%) maximum expected fragment size
- Fixed non-deterministic assembly SAM tag evidence ordering
- #503 limiting error reproduction packages to 5.
- configurable with assembly.maximumReproductionExportPackages
- #503 stripping SAM "aa" tag from input reads
- Fixed crash bug in SAMRecordUtil.ByBestPrimarySplitCandidate
- Added libdeflate to gridsstools
- Added --no-PG to samtools sort so BAM outputs are deterministic
- #509 improved gridss_somatic_filter script path determination so it works in the Docker image
- Added LICENSE to release package to remove bioconda lint warning
VIRUSBreakend
2.12.0
New release to coincide with formal publication of VIRUSBreakend (https://doi.org/10.1093/bioinformatics/btab343).
** Temporary notice: Docker images are building/uploading. gridss conda package is in the process of being updated. This notice will be removed when the docker images and conda packages are available for usage.**
Critical packaging changes
.sh
and.R
suffix removed from all gridss tools.- You'll need to update your pipeline scripts or add symlinks (e.g.
ln -s gridss gridss.sh
)
- You'll need to update your pipeline scripts or add symlinks (e.g.
Previously | 2.12.0 |
---|---|
gridss.sh | gridss |
virusbreakend.sh | virusbreakend |
virusbreakend-build.sh | virusbreakend-build |
gridss_annotate_vcf_kraken2.sh | gridss_annotate_vcf_kraken2 |
gridss_annotate_vcf_repeatmasker.sh | gridss_annotate_vcf_repeatmasker |
gridss_extract_overlapping_fragments.sh | gridss_extract_overlapping_fragments |
gridss_somatic_filter.R | gridss_somatic_filter |
gridss.config.R | gridss.config.R |
libgridss.R | libgridss.R |
- gridss/gridss:latest Docker image now includes tools and dependencies for running all GRIDSS tools including VIRUSBreakend (#473)
- New gridss/gridss_minimal:latest docker image for running just
gridss
- New gridss/virusbreakend:latest docker image for running
virusbreakend
andvirusbreakend-build
VIRUSBreakend
- #484 Added VIRUSBreakend support for viral contigs in reference genome
- By default,
chrEBV
and all NCBI viral contigs are considered viral reference contigs - Fixes missing EBV due to inclusion of
NC_007605
inhs37d5
andchrEBV
inhs38DH
/GRCh38_full_analysis_set_plus_decoy_hla
- By default,
- Using default GRIDSS jvm heap size since Kraken2 needs more memory now that bacterial sequences are included
- Added
nodes.dmp
to VIRUSBreakend database so downstream tools can look up the virus name from the.summary.tsv
output - #446 Now immediately terminates instead of attempting to continue if one of the child processes does not complete successfully
- Expanded documentation
Other changes
- #494 Added support using the hmmer engine of RepeatMasker
- #488 regenerating index files if they are zero size
- Fixed
gridss
not exiting after usage message when no BAM supplied - Fixed broken
--working_dir
argument ofgridss_annotate_vcf_repeatmasker
- Fixed regression error causing assembly to be run even when the assembly output file already exists
- Releases no longer depend on local environment and are built using a multi-stage Dockerfile
- #489 removed unused
RF
VCF header - gridsscache files deleted if outdated
- #491 Updated dependencies
2.11.1
Bug fixes and a VIRUSBreakend database updates.
- Added bacteria and archaea to VIRUSBreakend db
- Fixes false positives caused by bacteria reads with short homology to viral genomes in the expanded viral database
- VIRUSBreakend database must be rebuilt.
- If you haven't deleted the kraken2 intermediate files, the rebuild can be done by running just the following steps:
kraken2-build --download-library bacteria --db $dbname
kraken2-build --download-library archaea--db $dbname
kraken2-build --threads $(nproc) --build --db $dbname
tar -czvf virusbreakend.db.$dbname.tar.gz $dbname/*.k2d $dbname/taxonomy/nodes.dmp $dbname/library/viral/*.fna* $dbname/library/added/*.fna* $dbname/taxid10239.nbr $dbname/seqid2taxid.map
- #474 #476 Fixed off-by-one crash error occurring when read length was an exact multiple of 32.
- #475 Fixed
--externalaligner
handling - #480 Aborting VIRUSBreakend upon gridsstools failure
- #480 clarified VIRUSBreakend gridsstools error message
- #478 Added defensive check to NonReferenceContigAssembler.SupportLookup
2.11.0
This release contains a new error correction step that improves assembly runtime, overhauls the VIRUSBreakend viral database, and bug fixes.
VIRUSBreakend databases need to be rebuilt.
- Changed minimum clip length threshold from 50bp back to 5bp
- Versions 2.10.x have lower sensitivity due to this regression. Rerunning with 2.11.0 is recommended
- This regression primarily impacted single breakend sensitivity (since as split reads will generally have at least one side with 50bp clipped)
- Improved assembly
- Now performing local error correction of reads prior to assembly.
- Improves assembly runtime and memory usage
- Complex assembly graph regions are now downsampled instead of being excluded from assembly
- Additional assembly performance optimisations
- Now performing local error correction of reads prior to assembly.
- VIRUSBreakend
- Extended database to include all NCBI viral neighbour sequences.
- Deuplicating viral references with same taxid based on total kmer hits to each viral reference
- Added LOW_MAPQ filter to ambiguous viral integrations
- Additional
.summary.tsv
output columns - Added QCstatus & direct taxid counts to summary output
- only extracting SC and OEA reads
- EXCESSIVE_VIRAL_COVERAGE & ASSEMBLY_ABORTED QC failure modes
- Hard filtering < 10% viral coverage
- Actually using precomputed metrics in GRIDSS calling
- New ubuntu 20 Dockerfile mostly based on @alexiswl #461
- Added error message pointing to the log file with the underlying cause of the error
- Added examples/annotate_most_likely_centromere.R
- #439 extended documentation of GRIDSS SAM tags
- #448 added gridsstools source code to release package so the bioconda recipe can rebuild from source
- #450 stabilising sort order in VCF output
- #450 setting bwa mem batch size (-K) to force deterministic behaviour
- #449 preventing underflow of contig bounds
- #444 fixed handling of
--useproperpair
- #463 fixed race in assembly bam header creation
- Internally treating flanking indels are clipping (e.g. 4I5M becomes 4S5M)
- Reads containing only clipping and I/D alignments are treated as unmapped (bwa occasionally reports alignments such as 36S30I85S)
- Added reference matching checks to start of GRIDSS
- Updated ComputeSamTags behaviour
- #438 added --otherjvmheap
- Fixed bug causing assembly recovery to fail if near (<50kbp) end of chromosome
2.10.2
- Added support for batched assembly #354 #397 #403 #406 #430
- Overhaul of VIRUSBreakend behaviour
- Added a OUTPUT.summary.csv file that is always written
- This file is only written if VIRUSBreakend was successful
- Provides useful stats such as viral presence, coverage, and number of integrations found
- increased assembly threshold cut-offs
- Better integration detection on samples with high (1000x+) viral coverage
- Removed Virus-Host DB dependency due to Non-Commerical licence restriction
- Now determining human virus status from NCBI virus host file
- This change requires a rebuild of the VIRUSBreakend database
- Maximum of 1 virus reported per genus
- Stops multiple viral strains/subtypes being reporting due to a small portion of reads being misclassified by kraken2
- Added a OUTPUT.summary.csv file that is always written
- Added scripts for reproduction of results in VIRUSBreakend manuscript
- Fixed crash bug in VirusBreakendFilter
- Added kraken version to VIRUSBreakend output
- Added check for existence of RepeatMasker
BEALN
now always replaces pipes in contig name with underscores- Removed unnecessary GKL library warning message
- Hard limit maximum assembly base quality to the max of 93 representable by Sanger format fastq #404
2.10.1
Incremental release fixing some minor issues with with 2.10.0 release
- #410 added locking on the setupreference step
- prevents bwa index corruption due to concurrent writing from two GRIDSS instances using the same reference
- virusbreakend-build.sh: creating sequence dictionaries so virusbreakend.sh never has to write to the virusbreakenddb directory
- virusbreakend.sh: reusing GRIDSS metrics from the full BAM if they already exist
- gridss_extract_overlapping_fragments.sh/virusbreakend.sh: sampling 10M reads when calculating metrics.
- sampling only 1M reads was resulting in telomeric bias and underreporting of QUAL scores on 100x data
- virusbreakend.sh: reduced heap size when running gridss.sh to 13g so it fits in a 4core/16gb VM