Skip to content

Releases: ComparativeGenomicsToolkit/cactus

Cactus 2.8.4 2024-06-21

21 Jun 19:14
d62e175
Compare
Choose a tag to compare

Cactus 2.8.4 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release updates vcfbub in order to fix a longstanding issue where this tool can produce invalid VCFs.

  • vcfbub updated to v0.1.1 which resolves a bub where records could be missing columns in the presence of . genotypes
  • run bcftools view as sanity check on generated VCFs to prevent various normalization steps from ever silently producing invalid output.

Cactus 2.8.3 2024-06-12

12 Jun 23:06
d37fce4
Compare
Choose a tag to compare

Cactus 2.8.3 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

Note: The gpu docker image was built using this patch that bumps the Ubuntu version from 20.04 to 22.04.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Important note about installing on Python3.8: You may need to run python3 -m pip install backports.zoneinfo

Release Notes

This release fixes some bugs and updates to the latest Toil.

  • Fix broken --restart option in cactus-graphmap
  • Raise Toil job memory requirement for filter-paf-deletions
  • Update to vg v1.57.0
  • Update Toil to v7.0
  • Fix bug where trim-outgroups job could requeset way too little memory when there are no outgroups
  • Fix typo that broke cactus-maf2bigmaf on uncompressed inputs
  • More robust implementation of vcfwave
  • Fix bug where RED preprocessing crashed awk returned a number in scientific notation

Cactus 2.8.2 2024-05-09

09 May 13:29
2126683
Compare
Choose a tag to compare

Cactus 2.8.2 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release fixes some bugs and adds a (docker-only) vcfwave normalization option for pangenomes.

  • Use correct bigChain.as that allows chain scores to be huge (instead of capping them)
  • Update odgi, vg, abPOA and taffy to their latest releases
  • Fix cactus-hal2maf and cactus-pangenome --odgi to work with --binariesMode docker
  • bcftools norm -f now run by default on all non-raw VCF outputs (toggle off in the config)
  • vcfwave normalization option added for pangenomes (to mimic what was done for release HPRC graphs). Note that vcfwave is not included in the binary release -- you need to use the Cactus docker or build it yourself.
  • Minigraph fasta file renamed from .gfa.fa to .sv.gfa.fa to be less confusing
  • Gap and empty MAF block filtering moved from cactus-hal2maf to cactus-maf2bigmaf. So MAF output will now have a reference base for every position.
  • Fix cactus-preprocess to do only RED masking by default (there was previously a bug where it ran RED then lastz after). The --maskMode option is also fixed to work properly.
  • Update to Toil 6.1.0

Cactus 2.8.1 2024-04-04

04 Apr 20:47
28a6e2c
Compare
Choose a tag to compare

Cactus 2.8.1 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release patches some recent bugs, including a major bug introduced in cactus-hal2maf in v2.7.2 that could produce negative-stranded (and out of order) reference rows.

  • Do not apply RED masker to contigs that are likely to crash it (tiny contigs and extremely low information contigs)
  • Add --coverage option to cactus-hal2maf to include table of coverage statistics in the output
  • Fix bug where :start-end contig suffixes caused the pangenome pipeline to crash. They are now correctly handled as subranges
  • Turn off abPOA seeding by default, after finding (what must be a fairly rare) case where it doesn't work.
  • Improve cactus-hal2chains interface
  • Add range support to cactus-hal2maf via --start/--length or --bedRanges
  • Deprecate cactus-maf2bigmaf --chromSizes (use --halFile instead, as it handles "."s in genome names properly)
  • Fix bug where reference row could be lost in cactus-hal2maf MAF due to sorting error.

Cactus 2.8.0 2024-03-13

13 Mar 17:16
7286b49
Compare
Choose a tag to compare

Cactus 2.8.0 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release significantly changes the preprocessor step of Progressive Cactus in order to be more robust and efficient in the presence of unmasked repeats, something that seems more prevalent with newer, T2T assemblies.

  • Replace lastz repeatmasking with REepeat Detector (RED) in the Progressive Cactus preprocessor. RED is more sensitive and orders of magnitude faster than the old lastz masking pipeline. Crucially, it is able to mask regions that would slip by RepeatMasker/WindowMasker/lastz in new T2T ape genomes that would otherwise break Cactus downstream. Tests so far show this change to make Cactus much faster and more robust. The old lastz pipeline can still be toggled back on in the config.
  • Delete many unneeded files that previously collected in the jobstore directory until the end of execution. This was a particular issue in large cactus-pangenome runs where the jobstore would creep up to several terabytes for HPRC-sized inputs.
  • No longer require manually editing the blast chunksize in the config when running on Slurm (to reduce the number of jobs). It's now scaled up automatically on slurm environments (by a factor controlled in the config).
  • Fix bug introduced in last release where Cactus would not work on AWS/MESOS clusters unless --defaultMemory and --maxMemory options were specified (and in bytes).
  • Update to the latest taffy and vg

Cactus 2.7.2 2024-02-23

23 Feb 18:08
41f4a3d
Compare
Choose a tag to compare

2024/03/11 NOTE: this version does not work on AWS clusters -- use the previous or next release if specifying --batchSystem mesos --provisioner aws

Cactus 2.7.2 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release improves MAF output, along with some other fixes

  • --maxMemory option given more teeth. It is now used to clamp most large Toil jobs. On single-machine it defaults to system memory. This should prevent errors where Toil requrests more memory than available, halting the pipeline in an un-resumable state.
  • Update to latest taffy and use newer MAF normalization. This should result in larger blocks and fewer gaps. MAF rows will now be sorted phylogenetically rather than alphabetically
  • Better handle . characters in genome names during MAF processing. Previously neither duplicate filtering nor bigmaf summary creation could handle dots, but that should be fixed now.
  • Duplicate filtering now done automatically in cactus-maf2bigmaf.
  • Disable support for multifurcations (aka polytomies or internal nodes with more than 2 children) in Progressive Cactus. I'm doing this because I got spooked by a drop in coverage I noticed recently in a 4-child alignment. This regression appears to be linked to the new PAF chaining logic that's been added over the past several months. Until that's resolved, Cactus will exit with an error if it sees degree > 2 in the tree. This behaviour can, however, be overridden in the XML configuration file.

Cactus 2.7.1 2024-01-19

19 Jan 20:14
1771495
Compare
Choose a tag to compare

Cactus 2.7.1 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release adds some options to tune outgroup selection, as well as updates many included dependencies and tools

  • Add --chromInfo option to specify sex chromosomes of input genomes, in order to make sure outgroups are selected accordingly
  • Add --maxOutgroups option so that the number of outgroups can be toggled via the command line (previously required using a modified configuration file).
  • Update to Toil 6.0.0
  • Update to vg 1.54.0
  • Update to odgi 0.8.4
  • Update to latest taffy (fixing bug in paf export)
  • Update to abPOA 1.5.1
  • Fix Dockerfile so that Phast binaries are included
  • --indexMemory now acts as upper limit on chromosome-level jobs.

Cactus 2.7.0 2023-12-05

05 Dec 23:51
48410bd
Compare
Choose a tag to compare

Cactus 2.7.0 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release changes how outgroups are used during chaining during progressive alignment, and adds some pangenome options

  • Add --xg option for xg pangenome output.
  • Add (experimental) cactus-pagnenome --noSplit option in order to bypass reference chromosome splitting. This was previously only possible by running step-by-step and not using cactus-graphmap-join.
  • Add pangenome tutorial (developed for recent hackathon) to documentation.
  • Update to vg version 1.53.0.
  • Updated local alignment selection criteria. At each internal node of the guide tree Cactus picks a set of pairwise local alignments between the genomes being aligned to construct an initial sequence graph representing the whole genome alignment. This sequence graph is then refined and an ancestral sequence inferred to complete the alignment process for the internal node. The pairwise local alignments are generated with LASTZ (or SegAlign if using the GPU mode). To create a reliable subset of local alignments Cactus employs a chaining process that organizes the pairwise local alignments into pairwise chains of syntenic alignments, using a process akin to the chains and nets procedure used by the UCSC Browser. Previously, each genome being aligned, including both ingroup and outgroup genomes, was used to select a set of primary chains. That is, for each genome sequence non-overlapping chains of pairwise alignments were chosen, each of which could be to any of the other genomes in the set. Only these primary chains were then fed into the Cactus process to construct the sequence graph. This heuristic works reasonably well, in effect it allows each subsequence to choose a sequence in another genome with which it shares a most recent common ancestor. In the new, updated version we tweak this process slightly to avoid rare edge cases. Now each sequence in each ingroup genome picks primary chains only to other ingroup genomes. Thus the set of primary chains for ingroup genomes does not include any outgroup alignments. The outgroup genomes then get to pick primary chains to the ingroups, effectively voting on which parts of the ingroups they are syntenic too. The result of this change is that the outgroups are effectively only used to determine ancestral orderings and do not ever prevent the syntenic portions of two ingroups from aligning together.

Cactus 2.6.13 2023-11-15

15 Nov 15:12
d6fa7c4
Compare
Choose a tag to compare

Cactus 2.6.13 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

This release fixes an issue where Toil can ask for way too much memory for minigraph construction

  • Cut default minigraph construction memory estimate by half
  • Add --mgMemory option to override minigraph construction memory estimate no matter what
  • Exit with a clear error message (instead of more cryptic crash) when user tries to run container binaries in a container
  • Fix double Toil delete that seems to cause fatal error in some environments
  • Fix gfaffix regular expression bug that could cause paths other than the reference to be protoected from collapse.

Cactus 2.6.12 2023-11-07

07 Nov 18:23
368b8df
Compare
Choose a tag to compare

Cactus 2.6.12 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release Notes

The release fixes some recent regressions:

  • Include more portable (at least on Ubuntu) gfaffix binary.
  • Fix error where gpu support on singularity is completely broken.
  • Fix export_hal and export_vg job memory estimates when --consMemory not provided.