Releases: ComparativeGenomicsToolkit/cactus
Cactus 2.8.4 2024-06-21
Cactus 2.8.4 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.8.4
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.8.4-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.8.4.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.8.4.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.8.4.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release updates vcfbub
in order to fix a longstanding issue where this tool can produce invalid VCFs.
vcfbub
updated tov0.1.1
which resolves a bub where records could be missing columns in the presence of.
genotypes- run
bcftools view
as sanity check on generated VCFs to prevent various normalization steps from ever silently producing invalid output.
Cactus 2.8.3 2024-06-12
Cactus 2.8.3 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.8.3
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.8.3-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.8.3.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.8.3.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.8.3.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
Note: The gpu docker image was built using this patch that bumps the Ubuntu version from 20.04 to 22.04.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Important note about installing on Python3.8: You may need to run python3 -m pip install backports.zoneinfo
Release Notes
This release fixes some bugs and updates to the latest Toil.
- Fix broken
--restart
option incactus-graphmap
- Raise Toil job memory requirement for
filter-paf-deletions
- Update to
vg
v1.57.0 - Update
Toil
to v7.0 - Fix bug where trim-outgroups job could requeset way too little memory when there are no outgroups
- Fix typo that broke
cactus-maf2bigmaf
on uncompressed inputs - More robust implementation of
vcfwave
- Fix bug where RED preprocessing crashed
awk
returned a number in scientific notation
Cactus 2.8.2 2024-05-09
Cactus 2.8.2 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.8.2
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.8.2-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.8.2.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.8.2.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.8.2.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release fixes some bugs and adds a (docker-only) vcfwave
normalization option for pangenomes.
- Use correct
bigChain.as
that allows chain scores to be huge (instead of capping them) - Update
odgi
,vg
,abPOA
andtaffy
to their latest releases - Fix
cactus-hal2maf
andcactus-pangenome --odgi
to work with--binariesMode docker
bcftools norm -f
now run by default on all non-raw VCF outputs (toggle off in the config)vcfwave
normalization option added for pangenomes (to mimic what was done for release HPRC graphs). Note thatvcfwave
is not included in the binary release -- you need to use the Cactus docker or build it yourself.- Minigraph fasta file renamed from
.gfa.fa
to.sv.gfa.fa
to be less confusing - Gap and empty MAF block filtering moved from
cactus-hal2maf
tocactus-maf2bigmaf
. So MAF output will now have a reference base for every position. - Fix
cactus-preprocess
to do only RED masking by default (there was previously a bug where it ran RED then lastz after). The--maskMode
option is also fixed to work properly. - Update to Toil 6.1.0
Cactus 2.8.1 2024-04-04
Cactus 2.8.1 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.8.1
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.8.1-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.8.1.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.8.1.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.8.1.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release patches some recent bugs, including a major bug introduced in cactus-hal2maf
in v2.7.2 that could produce negative-stranded (and out of order) reference rows.
- Do not apply RED masker to contigs that are likely to crash it (tiny contigs and extremely low information contigs)
- Add
--coverage
option tocactus-hal2maf
to include table of coverage statistics in the output - Fix bug where
:start-end
contig suffixes caused the pangenome pipeline to crash. They are now correctly handled as subranges - Turn off
abPOA
seeding by default, after finding (what must be a fairly rare) case where it doesn't work. - Improve
cactus-hal2chains
interface - Add range support to
cactus-hal2maf
via--start/--length
or--bedRanges
- Deprecate
cactus-maf2bigmaf --chromSizes
(use--halFile
instead, as it handles "."s in genome names properly) - Fix bug where reference row could be lost in
cactus-hal2maf
MAF due to sorting error.
Cactus 2.8.0 2024-03-13
Cactus 2.8.0 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.8.0
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.8.0-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.8.0.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.8.0.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.8.0.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release significantly changes the preprocessor step of Progressive Cactus in order to be more robust and efficient in the presence of unmasked repeats, something that seems more prevalent with newer, T2T assemblies.
- Replace lastz repeatmasking with REepeat Detector (RED) in the Progressive Cactus preprocessor. RED is more sensitive and orders of magnitude faster than the old lastz masking pipeline. Crucially, it is able to mask regions that would slip by RepeatMasker/WindowMasker/lastz in new T2T ape genomes that would otherwise break Cactus downstream. Tests so far show this change to make Cactus much faster and more robust. The old lastz pipeline can still be toggled back on in the config.
- Delete many unneeded files that previously collected in the jobstore directory until the end of execution. This was a particular issue in large
cactus-pangenome
runs where the jobstore would creep up to several terabytes for HPRC-sized inputs. - No longer require manually editing the blast chunksize in the config when running on Slurm (to reduce the number of jobs). It's now scaled up automatically on slurm environments (by a factor controlled in the config).
- Fix bug introduced in last release where Cactus would not work on AWS/MESOS clusters unless
--defaultMemory
and--maxMemory
options were specified (and in bytes). - Update to the latest
taffy
andvg
Cactus 2.7.2 2024-02-23
2024/03/11 NOTE: this version does not work on AWS clusters -- use the previous or next release if specifying --batchSystem mesos --provisioner aws
Cactus 2.7.2 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.7.2
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.7.2-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.7.2.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.7.2.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.7.2.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release improves MAF output, along with some other fixes
--maxMemory
option given more teeth. It is now used to clamp most large Toil jobs. On single-machine it defaults to system memory. This should prevent errors where Toil requrests more memory than available, halting the pipeline in an un-resumable state.- Update to latest
taffy
and use newer MAF normalization. This should result in larger blocks and fewer gaps. MAF rows will now be sorted phylogenetically rather than alphabetically - Better handle
.
characters in genome names during MAF processing. Previously neither duplicate filtering nor bigmaf summary creation could handle dots, but that should be fixed now. - Duplicate filtering now done automatically in
cactus-maf2bigmaf
. - Disable support for multifurcations (aka polytomies or internal nodes with more than 2 children) in Progressive Cactus. I'm doing this because I got spooked by a drop in coverage I noticed recently in a 4-child alignment. This regression appears to be linked to the new PAF chaining logic that's been added over the past several months. Until that's resolved, Cactus will exit with an error if it sees degree > 2 in the tree. This behaviour can, however, be overridden in the XML configuration file.
Cactus 2.7.1 2024-01-19
Cactus 2.7.1 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.7.1
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.7.1-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.7.1.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.7.1.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.7.1.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release adds some options to tune outgroup selection, as well as updates many included dependencies and tools
- Add
--chromInfo
option to specify sex chromosomes of input genomes, in order to make sure outgroups are selected accordingly - Add
--maxOutgroups
option so that the number of outgroups can be toggled via the command line (previously required using a modified configuration file). - Update to Toil 6.0.0
- Update to vg 1.54.0
- Update to odgi 0.8.4
- Update to latest taffy (fixing bug in paf export)
- Update to abPOA 1.5.1
- Fix Dockerfile so that Phast binaries are included
--indexMemory
now acts as upper limit on chromosome-level jobs.
Cactus 2.7.0 2023-12-05
Cactus 2.7.0 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.7.0
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.7.0-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.7.0.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.7.0.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.7.0.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release changes how outgroups are used during chaining during progressive alignment, and adds some pangenome options
- Add
--xg
option forxg
pangenome output. - Add (experimental)
cactus-pagnenome --noSplit
option in order to bypass reference chromosome splitting. This was previously only possible by running step-by-step and not usingcactus-graphmap-join
. - Add pangenome tutorial (developed for recent hackathon) to documentation.
- Update to
vg
version 1.53.0. - Updated local alignment selection criteria. At each internal node of the guide tree Cactus picks a set of pairwise local alignments between the genomes being aligned to construct an initial sequence graph representing the whole genome alignment. This sequence graph is then refined and an ancestral sequence inferred to complete the alignment process for the internal node. The pairwise local alignments are generated with LASTZ (or SegAlign if using the GPU mode). To create a reliable subset of local alignments Cactus employs a chaining process that organizes the pairwise local alignments into pairwise chains of syntenic alignments, using a process akin to the chains and nets procedure used by the UCSC Browser. Previously, each genome being aligned, including both ingroup and outgroup genomes, was used to select a set of primary chains. That is, for each genome sequence non-overlapping chains of pairwise alignments were chosen, each of which could be to any of the other genomes in the set. Only these primary chains were then fed into the Cactus process to construct the sequence graph. This heuristic works reasonably well, in effect it allows each subsequence to choose a sequence in another genome with which it shares a most recent common ancestor. In the new, updated version we tweak this process slightly to avoid rare edge cases. Now each sequence in each ingroup genome picks primary chains only to other ingroup genomes. Thus the set of primary chains for ingroup genomes does not include any outgroup alignments. The outgroup genomes then get to pick primary chains to the ingroups, effectively voting on which parts of the ingroups they are syntenic too. The result of this change is that the outgroups are effectively only used to determine ancestral orderings and do not ever prevent the syntenic portions of two ingroups from aligning together.
Cactus 2.6.13 2023-11-15
Cactus 2.6.13 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.6.13
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.6.13-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.6.13.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.6.13.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.6.13.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
This release fixes an issue where Toil can ask for way too much memory for minigraph construction
- Cut default minigraph construction memory estimate by half
- Add
--mgMemory
option to override minigraph construction memory estimate no matter what - Exit with a clear error message (instead of more cryptic crash) when user tries to run container binaries in a container
- Fix double Toil delete that seems to cause fatal error in some environments
- Fix
gfaffix
regular expression bug that could cause paths other than the reference to be protoected from collapse.
Cactus 2.6.12 2023-11-07
Cactus 2.6.12 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.6.12
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.6.12-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.6.12.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures Linux Tarball: cactus-bin-legacy-v2.6.12.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.6.12.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (and, since version 2.3.1, support Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release Notes
The release fixes some recent regressions:
- Include more portable (at least on Ubuntu)
gfaffix
binary. - Fix error where gpu support on singularity is completely broken.
- Fix
export_hal
andexport_vg
job memory estimates when--consMemory
not provided.