ANIMA_BOOSTER (BSS) is a high-performance optimization suite for the Anima DiT 2B model in ComfyUI. It is designed to deliver maximum performance, reduce VRAM usage, and accelerate generation speeds on NVIDIA graphics cards.
Important
Author and Developer: blacksnowskill (BSS) © 2026 blacksnowskill (BSS). All rights reserved. This project is protected by copyright. Any unauthorized copying, modification without attribution, or representing this code as your own product is strictly prohibited.
This package allows you to achieve a total acceleration of 3.5× to 5.0× compared to the default Anima workflow in ComfyUI, with no noticeable loss in visual quality.
Tip
Ultimate Quality & Detail Recovery:
While extreme optimization can sometimes lead to a slight loss in micro-details, we have a perfect solution for that. We highly recommend pairing ANIMA_BOOSTER with our companion node FLSampler (BSS). FLS perfectly restores any lost details without sacrificing your speed gains, producing even sharper and more coherent details than the original unoptimized model!
- Integrated JIT Compilation (
torch.compile): Safe, one-click compilation of DiT blocks built directly into the loaders. Runs on the stableinductorbackend without CUDA Graphs, ensuring 100% stability and a speed boost of 20% to 40%. - SageAttention: Built-in support for ultra-fast 8-bit attention tailored for DiT models, significantly accelerating computations while reducing memory consumption.
- Adaptive TeaCache: Intelligent latent state caching. Dynamically adjusts the caching threshold: automatically lowering it in early steps to preserve geometry, and raising it in later steps for maximum acceleration.
- BSS Premium UI: An integrated, high-contrast dark theme for the package's nodes. Features a fully redesigned full-width slider control, adaptive visibility for inactive preset inputs, and complete suppression of intrusive tooltips for a distraction-free experience.
| Configuration | Average Acceleration | Description |
|---|---|---|
| fp16 Only | 1.4× | Baseline precision optimization |
| fp16 + SageAttention | 1.8–2.5× | Ultra-fast 8-bit attention for DiT |
| fp16 + Adaptive TeaCache | 1.5–2.0× | Intelligent step caching |
| fp16 + SageAttn + TeaCache | 🚀 2.5–3.5× | Perfect balance of speed and quality |
| + Integrated torch.compile | 💎 3.5–5.0× | Maximum performance boost (after a 2-3 step warm-up phase) |
- Open ComfyUI and click on the Manager button.
- Click Install via Git URL.
- Paste this repository URL:
https://github.com/BlackSnowSkill/ANIMA_BOOSTER - Click Install, wait for the process to complete, and restart ComfyUI.
- Open your terminal and navigate to your ComfyUI custom nodes directory:
cd ComfyUI/custom_nodes - Clone this repository:
git clone https://github.com/BlackSnowSkill/ANIMA_BOOSTER.git
- Restart ComfyUI.
All nodes are registered under the BSS/AnimaBooster category:
- 📥 Anima Booster Loader (BSS) (class
AnimaBoosterLoader)- Loads the Anima DiT model in the optimized fp16 format.
- SageAttention: Automatically applies accelerated 8-bit attention if installed in the system. If unavailable, it seamlessly falls back to built-in PyTorch SDPA.
- Torch Compile: An integrated toggle for safe JIT compilation of individual transformer blocks.
- 🎛️ Anima TeaCache (BSS) (class
AnimaTeaCache)- Implements adaptive latent state caching based on denoising steps (TeaCache).
- Version Selector (teacache_version): Allows you to choose between two modes:
v1 (Legacy Fast)(Default): Restores the highly requested aggressive caching behavior with a fixed timestep normalizer. Delivers an instant 2.0× speedup out-of-the-box on SDE samplers (such aser_sde,sde gpu), though it might introduce minor artifacts on Euler A.v2 (Standard Precise): Mathematically precise, dynamic timestep normalization that adapts to any sampler and scheduler. Fully protects early structural steps and guarantees perfect image quality.
- 🖼️ Anima Latent Image (BSS) (class
AnimaLatentImage)- A utility for generating empty latents with automatic size alignment to the Anima DiT patch grid (2x2), preventing tensor dimension mismatch errors. Provides predefined aspect ratio presets.
Unlike standard TeaCache implementations with a fixed threshold, the BSS version uses a dynamic adaptive threshold that evolves during the denoising process:
- In early steps (high noise, image structure formation), the threshold is automatically lowered to ensure maximum rendering precision and geometric accuracy.
- In later steps (details have stabilized, micro-texturing takes place), the threshold is raised, allowing up to 80% of block computations to be safely skipped without quality loss.
Tip
Choosing the Timestep Normalization Mode (in v1.3.0):
- If you want uncompromised extreme speed out-of-the-box on SDE samplers and enjoy experimenting, select
v1 (Legacy Fast). - If you are running Euler A, require maximum geometric precision, or want to fine-tune quality using
early_steps_factorandlate_steps_factor, selectv2 (Standard Precise).
To achieve the best results, connect the nodes in the following sequence:
[ Anima Booster Loader (BSS) ] ── (Enable sage_attention: auto and torch_compile: True)
↓ (MODEL)
[ Anima TeaCache (BSS) ] (Recommended: threshold: 0.15, adaptive: ON)
↓ (MODEL)
[ KSampler ]
All resource-heavy optimization libraries (SageAttention and JIT Triton) are entirely optional. The package is designed with Graceful Degradation in mind: if the libraries are not installed, the nodes will automatically disable patches or compile features, transitioning seamlessly to standard PyTorch mechanisms and guaranteeing crash-free execution in ComfyUI.
Important
Triton is required for both SageAttention and JIT Compilation (torch.compile)!
If you plan to enable torch_compile (which yields up to a 40% speed boost) or use SageAttention on Windows, you must install triton-windows. Without Triton, torch.compile will be safely disabled with a warning in the console to avoid crashes.
Portable ComfyUI builds (which use an isolated python_embeded environment on Python 3.12 or 3.13) do not have C++ compilation tools (MSVC / Build Tools) installed. As a result, the standard pip install sageattention command will fail with a compilation error.
To install it successfully, use precompiled binary packages (.whl wheels):
- Open a command prompt (CMD or PowerShell) in your main ComfyUI folder.
- Install Triton for Windows:
.\python_embeded\python.exe -m pip install triton-windows
- Download the precompiled
.whlfile for SageAttention that matches your Python version (e.g.,cp312orcp313) and CUDA version (e.g.,cu124/cu128):- Precompiled builds can be found in the releases of this repository: sdbds/SageAttention-for-windows.
- Precompiled wheels are also published in the project: wildminder/AI-windows-whl.
- Install the downloaded file into the portable environment:
.\python_embeded\python.exe -m pip install <path_to_downloaded_file.whl>
On Linux/Ubuntu systems, installing dependencies is much more straightforward than on Windows since compilation tools and build pipelines are natively supported.
Ensure you have the CUDA Toolkit installed and available in your environment (nvcc --version).
SageAttention 2.x is highly recommended for newer GPUs (Ampere, Lovelace, Blackwell, e.g., RTX 30xx, 40xx, 50xx).
- Activate your ComfyUI virtual environment:
source /path/to/ComfyUI/venv/bin/activate - Install SageAttention directly via PyPI with the
--no-build-isolationflag to prevent dependency mismatches with PyTorch:If PyPI fails, you can compile from source:pip install sageattention --no-build-isolation
git clone https://github.com/thu-ml/SageAttention.git cd SageAttention pip install "setuptools<=75.8.2" pip install --no-build-isolation -e .
If you enable torch_compile in the loader node and encounter a runtime crash in KSampler with Triton raising RuntimeError: PassManager::run failed (specifically under _attn_fwd during Triton compilation on Ubuntu 24.04), this is a known compatibility issue between Triton's code generator, LLVM, and local system compiler tools.
To resolve or bypass this error, try the following solutions:
The easiest and most robust workaround is to simply set torch_compile to False in the Anima Booster Loader (BSS) or Anima Checkpoint Loader (BSS) node.
- Why: The nodes are fully optimized, and you will still get massive acceleration (2.5× to 3.5×) using just SageAttention and TeaCache without invoking Triton's JIT compiler.
You can bypass the failing compilation pass by running ComfyUI with the Triton optimization disable flag. Run ComfyUI like this:
export TRITON_JIT_DISABLE_OPT=1
python main.pySometimes Triton's cached kernels get corrupted or miscompiled. Clear the cache directory:
rm -rf ~/.triton/cache
rm -rf ~/.cache/triton
rm -rf /tmp/torchinductor_*Ensure your Triton version is strictly compatible with PyTorch:
pip install --upgrade --force-reinstall triton- Model Base: Anima DiT is based on the
MiniTrainDITarchitecture with theLLMAdapterwrapper. - SageAttention Integration Point: The patch is applied via the standard
transformer_options["optimized_attention_override"]parameter dictionary. - Torch Compile Integration Point: Invokes the built-in
set_torch_compile_wrapperfunction fromcomfy_api.torch_helpersat the level of individual transformer blocks (ensuring LoRA compatibility and reducing compilation overhead). - Isolation: All graph and weight modifications are performed strictly on a cloned copy of the model (
model.clone()). This prevents conflicts and leaves the original model in the ComfyUI cache untouched.
If you love my work and want to support the development of future optimizations, nodes, and custom models, please consider supporting me:
- Boosty: Support & Exclusive Models
© 2026 blacksnowskill (BSS). All rights reserved.
This software is an experimental release. Feedback is highly welcome. Notice: This project is protected by copyright. Any unauthorized copying, distribution, merging with other projects, or hosting on other repositories/websites without the explicit written permission of the author is strictly prohibited.