A ComfyUI custom node that enables multi-artist mixing for the Anima model by hooking into its cross-attention layers.
Anima uses an LLM as its text encoder. When multiple artist tags are stacked in a single prompt, the LLM's contextualization causes them to interfere with each other, producing a conditioning that resembles neither artist clearly. This node encodes each artist independently and mixes them at the model's cross-attention layer, sidestepping the interference at the prompt-encoding stage.
The bundled AnimaArtistPack node provides a one-shot experience: write your artist list (separated by commas or newlines) in one text box, your main prompt in another, and the node handles splitting, encoding, and packaging automatically.
The current release (v24) adds layered cross-seed stabilizers, CFG-style strength extrapolation, and a new linear injection-layer weight syntax ::name::weight.
- Full documentation — usage, parameters, modes, stabilizers, performance tips
- Issues — bug reports, feature requests
- Discussions — usage questions, results sharing
Clone or download into your ComfyUI custom_nodes directory:
ComfyUI/custom_nodes/<this-plugin-folder>/
Restart ComfyUI. No extra dependencies.
- Anima model only — depends on Anima's built-in
LLMAdapter(preprocess_text_embeds) - Use the same CLIP loader that Anima's own text-encoding workflow uses (the one whose tokens carry
t5xxl_ids) - Inference only
┌──► artist_pack ──► AnimaArtistCrossAttn ──► MODEL ──► KSampler
[Load CLIP] ─► CLIP ──────┤ │ │
│ └──► base_prompt ──► (positive)
│
└──► CLIPTextEncode (Negative) ──► (negative)
[Load Anima Model] ──► MODEL ──► AnimaArtistCrossAttn
(optional) AnimaArtistOptions ──► advanced_options ──► AnimaArtistCrossAttn
- Top text box of
AnimaArtistPack: your artist chain (comma or newline separated) - Bottom text box: the main prompt (no need to repeat artist names here)
- Wire
AnimaArtistCrossAttn'sbase_promptoutput directly to KSampler's positive input
For full parameter explanations and recommended combinations, see docs/USAGE.md.
combine_mode = output_avg
fusion_mode = interpolate
strength = 1.0
To weight individual artists within the chain, use either of two syntaxes (they can coexist and stack):
wlop, ::sakimichan::1.2, (krenz:0.7)
(name:1.2)— CLIP-side weighting (same as SD/A1111), non-linear, applied at text encoding::name::1.2— injection-side weighting (v24), linear and predictable, applied at cross-attention output
In multi-artist setups, the same prompt with different seeds tends to produce noticeably different style mixes — sometimes one artist dominates, other times another, even at equal weights. This is structural to how cross-attention interacts with seed-driven hidden state.
v24 provides four optional stabilizers via AnimaArtistOptions, ordered from light to heavy:
| Stabilizer | Strength | Notes |
|---|---|---|
artist_ema_alpha |
light | Temporal EMA across sampling steps |
combine_mode = lowrank_avg + lowrank_k |
medium | SVD low-rank constraint on multi-artist deltas |
artist_static_capture + static_capture_k |
heavy | Freeze artist attention after K warmup steps (also a 30-50% speedup) |
artist_anchor_q |
heaviest | Replace user-seed Q with a fixed-seed anchor's Q (near-full cross-seed decoupling) |
All are off by default. Recommended progression: start with EMA, escalate as needed. See docs/USAGE.md for detailed mechanics and tuning.
strength accepts values in [0, 4]:
0 ~ 1— interpolation between base and artist (strength=1= pure artist replacement)1 ~ 4— CFG-style extrapolation:out = base + strength * (artist - base), amplifying the artist's deviation from base for stronger style
1.5 ~ 2.5 is a common range for "stronger style without breaking content"; pushing past 3 tends to oversaturate.
Generation time scales with artist count. Per the math of output_avg, each layer runs N + 1 cross-attention forwards (N artists + base). Approximate measured cost (varies by GPU):
| Configuration | Relative time |
|---|---|
| 1 artist | 1.0x |
| 4 artists | ~1.4x |
| 8 artists | ~1.7x |
5 artists + artist_static_capture (K=6) |
~1.1x |
5 artists + artist_anchor_q (cached) |
~1.05x |
Strongly recommended: connect AnimaArtistOptions and limit either the layer range (start_block / end_block) or the sampling-step range (start_percent / end_percent). Both can dramatically reduce generation time with minimal quality loss, and stack with the cache-based stabilizers above. See the docs for details.
This node cannot achieve the near-lossless artist mixing that SDXL does. Anima's text encoder is non-linear, so any mixing strategy introduces some distortion. What this node does is make that distortion controllable. Style-similar artists mix well; style-divergent artists may "regress to the mean" into a compromise look — lowrank_avg accepts more of this regression in exchange for cross-seed stability.
Special thanks to 汐浮尘/utowo for co-development, testing, and design contributions. The AnimaArtistPack split-and-encode design comes from their improvement.
MIT License. See LICENSE for the full text.

