v0.8.0: Pillar-attention architecture

danny-mio released this 21 Feb 22:53

· 1 commit to main since this release

Immutable

7048c73

[0.8.0] - 2026-02-21

Added

Pillar-attention architecture (v080/flow.py)
- FluxTransformerBlock_v080: FiLM conditioning and shared pillar cross-attention per transformer block
- FiLM modulation per pillar: gamma, beta = film_pi(text_cond).chunk(2) → gate * (1 + gamma) + beta
- Shared pillar_cross_attn (Q=stacked pillar outputs, KV=text_seq), n_pillar_heads = _valid_pillar_heads(d_model, max(1, n_head // 4)) — guarantees divisibility
- norm_pillar LayerNorm shared across all 4 pillars
- FluxFlowProcessor_v080: adds text_cond_proj linear, passes pooled text conditioning to every block
ModelLoaderV08 in versioning.py
- Loads/saves v0.8.0 checkpoints via load_versioned_checkpoint() / save_versioned_checkpoint()
- Routes by model_version field in model_metadata.json; characteristic v0.8.0 state dict keys include transformer_blocks.N.pillar_cross_attn and transformer_blocks.N.film_p0
- Compatible version range: ["0.8.1", "0.8.2"]
v080/__init__.py — version registry entry; imports FluxCompressor and FluxExpander from v070/vae.py (VAE unchanged)
Tests (tests/unit/test_flow_shapes_v080.py): shape correctness, FiLM modulation effect, pillar cross-attn KV shape, gradient flow, GPU variant

Changed

VAE (FluxCompressor, FluxExpander) unchanged from v0.7.0
External FluxFlowProcessor forward signature unchanged: forward(packed, text_embeddings, timesteps)
text_cond extraction is internal to FluxFlowProcessor_v080.forward(); no training loop changes required
text_embed_dim default corrected to 1024 across all loaders and pipelines (versioning.py, pipeline.py, diffusion_pipeline.py) — previous default of 768 was a legacy error from DistilBERT's internal hidden size; FluxFlow's projection output has always been 1024

Fixed

Pillar cross-attention head count now uses _valid_pillar_heads() to guarantee d_model % n_pillar_heads == 0; previously n_head // 4 could produce non-divisors causing runtime shape errors
Versioned loader now passes vae_latent_dim (raw latent dim) instead of flow_vae_dim when instantiating FluxFlowProcessor_v080; FluxFlowProcessor_v080 adds CONTEXT_DIMS internally so passing the pre-added value caused vae_to_dmodel to be sized vae_dim + 2×CONTEXT_DIMS, always mismatching real checkpoints

Assets 3