Skip to content

v0.8.0: Pillar-attention architecture

Choose a tag to compare

@danny-mio danny-mio released this 21 Feb 22:53
· 1 commit to main since this release
Immutable release. Only release title and notes can be modified.
v0.8.0

[0.8.0] - 2026-02-21

Added

  • Pillar-attention architecture (v080/flow.py)
    • FluxTransformerBlock_v080: FiLM conditioning and shared pillar cross-attention per transformer block
    • FiLM modulation per pillar: gamma, beta = film_pi(text_cond).chunk(2)gate * (1 + gamma) + beta
    • Shared pillar_cross_attn (Q=stacked pillar outputs, KV=text_seq), n_pillar_heads = _valid_pillar_heads(d_model, max(1, n_head // 4)) — guarantees divisibility
    • norm_pillar LayerNorm shared across all 4 pillars
    • FluxFlowProcessor_v080: adds text_cond_proj linear, passes pooled text conditioning to every block
  • ModelLoaderV08 in versioning.py
    • Loads/saves v0.8.0 checkpoints via load_versioned_checkpoint() / save_versioned_checkpoint()
    • Routes by model_version field in model_metadata.json; characteristic v0.8.0 state dict keys include transformer_blocks.N.pillar_cross_attn and transformer_blocks.N.film_p0
    • Compatible version range: ["0.8.1", "0.8.2"]
  • v080/__init__.py — version registry entry; imports FluxCompressor and FluxExpander from v070/vae.py (VAE unchanged)
  • Tests (tests/unit/test_flow_shapes_v080.py): shape correctness, FiLM modulation effect, pillar cross-attn KV shape, gradient flow, GPU variant

Changed

  • VAE (FluxCompressor, FluxExpander) unchanged from v0.7.0
  • External FluxFlowProcessor forward signature unchanged: forward(packed, text_embeddings, timesteps)
  • text_cond extraction is internal to FluxFlowProcessor_v080.forward(); no training loop changes required
  • text_embed_dim default corrected to 1024 across all loaders and pipelines (versioning.py, pipeline.py, diffusion_pipeline.py) — previous default of 768 was a legacy error from DistilBERT's internal hidden size; FluxFlow's projection output has always been 1024

Fixed

  • Pillar cross-attention head count now uses _valid_pillar_heads() to guarantee d_model % n_pillar_heads == 0; previously n_head // 4 could produce non-divisors causing runtime shape errors
  • Versioned loader now passes vae_latent_dim (raw latent dim) instead of flow_vae_dim when instantiating FluxFlowProcessor_v080; FluxFlowProcessor_v080 adds CONTEXT_DIMS internally so passing the pre-added value caused vae_to_dmodel to be sized vae_dim + 2×CONTEXT_DIMS, always mismatching real checkpoints