Skip to content

Resolve merge conflicts: resync with upstream while preserving fork features#6

Merged
lmangani merged 9 commits intomasterfrom
copilot/resolve-merge-conflicts
Mar 1, 2026
Merged

Resolve merge conflicts: resync with upstream while preserving fork features#6
lmangani merged 9 commits intomasterfrom
copilot/resolve-merge-conflicts

Conversation

Copy link

Copilot AI commented Mar 1, 2026

PR #5 (ServeurpersoCom → audiohacking) had dirty merge state because the fork had diverged with its own additions (LoRA, cover/repaint mode, VAE encoder, reference audio). Blind upstream adoption would have severed those features. Each of the 16 conflicts was resolved by merging both sides' intent.

Brought in from upstream

  • Flash attention toggleuse_flash_attn: bool added to CondGGML, DetokGGML, Qwen3GGML, Qwen3LM; qwen3_attn_f32() pure-F32 fallback added to qwen3-enc.h; threaded through all qwen3_build_layer / qw3lm_build_attn call chains
  • --no-fa CLI flagace-qwen3 and dit-vae both accept --no-fa to disable flash attention at runtime
  • qwen3_embed_lookup() via ggml_get_rows — replaces the old CPU mmap dequant approach (qwen3_cpu_embed_lookup); qw3lm_forward / qw3lm_forward_batch switch to ggml_backend_sched_alloc_graph instead of ggml_gallocr; galloc, gf_mmap, embed_mmap_data, embed_type fields removed from Qwen3LM
  • ggml submodule pointer advanced to upstream (55e062ab)
  • tests/debug-dit-cossim.sh rewritten to build and test CUDA / Vulkan / CPU in sequence
  • README documentation updates

Preserved from fork

  • LoRADiTGGMLLayer adapter tensors, dit_ggml_linear_lora(), dit_ggml_load_lora(), lora_wctx / lora_scale in DiTGGML, LoRA CLI args in dit-vae, dit-lora.cpp in CMakeLists
  • Cover / repaint modetask_type, reference_audio, src_audio, audio_cover_strength, repainting_start/end in AceRequest; custom_tag / genre for LoRA trigger words
  • VAE EncoderVAEEncoderGGML + vae_encoder_load() for reference audio timbre encoding
  • FSQ codeword helpersdetok_ggml_build_codeword_table() + latent_frames_to_codes() kept in fsq-detok.h (used by file-based cover mode)
  • .gitignore!tests/fixtures/ exception preserved so fixture JSON files remain tracked
  • CMakeListsaudio_loader.cpp and src/dit-lora.cpp remain in the build

Example: flash attention now toggleable per model

dit_ggml_init_backend(&model);
if (!use_fa) model.use_flash_attn = false;   // DiT

qwen3_init_backend(&text_enc);
if (!use_fa) text_enc.use_flash_attn = false; // text encoder

cond_ggml_init_backend(&cond);
if (!use_fa) cond.use_flash_attn = false;     // condition encoder

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

ServeurpersoCom and others added 7 commits March 1, 2026 15:36
Fix Metal col2im_1d: use 256 threads/group instead of 1 thread/group.

Revert conv_transpose_1d bounded loop (8c70db8, e0e36f3) and im2col
gridDim.y fix (b65bf45): not used by the project, reduce upstream diff.

Rename CPU helpers ggml_load_f32/ggml_store_f32 to snake_load/snake_store
ace-qwen3: disables flash_attn_ext in prefill and batched decode,
falls back to F32 manual attention.

dit-vae: disables flash_attn_ext in TextEncoder, CondEncoder,
Detokenizer and DiT.

qwen3_attn_f32() fallback added in qwen3-enc.h, reused by
qwen3-lm.h prefill/decode and dit-graph.h self/cross attention.
DiT already had its own fallback: F16 accumulation drifts audibly
over 24 layers x 8 iterative Euler steps on CPU
Drop manual CPU-side mmap dequant and gallocr in favor of
standard ggml_get_rows with backend scheduler fallback.
No functional change
Copilot AI and others added 2 commits March 1, 2026 22:57
… fork features (LoRA, cover mode, reference audio, VAE encoder)
…additions

Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>
Copilot AI changed the title [WIP] Resolve merge conflicts to resync with upstream changes Resolve merge conflicts: resync with upstream while preserving fork features Mar 1, 2026
@lmangani lmangani marked this pull request as ready for review March 1, 2026 23:03
@lmangani lmangani merged commit 40d75d1 into master Mar 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants