Conversation
Fix Metal col2im_1d: use 256 threads/group instead of 1 thread/group. Revert conv_transpose_1d bounded loop (8c70db8, e0e36f3) and im2col gridDim.y fix (b65bf45): not used by the project, reduce upstream diff. Rename CPU helpers ggml_load_f32/ggml_store_f32 to snake_load/snake_store
ace-qwen3: disables flash_attn_ext in prefill and batched decode, falls back to F32 manual attention. dit-vae: disables flash_attn_ext in TextEncoder, CondEncoder, Detokenizer and DiT. qwen3_attn_f32() fallback added in qwen3-enc.h, reused by qwen3-lm.h prefill/decode and dit-graph.h self/cross attention. DiT already had its own fallback: F16 accumulation drifts audibly over 24 layers x 8 iterative Euler steps on CPU
Drop manual CPU-side mmap dequant and gallocr in favor of standard ggml_get_rows with backend scheduler fallback. No functional change
… fork features (LoRA, cover mode, reference audio, VAE encoder)
…additions Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Resolve merge conflicts to resync with upstream changes
Resolve merge conflicts: resync with upstream while preserving fork features
Mar 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #5 (ServeurpersoCom → audiohacking) had dirty merge state because the fork had diverged with its own additions (LoRA, cover/repaint mode, VAE encoder, reference audio). Blind upstream adoption would have severed those features. Each of the 16 conflicts was resolved by merging both sides' intent.
Brought in from upstream
use_flash_attn: booladded toCondGGML,DetokGGML,Qwen3GGML,Qwen3LM;qwen3_attn_f32()pure-F32 fallback added toqwen3-enc.h; threaded through allqwen3_build_layer/qw3lm_build_attncall chains--no-faCLI flag —ace-qwen3anddit-vaeboth accept--no-fato disable flash attention at runtimeqwen3_embed_lookup()viaggml_get_rows— replaces the old CPU mmap dequant approach (qwen3_cpu_embed_lookup);qw3lm_forward/qw3lm_forward_batchswitch toggml_backend_sched_alloc_graphinstead ofggml_gallocr;galloc,gf_mmap,embed_mmap_data,embed_typefields removed fromQwen3LM55e062ab)tests/debug-dit-cossim.shrewritten to build and test CUDA / Vulkan / CPU in sequencePreserved from fork
DiTGGMLLayeradapter tensors,dit_ggml_linear_lora(),dit_ggml_load_lora(),lora_wctx/lora_scaleinDiTGGML, LoRA CLI args indit-vae,dit-lora.cppin CMakeListstask_type,reference_audio,src_audio,audio_cover_strength,repainting_start/endinAceRequest;custom_tag/genrefor LoRA trigger wordsVAEEncoderGGML+vae_encoder_load()for reference audio timbre encodingdetok_ggml_build_codeword_table()+latent_frames_to_codes()kept infsq-detok.h(used by file-based cover mode).gitignore—!tests/fixtures/exception preserved so fixture JSON files remain trackedaudio_loader.cppandsrc/dit-lora.cppremain in the buildExample: flash attention now toggleable per model
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.