feat(tts/magpie): parity fixture + HF upload staging + manifest by Alex-Wengg · Pull Request #44 · FluidInference/mobius

Alex-Wengg · 2026-04-25T02:59:39Z

Summary

Adds three Swift-port-supporting tools under models/tts/magpie/coreml/:

emit_parity_fixture.py — runs the full Magpie CoreML pipeline for a fixed (text, speaker, language, seed) and dumps every intermediate tensor as a single .npz so the Swift port can replay each stage and diff against this Python ground truth.
prepare_hf_upload.py — stages hf-upload/ from compiled/build/ + constants/ for upload to the HF repo. Splits constants into constants/ (model tensors) + tokenizer/ (per-language lookup files), generates README.md + .gitattributes + _prep_report.json.
build_manifest.py — generates manifest.json, a machine-readable index with sha256, file sizes, npy shapes/dtypes, and per-model IO specs. The Swift port's MagpieResourceDownloader consumes it.

All three re-import helpers from generate_coreml.py — they never fork the reference pipeline.

HF upload (live)

Already uploaded to FluidInference/magpie-tts-multilingual-357m-coreml (1.4 GB) — both .mlmodelc (compiled, ready-to-run) and .mlpackage (portable) for all 4 models, plus constants/, tokenizer/, and manifest.json.

What `emit_parity_fixture.py` captures

--mode full → .npz (+ reference .wav):

Stage	Keys
Config	`text`, `speakerIndex`, `languageCode`, `seed`, `useCfg`, `cfgScale`, `temperature`, `topk`, `sampleRate`, `minFrames`
Tokenizer	`textTokens`, `textTokensPadded`, `textMask`
Text encoder	`encoderOutput`
Post-prefill KV	`prefillCache{0..11}`, `prefillPosition{0..11}`
AR loop	`perStepDecoderHidden` (N, 768), `perStepCodes` (N, 8), `predictedCodes` (8, N)
Audio	`audioPcm`, `audioSamples`, `genTimeSeconds`

--mode tokenizer → small .json with {text, speakerIndex, languageCode, expectedTokenIds} for cheap Swift-side tokenizer diffing.

What `manifest.json` looks like

{
  "schema_version": "1.0",
  "repo_id": "FluidInference/magpie-tts-multilingual-357m-coreml",
  "model": {
    "name": "Magpie TTS Multilingual",
    "params_million": 357,
    "sample_rate": 22050,
    "max_nanocodec_seconds": 11.89,
    "supported_languages": ["english", "spanish", "german", "hindi", "mandarin", "french", "italian", "vietnamese"],
    "audio_eos_id": 2017,
    "forbidden_token_ids": [2016, 2018, 2019, 2020, 2021, 2022, 2023],
    "speaker_names": ["John", "Sofia", "Aria", "Jason", "Leo"],
    "streaming_nanocodec": { "supported": false, "note": "..." }
  },
  "models": {
    "decoder_step": {
      "compiled": { "path": "decoder_step.mlmodelc", "bytes": ..., "files": ... },
      "package":  { "path": "decoder_step.mlpackage", "bytes": ..., "files": ... },
      "io": { "inputs": [...], "outputs": [{ "name": "var_2201", "shape": [1,1,16192], ... }, ...] }
    },
    ...
  },
  "constants": { "json": [...], "npy": [...], "local_transformer": [...] },
  "languages": { "english": { "tokenizer_kind": "phoneme", "files": [...] }, ... }
}

Usage

# Build hf-upload/ from compiled artifacts
python prepare_hf_upload.py \
    --build-dir compiled/build \
    --constants-dir constants \
    --output-dir hf-upload --clean

# Add manifest.json
python build_manifest.py

# Emit parity fixture
python emit_parity_fixture.py "Hello world." \
    --speaker 0 --language en --seed 42 \
    --output fixture_en_s0.npz

Companion PR

Consumed by the Swift port in FluidInference/FluidAudio#541 — fluidaudiocli magpie parity / magpie tokenizer-parity / magpie text subcommands.

Test plan

python -m py_compile {emit_parity_fixture,prepare_hf_upload,build_manifest}.py — parses clean.
End-to-end: full mobius pipeline run produced 4 .mlmodelc + 4 .mlpackage + constants/ + tokenizer/ + manifest.json, uploaded successfully to HF (1.4 GB total).
Python inference smoke test: 11.05 s synthesis at 3.97x RTF using the generated .mlpackage set.
Inline IPA verified: "Hello | n ɛ m o ʊ |." produces həˈloʊ … nɛmoʊ in G2P output.
Swift-side fluidaudiocli magpie parity --fixture fixture_en_s0.npz hits MAE < 1e-3 on encoderOutput and SNR > 40 dB on audioPcm.

Notes

Requires build/ to contain compiled .mlpackage artifacts.
nemo extras required for tokenization.
CFG enabled by default (cfg_scale=2.5); pass --no-cfg to emit unconditional fixture.
coremltools 9.0 wheel quirk: PyPI's py3-none-any is a stub; uv may need --force-reinstall to pull cp311-none-macosx_11_0_arm64.whl.

Adds a standalone companion script next to `generate_coreml.py` that runs the Magpie CoreML pipeline for a fixed (text, speaker, language, seed) and dumps intermediate tensors so cross-implementation parity tests can diff against this ground truth. Usage: # Full pipeline — .npz with tokens, encoder output, prefill caches, # per-step decoder hidden + sampled codes, predicted (8,N), PCM audio. python emit_parity_fixture.py "Hello world." \ --speaker 0 --language en --seed 42 \ --output fixture_en_s0.npz # Tokenizer-only — small .json for quick Swift tokenizer diff # without loading CoreML. python emit_parity_fixture.py "Hello world." \ --speaker 0 --language en --mode tokenizer \ --output fixture_en_s0_tokens.json The script re-imports from `generate_coreml.py` so it never drifts from the reference pipeline. Consumed by the Swift port's `fluidaudiocli magpie parity` and `magpie tokenizer-parity` subcommands in FluidInference/FluidAudio#541.

devin-ai-integration

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-04-25T03:02:52Z

+
+    gen_time = time.time() - gen_start
+
+    predicted_codes_full = np.stack(per_step_codes, axis=1)  # (8, N)


🔴 EOS frame included in codec input causes fixture audio/codes to diverge from reference

The fixture's AR loop appends the EOS-containing codes to per_step_codes before breaking (line 216), then feeds the full per_step_codes (including the EOS frame) into predicted_codes_full (line 223) and through the NanoCodec decoder (line 229). In contrast, the reference generate_coreml.py:400-406 breaks before appending EOS codes to all_predictions, so the codec never sees the EOS frame.

This means predictedCodes in the fixture has one extra column containing EOS token IDs (e.g., 2017), the NanoCodec decodes those special tokens as regular codec codes producing garbage audio for that frame, and the resulting audioPcm and WAV file diverge from the reference. Since this tool exists to emit "ground truth" for cross-implementation parity testing, a Swift implementation validated against this fixture would produce different output than generate_coreml.py.

Recording the EOS step in per_step_codes for trace purposes is fine, but the codec-input codes should exclude the final EOS frame.

Prompt for agents

The issue is that predicted_codes_full at line 223 includes the EOS frame (appended at line 216) and is then fed to the NanoCodec decoder at lines 226-232. The reference generate_coreml.py excludes EOS codes from codec input entirely. To fix this while preserving the full per-step trace (which is useful for the fixture): 1. After line 223, determine whether the last frame is an EOS frame. If the loop broke due to EOS (is_eos=True), the last entry in per_step_codes is the EOS frame. 2. Build a separate codec_codes variable that excludes the EOS frame: e.g. predicted_codes_full[:, :-1] if the loop ended on EOS, or predicted_codes_full otherwise. 3. Use that codec_codes for the NanoCodec decode step (lines 226-237) instead of predicted_codes_full. 4. Keep predicted_codes_full (with EOS) in the fixture under predictedCodes for trace completeness, but also store the codec-input codes if desired. Alternatively, mimic the reference exactly: do not append EOS codes to per_step_codes (remove line 216), and record the EOS event separately (e.g. a boolean flag in the fixture). This keeps predicted_codes_full identical to the reference's predicted_codes.

Was this helpful? React with 👍 or 👎 to provide feedback.

Adds `prepare_hf_upload.py`, which assembles the layout expected by the FluidAudio Swift port (`FluidInference/magpie-tts-multilingual-357m-coreml`) from the mobius exporter outputs: - Copies the 3 required CoreML models (and optional `decoder_prefill`) from `build/` to the repo root. - Keeps model constants + speaker/audio embeddings + `local_transformer/` under `constants/`. - Moves per-language tokenizer JSONs (english_phoneme_*, mandarin_*, etc.) into a dedicated `tokenizer/` subtree — Swift's `MagpieResourceDownloader` downloads this folder lazily on language selection. - Writes a model card README.md and a `.gitattributes` that LFS-tracks `.mlmodelc` / `.npy` / `.bin` / `.safetensors` / `.onnx`. - Emits a `_prep_report.json` listing what was copied / skipped / missing. The script does NOT upload — it prints the exact `huggingface-cli upload` command for the maintainer to run. Smoke-tested against a synthetic fixture tree; MISS rows surface in the report and the script exits non-zero when required inputs (local_transformer weights, core models) are absent. Usage: python prepare_hf_upload.py # defaults python prepare_hf_upload.py --clean # fresh staging dir python prepare_hf_upload.py --repo-id org/name # override target

Generates a machine-readable index of every artifact in the upload (models in both .mlmodelc + .mlpackage form, constants, per-language tokenizer files), with shapes, sizes, and SHA-256 digests. The Swift port's MagpieResourceDownloader consumes manifest.json to know what to fetch and how to verify integrity.

devin-ai-integration

Devin Review found 3 new potential issues.

View 11 additional findings in Devin Review.

devin-ai-integration · 2026-04-25T05:12:03Z

+    "french": {"tokenizer_kind": "byt5", "files": []},
+    "italian": {"tokenizer_kind": "byt5", "files": []},
+    "vietnamese": {"tokenizer_kind": "byt5", "files": []},


🔴 French, Italian, Vietnamese incorrectly labeled as "byt5" with no tokenizer files in manifest

The LANGUAGE_FILES dict marks French, Italian, and Vietnamese as "tokenizer_kind": "byt5" with empty "files": [] lists. However, these languages are NOT byt5-based:

French uses french_chartokenizer (generate_coreml.py:470)

Italian uses italian_phoneme (generate_coreml.py:474)

Vietnamese uses vietnamese_phoneme (generate_coreml.py:475)

The exporter (export_tokenizers.py:56-62) produces token2id (and phoneme_dict for Italian/Vietnamese) files for these languages. The sibling script prepare_hf_upload.py:83-98 correctly lists the files. The manifest consumed by Swift's MagpieResourceDownloader will indicate these languages need no downloads, causing runtime failures when attempting to use them.

Suggested change

"french": {"tokenizer_kind": "byt5", "files": []},

"italian": {"tokenizer_kind": "byt5", "files": []},

"vietnamese": {"tokenizer_kind": "byt5", "files": []},

"french": {

"tokenizer_kind": "char",

"files": [

"tokenizer/french_chartokenizer_token2id.json",

],

},

"italian": {

"tokenizer_kind": "phoneme",

"files": [

"tokenizer/italian_phoneme_token2id.json",

"tokenizer/italian_phoneme_phoneme_dict.json",

],

},

"vietnamese": {

"tokenizer_kind": "phoneme",

"files": [

"tokenizer/vietnamese_phoneme_token2id.json",

"tokenizer/vietnamese_phoneme_phoneme_dict.json",

],

},

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-25T05:12:04Z

+    "english": [
+        "english_phoneme_token2id.json",
+        "english_phoneme_phoneme_dict.json",
+    ],


🔴 English heteronyms file missing from HF upload tokenizer list

The English entry in PER_LANGUAGE_TOKENIZER_FILES omits english_phoneme_heteronyms.json. The exporter (export_tokenizers.py:100-104) produces this file, build_manifest.py:236 references it as tokenizer/english_phoneme_heteronyms.json, and the German entry at line 94 correctly includes its own heteronyms file. Without this file in the upload, English heteronym pronunciation resolution will be unavailable in the Swift port.

Suggested change

"english": [

"english_phoneme_token2id.json",

"english_phoneme_phoneme_dict.json",

],

"english": [

"english_phoneme_token2id.json",

"english_phoneme_phoneme_dict.json",

"english_phoneme_heteronyms.json",

],

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-25T05:12:05Z

+    "mandarin": [
+        "mandarin_phoneme_token2id.json",
+        "mandarin_phoneme_pinyin_dict.json",
+        "mandarin_phoneme_tone_dict.json",
+        "mandarin_phoneme_ascii_letter_dict.json",
+        "mandarin_pypinyin_char_dict.json",
+        "mandarin_pypinyin_phrase_dict.json",
+        "mandarin_jieba_dict.json",
+    ],


🔴 Mandarin phoneme_dict file missing from HF upload tokenizer list

The Mandarin entry in PER_LANGUAGE_TOKENIZER_FILES omits mandarin_phoneme_phoneme_dict.json. The exporter (export_tokenizers.py:65-78) produces this file via the generic IPA G2P phoneme_dict export path (separate from the Chinese-specific pinyin_dict), and build_manifest.py:264 references it. Since the file won't be in ALL_TOKENIZER_FILES, it also won't be skipped by the constants copier — it will end up in unknown_files and not be copied anywhere. The manifest builder will then fail when trying to hash a non-existent file.

Suggested change

"mandarin": [

"mandarin_phoneme_token2id.json",

"mandarin_phoneme_pinyin_dict.json",

"mandarin_phoneme_tone_dict.json",

"mandarin_phoneme_ascii_letter_dict.json",

"mandarin_pypinyin_char_dict.json",

"mandarin_pypinyin_phrase_dict.json",

"mandarin_jieba_dict.json",

],

"mandarin": [

"mandarin_phoneme_token2id.json",

"mandarin_phoneme_phoneme_dict.json",

"mandarin_phoneme_pinyin_dict.json",

"mandarin_phoneme_tone_dict.json",

"mandarin_phoneme_ascii_letter_dict.json",

"mandarin_pypinyin_char_dict.json",

"mandarin_pypinyin_phrase_dict.json",

"mandarin_jieba_dict.json",

],

Was this helpful? React with 👍 or 👎 to provide feedback.

The 30s hard-coded timeout on `event.wait` in `get_compute_plan` was silently false-failing as "Failed to load compute plan: unknown error" on graphs ≳1500 ops where `MLComputePlan.loadContentsOfURL` legitimately takes 25-30s. Confirmed on Magpie's rank-4 decoder_step (1782 ops, load took 27.09s end-to-end) and reproducible on nanocodec_decoder. Changes: - `compute_plan.py`: add `DEFAULT_LOAD_TIMEOUT_S = 120.0` and `load_timeout_s` parameter; raise descriptive timeout error separately from the generic load error. - `fallback.py`: pass-through `load_timeout_s` to `get_compute_plan`. - `cli.py`: expose `--plan-timeout` typer option (default 120s); wire through to both the compute-plan and fallback paths. Verified on Magpie decoder_step (1782 ops, 27.1s load, ANE 97.3%) and nanocodec_decoder (1149 ops, ANE compile rejection visible via --fallback at extended timeout).

…ental stateful variant Production change: Split per-layer KV cache from rank-5 ``(2, B, max_seq, H, D)`` into rank-4 ``cache_k`` + ``cache_v`` tensors so the ANE compiler will accept the graph. Also replace causal mask constant ``-1e9`` (overflows fp16, ANE rejects) with fp16-safe ``-3e4``, and use additive masking for cross-attention. Result on Apple M2 / macOS 26.5: rank-5 (old): GPU only, ~64 ms/step rank-4 (new): 97.3% ANE, 40 ms/step standalone, 96 ms/step in 146-step loop Loads ``decoder_step.mlpackage`` with ``ComputeUnit.ALL``. Cache and position output key names re-derived from the re-traced graph. Experimental (kept, not enabled by default): Add ``traceable_decoder_step_stateful.py`` and ``convert_decoder_step_stateful.py`` for a CoreML MLState (stateful buffers) variant. Shrinks IO surface from 39/38 to 4/2 tensors, but forces CPU+GPU only (ANE rejects stateful graphs). Real-loop benchmark showed 212 ms/step — 2.2× regression vs rank-4. Both files carry prominent EXPERIMENTAL banners and the ``MAGPIE_STATEFUL`` env path in ``generate_coreml.py`` is off by default. Kept so future agents don't repeat the experiment thinking CosyVoice3's ~3× MLState speedup generalises (it doesn't — Magpie's rank-4 graph is already on ANE). Files: - ``traceable/traceable_decoder_step.py`` — rank-4 production - ``convert_decoder_step.py`` — rank-4 production - ``generate_coreml.py`` — rank-4 keys + ``ComputeUnit.ALL``; experimental ``MAGPIE_STATEFUL`` env-gated branch - ``traceable/traceable_decoder_step_stateful.py`` — experimental, NEW - ``convert_decoder_step_stateful.py`` — experimental, NEW

devin-ai-integration

Devin Review found 3 new potential issues.

View 17 additional findings in Devin Review.

devin-ai-integration · 2026-04-25T20:03:18Z

+# Re-use everything from the main script so we never drift from the reference.
+from generate_coreml import (  # noqa: E402
+    BUILD_DIR,
+    DECODER_CACHE_OUT_KEYS,


🔴 emit_parity_fixture.py imports deleted DECODER_CACHE_OUT_KEYS and uses stale rank-5 cache format

The PR renamed DECODER_CACHE_OUT_KEYS → DECODER_CACHE_K_OUT_KEYS + DECODER_CACHE_V_OUT_KEYS in generate_coreml.py, and changed caches from rank-5 (2, B, max_seq, H, D) to rank-4 (B, max_seq, H, D) with split K/V keys (cache_k{i} / cache_v{i}). However, emit_parity_fixture.py was never updated:

Line 43 imports DECODER_CACHE_OUT_KEYS which no longer exists — immediate ImportError at runtime.

Lines 59–60 create caches with old rank-5 shape (2, 1, max_seq_len, n_heads, d_head) and old key names cache{i} — incompatible with the new model that expects cache_k{i} / cache_v{i} with shape (1, max_seq_len, n_heads, d_head).

Line 166 updates cache_dict[f"cache{i}"] using the deleted DECODER_CACHE_OUT_KEYS — even if the import were fixed, the cache update logic is wrong.

The script is completely non-functional with the new rank-4 decoder_step model.

Prompt for agents

emit_parity_fixture.py needs to be updated to match the new rank-4 split-K/V cache format from generate_coreml.py. Specifically: 1. In the import block (line 43), replace DECODER_CACHE_OUT_KEYS with DECODER_CACHE_K_OUT_KEYS and DECODER_CACHE_V_OUT_KEYS. 2. In _make_caches() (lines 56-63), change the cache creation from rank-5 (2, 1, max_seq_len, n_heads, d_head) with key cache{i} to two rank-4 tensors (1, max_seq_len, n_heads, d_head) with keys cache_k{i} and cache_v{i}. Mirror the make_caches() function in generate_coreml.py lines 365-371. 3. In _run_step() (lines 156-168), update the cache output reading from cache_dict[f"cache{i}"] = out[DECODER_CACHE_OUT_KEYS[i]] to cache_dict[f"cache_k{i}"] = out[DECODER_CACHE_K_OUT_KEYS[i]] and cache_dict[f"cache_v{i}"] = out[DECODER_CACHE_V_OUT_KEYS[i]]. Mirror the run_decoder_step() function in generate_coreml.py lines 398-402. 4. In the prefill snapshot (line 262), the prefillCache key naming f"prefillCache{i}" referencing f"cache{i}" also needs updating to match the new cache key names.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-25T20:03:19Z

+    "decoder_step": {
+        "inputs": [
+            {"name": "input", "dtype": "fp16", "shape": [1, 1, 768]},
+            {"name": "encoder_output", "dtype": "fp16", "shape": [1, 256, 768]},
+            {"name": "encoder_mask", "dtype": "fp16", "shape": [1, 256]},
+            {
+                "name": "cache_*",
+                "dtype": "fp16",
+                "shape": [2, 1, 512, 12, 64],
+                "count": 12,
+            },
+            {"name": "position_*", "dtype": "int32", "shape": [], "count": 12},
+        ],
+        "outputs": [
+            {
+                "name": "var_2201",
+                "dtype": "fp16",
+                "shape": [1, 1, 16192],
+                "note": "logits, reshape to (1, 1, 8, 2024) for 8 codebooks",
+            },
+            {"name": "new_cache_*", "dtype": "fp16", "shape": [2, 1, 512, 12, 64], "count": 12},
+            {"name": "var_*", "dtype": "int32", "shape": [], "count": 12, "note": "advanced positions"},
+        ],
+    },


🟡 build_manifest.py decoder_step IO spec describes stale rank-5 cache format

The MODEL_IO["decoder_step"] dictionary in build_manifest.py still describes the old rank-5 cache layout that was replaced in this PR. It lists input caches as cache_* with shape [2, 1, 512, 12, 64] and outputs as new_cache_* with the same shape, plus logits named var_2201. The actual model now uses split rank-4 caches (cache_k* / cache_v* with shape [1, 512, 12, 64]), output keys new_k_* / new_v_*, and logits named var_2129 (see generate_coreml.py:39-58). The Swift port's MagpieResourceDownloader consumes this manifest, so the stale IO spec will mislead consumers about the model's interface.

Prompt for agents

Update the MODEL_IO dictionary entry for decoder_step in build_manifest.py to reflect the new rank-4 split-K/V cache format. Specifically: - Input caches should be two sets of 12: cache_k* with shape [1, 512, 12, 64] and cache_v* with shape [1, 512, 12, 64] (rank-4, not rank-5) - Output caches should be new_k_* and new_v_* with shape [1, 512, 12, 64] (12 each) - Logits output name should be var_2129 not var_2201 - Count of cache inputs is 24 (12 K + 12 V) not 12 - The position inputs remain the same (12 scalars) Refer to the DECODER_CACHE_K_OUT_KEYS, DECODER_CACHE_V_OUT_KEYS, and DECODER_LOGITS_KEY constants in generate_coreml.py for the correct names.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-25T20:03:20Z

+LANGUAGE_FILES: dict[str, dict[str, Any]] = {
+    "english": {
+        "tokenizer_kind": "phoneme",
+        "files": [
+            "tokenizer/english_phoneme_token2id.json",
+            "tokenizer/english_phoneme_phoneme_dict.json",
+            "tokenizer/english_phoneme_heteronyms.json",
+        ],
+    },
+    "spanish": {
+        "tokenizer_kind": "phoneme",
+        "files": [
+            "tokenizer/spanish_phoneme_token2id.json",
+            "tokenizer/spanish_phoneme_phoneme_dict.json",
+        ],
+    },
+    "german": {
+        "tokenizer_kind": "phoneme",
+        "files": [
+            "tokenizer/german_phoneme_token2id.json",
+            "tokenizer/german_phoneme_phoneme_dict.json",
+            "tokenizer/german_phoneme_heteronyms.json",
+        ],
+    },
+    "hindi": {
+        "tokenizer_kind": "char",
+        "files": [
+            "tokenizer/hindi_chartokenizer_token2id.json",
+        ],
+    },
+    "mandarin": {
+        "tokenizer_kind": "phoneme+jieba+pypinyin",
+        "files": [
+            "tokenizer/mandarin_phoneme_token2id.json",
+            "tokenizer/mandarin_phoneme_phoneme_dict.json",
+            "tokenizer/mandarin_phoneme_pinyin_dict.json",
+            "tokenizer/mandarin_phoneme_tone_dict.json",
+            "tokenizer/mandarin_phoneme_ascii_letter_dict.json",
+            "tokenizer/mandarin_pypinyin_char_dict.json",
+            "tokenizer/mandarin_pypinyin_phrase_dict.json",
+            "tokenizer/mandarin_jieba_dict.json",
+        ],
+    },
+    "french": {"tokenizer_kind": "byt5", "files": []},
+    "italian": {"tokenizer_kind": "byt5", "files": []},
+    "vietnamese": {"tokenizer_kind": "byt5", "files": []},
+}


🔴 build_manifest.py references tokenizer files not staged by prepare_hf_upload.py, causing FileNotFoundError

The LANGUAGE_FILES in build_manifest.py references two tokenizer files that are absent from PER_LANGUAGE_TOKENIZER_FILES in prepare_hf_upload.py: english_phoneme_heteronyms.json (line 236) and mandarin_phoneme_phoneme_dict.json (line 264). Since prepare_hf_upload.py stages the hf-upload/ directory and doesn't copy these files, build_manifest.py's json_entry() will crash with FileNotFoundError when computing their size/hash. The intended workflow is: run prepare_hf_upload.py first, then run build_manifest.py — but one list has files the other doesn't.

Specific mismatches

english: build_manifest.py includes english_phoneme_heteronyms.json, prepare_hf_upload.py does not

mandarin: build_manifest.py includes mandarin_phoneme_phoneme_dict.json, prepare_hf_upload.py does not

Prompt for agents

The language tokenizer file lists in build_manifest.py (LANGUAGE_FILES) and prepare_hf_upload.py (PER_LANGUAGE_TOKENIZER_FILES) are inconsistent. Reconcile them so the files prepare_hf_upload copies to hf-upload exactly match the files build_manifest expects to index. Specific issues to resolve: 1. English: build_manifest includes english_phoneme_heteronyms.json but prepare_hf_upload does not. Either add it to prepare_hf_upload.py or remove it from build_manifest.py. 2. Mandarin: build_manifest includes mandarin_phoneme_phoneme_dict.json but prepare_hf_upload does not. Same resolution needed. 3. French, Italian, Vietnamese: prepare_hf_upload stages phoneme/chartokenizer files for these languages, but build_manifest marks them as byt5 with empty file lists. Since generate_coreml.py maps these languages to phoneme/char tokenizers (lines 524-529), build_manifest should list their tokenizer files too. The authoritative source for which tokenizers each language needs is generate_coreml.py's language_tokenizer_map. Both files should mirror that.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration Bot reviewed Apr 25, 2026

View reviewed changes

Alex-Wengg added 2 commits April 24, 2026 23:05

Alex-Wengg changed the title ~~feat(tts/magpie): add parity fixture emitter for Swift port~~ feat(tts/magpie): parity fixture + HF upload staging + manifest Apr 25, 2026

Alex-Wengg mentioned this pull request Apr 25, 2026

feat(tts/magpie): add NVIDIA Magpie TTS Multilingual 357M Swift port FluidInference/FluidAudio#541

Open

5 tasks

devin-ai-integration Bot reviewed Apr 25, 2026

View reviewed changes

Alex-Wengg added 2 commits April 25, 2026 15:44

devin-ai-integration Bot reviewed Apr 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tts/magpie): parity fixture + HF upload staging + manifest#44

feat(tts/magpie): parity fixture + HF upload staging + manifest#44
Alex-Wengg wants to merge 5 commits intomainfrom
tts/magpie-parity-fixture

Alex-Wengg commented Apr 25, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		gen_time = time.time() - gen_start

		predicted_codes_full = np.stack(per_step_codes, axis=1) # (8, N)

Conversation

Alex-Wengg commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

HF upload (live)

What emit_parity_fixture.py captures

What manifest.json looks like

Usage

Companion PR

Test plan

Notes

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Alex-Wengg commented Apr 25, 2026 •

edited

Loading

What `emit_parity_fixture.py` captures

What `manifest.json` looks like