feat(tts/magpie): parity fixture + HF upload staging + manifest#44
feat(tts/magpie): parity fixture + HF upload staging + manifest#44Alex-Wengg wants to merge 5 commits intomainfrom
Conversation
Adds a standalone companion script next to `generate_coreml.py` that runs
the Magpie CoreML pipeline for a fixed (text, speaker, language, seed)
and dumps intermediate tensors so cross-implementation parity tests can
diff against this ground truth.
Usage:
# Full pipeline — .npz with tokens, encoder output, prefill caches,
# per-step decoder hidden + sampled codes, predicted (8,N), PCM audio.
python emit_parity_fixture.py "Hello world." \
--speaker 0 --language en --seed 42 \
--output fixture_en_s0.npz
# Tokenizer-only — small .json for quick Swift tokenizer diff
# without loading CoreML.
python emit_parity_fixture.py "Hello world." \
--speaker 0 --language en --mode tokenizer \
--output fixture_en_s0_tokens.json
The script re-imports from `generate_coreml.py` so it never drifts from
the reference pipeline. Consumed by the Swift port's
`fluidaudiocli magpie parity` and `magpie tokenizer-parity` subcommands
in FluidInference/FluidAudio#541.
|
|
||
| gen_time = time.time() - gen_start | ||
|
|
||
| predicted_codes_full = np.stack(per_step_codes, axis=1) # (8, N) |
There was a problem hiding this comment.
🔴 EOS frame included in codec input causes fixture audio/codes to diverge from reference
The fixture's AR loop appends the EOS-containing codes to per_step_codes before breaking (line 216), then feeds the full per_step_codes (including the EOS frame) into predicted_codes_full (line 223) and through the NanoCodec decoder (line 229). In contrast, the reference generate_coreml.py:400-406 breaks before appending EOS codes to all_predictions, so the codec never sees the EOS frame.
This means predictedCodes in the fixture has one extra column containing EOS token IDs (e.g., 2017), the NanoCodec decodes those special tokens as regular codec codes producing garbage audio for that frame, and the resulting audioPcm and WAV file diverge from the reference. Since this tool exists to emit "ground truth" for cross-implementation parity testing, a Swift implementation validated against this fixture would produce different output than generate_coreml.py.
Recording the EOS step in per_step_codes for trace purposes is fine, but the codec-input codes should exclude the final EOS frame.
Prompt for agents
The issue is that predicted_codes_full at line 223 includes the EOS frame (appended at line 216) and is then fed to the NanoCodec decoder at lines 226-232. The reference generate_coreml.py excludes EOS codes from codec input entirely.
To fix this while preserving the full per-step trace (which is useful for the fixture):
1. After line 223, determine whether the last frame is an EOS frame. If the loop broke due to EOS (is_eos=True), the last entry in per_step_codes is the EOS frame.
2. Build a separate codec_codes variable that excludes the EOS frame: e.g. predicted_codes_full[:, :-1] if the loop ended on EOS, or predicted_codes_full otherwise.
3. Use that codec_codes for the NanoCodec decode step (lines 226-237) instead of predicted_codes_full.
4. Keep predicted_codes_full (with EOS) in the fixture under predictedCodes for trace completeness, but also store the codec-input codes if desired.
Alternatively, mimic the reference exactly: do not append EOS codes to per_step_codes (remove line 216), and record the EOS event separately (e.g. a boolean flag in the fixture). This keeps predicted_codes_full identical to the reference's predicted_codes.
Was this helpful? React with 👍 or 👎 to provide feedback.
Adds `prepare_hf_upload.py`, which assembles the layout expected by the
FluidAudio Swift port (`FluidInference/magpie-tts-multilingual-357m-coreml`)
from the mobius exporter outputs:
- Copies the 3 required CoreML models (and optional `decoder_prefill`)
from `build/` to the repo root.
- Keeps model constants + speaker/audio embeddings + `local_transformer/`
under `constants/`.
- Moves per-language tokenizer JSONs (english_phoneme_*, mandarin_*, etc.)
into a dedicated `tokenizer/` subtree — Swift's `MagpieResourceDownloader`
downloads this folder lazily on language selection.
- Writes a model card README.md and a `.gitattributes` that LFS-tracks
`.mlmodelc` / `.npy` / `.bin` / `.safetensors` / `.onnx`.
- Emits a `_prep_report.json` listing what was copied / skipped / missing.
The script does NOT upload — it prints the exact `huggingface-cli upload`
command for the maintainer to run. Smoke-tested against a synthetic
fixture tree; MISS rows surface in the report and the script exits non-zero
when required inputs (local_transformer weights, core models) are absent.
Usage:
python prepare_hf_upload.py # defaults
python prepare_hf_upload.py --clean # fresh staging dir
python prepare_hf_upload.py --repo-id org/name # override target
Generates a machine-readable index of every artifact in the upload (models in both .mlmodelc + .mlpackage form, constants, per-language tokenizer files), with shapes, sizes, and SHA-256 digests. The Swift port's MagpieResourceDownloader consumes manifest.json to know what to fetch and how to verify integrity.
| "french": {"tokenizer_kind": "byt5", "files": []}, | ||
| "italian": {"tokenizer_kind": "byt5", "files": []}, | ||
| "vietnamese": {"tokenizer_kind": "byt5", "files": []}, |
There was a problem hiding this comment.
🔴 French, Italian, Vietnamese incorrectly labeled as "byt5" with no tokenizer files in manifest
The LANGUAGE_FILES dict marks French, Italian, and Vietnamese as "tokenizer_kind": "byt5" with empty "files": [] lists. However, these languages are NOT byt5-based:
- French uses
french_chartokenizer(generate_coreml.py:470) - Italian uses
italian_phoneme(generate_coreml.py:474) - Vietnamese uses
vietnamese_phoneme(generate_coreml.py:475)
The exporter (export_tokenizers.py:56-62) produces token2id (and phoneme_dict for Italian/Vietnamese) files for these languages. The sibling script prepare_hf_upload.py:83-98 correctly lists the files. The manifest consumed by Swift's MagpieResourceDownloader will indicate these languages need no downloads, causing runtime failures when attempting to use them.
| "french": {"tokenizer_kind": "byt5", "files": []}, | |
| "italian": {"tokenizer_kind": "byt5", "files": []}, | |
| "vietnamese": {"tokenizer_kind": "byt5", "files": []}, | |
| "french": { | |
| "tokenizer_kind": "char", | |
| "files": [ | |
| "tokenizer/french_chartokenizer_token2id.json", | |
| ], | |
| }, | |
| "italian": { | |
| "tokenizer_kind": "phoneme", | |
| "files": [ | |
| "tokenizer/italian_phoneme_token2id.json", | |
| "tokenizer/italian_phoneme_phoneme_dict.json", | |
| ], | |
| }, | |
| "vietnamese": { | |
| "tokenizer_kind": "phoneme", | |
| "files": [ | |
| "tokenizer/vietnamese_phoneme_token2id.json", | |
| "tokenizer/vietnamese_phoneme_phoneme_dict.json", | |
| ], | |
| }, |
Was this helpful? React with 👍 or 👎 to provide feedback.
| "english": [ | ||
| "english_phoneme_token2id.json", | ||
| "english_phoneme_phoneme_dict.json", | ||
| ], |
There was a problem hiding this comment.
🔴 English heteronyms file missing from HF upload tokenizer list
The English entry in PER_LANGUAGE_TOKENIZER_FILES omits english_phoneme_heteronyms.json. The exporter (export_tokenizers.py:100-104) produces this file, build_manifest.py:236 references it as tokenizer/english_phoneme_heteronyms.json, and the German entry at line 94 correctly includes its own heteronyms file. Without this file in the upload, English heteronym pronunciation resolution will be unavailable in the Swift port.
| "english": [ | |
| "english_phoneme_token2id.json", | |
| "english_phoneme_phoneme_dict.json", | |
| ], | |
| "english": [ | |
| "english_phoneme_token2id.json", | |
| "english_phoneme_phoneme_dict.json", | |
| "english_phoneme_heteronyms.json", | |
| ], |
Was this helpful? React with 👍 or 👎 to provide feedback.
| "mandarin": [ | ||
| "mandarin_phoneme_token2id.json", | ||
| "mandarin_phoneme_pinyin_dict.json", | ||
| "mandarin_phoneme_tone_dict.json", | ||
| "mandarin_phoneme_ascii_letter_dict.json", | ||
| "mandarin_pypinyin_char_dict.json", | ||
| "mandarin_pypinyin_phrase_dict.json", | ||
| "mandarin_jieba_dict.json", | ||
| ], |
There was a problem hiding this comment.
🔴 Mandarin phoneme_dict file missing from HF upload tokenizer list
The Mandarin entry in PER_LANGUAGE_TOKENIZER_FILES omits mandarin_phoneme_phoneme_dict.json. The exporter (export_tokenizers.py:65-78) produces this file via the generic IPA G2P phoneme_dict export path (separate from the Chinese-specific pinyin_dict), and build_manifest.py:264 references it. Since the file won't be in ALL_TOKENIZER_FILES, it also won't be skipped by the constants copier — it will end up in unknown_files and not be copied anywhere. The manifest builder will then fail when trying to hash a non-existent file.
| "mandarin": [ | |
| "mandarin_phoneme_token2id.json", | |
| "mandarin_phoneme_pinyin_dict.json", | |
| "mandarin_phoneme_tone_dict.json", | |
| "mandarin_phoneme_ascii_letter_dict.json", | |
| "mandarin_pypinyin_char_dict.json", | |
| "mandarin_pypinyin_phrase_dict.json", | |
| "mandarin_jieba_dict.json", | |
| ], | |
| "mandarin": [ | |
| "mandarin_phoneme_token2id.json", | |
| "mandarin_phoneme_phoneme_dict.json", | |
| "mandarin_phoneme_pinyin_dict.json", | |
| "mandarin_phoneme_tone_dict.json", | |
| "mandarin_phoneme_ascii_letter_dict.json", | |
| "mandarin_pypinyin_char_dict.json", | |
| "mandarin_pypinyin_phrase_dict.json", | |
| "mandarin_jieba_dict.json", | |
| ], |
Was this helpful? React with 👍 or 👎 to provide feedback.
The 30s hard-coded timeout on `event.wait` in `get_compute_plan` was
silently false-failing as "Failed to load compute plan: unknown error"
on graphs ≳1500 ops where `MLComputePlan.loadContentsOfURL` legitimately
takes 25-30s. Confirmed on Magpie's rank-4 decoder_step (1782 ops, load
took 27.09s end-to-end) and reproducible on nanocodec_decoder.
Changes:
- `compute_plan.py`: add `DEFAULT_LOAD_TIMEOUT_S = 120.0` and
`load_timeout_s` parameter; raise descriptive timeout error
separately from the generic load error.
- `fallback.py`: pass-through `load_timeout_s` to `get_compute_plan`.
- `cli.py`: expose `--plan-timeout` typer option (default 120s);
wire through to both the compute-plan and fallback paths.
Verified on Magpie decoder_step (1782 ops, 27.1s load, ANE 97.3%) and
nanocodec_decoder (1149 ops, ANE compile rejection visible via
--fallback at extended timeout).
…ental stateful variant
Production change:
Split per-layer KV cache from rank-5 ``(2, B, max_seq, H, D)`` into
rank-4 ``cache_k`` + ``cache_v`` tensors so the ANE compiler will
accept the graph. Also replace causal mask constant ``-1e9`` (overflows
fp16, ANE rejects) with fp16-safe ``-3e4``, and use additive masking
for cross-attention.
Result on Apple M2 / macOS 26.5:
rank-5 (old): GPU only, ~64 ms/step
rank-4 (new): 97.3% ANE, 40 ms/step standalone, 96 ms/step in 146-step loop
Loads ``decoder_step.mlpackage`` with ``ComputeUnit.ALL``. Cache and
position output key names re-derived from the re-traced graph.
Experimental (kept, not enabled by default):
Add ``traceable_decoder_step_stateful.py`` and
``convert_decoder_step_stateful.py`` for a CoreML MLState (stateful
buffers) variant. Shrinks IO surface from 39/38 to 4/2 tensors, but
forces CPU+GPU only (ANE rejects stateful graphs). Real-loop benchmark
showed 212 ms/step — 2.2× regression vs rank-4. Both files carry
prominent EXPERIMENTAL banners and the ``MAGPIE_STATEFUL`` env path in
``generate_coreml.py`` is off by default. Kept so future agents don't
repeat the experiment thinking CosyVoice3's ~3× MLState speedup
generalises (it doesn't — Magpie's rank-4 graph is already on ANE).
Files:
- ``traceable/traceable_decoder_step.py`` — rank-4 production
- ``convert_decoder_step.py`` — rank-4 production
- ``generate_coreml.py`` — rank-4 keys + ``ComputeUnit.ALL``;
experimental ``MAGPIE_STATEFUL`` env-gated branch
- ``traceable/traceable_decoder_step_stateful.py`` — experimental, NEW
- ``convert_decoder_step_stateful.py`` — experimental, NEW
| # Re-use everything from the main script so we never drift from the reference. | ||
| from generate_coreml import ( # noqa: E402 | ||
| BUILD_DIR, | ||
| DECODER_CACHE_OUT_KEYS, |
There was a problem hiding this comment.
🔴 emit_parity_fixture.py imports deleted DECODER_CACHE_OUT_KEYS and uses stale rank-5 cache format
The PR renamed DECODER_CACHE_OUT_KEYS → DECODER_CACHE_K_OUT_KEYS + DECODER_CACHE_V_OUT_KEYS in generate_coreml.py, and changed caches from rank-5 (2, B, max_seq, H, D) to rank-4 (B, max_seq, H, D) with split K/V keys (cache_k{i} / cache_v{i}). However, emit_parity_fixture.py was never updated:
- Line 43 imports
DECODER_CACHE_OUT_KEYSwhich no longer exists — immediateImportErrorat runtime. - Lines 59–60 create caches with old rank-5 shape
(2, 1, max_seq_len, n_heads, d_head)and old key namescache{i}— incompatible with the new model that expectscache_k{i}/cache_v{i}with shape(1, max_seq_len, n_heads, d_head). - Line 166 updates
cache_dict[f"cache{i}"]using the deletedDECODER_CACHE_OUT_KEYS— even if the import were fixed, the cache update logic is wrong.
The script is completely non-functional with the new rank-4 decoder_step model.
Prompt for agents
emit_parity_fixture.py needs to be updated to match the new rank-4 split-K/V cache format from generate_coreml.py. Specifically:
1. In the import block (line 43), replace DECODER_CACHE_OUT_KEYS with DECODER_CACHE_K_OUT_KEYS and DECODER_CACHE_V_OUT_KEYS.
2. In _make_caches() (lines 56-63), change the cache creation from rank-5 (2, 1, max_seq_len, n_heads, d_head) with key cache{i} to two rank-4 tensors (1, max_seq_len, n_heads, d_head) with keys cache_k{i} and cache_v{i}. Mirror the make_caches() function in generate_coreml.py lines 365-371.
3. In _run_step() (lines 156-168), update the cache output reading from cache_dict[f"cache{i}"] = out[DECODER_CACHE_OUT_KEYS[i]] to cache_dict[f"cache_k{i}"] = out[DECODER_CACHE_K_OUT_KEYS[i]] and cache_dict[f"cache_v{i}"] = out[DECODER_CACHE_V_OUT_KEYS[i]]. Mirror the run_decoder_step() function in generate_coreml.py lines 398-402.
4. In the prefill snapshot (line 262), the prefillCache key naming f"prefillCache{i}" referencing f"cache{i}" also needs updating to match the new cache key names.
Was this helpful? React with 👍 or 👎 to provide feedback.
| "decoder_step": { | ||
| "inputs": [ | ||
| {"name": "input", "dtype": "fp16", "shape": [1, 1, 768]}, | ||
| {"name": "encoder_output", "dtype": "fp16", "shape": [1, 256, 768]}, | ||
| {"name": "encoder_mask", "dtype": "fp16", "shape": [1, 256]}, | ||
| { | ||
| "name": "cache_*", | ||
| "dtype": "fp16", | ||
| "shape": [2, 1, 512, 12, 64], | ||
| "count": 12, | ||
| }, | ||
| {"name": "position_*", "dtype": "int32", "shape": [], "count": 12}, | ||
| ], | ||
| "outputs": [ | ||
| { | ||
| "name": "var_2201", | ||
| "dtype": "fp16", | ||
| "shape": [1, 1, 16192], | ||
| "note": "logits, reshape to (1, 1, 8, 2024) for 8 codebooks", | ||
| }, | ||
| {"name": "new_cache_*", "dtype": "fp16", "shape": [2, 1, 512, 12, 64], "count": 12}, | ||
| {"name": "var_*", "dtype": "int32", "shape": [], "count": 12, "note": "advanced positions"}, | ||
| ], | ||
| }, |
There was a problem hiding this comment.
🟡 build_manifest.py decoder_step IO spec describes stale rank-5 cache format
The MODEL_IO["decoder_step"] dictionary in build_manifest.py still describes the old rank-5 cache layout that was replaced in this PR. It lists input caches as cache_* with shape [2, 1, 512, 12, 64] and outputs as new_cache_* with the same shape, plus logits named var_2201. The actual model now uses split rank-4 caches (cache_k* / cache_v* with shape [1, 512, 12, 64]), output keys new_k_* / new_v_*, and logits named var_2129 (see generate_coreml.py:39-58). The Swift port's MagpieResourceDownloader consumes this manifest, so the stale IO spec will mislead consumers about the model's interface.
Prompt for agents
Update the MODEL_IO dictionary entry for decoder_step in build_manifest.py to reflect the new rank-4 split-K/V cache format. Specifically:
- Input caches should be two sets of 12: cache_k* with shape [1, 512, 12, 64] and cache_v* with shape [1, 512, 12, 64] (rank-4, not rank-5)
- Output caches should be new_k_* and new_v_* with shape [1, 512, 12, 64] (12 each)
- Logits output name should be var_2129 not var_2201
- Count of cache inputs is 24 (12 K + 12 V) not 12
- The position inputs remain the same (12 scalars)
Refer to the DECODER_CACHE_K_OUT_KEYS, DECODER_CACHE_V_OUT_KEYS, and DECODER_LOGITS_KEY constants in generate_coreml.py for the correct names.
Was this helpful? React with 👍 or 👎 to provide feedback.
| LANGUAGE_FILES: dict[str, dict[str, Any]] = { | ||
| "english": { | ||
| "tokenizer_kind": "phoneme", | ||
| "files": [ | ||
| "tokenizer/english_phoneme_token2id.json", | ||
| "tokenizer/english_phoneme_phoneme_dict.json", | ||
| "tokenizer/english_phoneme_heteronyms.json", | ||
| ], | ||
| }, | ||
| "spanish": { | ||
| "tokenizer_kind": "phoneme", | ||
| "files": [ | ||
| "tokenizer/spanish_phoneme_token2id.json", | ||
| "tokenizer/spanish_phoneme_phoneme_dict.json", | ||
| ], | ||
| }, | ||
| "german": { | ||
| "tokenizer_kind": "phoneme", | ||
| "files": [ | ||
| "tokenizer/german_phoneme_token2id.json", | ||
| "tokenizer/german_phoneme_phoneme_dict.json", | ||
| "tokenizer/german_phoneme_heteronyms.json", | ||
| ], | ||
| }, | ||
| "hindi": { | ||
| "tokenizer_kind": "char", | ||
| "files": [ | ||
| "tokenizer/hindi_chartokenizer_token2id.json", | ||
| ], | ||
| }, | ||
| "mandarin": { | ||
| "tokenizer_kind": "phoneme+jieba+pypinyin", | ||
| "files": [ | ||
| "tokenizer/mandarin_phoneme_token2id.json", | ||
| "tokenizer/mandarin_phoneme_phoneme_dict.json", | ||
| "tokenizer/mandarin_phoneme_pinyin_dict.json", | ||
| "tokenizer/mandarin_phoneme_tone_dict.json", | ||
| "tokenizer/mandarin_phoneme_ascii_letter_dict.json", | ||
| "tokenizer/mandarin_pypinyin_char_dict.json", | ||
| "tokenizer/mandarin_pypinyin_phrase_dict.json", | ||
| "tokenizer/mandarin_jieba_dict.json", | ||
| ], | ||
| }, | ||
| "french": {"tokenizer_kind": "byt5", "files": []}, | ||
| "italian": {"tokenizer_kind": "byt5", "files": []}, | ||
| "vietnamese": {"tokenizer_kind": "byt5", "files": []}, | ||
| } |
There was a problem hiding this comment.
🔴 build_manifest.py references tokenizer files not staged by prepare_hf_upload.py, causing FileNotFoundError
The LANGUAGE_FILES in build_manifest.py references two tokenizer files that are absent from PER_LANGUAGE_TOKENIZER_FILES in prepare_hf_upload.py: english_phoneme_heteronyms.json (line 236) and mandarin_phoneme_phoneme_dict.json (line 264). Since prepare_hf_upload.py stages the hf-upload/ directory and doesn't copy these files, build_manifest.py's json_entry() will crash with FileNotFoundError when computing their size/hash. The intended workflow is: run prepare_hf_upload.py first, then run build_manifest.py — but one list has files the other doesn't.
Specific mismatches
english:build_manifest.pyincludesenglish_phoneme_heteronyms.json,prepare_hf_upload.pydoes notmandarin:build_manifest.pyincludesmandarin_phoneme_phoneme_dict.json,prepare_hf_upload.pydoes not
Prompt for agents
The language tokenizer file lists in build_manifest.py (LANGUAGE_FILES) and prepare_hf_upload.py (PER_LANGUAGE_TOKENIZER_FILES) are inconsistent. Reconcile them so the files prepare_hf_upload copies to hf-upload exactly match the files build_manifest expects to index.
Specific issues to resolve:
1. English: build_manifest includes english_phoneme_heteronyms.json but prepare_hf_upload does not. Either add it to prepare_hf_upload.py or remove it from build_manifest.py.
2. Mandarin: build_manifest includes mandarin_phoneme_phoneme_dict.json but prepare_hf_upload does not. Same resolution needed.
3. French, Italian, Vietnamese: prepare_hf_upload stages phoneme/chartokenizer files for these languages, but build_manifest marks them as byt5 with empty file lists. Since generate_coreml.py maps these languages to phoneme/char tokenizers (lines 524-529), build_manifest should list their tokenizer files too.
The authoritative source for which tokenizers each language needs is generate_coreml.py's language_tokenizer_map. Both files should mirror that.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Adds three Swift-port-supporting tools under
models/tts/magpie/coreml/:emit_parity_fixture.py— runs the full Magpie CoreML pipeline for a fixed(text, speaker, language, seed)and dumps every intermediate tensor as a single.npzso the Swift port can replay each stage and diff against this Python ground truth.prepare_hf_upload.py— stageshf-upload/fromcompiled/build/+constants/for upload to the HF repo. Splits constants intoconstants/(model tensors) +tokenizer/(per-language lookup files), generatesREADME.md+.gitattributes+_prep_report.json.build_manifest.py— generatesmanifest.json, a machine-readable index with sha256, file sizes, npy shapes/dtypes, and per-model IO specs. The Swift port'sMagpieResourceDownloaderconsumes it.All three re-import helpers from
generate_coreml.py— they never fork the reference pipeline.HF upload (live)
Already uploaded to
FluidInference/magpie-tts-multilingual-357m-coreml(1.4 GB) — both.mlmodelc(compiled, ready-to-run) and.mlpackage(portable) for all 4 models, plusconstants/,tokenizer/, andmanifest.json.What
emit_parity_fixture.pycaptures--mode full→.npz(+ reference.wav):text,speakerIndex,languageCode,seed,useCfg,cfgScale,temperature,topk,sampleRate,minFramestextTokens,textTokensPadded,textMaskencoderOutputprefillCache{0..11},prefillPosition{0..11}perStepDecoderHidden(N, 768),perStepCodes(N, 8),predictedCodes(8, N)audioPcm,audioSamples,genTimeSeconds--mode tokenizer→ small.jsonwith{text, speakerIndex, languageCode, expectedTokenIds}for cheap Swift-side tokenizer diffing.What
manifest.jsonlooks like{ "schema_version": "1.0", "repo_id": "FluidInference/magpie-tts-multilingual-357m-coreml", "model": { "name": "Magpie TTS Multilingual", "params_million": 357, "sample_rate": 22050, "max_nanocodec_seconds": 11.89, "supported_languages": ["english", "spanish", "german", "hindi", "mandarin", "french", "italian", "vietnamese"], "audio_eos_id": 2017, "forbidden_token_ids": [2016, 2018, 2019, 2020, 2021, 2022, 2023], "speaker_names": ["John", "Sofia", "Aria", "Jason", "Leo"], "streaming_nanocodec": { "supported": false, "note": "..." } }, "models": { "decoder_step": { "compiled": { "path": "decoder_step.mlmodelc", "bytes": ..., "files": ... }, "package": { "path": "decoder_step.mlpackage", "bytes": ..., "files": ... }, "io": { "inputs": [...], "outputs": [{ "name": "var_2201", "shape": [1,1,16192], ... }, ...] } }, ... }, "constants": { "json": [...], "npy": [...], "local_transformer": [...] }, "languages": { "english": { "tokenizer_kind": "phoneme", "files": [...] }, ... } }Usage
Companion PR
Consumed by the Swift port in FluidInference/FluidAudio#541 —
fluidaudiocli magpie parity/magpie tokenizer-parity/magpie textsubcommands.Test plan
python -m py_compile {emit_parity_fixture,prepare_hf_upload,build_manifest}.py— parses clean..mlmodelc+ 4.mlpackage+ constants/ + tokenizer/ +manifest.json, uploaded successfully to HF (1.4 GB total)..mlpackageset."Hello | n ɛ m o ʊ |."produceshəˈloʊ … nɛmoʊin G2P output.fluidaudiocli magpie parity --fixture fixture_en_s0.npzhits MAE < 1e-3 onencoderOutputand SNR > 40 dB onaudioPcm.Notes
build/to contain compiled.mlpackageartifacts.nemoextras required for tokenization.cfg_scale=2.5); pass--no-cfgto emit unconditional fixture.py3-none-anyis a stub;uvmay need--force-reinstallto pullcp311-none-macosx_11_0_arm64.whl.