feat: add Magpie TTS CoreML conversion pipeline#24
Conversation
3be1e15 to
d161a71
Compare
NVIDIA Magpie TTS Multilingual (357M) conversion to CoreML. Pipeline (4 models): - text_encoder: text tokenization and encoding - decoder_prefill: batch speaker context into KV cache - decoder_step: single AR step with KV cache - nanocodec_decoder: codec tokens to 22kHz audio 9 languages (en, es, de, fr, it, vi, zh, hi, ja), 5 speakers. Includes conversion scripts, traceable wrappers, export scripts for embeddings/tokenizers/weights, and CoreML inference script. Source: nvidia/magpie_tts_multilingual_357m
d161a71 to
c04cbcb
Compare
- Fix constants_dir path to use single dirname (script is in coreml/ not coreml/convert/) - Move pyproject.toml and uv.lock to coreml/ directory to follow AGENTS.md structure Fixes Devin review findings: - BUG_pr-review-job-b93938a5c0ea4e7897c7782fcd2dbe59_0002 - BUG_pr-review-job-b93938a5c0ea4e7897c7782fcd2dbe59_0003
All three conversion scripts had incorrect sys.path.insert() that went up two directories instead of one, causing ModuleNotFoundError at runtime. Fixed files: - convert_decoder_prefill.py:18 - convert_decoder_step.py:14 - convert_text_encoder.py:14 Changed from dirname(dirname(__file__)) to dirname(__file__) to correctly resolve to coreml/ directory where traceable/ package lives. Addresses Devin review findings in PR #24
All three conversion scripts had incorrect sys.path.insert() that went up two directories instead of one, causing ModuleNotFoundError at runtime. Fixed files: - convert_decoder_prefill.py:18 - convert_decoder_step.py:14 - convert_text_encoder.py:14 Changed from dirname(dirname(__file__)) to dirname(__file__) to correctly resolve to coreml/ directory where traceable/ package lives. Addresses Devin review findings in PR #24
There was a problem hiding this comment.
Devin Review found 2 new potential issues.
⚠️ 2 issues in files not directly in the diff
⚠️ README references non-existent convert/ subdirectory for conversion scripts (models/tts/magpie/coreml/README.md:79-83)
The README instructs users to run python convert/convert_text_encoder.py, python convert/convert_decoder_prefill.py, etc. (lines 79-83), and lists a "Conversion Scripts (convert/)" section (line 195). However, these files actually live at the root of the coreml/ directory (e.g., convert_text_encoder.py), not in a convert/ subdirectory — the convert/ directory does not exist. Users following these instructions will get FileNotFoundError. The docstrings inside each convert script also reference the wrong convert/ path (e.g., convert_decoder_prefill.py:7, convert_decoder_step.py:4, convert_nanocodec.py:10, convert_text_encoder.py:4).
⚠️ README references non-existent extras/ path for export_pypinyin.py (models/tts/magpie/coreml/README.md:158-160)
The README Mandarin file table (lines 158-160) references extras/export_pypinyin.py as the generator for mandarin_jieba_dict.json, mandarin_pypinyin_char_dict.json, and mandarin_pypinyin_phrase_dict.json. However, the file is at export_pypinyin.py (root of coreml/), not in an extras/ subdirectory. The script's own docstring (export_pypinyin.py:11) also references the wrong path extras/export_pypinyin.py. Users following these instructions will get a FileNotFoundError.
View 12 additional findings in Devin Review.
- Remove duplicate os.makedirs()/mlmodel.save() in convert_decoder_step.py:90-94 - Fix README to reference coreml/ instead of non-existent convert/ subdirectory - Fix README to reference coreml/export_pypinyin.py instead of extras/ Addresses remaining Devin review findings in PR #24
| super().__init__() | ||
| self.snake_channels = original.snake_channels | ||
| self.snake_act = TraceableSnake(original.snake_act) | ||
| self.lrelu = nn.LeakyReLU() |
There was a problem hiding this comment.
🔴 TraceableHalfSnake uses default LeakyReLU slope (0.01) instead of copying the original module's slope
In TraceableHalfSnake.__init__, a fresh nn.LeakyReLU() is created with PyTorch's default negative_slope=0.01, rather than copying the original HalfSnake module's lrelu attribute. NanoCodec is a BigVGAN/HiFi-GAN-based vocoder where the standard LRELU_SLOPE is 0.1 — a 10x difference. The code correctly copies snake_channels and wraps snake_act from original, making the omission of original.lrelu an oversight. Since HalfSnake applies LeakyReLU to half the channels in every activation layer, the wrong slope silently degrades the converted NanoCodec decoder's audio quality.
| self.lrelu = nn.LeakyReLU() | |
| self.lrelu = original.lrelu if hasattr(original, 'lrelu') else nn.LeakyReLU() |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
generate_coreml.py) and PyTorch reference (generate_pytorch.py)Pipeline
text_encoderdecoder_prefilldecoder_stepnanocodec_decoderSource