Skip to content

feat: add Magpie TTS CoreML conversion pipeline#24

Merged
Alex-Wengg merged 4 commits intomainfrom
feat/magpie-tts
Mar 21, 2026
Merged

feat: add Magpie TTS CoreML conversion pipeline#24
Alex-Wengg merged 4 commits intomainfrom
feat/magpie-tts

Conversation

@Alex-Wengg
Copy link
Copy Markdown
Member

@Alex-Wengg Alex-Wengg commented Mar 13, 2026

Summary

  • Add NVIDIA Magpie TTS Multilingual (357M) CoreML conversion pipeline as a submodule
  • Complete 4-model pipeline: text encoder, decoder prefill, decoder step (AR), NanoCodec vocoder
  • 9 languages (en, es, de, fr, it, vi, zh, hi, ja), 5 built-in speakers, float16 CoreML
  • Includes export scripts for embeddings, tokenizers, local transformer weights, and pypinyin/OpenJTalk dictionaries
  • Pure CoreML inference script (generate_coreml.py) and PyTorch reference (generate_pytorch.py)

Pipeline

Model Purpose
text_encoder Text → conditioning vectors
decoder_prefill Batch speaker context into KV cache
decoder_step Single AR step with KV cache (~50-200x per utterance)
nanocodec_decoder Codec tokens → 22kHz audio

Source


Open with Devin

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

NVIDIA Magpie TTS Multilingual (357M) conversion to CoreML.

Pipeline (4 models):
- text_encoder: text tokenization and encoding
- decoder_prefill: batch speaker context into KV cache
- decoder_step: single AR step with KV cache
- nanocodec_decoder: codec tokens to 22kHz audio

9 languages (en, es, de, fr, it, vi, zh, hi, ja), 5 speakers.

Includes conversion scripts, traceable wrappers, export scripts
for embeddings/tokenizers/weights, and CoreML inference script.

Source: nvidia/magpie_tts_multilingual_357m
devin-ai-integration[bot]

This comment was marked as resolved.

@Alex-Wengg Alex-Wengg marked this pull request as draft March 15, 2026 15:17
- Fix constants_dir path to use single dirname (script is in coreml/ not coreml/convert/)
- Move pyproject.toml and uv.lock to coreml/ directory to follow AGENTS.md structure

Fixes Devin review findings:
- BUG_pr-review-job-b93938a5c0ea4e7897c7782fcd2dbe59_0002
- BUG_pr-review-job-b93938a5c0ea4e7897c7782fcd2dbe59_0003
@Alex-Wengg Alex-Wengg marked this pull request as ready for review March 21, 2026 01:40
devin-ai-integration[bot]

This comment was marked as resolved.

Alex-Wengg added a commit that referenced this pull request Mar 21, 2026
All three conversion scripts had incorrect sys.path.insert() that went up
two directories instead of one, causing ModuleNotFoundError at runtime.

Fixed files:
- convert_decoder_prefill.py:18
- convert_decoder_step.py:14
- convert_text_encoder.py:14

Changed from dirname(dirname(__file__)) to dirname(__file__) to correctly
resolve to coreml/ directory where traceable/ package lives.

Addresses Devin review findings in PR #24
All three conversion scripts had incorrect sys.path.insert() that went up
two directories instead of one, causing ModuleNotFoundError at runtime.

Fixed files:
- convert_decoder_prefill.py:18
- convert_decoder_step.py:14
- convert_text_encoder.py:14

Changed from dirname(dirname(__file__)) to dirname(__file__) to correctly
resolve to coreml/ directory where traceable/ package lives.

Addresses Devin review findings in PR #24
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

⚠️ 2 issues in files not directly in the diff

⚠️ README references non-existent convert/ subdirectory for conversion scripts (models/tts/magpie/coreml/README.md:79-83)

The README instructs users to run python convert/convert_text_encoder.py, python convert/convert_decoder_prefill.py, etc. (lines 79-83), and lists a "Conversion Scripts (convert/)" section (line 195). However, these files actually live at the root of the coreml/ directory (e.g., convert_text_encoder.py), not in a convert/ subdirectory — the convert/ directory does not exist. Users following these instructions will get FileNotFoundError. The docstrings inside each convert script also reference the wrong convert/ path (e.g., convert_decoder_prefill.py:7, convert_decoder_step.py:4, convert_nanocodec.py:10, convert_text_encoder.py:4).


⚠️ README references non-existent extras/ path for export_pypinyin.py (models/tts/magpie/coreml/README.md:158-160)

The README Mandarin file table (lines 158-160) references extras/export_pypinyin.py as the generator for mandarin_jieba_dict.json, mandarin_pypinyin_char_dict.json, and mandarin_pypinyin_phrase_dict.json. However, the file is at export_pypinyin.py (root of coreml/), not in an extras/ subdirectory. The script's own docstring (export_pypinyin.py:11) also references the wrong path extras/export_pypinyin.py. Users following these instructions will get a FileNotFoundError.

View 12 additional findings in Devin Review.

Open in Devin Review

- Remove duplicate os.makedirs()/mlmodel.save() in convert_decoder_step.py:90-94
- Fix README to reference coreml/ instead of non-existent convert/ subdirectory
- Fix README to reference coreml/export_pypinyin.py instead of extras/

Addresses remaining Devin review findings in PR #24
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 12 additional findings in Devin Review.

Open in Devin Review

super().__init__()
self.snake_channels = original.snake_channels
self.snake_act = TraceableSnake(original.snake_act)
self.lrelu = nn.LeakyReLU()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 TraceableHalfSnake uses default LeakyReLU slope (0.01) instead of copying the original module's slope

In TraceableHalfSnake.__init__, a fresh nn.LeakyReLU() is created with PyTorch's default negative_slope=0.01, rather than copying the original HalfSnake module's lrelu attribute. NanoCodec is a BigVGAN/HiFi-GAN-based vocoder where the standard LRELU_SLOPE is 0.1 — a 10x difference. The code correctly copies snake_channels and wraps snake_act from original, making the omission of original.lrelu an oversight. Since HalfSnake applies LeakyReLU to half the channels in every activation layer, the wrong slope silently degrades the converted NanoCodec decoder's audio quality.

Suggested change
self.lrelu = nn.LeakyReLU()
self.lrelu = original.lrelu if hasattr(original, 'lrelu') else nn.LeakyReLU()
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant