## Imports

In [None]:
from huggingface_hub import hf_hub_download
from IPython.display import display, Audio
from kokoro import KModel, KPipeline
import soundfile as sf

## Download model

In [None]:
REPO_ID = "hexgrad/Kokoro-82M"

In [None]:
model_path = hf_hub_download(
    repo_id=REPO_ID,
    filename=KModel.MODEL_NAMES[REPO_ID],
    local_dir="../models/kokoro",
    force_download=False,  # Set to True to force redownload even if the file exists
)
model_path

## Initialize model

In [None]:
%%time
model = KModel(repo_id=REPO_ID, model=model_path)

## Initialize pipeline

KPipeline is a language-aware support class with 2 main responsibilities:
1. Perform language-specific G2P (Grapheme-to-Phoneme), mapping (and chunking) text -> phonemes
2. Manage and store voices, lazily downloaded from HF if needed

You are expected to have one KPipeline per language. If you have multiple KPipelines, you should reuse one KModel instance across all of them.

By default, KPipeline will automatically initialize its own KModel (`model=True`). With `model=False` we construct a "quiet" KPipeline, which means that KPipeline yields (graphemes, phonemes, None) without generating any audio. You can use this to phonemize and chunk your text in advance.

A "loud" KPipeline _with_ a model yields (graphemes, phonemes, audio).

Args:
    lang_code: Language code for G2P processing
    model: KModel instance, True to create new model, False for no model (default: True)
    trf: Whether to use transformer-based G2P (default: False)
    device: Override default device selection ('cuda' or 'cpu', or None for auto)
        If None, will auto-select cuda if available
        If 'cuda' and not available, will explicitly raise an error

### Language codes

```python
LANG_CODES = dict(
    # pip install misaki[en]
    a='American English',
    b='British English',

    # espeak-ng
    e='es',
    f='fr-fr',
    h='hi',
    i='it',
    p='pt-br',

    # pip install misaki[ja]
    j='Japanese',

    # pip install misaki[zh]
    z='Mandarin Chinese',
)
```

In [None]:
%%time
pipeline = KPipeline(lang_code="a", repo_id=REPO_ID, model=model, device="cpu")

## Generate, display, and save audio files in a loop

See voices samples [here](https://huggingface.co/onnx-community/Kokoro-82M-v1.0-ONNX#voicessamples).

```python

def pipeline(
    text: str | List[str],
    voice: str | None = None,
    speed: float | ((int) -> float) = 1,
    split_pattern: str | None = r'\n+',
    model: KModel | None = None
) -> Generator[Result, None, None]
```

In [None]:
text = '''
[Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, [Kokoro](/kˈOkəɹO/) can be deployed anywhere from production environments to personal projects.
'''
generator = pipeline(text, voice='af_heart')

In [None]:
%%time

for i, (graphemes, phonemes, audio) in enumerate(generator):
    print(
        "-------------------------------\n"
        f"Audio {i}:\n"
        f"  Graphemes: {graphemes}\n"
        f"  Phonemes: {phonemes}\n"
    )
    display(
        Audio(
            data=audio,
            rate=24000,  # Default sample rate for Kokoro. Increasing this value will accelerate the playback speed
            autoplay=i==0  # Autoplays the first audio and not the others
        )
    )
    # Save the audio to a file
    # sf.write(f"../data/output_audio/{i}.wav", audio, 24000)