# Kokoro Text To Speech

* Kokoro is a light weight, but strong Text to Speach Library on Hugging Face
    * [HuggingFace: hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M/tree/main)
* Python Setup
    * This video describes setup: [Kokoro-82M: Install and Run Locally Fast, Small, and Free Text to Speech AI Model Kokoro-82M](https://www.youtube.com/watch?v=bs45W7CEGps)
    * Install espeak (use the espeak-ng.msi installer on windows): [Espeak Releases](https://github.com/espeak-ng/espeak-ng/releases)
    * Install python packages in the LLM project using UV:
        * uv add kokoro
        * uv add soundfile
        * uv add https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0.tar.gz
* Kokoro Voice Options
    * [Dozens of voice options](https://huggingface.co/hexgrad/Kokoro-82M/tree/main/voices)
    * Voices are prefixed with af or am, for american female or american male
    * Voices are prefixed with bf or bm for british female or british male 

In [None]:
from kokoro import KPipeline
import soundfile as sf
# US 'a' => American English, GB 'b' British English
# JP 'j' => Japanese (uv pip install additional language models separately - misaki[ja])
# CN 'z' => Mandarin Chinese (uv pip install additional language models separately - misaki[zh])

try:
    pipeline = KPipeline(lang_code='a')
except SystemExit as se:
    print("KPipeline initialization failed. Ensure en_core_web_sm is pip installed.")
except Exception as e:
    print("Error initializing Kokoro pipeline:", e)
    
text = '''
Hello, this is a test of the Kokoro text to speech system.
This is an example of generating speech from text using a lightweight model.
Kokoro is designed to be efficient and effective for various text-to-speech applications.
Enjoy exploring the capabilities of Kokoro for your text-to-speech needs!
'''
generator = pipeline(
    text,
    voice = 'af_heart', #'hf_alpha', #'ff_siwis', #'bm_george', #'bm_daniel', #'bf_lily', #'bf_isabella', #'bf_emma','hm_omega' , #'am_eric',
    speed = 1.0,
    split_pattern=r'\n+'
)
for i, (gs, ps, audio) in enumerate(generator):
    print(i) # i => index of chunk
    print(gs) # gs =< graphemes/text
    print(ps) #ps => phonemes
    sf.write(f'./audioOutput/kokoro_output_{i}.wav', audio, 24000) #save each audio file chunk
