<a href="https://colab.research.google.com/github/Promila-uwc/DeepFaceStream/blob/master/examples/ipynb/colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Clone Repo

In [1]:
!cd /content
!rm -rf sample_data ChatTTS
!git clone https://github.com/2noise/ChatTTS.git

Cloning into 'ChatTTS'...
remote: Enumerating objects: 2628, done.[K
remote: Counting objects: 100% (806/806), done.[K
remote: Compressing objects: 100% (342/342), done.[K
remote: Total 2628 (delta 537), reused 517 (delta 454), pack-reused 1822 (from 1)[K
Receiving objects: 100% (2628/2628), 7.99 MiB | 9.24 MiB/s, done.
Resolving deltas: 100% (1583/1583), done.


## Install Requirements

In [2]:
!pip install -r /content/ChatTTS/requirements.txt
!ldconfig /usr/lib64-nvidia

Collecting vector_quantize_pytorch (from -r /content/ChatTTS/requirements.txt (line 6))
  Downloading vector_quantize_pytorch-1.20.10-py3-none-any.whl.metadata (29 kB)
Collecting vocos (from -r /content/ChatTTS/requirements.txt (line 8))
  Downloading vocos-0.1.0-py3-none-any.whl.metadata (4.8 kB)
Collecting gradio (from -r /content/ChatTTS/requirements.txt (line 10))
  Downloading gradio-5.7.1-py3-none-any.whl.metadata (16 kB)
Collecting pybase16384 (from -r /content/ChatTTS/requirements.txt (line 11))
  Downloading pybase16384-0.3.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.5 kB)
Collecting pynini==2.1.5 (from -r /content/ChatTTS/requirements.txt (line 12))
  Downloading pynini-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.6 kB)
Collecting WeTextProcessing (from -r /content/ChatTTS/requirements.txt (line 13))
  Downloading WeTextProcessing-1.0.4.1-py3-none-any.whl.metadata (7.2 kB)
Collecting nemo_text_processing (from -r /c

## Import Packages

In [3]:
import torch

torch._dynamo.config.cache_size_limit = 64
torch._dynamo.config.suppress_errors = True
torch.set_float32_matmul_precision("high")

from ChatTTS import ChatTTS
from ChatTTS.tools.logger import get_logger
from ChatTTS.tools.normalizer import normalizer_en_nemo_text, normalizer_zh_tn
from IPython.display import Audio

## Load Models

In [4]:
logger = get_logger("ChatTTS", format_root=True)
chat = ChatTTS.Chat(logger)

# try to load normalizer
try:
    chat.normalizer.register("en", normalizer_en_nemo_text())
except ValueError as e:
    logger.error(e)
except:
    logger.warning("Package nemo_text_processing not found!")
    logger.warning(
        "Run: conda install -c conda-forge pynini=2.1.5 && pip install nemo_text_processing",
    )
try:
    chat.normalizer.register("zh", normalizer_zh_tn())
except ValueError as e:
    logger.error(e)
except:
    logger.warning("Package WeTextProcessing not found!")
    logger.warning(
        "Run: conda install -c conda-forge pynini=2.1.5 && pip install WeTextProcessing",
    )

 NeMo-text-processing :: INFO     :: Creating ClassifyFst grammars.
[+0000 20241202 10:43:00] [[37mINFO[0m] NeMo-text-processing | tokenize_and_classify | Creating ClassifyFst grammars.
2024-12-02 10:43:34,660 WETEXT INFO found existing fst: /usr/local/lib/python3.10/dist-packages/tn/zh_tn_tagger.fst
[+0000 20241202 10:43:34] [[37mINFO[0m] wetext-zh_normalizer | processor | found existing fst: /usr/local/lib/python3.10/dist-packages/tn/zh_tn_tagger.fst
2024-12-02 10:43:34,665 WETEXT INFO                     /usr/local/lib/python3.10/dist-packages/tn/zh_tn_verbalizer.fst
[+0000 20241202 10:43:34] [[37mINFO[0m] wetext-zh_normalizer | processor |                     /usr/local/lib/python3.10/dist-packages/tn/zh_tn_verbalizer.fst
2024-12-02 10:43:34,667 WETEXT INFO skip building fst for zh_normalizer ...
[+0000 20241202 10:43:34] [[37mINFO[0m] wetext-zh_normalizer | processor | skip building fst for zh_normalizer ...


### Here are three choices for loading models,

#### 1. Load models from Hugging Face (recommend)

In [5]:
# use force_redownload=True if the weights have been updated.
chat.load(source="huggingface")

[+0000 20241202 10:43:35] [[37mINFO[0m] ChatTTS | core | download from HF: https://huggingface.co/2Noise/ChatTTS
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 14 files:   0%|          | 0/14 [00:00<?, ?it/s]

DVAE.safetensors:   0%|          | 0.00/60.4M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/853M [00:00<?, ?B/s]

Decoder.safetensors:   0%|          | 0.00/104M [00:00<?, ?B/s]

Vocos.safetensors:   0%|          | 0.00/54.3M [00:00<?, ?B/s]

asset/gpt/config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

Embed.safetensors:   0%|          | 0.00/146M [00:00<?, ?B/s]

asset/tokenizer/special_tokens_map.json:   0%|          | 0.00/7.85k [00:00<?, ?B/s]

asset/tokenizer/tokenizer.json:   0%|          | 0.00/449k [00:00<?, ?B/s]

asset/tokenizer/tokenizer_config.json:   0%|          | 0.00/11.0k [00:00<?, ?B/s]

config/decoder.yaml:   0%|          | 0.00/117 [00:00<?, ?B/s]

config/dvae.yaml:   0%|          | 0.00/143 [00:00<?, ?B/s]

config/gpt.yaml:   0%|          | 0.00/346 [00:00<?, ?B/s]

config/path.yaml:   0%|          | 0.00/309 [00:00<?, ?B/s]

config/vocos.yaml:   0%|          | 0.00/460 [00:00<?, ?B/s]

[+0000 20241202 10:43:57] [[37mINFO[0m] ChatTTS | core | use device cuda:0
[+0000 20241202 10:43:58] [[37mINFO[0m] ChatTTS | core | vocos loaded.
[+0000 20241202 10:43:58] [[37mINFO[0m] ChatTTS | core | dvae loaded.
[+0000 20241202 10:43:59] [[37mINFO[0m] ChatTTS | core | embed loaded.
[+0000 20241202 10:43:59] [[37mINFO[0m] ChatTTS | core | gpt loaded.
[+0000 20241202 10:43:59] [[37mINFO[0m] ChatTTS | core | speaker loaded.
[+0000 20241202 10:43:59] [[37mINFO[0m] ChatTTS | core | decoder loaded.
[+0000 20241202 10:43:59] [[37mINFO[0m] ChatTTS | core | tokenizer loaded.


True

#### 2. Load models from local directories 'asset' and 'config'

In [11]:
chat.load()
# chat.load(source='local') same as above

[+0000 20241202 10:47:29] [[37mINFO[0m] ChatTTS | dl | checking assets...
[+0000 20241202 10:47:33] [[37mINFO[0m] ChatTTS | dl | all assets are already latest.
[+0000 20241202 10:47:33] [[37mINFO[0m] ChatTTS | core | use device cuda:0
[+0000 20241202 10:47:33] [[37mINFO[0m] ChatTTS | core | vocos loaded.
[+0000 20241202 10:47:33] [[37mINFO[0m] ChatTTS | core | dvae loaded.
[+0000 20241202 10:47:33] [[37mINFO[0m] ChatTTS | core | embed loaded.
[+0000 20241202 10:47:34] [[37mINFO[0m] ChatTTS | core | gpt loaded.
[+0000 20241202 10:47:34] [[37mINFO[0m] ChatTTS | core | speaker loaded.
[+0000 20241202 10:47:34] [[37mINFO[0m] ChatTTS | core | decoder loaded.
[+0000 20241202 10:47:34] [[37mINFO[0m] ChatTTS | core | tokenizer loaded.


True

#### 3. Load models from a custom path

In [12]:
# write the model path into custom_path
chat.load(source="custom", custom_path="/content/ChatTTS/ChatTTS")

[+0000 20241202 10:47:34] [[37mINFO[0m] ChatTTS | core | try to load from local: /content/ChatTTS/ChatTTS
[+0000 20241202 10:47:34] [[37mINFO[0m] ChatTTS | dl | checking assets...
[+0000 20241202 10:47:34] [[37mINFO[0m] ChatTTS | dl | /content/ChatTTS/ChatTTS/asset/Decoder.safetensors not exist.
[+0000 20241202 10:47:34] [[31mERRO[0m] ChatTTS | core | check models in custom path /content/ChatTTS/ChatTTS failed.


False

### You can also unload models to save the memory

In [8]:
chat.unload()

## Inference

### Batch infer

In [25]:
texts = [
    "[oral_2][laugh_0][break_6]Hi, I'm Stella, a Psychological First Aid agent created by United We Care.[uv_break][laugh] I'm here to listen and help you with any health or wellness concerns you may have. I'm not a licensed clinician, but I can offer support and connect you with one of our licensed professionals on our platform if you'd like.[uv_break][laugh] Would you like to learn more about our platform or talk about what's on your mind?",
]

wavs = chat.infer(texts)

[+0000 20241202 10:58:11] [[33mWARN[0m] ChatTTS | norm | found invalid characters: {'?', "'"}
text: 100%|██████████| 384/384(max) [00:09, 40.96it/s]
[+0000 20241202 10:58:21] [[33mWARN[0m] ChatTTS | gpt | incomplete result. hit max_new_token: 384
code:  27%|██▋       | 546/2048(max) [00:12, 42.40it/s]


In [26]:
Audio(wavs[0], rate=24_000, autoplay=True)

In [15]:
from IPython.display import Audio

# Assuming 'wavs[0]' contains the path to your audio file (or numpy array for audio data)
audio = Audio(wavs[0], rate=24_000, autoplay=True)
audio


In [24]:
import numpy as np
from scipy.io.wavfile import write

# Assuming `wavs[0]` contains the raw audio data (numpy array) and sample rate
audio_data = wavs[0]  # Your audio data (numpy array)
sample_rate = 24000  # Adjust if necessary

# Save the audio data to a WAV file
write('/content/stella-test-with-laugh.wav', sample_rate, audio_data)

print("Audio saved as /content/my_audio.wav")


Audio saved as /content/my_audio.wav


In [None]:
Audio(wavs[3], rate=24_000, autoplay=True)

### fix random speaker

In [49]:
#rand_spk = chat.sample_random_speaker()
#print(rand_spk)  # save it for later timbre recovery

rand_spk = "蘁淰敥欀樃穸羂蒋谦脫樟砑彊实呝呧淤眳碶揮妋唊牼蝬砫暩俁妃蘱疥欀碢檖浌腓樯貆垚乎覩赲愻斏熦腬窇佴冫腯泛毁襮篜襼瓥挷裐胗娹徢琶茤懚劶実苔叾岓溥棷愑昒弝蔅愜帯彅礪啑苈祙蘨糘偵圇技篝蠦泯儳藲噿廙芐全悗扥螋巴渧沗嘯墢櫁粟諠昣潗睳唾懔冨茳竢斍擲稖剁婵纻谧葇忇憠癐蘏睂嬌廜兪湩褙臯欼趘璔挆丞樴赧糌莌抵薙趗孛漝犨椑杠蒏楆聐胳烟军趿溱翶壉莯喭汌怽殮啅菴瑈翥攧斺氝坟滂侵方呅俆烍肑瘌撇議敌溋缂熮攚詟绅菴暐牳擆執受庢競噬棃慀秩梾腞牓璘呍務獆嗾淞毛瓩也粙艈杢媋桴賒瘟讬噮櫔包愐洰帷烇畄噢牵歅茩琭很屣岴蘥懕脓姫慀褺璽稙沭拄珈儜佺谺昗怅剣触愚畐煭潒僈滴笉憤椉萓诣稃矆侢惏笀朢嘮糬莻俯焯慭蒎胡嘜室塂蠈嗿禡撕未卼乇挮寧丵屳謞嘑烻翄訿瀬櫢瓫怆瑃茱擂綜牦嵇兗瀴姂嵹砧磂欳岠睱懅仛告绢粫灒檚磸砸娥椥胬孼曳岢懇江嗦槪婓埂嘇杞桿卼枣嗏祚煡揙綹痱狷汰聾耯矴榃處养灦愙潦熩惄氞堂熵匳瘼廉洹圗埨禟皘祡蚕講橅蘖掋緓垐拏姑禮樸谯皿耵歭贀蚻猝噀洏堝嚞蠮誁苒冝摒毳牷啳吭癋蠨瘼菗懺茐俚棰綞姍悍岴烧皌婂枀蚼矄妑瓳炚縚勉伳跟畞慪狊祗朮巈繺嫰晹吒贤苯蒀徣葱然璟僺惟巎璭苉畓訢姇薍嬁眉嶒哏礙搿罡儜痁磸显儫紃碊怏洧凤瀞夓囫二暤兒葪牱緧医埧賣啈诙杨熋灀昨惙穉杹煻筬勛妬篴简磲萮犚萲搠蜾梀癋嗍聎忿英佪倜蝝莛烇榜彨滻帮汉懊澴製弊壤湓殭姙礯蚚皝珓楻峍赴倃蠣翦沍毴誺櫺亣唕升瑭芎賌谠匶綝娎桴櫱窞勾楾譄袕嘀櫛茊圻橔彭觗觰豋耯跛蘞舋嬢奛证廽垏读畢孱瑹萰涉蛎歐怊娵謁厪儑尝芑虧峧覃订藲佞懔攑啱箤溞蓼笗撚男翰怿怪贔賊朦贷悒朁偶誤笍莡洣樻泭浄搾歾槜禝菰獅偶滖牥伊艻婡琻硐冐蕭株羌筽劷櫟舩藰杤籎濏萔前跱厁噶旅虶繚誄塺憭觷叇壡喍擔拷脹蟩沰粖仅嚮葽膢蓦胒凒捼堰攴朻芨禖憮樛勭揉譚舨俭漘媈萞崨绽傅旿趗滎碒貲襔褗獏蚵峌賹筏衵誛徫它埓俷潬瑉栃湣棘份怑之讋殢囖瀡塽縝襫瑦澆螪姓撅畏腜蛣烠熴榏憊縭庻涞蠩誺玛愊誏橪玻岎潁衃呆皦潇勶泊憝濖挈构祖莍汸怈妷姮浊荈浪尋莺荃揪蚤纐濆泗媦櫢舟籵儐団捼衂珻痭於罒徯槺旮藠殿觪欆嗆喞漕畽櫼灩苫椟诊屚语晻斨毿徖巌矓蓗薴詄廚嚆撫搃瓄嬅熶桹屧澒訍倻侤崟蜡擊淩儃胨姎养虎巊葐秀类喕派湗皽恁嵩芸叕殪簀"

params_infer_code = ChatTTS.Chat.InferCodeParams(
    spk_emb=rand_spk,
    temperature=0
)

wav = chat.infer(
    "Hi, I'm Stella, a Psychological First Aid agent created by United We Care. I'm here to listen and help you with any health or wellness concerns you may have. I'm not a licensed clinician, but I can offer support and connect you with one of our licensed professionals on our platform if you'd like. Would you like to learn more about our platform or talk about what's on your mind?",
    params_infer_code=params_infer_code,
)

[+0000 20241202 11:27:57] [[33mWARN[0m] ChatTTS | norm | found invalid characters: {'?', "'"}


RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


In [38]:
Audio(wav[0], rate=24_000, autoplay=True)

In [41]:
Audio(wav[0], rate=24_000, autoplay=True)

In [44]:
Audio(wav[0], rate=24_000, autoplay=True)

In [45]:
import numpy as np
from scipy.io.wavfile import write

# Assuming `wavs[0]` contains the raw audio data (numpy array) and sample rate
audio_data = wavs[0]  # Your audio data (numpy array)
sample_rate = 24000  # Adjust if necessary

# Save the audio data to a WAV file
write('/content/fix-random-voice-1.wav', sample_rate, audio_data)

print("Audio saved as /content/my_audio.wav")

Audio saved as /content/my_audio.wav


### Zero shot (simulate speaker)

In [20]:
from ChatTTS.tools.audio import load_audio

spk_smp = chat.sample_audio_speaker(load_audio("/content/leann.wav", 24000))
print(spk_smp)  # save it in order to load the speaker without sample audio next time

params_infer_code = ChatTTS.Chat.InferCodeParams(
    spk_smp=spk_smp,
    txt_smp="与sample.mp3内容完全一致的文本转写。",
)

wav = chat.infer(
    "Hi, I'm Stella, a Psychological First Aid agent created by United We Care. I'm here to listen and help you with any health or wellness concerns you may have. I'm not a licensed clinician, but I can offer support and connect you with one of our licensed professionals on our platform if you'd like. Would you like to learn more about our platform or talk about what's on your mind?",
    params_infer_code=params_infer_code,
)

[+0000 20241202 07:53:42] [[33mWARN[0m] ChatTTS | norm | found invalid characters: {'?', "'"}


伀传喀嚏伱珐佈乐搩愃赑暊喚卼懊勞尟聃蒫稈剭燳擬灊梏壕柰謢螮菈賕奂訢弑舞巠早狙祊臼瘱覺巛峘渫泬紫禘葅搂畦柋糕肩暬儐紩編癃瘔刀傩呹聞栧灴蠅蒇絹湄葀罛吆僰絈洟罭换审峷服腍滓糒勍嶁沨橸殡憳奨琏祑蕶欒榇贱蓽簓謓眇諸瘪睂膞墛丷盦潷樕碆涓捵尤炙沄嘢穏詈聶招虒偦痒楙諥歙戗仍佘潥珜斂茇晊肆檽僗謧跺璭赺訝搕六瀅瓖螯捌烵笺唞诙衡埶紋覢科棑舩撼箣聤痹嵘缓盩囥滜羞暃惺屔胾帯绥薯孃筬廽嵾葫洮蓒曰曄痵疷氌紌畣蛩艸憺綯訚萓螌建憇脹濬摐柴茡嚂佚赭詫苣犞仜帅愰芐豜厄兒于礗卥槷蜎弧泄疴吷痗烢烛猎絼蛃槥喂嵽噪恝堉昿塶氎簶屨瞖榱庨圬穽箎哆廾諳泉衱攼蔠烇婋戓纱傡虬熁沷芉痴乩曷箲泣峸恴芀亓煶其呄砿徼剙寂蜮喃杩蠜睆莳散乻匹忡虓抱玣译緈灗悑圈腸堓泅懏埛縄薴槒摪揗墺覔止营屝簒湺墉熃熦弱抱纵褃莹巀嚷牯芕瀼夐痨姘泓右瑠炨垉慬盏菜榽潊洧枅蝌慧趱乡剀中焥诋瘝袌殁碓匆蚵桭杬厡悕忎従伎呞碻欢珌摰簪稿洼乜課摑贉弈乳盤氪唰豉漥莸胤绱玧帮嚎紋睢櫻胿蟸诖爆绪匐忂妾殽朕児棼伺谪諕帞奡犴瘃岇蕢媻妮寓儧暗喩菼趷蕰椺嫾櫰習儺櫧薾缝英端瑡亇峅癍瀏硅盖燁畊咟扶缨諴坐谩砅毉滎蝅詥嚆疽歑崮淪璊谜畲爬厾蘩蒓懷裋桬媔盓紘簹嫢禝夽咞墒慱萿謧倴嵮岠屓蚙渾垑律衃慗袴荩砱蒸蕽劄点瓷廒盌弓傲褝犜絨囖偫瓦埤仜汇搃簔惯萿寲蠱椶裿蚶玭浈珱蓇盌憹贆癲翹灿痖焠徇嘊朶兼礂哀甋籱乁欄桙跮绿劑策儛暔懇惤啨琱减殈貛塌檟呈琺紳蜽賓甿蛽兌姒瓋喛眹慟篗瀍瓋垹举拙憋埦囋猽枌妜谍悯嗠挊箠賝停睾审扌傾刕烡管嚻蘊劁嫣罽貭崃溺恅覯佑蓩有瓂品裁喜瘜揺僆竅凊豲勊犎蘡漏蚓崁証栞刹槷襽泥琞埖账罣節夹嶐刋彯垃岌浼裰五罘焀涬糚冈肀挡壕肾劓殈嬒崖蕮稷泍誦蜪胈伺聱橕罽明穴嶏犣兠冰蕢径緑憛禂孁癠沅跭八叻磕苹砑跣壅疐煲稌术丠牁襻份呄脎嗙夂一一㴂


text:  31%|███       | 118/384(max) [00:02, 44.63it/s]
code:  53%|█████▎    | 1077/2048(max) [00:26, 40.98it/s]


In [21]:
Audio(wav[0], rate=24_000, autoplay=True)

### Two stage control

In [None]:
text = "So we found being competitive and collaborative was a huge way of staying motivated towards our goals, so one person to call when you fall off, one person who gets you back on then one person to actually do the activity with."
refined_text = chat.infer(text, refine_text_only=True)
refined_text

In [None]:
wav = chat.infer(refined_text, skip_refine_text=True)

In [None]:
Audio(wav[0], rate=24_000, autoplay=True)