# CIS Demo

## Dependencies and Imports

In [None]:
#@title Install dependencies

!pip install -q silero

import torch
from IPython.display import Audio, display

## Load model

Available models:
- v5_cis_base
- v5_cis_base_nostress
- v5_cis_ext

**Important notes:**
- `v5_cis_base` and `v5_cis_ext` models assume that proper stress should be added for each word for all languages, i.e. к+ошка;
- `v5_cis_base_nostress` models assume that proper stress should be added for each word ONLY for slavic languages (i.e. ru, bel, ukr);
- It is recommended to select a speaker that matches the target language. For example, to generate text in Kazakh, select `kaz_zhadyra`. To generate Russian text with the same voice, select `ru_zhadyra`.

In [None]:
from silero import silero_tts

model_id = 'v5_cis_base_nostress'

device = torch.device('cpu')

model, example_text = silero_tts(language='ru',
                                 speaker=model_id)
model.to(device)  # gpu or cpu

In [None]:
val_texts = {
    'aze': 'Mən hər səhər erkən qalxıb təzə hava ilə məşq edirəm.',
    'bak': 'Мин һәр саңғыраҡ тауҙа түбәнәгендә йәйенә йөҙөп йөрөйм.',
    'bel': '+Я в+ечарам любл+ю чыт+аць цік+авыя кн+ігі пры св+ятле нач+овай л+ямпы.',
    'chv': 'Эпĕ ача чухнех пиччĕшсемпе юнашар кĕтӳльех вăйă вылянă.',
    'erz': 'Монь веленек шачемсёномань панжовксонть кис эрьва кизонь туема.',
    'hye': 'Ես շաբաթ օրերին սիրում եմ երկար զբոսնել անտառով:',
    'kat': 'მე ძალიან მიყვარს ჩემი ოჯახის წევრებთან ერთად დროის გატარება.',
    'kaz': 'Мен балалық шақта жаңа досдармен танысуды әбден ұнататынмын.',
    'kbd': 'Сэ уиӀуанэ уашъхъэри унагъуэхэри сэбэп хъущтыр сыту щӀэлъэӀу.',
    'kir': 'Мен мектепте окуп жүргөндө эң жакшы досум менен тааныштым.',
    'kjh': 'Мин аал чоньчарға пастабахсынар хайдиғырам хынаңның хоный.',
    'mdf': 'Монь тяштеть эзда кизонь карьхть сельметь кштинь аф лац.',
    'sah': 'Мин бүгүн оройунан саһарҕа оонньуу сылдьан сымнаҕыстык утуйбутум.',
    'tat': 'Мин ерак түгел урман эчендә чиста һавада йөргәне яратам.',
    'tgk': 'Ман дар бораи хонаи нави худ дар канори дарё хондем.',
    'udm': 'Мон ашалэ тӥлед нуналлы огы быдэсэ кошко учке.',
    'ukr': '+Я з р+аннього дит+инства д+уже любл+ю сл+ухати цік+аві к+азки.',
    'uzb': "Men bolaligimda ko'pincha do'stlarim bilan hovlida futbol o'ynardim.",
    'xal': 'Би эцкд сарин җилин дуулҗана хойр седклтә күрәм.'
}

### List speakers

In [None]:
sorted(model.speakers)

## Example

### Slavic(ru/bel/ukr)

To automatically place stress marks, you can use the `silero-stress` library. An example is given below

In [None]:
# v5_cis_base_nostress
sample_rate = 48000
speaker = 'ukr_igor'

example_text = '+Я з р+аннього дит+инства д+уже любл+ю сл+ухати цік+аві к+азки.'

audio = model.apply_tts(text=example_text,
                        speaker=speaker,
                        sample_rate=sample_rate)
print(example_text)
display(Audio(audio, rate=sample_rate))

In [None]:
# v5_cis_base_nostress
sample_rate = 48000
speaker = 'ru_zhadyra'

example_text = 'брод+ить с дожд+ём п+од +окнами тво+ими.'

audio = model.apply_tts(text=example_text,
                        speaker=speaker,
                        sample_rate=sample_rate)
print(example_text)
display(Audio(audio, rate=sample_rate))

### Not slavic

In [None]:
# v5_cis_base_nostress
sample_rate = 48000
speaker = 'kaz_zhadyra'

example_text = 'Мен балалық шақта жаңа досдармен танысуды әбден ұнататынмын.'

audio = model.apply_tts(text=example_text,
                        speaker=speaker,
                        sample_rate=sample_rate)
print(example_text)
display(Audio(audio, rate=sample_rate))

In [None]:
sample_rate = 48000
speaker = 'hye_zara'

example_text = 'Ես շաբաթ օրերին սիրում եմ երկար զբոսնել անտառով:'

audio = model.apply_tts(text=example_text,
                        speaker=speaker,
                        sample_rate=sample_rate)
print(example_text)
display(Audio(audio, rate=sample_rate))

# Accentor Demo

In case, you need word stress, but you don't want to manually annotate texts, we provide supplementary accentuation modules as a part of `silero-stress` project.

- We provide full-fledged accentor and `ё`-ficator trained on large vocab with homograph disambiguation for Russian language.

- We provide accentor trained on large vocab for Ukrainian language.

- And we provide manually annotated dictionaries with minimalistic wrapper for another languages.

In [None]:
!pip install -q silero-stress

### Russian / Ukrainian

In [None]:
from silero_stress import load_accentor

In [None]:
accentor = load_accentor(lang='ru')  # lang could be "ru" / "ukr"
sample_sent = "В недрах тундры выдры в гетрах тырят в ведра ядра кедров."
print(accentor(sample_sent))

### Other Languages

Basically, there is no "accentor" for other languages, but we released stress dictionaries with some minimalistic wrapper.

In [None]:
sample_texts = {
    # if you need "aze" language, you need to specify which layout do you use - latin or cyrillic
    'aze_lat': 'Mən hər səhər erkən qalxıb təzə hava ilə məşq edirəm.',
    'aze_cyr': 'Мән һәр сәһәр еркән галхыб тәзә һава ылә мәшг едырәм.',
    'bak': 'Мин һәр саңғыраҡ тауҙа түбәнәгендә йәйенә йөҙөп йөрөйм.',
    'bel': 'Я вечарам люблю чытаць цікавыя кнігі пры святле начовай лямпы.',
    'chv': 'Эпĕ ача чухнех пиччĕшсемпе юнашар кĕтӳльех вăйă вылянă.',
    'erz': 'Монь веленек шачемсёномань панжовксонть кис эрьва кизонь туема.',
    'hye': 'Ես շաբաթ օրերին սիրում եմ երկար զբոսնել անտառով:',
    'kat': 'მე ძალიან მიყვარს ჩემი ოჯახის წევრებთან ერთად დროის გატარება.',
    'kaz': 'Мен балалық шақта жаңа досдармен танысуды әбден ұнататынмын.',
    'kbd': 'Сэ уиӀуанэ уашъхъэри унагъуэхэри сэбэп хъущтыр сыту щӀэлъэӀу.',
    'kir': 'Мен мектепте окуп жүргөндө эң жакшы досум менен тааныштым.',
    'kjh': 'Мин аал чоньчарға пастабахсынар хайдиғырам хынаңның хоный.',
    'mdf': 'Монь тяштеть эзда кизонь карьхть сельметь кштинь аф лац.',
    'sah': 'Мин бүгүн оройунан саһарҕа оонньуу сылдьан сымнаҕыстык утуйбутум.',
    'tat': 'Мин ерак түгел урман эчендә чиста һавада йөргәне яратам.',
    'tgk': 'Ман дар бораи хонаи нави худ дар канори дарё хондем.',
    'udm': 'Мон ашалэ тӥлед нуналлы огы быдэсэ кошко учке.',
    # if you need "uzb" language, you need to specify which layout do you use - latin or cyrillic
    'uzb_lat': "Men bolaligimda ko'pincha do'stlarim bilan hovlida futbol o'ynardim.",
    'uzb_cyr': "Мен болалигимда кўпинча дўстларим билан ҳовлида футбол ўйнардим.",
    'xal': 'Би эцкд сарин җилин дуулҗана хойр седклтә күрәм.'
}

In [None]:
from silero_stress.simple_accentor import SimpleAccentor

In [None]:
for lang in sample_texts:
    accentor = SimpleAccentor(lang=lang)
    print(sample_texts[lang])
    print(accentor(sample_texts[lang]))
    print()

Ditto-Talking-Head

In [1]:
!git clone https://github.com/antgroup/ditto-talkinghead

Cloning into 'ditto-talkinghead'...
remote: Enumerating objects: 226, done.[K
remote: Counting objects: 100% (163/163), done.[K
remote: Compressing objects: 100% (130/130), done.[K
remote: Total 226 (delta 34), reused 129 (delta 23), pack-reused 63 (from 2)[K
Receiving objects: 100% (226/226), 6.86 MiB | 46.24 MiB/s, done.
Resolving deltas: 100% (37/37), done.


In [4]:
!ls ditto-talkinghead/

core		  inference.py	scripts
environment.yaml  LICENSE	stream_pipeline_offline.py
example		  README.md	stream_pipeline_online.py


In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [1]:
import numpy
!pip install \
    librosa \
    tqdm \
    filetype \
    imageio \
    opencv_python_headless \
    scikit-image \
    cython \
    cuda-python \
    imageio-ffmpeg \
    colored \
    polygraphy \
    numpy==2.0.1



In [2]:
!pip install ffmpeg

Collecting ffmpeg
  Downloading ffmpeg-1.4.tar.gz (5.1 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: ffmpeg
  Building wheel for ffmpeg (setup.py) ... [?25l[?25hdone
  Created wheel for ffmpeg: filename=ffmpeg-1.4-py3-none-any.whl size=6083 sha256=585cb723ef4135c31a146dd3f5f4af3e5905e1d7c6e02825d48ccd464bbc44eb
  Stored in directory: /root/.cache/pip/wheels/26/21/0c/c26e09dff860a9071683e279445262346e008a9a1d2142c4ad
Successfully built ffmpeg
Installing collected packages: ffmpeg
Successfully installed ffmpeg-1.4


In [3]:
!git lfs install
!git clone https://huggingface.co/digital-avatar/ditto-talkinghead checkpoints

Git LFS initialized.
Cloning into 'checkpoints'...
remote: Enumerating objects: 84, done.[K
remote: Counting objects: 100% (80/80), done.[K
remote: Compressing objects: 100% (79/79), done.[K
remote: Total 84 (delta 17), reused 0 (delta 0), pack-reused 4 (from 1)[K
Unpacking objects: 100% (84/84), 23.07 KiB | 1.36 MiB/s, done.
Filtering content: 100% (40/40), 6.45 GiB | 52.41 MiB/s, done.


In [5]:
!python ditto-talkinghead/inference.py \
    --data_root "./checkpoints/ditto_trt_Ampere_Plus" \
    --cfg_pkl "./checkpoints/ditto_cfg/v0.4_hubert_cfg_trt.pkl" \
    --audio_path "./example/audio.wav" \
    --source_path "./example/image.png" \
    --output_path "./tmp/result.mp4"

In file included from [01m[K/usr/local/lib/python3.12/dist-packages/numpy/_core/include/numpy/ndarraytypes.h:1909[m[K,
                 from [01m[K/usr/local/lib/python3.12/dist-packages/numpy/_core/include/numpy/ndarrayobject.h:12[m[K,
                 from [01m[K/usr/local/lib/python3.12/dist-packages/numpy/_core/include/numpy/arrayobject.h:5[m[K,
                 from [01m[K/root/.pyxbld/temp.linux-x86_64-cpython-312/content/ditto-talkinghead/core/utils/blend/blend.c:1259[m[K:
      |  [01;35m[K^~~~~~~[m[K
Traceback (most recent call last):
  File "/content/ditto-talkinghead/core/utils/tensorrt_utils.py", line 12, in <module>
    import tensorrt as trt
ModuleNotFoundError: No module named 'tensorrt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/content/ditto-talkinghead/inference.py", line 80, in <module>
    SDK = StreamSDK(cfg_pkl, data_root)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/con

In [6]:
!pip install nvidia-pyindex

Collecting nvidia-pyindex
  Downloading nvidia-pyindex-1.0.9.tar.gz (10 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: nvidia-pyindex
  Building wheel for nvidia-pyindex (setup.py) ... [?25l[?25hdone
  Created wheel for nvidia-pyindex: filename=nvidia_pyindex-1.0.9-py3-none-any.whl size=8419 sha256=71851bfa8cd230894b379e65a22938b2b9236571cb328c75dac562f3207ae35c
  Stored in directory: /root/.cache/pip/wheels/eb/2d/7f/d86cb060a9c51fb933aa4fe0d2f73ffe8df2bd0b58d3d2bba4
Successfully built nvidia-pyindex
Installing collected packages: nvidia-pyindex
Successfully installed nvidia-pyindex-1.0.9


In [9]:
!pip install nvidia-tensorrt

Collecting nvidia-tensorrt
  Downloading nvidia_tensorrt-99.0.0-py3-none-manylinux_2_17_x86_64.whl.metadata (596 bytes)
Collecting tensorrt (from nvidia-tensorrt)
  Downloading tensorrt-10.14.1.48.post1.tar.gz (16 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting tensorrt_cu13==10.14.1.48.post1 (from tensorrt->nvidia-tensorrt)
  Downloading tensorrt_cu13-10.14.1.48.post1.tar.gz (18 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting tensorrt_cu13_libs==10.14.1.48.post1 (from tensorrt_cu13==10.14.1.48.post1->tensorrt->nvidia-tensorrt)
  Downloading tensorrt_cu13_libs-10.14.1.48.post1.tar.gz (726 bytes)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tensorrt_cu13_bindings==10.14.1.48.post1 (from tensorrt_cu13==10.14.1.48.post1->tensorrt->nvidia-tensorrt)
  Downloading tensorrt_cu13_bindings-10.14.1.48.post1-cp312-none-man

In [10]:
# Uninstall the conflicting CUDA 13 components
!pip uninstall -y nvidia-tensorrt tensorrt tensorrt_cu13 tensorrt_cu13_bindings tensorrt_cu13_libs cuda-toolkit nvidia-cuda-runtime

# Clear pip cache to ensure fresh downloads if needed
!pip cache purge

Found existing installation: nvidia-tensorrt 99.0.0
Uninstalling nvidia-tensorrt-99.0.0:
  Successfully uninstalled nvidia-tensorrt-99.0.0
Found existing installation: tensorrt 10.14.1.48.post1
Uninstalling tensorrt-10.14.1.48.post1:
  Successfully uninstalled tensorrt-10.14.1.48.post1
Found existing installation: tensorrt_cu13 10.14.1.48.post1
Uninstalling tensorrt_cu13-10.14.1.48.post1:
  Successfully uninstalled tensorrt_cu13-10.14.1.48.post1
Found existing installation: tensorrt_cu13_bindings 10.14.1.48.post1
Uninstalling tensorrt_cu13_bindings-10.14.1.48.post1:
  Successfully uninstalled tensorrt_cu13_bindings-10.14.1.48.post1
Found existing installation: tensorrt_cu13_libs 10.14.1.48.post1
Uninstalling tensorrt_cu13_libs-10.14.1.48.post1:
  Successfully uninstalled tensorrt_cu13_libs-10.14.1.48.post1
Found existing installation: cuda-toolkit 13.0.1
Uninstalling cuda-toolkit-13.0.1:
  Successfully uninstalled cuda-toolkit-13.0.1
Found existing installation: nvidia-cuda-runtime 13.

In [11]:
# Try installing a specific TensorRT version known to work with CUDA 12, e.g., TensorRT 8.6.1 for CUDA 12
# Note: This might require finding the exact pip package name if 'nvidia-tensorrt==8.6.1' is not it.
# Based on common usage, sometimes 'tensorrt' (not 'nvidia-tensorrt') is used with direct CUDA versioning.
# Let's try installing tensorrt directly with a CUDA 12 dependency, if a package exists.
# If this fails, I might need to adjust the approach to try to force CUDA 12 first.

# A common pattern is to install a specific 'tensorrt' version that is built against a specific CUDA version
# However, PyPI packages like 'tensorrt' often are just wrappers or meta-packages. 'nvidia-tensorrt' is the official one.
# The error from earlier 'nvidia-tensorrt==8.6.1 (from versions: 0.0.1.dev4, 0.0.1.dev5, 99.0.0)' indicates 8.6.1 might not be a valid version for 'nvidia-tensorrt'.
# I will try to install 'nvidia-tensorrt' again, but explicitly tell it to use a CUDA 12 specific variant if available, which is often not directly possible via pip.

# Given the persistent CUDA 13 installation with nvidia-tensorrt-99.0.0, the best approach might be to try a specific `tensorrt` package directly that specifies cu12.
# However, such packages are not always readily available on PyPI. Another strategy is to ensure CUDA 12 is installed *before* TensorRT.

# Let's try installing a cuda-toolkit 12 first, and then nvidia-tensorrt, hoping it respects the existing CUDA.
!pip install cuda-toolkit==12.2.2
!pip install nvidia-tensorrt

Collecting cuda-toolkit==12.2.2
  Downloading cuda_toolkit-12.2.2-py2.py3-none-any.whl.metadata (8.1 kB)
Downloading cuda_toolkit-12.2.2-py2.py3-none-any.whl (2.2 kB)
Installing collected packages: cuda-toolkit
Successfully installed cuda-toolkit-12.2.2
Collecting nvidia-tensorrt
  Downloading nvidia_tensorrt-99.0.0-py3-none-manylinux_2_17_x86_64.whl.metadata (596 bytes)
Collecting tensorrt (from nvidia-tensorrt)
  Downloading tensorrt-10.14.1.48.post1.tar.gz (16 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting tensorrt_cu13==10.14.1.48.post1 (from tensorrt->nvidia-tensorrt)
  Downloading tensorrt_cu13-10.14.1.48.post1.tar.gz (18 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting tensorrt_cu13_libs==10.14.1.48.post1 (from tensorrt_cu13==10.14.1.48.post1->tensorrt->nvidia-tensorrt)
  Downloading tensorrt_cu13_libs-10.14.1.48.post1.tar.gz (726 bytes)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[

In [4]:
!python ditto-talkinghead/inference.py \
  --data_root "./checkpoints/ditto_pytorch" \
  --cfg_pkl "./checkpoints/ditto_cfg/v0.4_hubert_cfg_pytorch.pkl" \
  --audio_path "/example/audio.wav" \
  --source_path "/example/image.png" \
  --output_path "./tmp/result.mp4"

2025-11-23 14:27:59.382491: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1763908079.424659   18777 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1763908079.438417   18777 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1763908079.478436   18777 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1763908079.478471   18777 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1763908079.478479   18777 computation_placer.cc:177] computation placer alr

In [None]:
import os
# Assuming /usr/local/cuda/lib64 is where libcudnn.so.8 is located in Colab
if 'LD_LIBRARY_PATH' in os.environ:
    os.environ['LD_LIBRARY_PATH'] += ':/usr/local/cuda/lib64'
else:
    os.environ['LD_LIBRARY_PATH'] = '/usr/local/cuda/lib64'


In [17]:
!pip install mediapipe

Collecting mediapipe
  Downloading mediapipe-0.10.21-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (9.7 kB)
Collecting numpy<2 (from mediapipe)
  Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Collecting protobuf<5,>=4.25.3 (from mediapipe)
  Downloading protobuf-4.25.8-cp37-abi3-manylinux2014_x86_64.whl.metadata (541 bytes)
Collecting sounddevice>=0.4.4 (from mediapipe)
  Downloading sounddevice-0.5.3-py3-none-any.whl.metadata (1.6 kB)
INFO: pip is looking at multiple versions of jax to determine which version is compatible with other requirements. This could take a while.
Collecting jax (from mediapipe)
  Downloading jax-0.8.1-py3-none-any.whl.metadata (13 kB)
Collecting jaxlib (from mediapipe)
  Downloading jaxlib-0.8.1-cp312-cp312-manylinux_2_27_x86_64.whl.metadata (1.3 kB)
Collecting jax (from mediapipe)
  Do

In [2]:
!pwd

/content
