xtalk.py-troubleshooting

Troubleshooting problem and install speech-to-rag xtalk.py

This is for people that don't know how to get it work and are "noob" (that don't know anything about python).
More advance user will find other way but this is (I think) the easiest.

Before starting :

1°/ Download and install miniconda. I have too many problem with windows venv, I use miniconda to manager environment stuff, you should do the same. https://docs.anaconda.com/free/miniconda/index.html

3°/ You need to install CUDA 11.8 (waaaaaay to many problem with other versions). This is a Windows 10 link, if you have another system, download for you system. https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exe_local

4°/ I make it run without problem with Python 3.10. Should work fine with python 3.11 (but not tested). DON'T USE PYTHON 3.12.

5°/ Install GIT and GIT-LFS if you don't already have them https://git-scm.com/downloads & https://git-lfs.com/

Go to where you want download the git repo. Never use space in the folder name, that cause path problem. Use "-" or "_" instead of space . (eg: c:/folder/folder name here/ => c:/folder/folder_name_here/).
In the search bar on top of the windows click on the bar text and type "cmd". That will open your command prompt directly on this folder.
git clone https://github.com/All-About-AI-YouTube/speech-to-rag.git
cd speech-to-rag
git clone https://huggingface.co/coqui/XTTS-v2
On windows, in your folder "speech-to-rag", right click and "Open with Code" If you don't have "Open with code", just open Vscode and drag/drop speech-to-rag in it.
In Vscode, open a new terminal (on top, Terminal > Open New Terminal) and type : conda create -n speechtorag python=3.10

You can remplace "speechtorag" with whatever name you want for your env name. DON'T USE SPACE.
python=3.10 is the version of python your environment will be. I highly recommend to use 3.10 because I know it work with this version.

pip install -r requirements.txt
pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 torchaudio==2.1.2+cu118 --index-url https://download.pytorch.org/whl/cu118

The xtalk.py script have some problems, the easiest way to fix them is to remplace xtalk.py with the one from this repo.

I fix the relative path, rework 2 variables to make it easiest to use, add some comment to help you understand the code and delet some unnecessary import.

In Vscode, go to the xtalk_fix.py file.
In line 21, setup the faster_whisper model (default is "medium" should work fine for 90% of people)
In line 25, this is the language of the TTS.
In line 26, you can change the speed of the TTS (1.2 is normal speaking speed).
In line 27, this is the reference voice (the voice of your TTS).

You can change it with whatever voice you want, but the file need to be a 24000Hz or 44100Hz .wav file for best result. (Use audacity to change the format of your file simply).
Ones you have the .wav file, just drop it in folder speech-to-rag/XTTS-v2/samples
Change the name in the variable (default is audio_file_pth2 = "./XTTS-v2/samples/en_sample.wav", if your file name caroline.wav, then the variable should look like this : audio_file_pth2 = "./XTTS-v2/samples/caroline.wav"

Edit : If you get this error : RuntimeError: Library cublas64_12.dll is not found or cannot be loaded

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
xtalk_fix.py		xtalk_fix.py

Provide feedback