Troubleshooting problem and install speech-to-rag xtalk.py
- This is for people that don't know how to get it work and are "noob" (that don't know anything about python).
- More advance user will find other way but this is (I think) the easiest.
Before starting :
1°/ Download and install miniconda. I have too many problem with windows venv, I use miniconda to manager environment stuff, you should do the same. https://docs.anaconda.com/free/miniconda/index.html
2°/ You need to install ffmpeg. https://phoenixnap.com/kb/ffmpeg-windows
3°/ You need to install CUDA 11.8 (waaaaaay to many problem with other versions). This is a Windows 10 link, if you have another system, download for you system. https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exe_local
4°/ I make it run without problem with Python 3.10. Should work fine with python 3.11 (but not tested). DON'T USE PYTHON 3.12.
5°/ Install GIT and GIT-LFS if you don't already have them https://git-scm.com/downloads & https://git-lfs.com/
6°/ Use VScode please. https://code.visualstudio.com/
- Go to where you want download the git repo. Never use space in the folder name, that cause path problem. Use "-" or "_" instead of space . (eg: c:/folder/folder name here/ => c:/folder/folder_name_here/).
- In the search bar on top of the windows
click on the bar text and type "cmd". That will open your command prompt directly on this folder.
- git clone https://github.com/All-About-AI-YouTube/speech-to-rag.git
- cd speech-to-rag
- git clone https://huggingface.co/coqui/XTTS-v2
- On windows, in your folder "speech-to-rag", right click and "Open with Code"
If you don't have "Open with code", just open Vscode and drag/drop speech-to-rag in it.
- In Vscode, open a new terminal (on top, Terminal > Open New Terminal) and type : conda create -n speechtorag python=3.10
- You can remplace "speechtorag" with whatever name you want for your env name. DON'T USE SPACE.
- python=3.10 is the version of python your environment will be. I highly recommend to use 3.10 because I know it work with this version.
- conda activate speechtorag
- If you have change the environment name in conda create, change it here too.
- pip install -r requirements.txt
- pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 torchaudio==2.1.2+cu118 --index-url https://download.pytorch.org/whl/cu118
- That will remplace your toch with a CUDA 11.8 compatible version.
The xtalk.py script have some problems, the easiest way to fix them is to remplace xtalk.py with the one from this repo.
- I fix the relative path, rework 2 variables to make it easiest to use, add some comment to help you understand the code and delet some unnecessary import.
- In Vscode, go to the xtalk_fix.py file.
- In line 21, setup the faster_whisper model (default is "medium" should work fine for 90% of people)
- In line 25, this is the language of the TTS.
- In line 26, you can change the speed of the TTS (1.2 is normal speaking speed).
- In line 27, this is the reference voice (the voice of your TTS).
- You can change it with whatever voice you want, but the file need to be a 24000Hz or 44100Hz .wav file for best result. (Use audacity to change the format of your file simply).
- Ones you have the .wav file, just drop it in folder speech-to-rag/XTTS-v2/samples
- Change the name in the variable (default is audio_file_pth2 = "./XTTS-v2/samples/en_sample.wav", if your file name caroline.wav, then the variable should look like this : audio_file_pth2 = "./XTTS-v2/samples/caroline.wav"
Edit : If you get this error :
RuntimeError: Library cublas64_12.dll is not found or cannot be loaded
- Go to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin
- Find cublas64_11.dll
- Make a copy and rename the copy cublas64_12.dll



