# 🗣️ Speech-AI-Forge Colab

👋 This script is built on [Speech-AI-Forge](https://github.com/lenML/Speech-AI-Forge). If this project helps you, feel free to support us with a star on GitHub! Contributions via PRs and issues are also welcome~

## Usage Guide

1. Select **Runtime** from the menu.
2. Click **Run All**.

Once the process is complete, look for the following information in the log:

```
Running on public URL: https://**.gradio.live
```

This link will be the public URL you can access.

> Note: If prompted to restart during package installation, please select "No."

## Environment

In [None]:
%%capture
# Skip restarting message in Colab
import sys; modules = list(sys.modules.keys())
for x in modules: sys.modules.pop(x) if "PIL" in x or "google" in x else None

# 1. Clone the repository
!git clone https://github.com/lenML/Speech-AI-Forge

# 2. Change directory to the repository
%cd Speech-AI-Forge

# 3. Install ffmpeg and rubberband-cli
!apt-get update -y
!apt-get install -y ffmpeg rubberband-cli

# 4. Install dependencies
!pip install -r requirements.txt


## Download Models

In [None]:
# @markdown ## Model download instructions
# @markdown Most models are close to 2GB in size, please ensure that you have enough storage space and network bandwidth. <br/>
# @markdown **Note**: At least one TTS model must be selected. If no model is selected, ChatTTS will be downloaded by default.

# @markdown ### TTS Models
# @markdown ChatTTS: [GitHub](https://github.com/2noise/ChatTTS) - A generative speech model for daily dialogue.
download_chattts = True  # @param {"type":"boolean"}
# @markdown F5-TTS: [GitHub](https://github.com/SWivid/F5-TTS) - A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
download_f5_tts = False  # @param {"type":"boolean"}
# @markdown CosyVoice: [GitHub](https://github.com/FunAudioLLM/CosyVoice) - Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
download_cosyvoice = False  # @param {"type":"boolean"}
# @markdown FireRedTTS: [GitHub](https://github.com/FireRedTeam/FireRedTTS) - An Open-Sourced LLM-empowered Foundation TTS System
download_fire_red_tts = False  # @param {"type":"boolean"}
# @markdown FishSpeech: [GitHub](https://github.com/fishaudio/fish-speech) - Brand new TTS solution
download_fish_speech = False  # @param {"type":"boolean"}

# @markdown ### ASR Models
# @markdown Whisper: [GitHub](https://github.com/openai/whisper) - Robust Speech Recognition via Large-Scale Weak Supervision
download_whisper = True  # @param {"type":"boolean"}

# @markdown ### Clone Voice Models
# @markdown OpenVoice: [GitHub](https://github.com/myshell-ai/OpenVoice) - Instant voice cloning by MIT and MyShell.
download_open_voice = False  # @param {"type":"boolean"}

# @markdown ### Enhance Models
# @markdown resemble-enhance: [GitHub](https://github.com/resemble-ai/resemble-enhance) - AI powered speech denoising and enhancement
download_enhancer = True  # @param {"type":"boolean"}

def download_model(command):
    print(f"Executing: {command}")
    !{command}

# 检查是否至少选择了一个 TTS 模型
if not any([download_chattts, download_fish_speech, download_cosyvoice, download_fire_red_tts, download_f5_tts]):
    print("未选择任何 TTS 模型，默认下载 ChatTTS...")
    download_chattts = True

# TTS 模型下载
if download_chattts:
    download_model("python -m scripts.dl_chattts --source huggingface")

if download_fish_speech:
    download_model("python -m scripts.downloader.fish_speech_1_2sft --source huggingface")

if download_cosyvoice:
    download_model("python -m scripts.downloader.cosyvoice2 --source huggingface")

if download_fire_red_tts:
    download_model("python -m scripts.downloader.fire_red_tts --source huggingface")

if download_f5_tts:
    download_model("python -m scripts.downloader.f5_tts --source huggingface")
    download_model("python -m scripts.downloader.vocos_mel_24khz --source huggingface")

# ASR 模型下载
if download_whisper:
    download_model("python -m scripts.downloader.faster_whisper --source huggingface")

# Clone Voice 模型下载
if download_open_voice:
    download_model("python -m scripts.downloader.open_voice --source huggingface")

# 增强模型下载
if download_enhancer:
    download_model("python -m scripts.dl_enhance --source huggingface")

print("All selected models have been downloaded!")


## Run WebUI

In [None]:
!nvcc --version

In [None]:
!nvidia-smi

In [None]:
!python webui.py --share --language=en