# 🗣️ Speech-AI-Forge Colab

👋 This script is built on [Speech-AI-Forge](https://github.com/lenML/Speech-AI-Forge). If this project helps you, feel free to support us with a star on GitHub! Contributions via PRs and issues are also welcome~

## Usage Guide

1. Select **Runtime** from the menu.
2. Click **Run All**.

Once the process is complete, look for the following information in the log:

```
Running on public URL: https://**.gradio.live
```

This link will be the public URL you can access.

> Note: If prompted to restart during package installation, please select "No."

## Environment

In [None]:
%%capture
# Skip restarting message in Colab
import sys; modules = list(sys.modules.keys())
for x in modules: sys.modules.pop(x) if "PIL" in x or "google" in x else None

# 1. Clone the repository
!git clone https://github.com/lenML/Speech-AI-Forge

# 2. Change directory to the repository
%cd Speech-AI-Forge

# 3. Install ffmpeg and rubberband-cli
!apt-get update -y
!apt-get install -y ffmpeg rubberband-cli

# 4. Install dependencies
%pip install -r requirements.txt

## Download Models

In [None]:
import os

from scripts import dl_chattts, dl_enhance
from scripts.downloader import (
    fish_speech_1_4,
    cosyvoice2,
    fire_red_tts,
    vocos_mel_24khz,
    f5_tts_v1,
    index_tts,
    spark_tts,
    faster_whisper,
    open_voice,
)

# @markdown ## Model Download Instructions
# @markdown Most models are close to 2GB in size, please ensure you have sufficient storage space and network bandwidth. <br/>
# @markdown **Note**: You must select at least one TTS model. If none are selected, `CosyVoice` will be downloaded by default.

# @markdown ### Hugging Face Token (Optional)
# @markdown Some models may require a configured Hugging Face Token to download. If you need to use these models, enter your Token here. You can get your Token from [Hugging Face](https://huggingface.co/settings/tokens).
HF_TOKEN = ""  # @param {type:"string"}

if HF_TOKEN:
    os.environ["HF_TOKEN"] = HF_TOKEN  # Set the "HF_TOKEN" environment variable
    print("HF_TOKEN environment variable configured.")  # Helpful feedback
else:
    print(
        "No HF_TOKEN provided, skipping environment variable configuration."
    )  # Good to know why

# @markdown ### TTS Model
# @markdown CosyVoice: [GitHub](https://github.com/FunAudioLLM/CosyVoice) - Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
download_cosyvoice = True  # @param {"type":"boolean"}

# @markdown FishSpeech: [GitHub](https://github.com/fishaudio/fish-speech) - Brand new TTS solution
download_fish_speech = False  # @param {"type":"boolean"}

# @markdown Index-TTS: [GitHub](https://github.com/index-tts/index-tts) - An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
download_index_tts = False  # @param {"type":"boolean"}

# @markdown Spark-TTS: [GitHub](https://github.com/SparkAudio/Spark-TTS) - Spark-TTS Inference
download_spark_tts = False  # @param {"type":"boolean"}

# @markdown ChatTTS: [GitHub](https://github.com/2noise/ChatTTS) - A generative speech model for daily dialogue.
download_chattts = False  # @param {"type":"boolean"}

# @markdown F5-TTS: [GitHub](https://github.com/SWivid/F5-TTS) - A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
download_f5_tts = False  # @param {"type":"boolean"}

# @markdown FireRedTTS: [GitHub](https://github.com/FireRedTeam/FireRedTTS) - An Open-Sourced LLM-empowered Foundation TTS System
download_fire_red_tts = False  # @param {"type":"boolean"}

# @markdown ### ASR 模型
# @markdown Whisper: [GitHub](https://github.com/openai/whisper) - Robust Speech Recognition via Large-Scale Weak Supervision
download_whisper = True  # @param {"type":"boolean"}

# @markdown ### Clone Voice 模型
# @markdown OpenVoice: [GitHub](https://github.com/myshell-ai/OpenVoice) - Instant voice cloning by MIT and MyShell.
download_open_voice = False  # @param {"type":"boolean"}

# @markdown ### 增强模型
# @markdown resemble-enhance: [GitHub](https://github.com/resemble-ai/resemble-enhance) - AI powered speech denoising and enhancement
download_enhancer = True  # @param {"type":"boolean"}


# Check if at least one TTS model is selected
if not any(
    [
        download_chattts,
        download_fish_speech,
        download_cosyvoice,
        download_fire_red_tts,
        download_f5_tts,
        download_index_tts,
        download_spark_tts,
    ]
):
    print("No TTS models selected, downloading CosyVoice by default...")
    download_cosyvoice

dl_source = "huggingface"

# TTS Model Downloads
if download_chattts:
    print("Downloading ChatTTS...")
    dl_chattts.ChatTTSDownloader()(source=dl_source)
    print("Downloading ChatTTS, completed")

if download_fish_speech:
    print("Downloading FishSpeech...")
    fish_speech_1_4.FishSpeech14Downloader()(source=dl_source)
    print("Downloading FishSpeech, completed")

if download_cosyvoice:
    print("Downloading CosyVoice...")
    cosyvoice2.CosyVoice2Downloader()(source=dl_source)
    print("Downloading CosyVoice, completed")

if download_fire_red_tts:
    print("Downloading FireRedTTS...")
    fire_red_tts.FireRedTTSDownloader()(source=dl_source)
    print("Downloading FireRedTTS, completed")

if download_f5_tts:
    print("Downloading F5TTS...")
    f5_tts_v1.F5TTSV1Downloader()(source=dl_source)
    vocos_mel_24khz.VocosMel24khzDownloader()(source=dl_source)
    print("Downloading F5TTS, completed")

if download_index_tts:
    print("Downloading IndexTTS...")
    index_tts.IndexTTSDownloader()(source=dl_source)
    print("Downloading IndexTTS, completed")

if download_spark_tts:
    print("Downloading SparkTTS...")
    spark_tts.SparkTTSDownloader()(source=dl_source)
    print("Downloading SparkTTS, completed")

# ASR Model Downloads
if download_whisper:
    print("Downloading Whisper...")
    faster_whisper.FasterWhisperDownloader()(source=dl_source)
    print("Downloading Whisper, completed")

# Voice Cloning Model Downloads
if download_open_voice:
    print("Downloading OpenVoice...")
    open_voice.OpenVoiceDownloader()(source=dl_source)
    print("Downloading OpenVoice, completed")

# Enhancement Model Downloads
if download_enhancer:
    print("Downloading ResembleEnhance...")
    dl_enhance.ResembleEnhanceDownloader()(source=dl_source)
    print("Downloading ResembleEnhance, completed")

print("All selected models have been downloaded!")

## Run WebUI

In [None]:
!nvcc --version

In [None]:
!nvidia-smi

In [None]:
!python webui.py --share --language=en