[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/liujing04/Retrieval-based-Voice-Conversion-WebUI/blob/main/Retrieval_based_Voice_Conversion_WebUI.ipynb)

In [None]:
# Check the specifications of the available GPU (Graphics Processing Unit)
# The following command checks which GPU is present and shows its usage, names, and other details.
# "nvidia-smi" is a system management interface command for NVIDIA GPU products.
# This command provides information on the GPU and its capabilities, as well as statistics on its utilization and temperature.

# @title Check GPU
!nvidia-smi

In [None]:
# Install required system packages
# The following command installs necessary system packages, including build tools and libraries for Python development and FFmpeg for audio manipulation.
!apt-get -y install build-essential python3-dev ffmpeg

# Upgrade setuptools and wheel packages
# These packages are important for installing some Python libraries and managing their dependencies.
!pip3 install --upgrade setuptools wheel

# Upgrade pip package installer
# This ensures that the latest version of pip is being used, allowing for proper installation of library packages.
!pip3 install --upgrade pip

# Install required Python libraries
# The following command installs the necessary Python libraries for the project, specifying specific versions where needed.
# These libraries include:
# faiss-gpu: Facebook AI Similarity Search library with GPU support
# fairseq: A sequence-to-sequence learning toolkit from Facebook AI Research
# gradio: A library for creating easy-to-use UI components for ML models
# ffmpeg, ffmpeg-python: Libraries for handling multimedia files
# praat-parselmouth: A library to interface with the Praat software for phonetic analysis
# pyworld: A world-class speech analysis, manipulation, and synthesis system
# numpy, numba, librosa: Libraries for numerical operations and audio processing
!pip3 install faiss-gpu fairseq gradio ffmpeg ffmpeg-python praat-parselmouth pyworld numpy==1.23.5 numba==0.56.4 librosa==0.9.2

In [None]:
# Clone the "Retrieval-based Voice Conversion WebUI" repository
# This command downloads the stable branch of the repository and navigates into the downloaded folder.
!git clone --depth=1 -b main https://github.com/developer787/Retrieval-based-Voice-Conversion-WebUI
%cd /content/Retrieval-based-Voice-Conversion-WebUI

# Create directories for holding pre-trained models and UVR5 weights
# This command creates the necessary folders for storing pre-trained models and the weights for the UVR5 model.
!mkdir -p pretrained uvr5_weights

In [None]:
# Update the repository (generally not needed)
# This command, when executed, updates the cloned repository with the latest changes from the remote repository.
# Note that this command is usually not needed as the codebase should already be up to date.
!git pull

In [None]:
# Install aria2 downloader
# aria2 is a lightweight multi-protocol & multi-source command-line download utility.
# It supports HTTP/HTTPS, FTP, and BitTorrent, and optimizes multi-connection downloads.
# The following command installs aria2, using the "-qq" flag to minimize output.
!apt -y install -qq aria2

In [None]:
# Download base models
# These commands use the aria2 downloader to download pre-trained models for the voice conversion system.
# The models are downloaded from the Hugging Face Model Hub, specifying the filenames and directory locations.
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o D48k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o G48k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0D48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0D48k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G32k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G32k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G40k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G40k.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained/f0G48k.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/pretrained -o f0G48k.pth

In [None]:
# Download voice separation models
# Use aria2 to download the pretrained voice separation models from HuggingFace hub with optimized multi-connection settings.
# HP2 model: separates vocals and non-vocal instrumental parts
# HP5 model: separates main melody vocals and other instrumental parts
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP2-人声vocals+非人声instrumentals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/uvr5_weights -o HP2-人声vocals+非人声instrumentals.pth
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP5-主旋律人声vocals+其他instrumentals.pth -d /content/Retrieval-based-Voice-Conversion-WebUI/uvr5_weights -o HP5-主旋律人声vocals+其他instrumentals.pth

In [None]:
# Download the Hubert Base model
# Use aria2 to download the pretrained Hubert Base model from HuggingFace hub with optimized multi-connection settings.
# Hubert is a self-supervised speech model developed by Facebook AI Research.
!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/hubert_base.pt -d /content/Retrieval-based-Voice-Conversion-WebUI -o hubert_base.pt

In [None]:
# Mount Google Drive
# This command allows you to access your Google Drive files in the Colab environment.
# A prompt will ask you to authenticate and grant access permissions.
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load the packaged dataset from Google Drive to /content/dataset
# This command creates a new directory at "/content/dataset" and unzips the dataset from Google Drive into that directory.

# Dataset location (in Google Drive)
DATASET = "/content/drive/MyDrive/dataset/lulu20230327_32k.zip"  # @param {type:"string"}

# Create a directory to store the dataset at "/content/dataset"
!mkdir -p /content/dataset

# Unzip the dataset into the created directory
!unzip -d /content/dataset -B {DATASET}

In [None]:
# Rename duplicate filenames in the dataset
# The following commands are used to display the current file names in the dataset directory
# and to rename any duplicate files using a regex pattern.

# List all files in the /content/dataset/ directory
!ls -a /content/dataset/

# Rename duplicate files using a regex pattern
# This command uses the 'rename' utility to replace the "~" in the file name with an underscore.
# For example: "file.wav~1" will be renamed to "file_1.wav"
!rename 's/(\w+)\.(\w+)~(\d*)/$1_$3.$2/' /content/dataset/*.*~*

In [None]:

# Start the Web Interface
# Change to the project directory and start the web interface using the "infer-web.py" script.
# The "--colab" and "--pycmd python3" flags are used to configure the script for Google Colaboratory and specify that Python 3 is being used.

# Change to the project directory
%cd /content/Retrieval-based-Voice-Conversion-WebUI

# Uncomment the following lines for loading TensorBoard and visualizing logs.
# %load_ext tensorboard
# %tensorboard --logdir /content/Retrieval-based-Voice-Conversion-WebUI/logs

# Start the web interface using the "infer-web.py" script
!python3 infer-web.py --colab --pycmd python3

In [None]:
# Manually backup trained model files to Google Drive
# You need to check the model file names in the logs folder and manually modify the file names accordingly in the commands below.

# Define the variables for model name and epoch
MODELNAME = "lulu"  # @param {type:"string"}
MODELEPOCH = 9600  # @param {type:"integer"}

# Copy Generator and Discriminator model files to Google Drive
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth /content/drive/MyDrive/{MODELNAME}_D_{MODELEPOCH}.pth
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth /content/drive/MyDrive/{MODELNAME}_G_{MODELEPOCH}.pth

# Copy index and .npy files to Google Drive
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/added_*.index /content/drive/MyDrive/
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/total_*.npy /content/drive/MyDrive/

# Copy weights (model) file to Google Drive
!cp /content/Retrieval-based-Voice-Conversion-WebUI/weights/{MODELNAME}.pth /content/drive/MyDrive/{MODELNAME}{MODELEPOCH}.pth

In [None]:
# Restore .pth files from Google Drive
# You need to check the model file names in the logs folder and manually modify the file names accordingly in the commands below.

# Define the variables for model name and epoch
MODELNAME = "lulu"  # @param {type:"string"}
MODELEPOCH = 7500  # @param {type:"integer"}

# Create the required directory for storing logs
!mkdir -p /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}

# Copy Generator and Discriminator model files from Google Drive
!cp /content/drive/MyDrive/{MODELNAME}_D_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth
!cp /content/drive/MyDrive/{MODELNAME}_G_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth

# Copy index and .npy files from Google Drive
!cp /content/drive/MyDrive/*.index /content/
!cp /content/drive/MyDrive/*.npy /content/

# Copy weights (model) file from Google Drive
!cp /content/drive/MyDrive/{MODELNAME}{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/weights/{MODELNAME}.pth

In [None]:
# Manual Preprocessing (Not Recommended)
# This section allows for customizing the preprocessing pipeline using specific values for model name, sampling rate, and number of processes.
# Note: Manual preprocessing is not recommended for most use cases.

# Specify model name, sampling rate, and number of processes
MODELNAME = "lulu"  #@param {type:"string"}
BITRATE = 48000  #@param {type:"integer"}
THREADCOUNT = 8  #@param {type:"integer"}

# Run the preprocessing pipeline script with the specified parameters
!python3 trainset_preprocess_pipeline_print.py /content/dataset {BITRATE} {THREADCOUNT} logs/{MODELNAME} True

In [None]:
# Manual Feature Extraction (Not Recommended)
# This section allows for customizing the feature extraction pipeline using specific values for model name, number of processes, and pitch extraction algorithm.
# Note: Manual feature extraction is not recommended for most use cases.

# Specify model name, number of processes, and pitch extraction algorithm
MODELNAME = "lulu"  #@param {type:"string"}
THREADCOUNT = 8  #@param {type:"integer"}
ALGO = "harvest"  #@param {type:"string"}

# Run the F0 extraction script with the specified parameters
!python3 extract_f0_print.py logs/{MODELNAME} {THREADCOUNT} {ALGO}

# Run the feature extraction script with the specified parameters
!python3 extract_feature_print.py cpu 1 0 0 logs/{MODELNAME}


In [None]:
# Manual Training (Not Recommended)
# This section allows for customizing the training process using specific values for model name, GPU, batch size, epochs, sampling rate, and other parameters.
# Note: Manual training is not recommended for most use cases.

# Specify training parameters
MODELNAME = "lulu"  #@param {type:"string"}
USEGPU = "0"  #@param {type:"string"}
BATCHSIZE = 32  #@param {type:"integer"}
MODELEPOCH = 3200  #@param {type:"integer"}
EPOCHSAVE = 100  #@param {type:"integer"}
MODELSAMPLE = "48k"  #@param {type:"string"}
CACHEDATA = 1  #@param {type:"integer"}
ONLYLATEST = 0  #@param {type:"integer"}

# Run the training script with the specified parameters
!python3 train_nsf_sim_cache_sid_load_pretrain.py -e lulu -sr {MODELSAMPLE} -f0 1 -bs {BATCHSIZE} -g {USEGPU} -te {MODELEPOCH} -se {EPOCHSAVE} -pg pretrained/f0G{MODELSAMPLE}.pth -pd pretrained/f0D{MODELSAMPLE}.pth -l {ONLYLATEST} -c {CACHEDATA}

In [None]:
# Caution: This section deletes all other .pth files and leaves only the selected one.
# Carefully review the code before running this cell.

# Specify model name and selected model epoch
MODELNAME = "lulu"  #@param {type:"string"}
MODELEPOCH = 9600  #@param {type:"integer"}

# Backup the selected model
!echo "Backing up the selected model..."
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth /content/{MODELNAME}_D_{MODELEPOCH}.pth
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth /content/{MODELNAME}_G_{MODELEPOCH}.pth

# Delete other .pth files
!echo "Deleting other files..."
!ls /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}
!rm /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/*.pth

# Restore the selected model
!echo "Restoring the selected model..."
!mv /content/{MODELNAME}_D_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth
!mv /content/{MODELNAME}_G_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth

# Confirm the deletion is completed
!echo "Deletion completed"
!ls /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}

In [None]:
# Clear all files under the project and keep only the selected model (Use with caution, review the code carefully)
# This script allows you to delete all the files under the project directory, 
# keeping only the selected model's generator and discriminator weights.

# Specify the model name and model epoch
MODELNAME = "lulu"  #@param {type:"string"}
MODELEPOCH = 9600  #@param {type:"integer"}

# Back up the selected model's generator and discriminator weights
!echo "Backing up the selected model..."
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth /content/{MODELNAME}_D_{MODELEPOCH}.pth
!cp /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth /content/{MODELNAME}_G_{MODELEPOCH}.pth

# Delete all files under the project's logs directory for the specified model
!echo "Deleting..."
!ls /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}
!rm -rf /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/*

# Restore the backed-up generator and discriminator weights
!echo "Restoring the selected model..."
!mv /content/{MODELNAME}_D_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/G_{MODELEPOCH}.pth
!mv /content/{MODELNAME}_G_{MODELEPOCH}.pth /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}/D_{MODELEPOCH}.pth

# Verify that the deletion and restoration are complete
!echo "Deletion completed"
!ls /content/Retrieval-based-Voice-Conversion-WebUI/logs/{MODELNAME}