# CoMoSVC: Consistency Model Based Singing Voice Conversion (Colab)

This notebook provides a complete workflow to set up, prepare data, train, and run inference for the **CoMoSVC** project.

**This version has been modified to integrate with Google Drive for datasets, checkpoints, and results, ensuring your work is saved across sessions.**

**Links:**
- **Adapted Colab Repository:** [https://github.com/Crepveant/CoMoSVC-Colab](https://github.com/Crepveant/CoMoSVC-Colab)
- **Original Research Paper:** [https://arxiv.org/pdf/2401.01792.pdf](https://arxiv.org/pdf/2401.01792.pdf)

## 1. Setup Environment

First, we'll connect to Google Drive, clone the repository, and install the required dependencies.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
print("Cloning the repository...")
!git clone https://github.com/Crepveant/CoMoSVC-Colab.git
%cd CoMoSVC-Colab

In [None]:
print("Installing dependencies from requirements.txt...")
!pip install -r requirements.txt
print("\n✅ Dependencies installed successfully!")

--- 
## 2. Download Pre-trained Components

Next, we'll download the essential pre-trained models: the vocoder, content encoder, and pitch extractor.

In [None]:
print("Downloading and extracting the HiFiGAN Vocoder (m4singer_hifigan)...")
!gdown --id 10LD3sq_zmAibl379yTW5M-LXy2l_xk6h
!unzip -q m4singer_hifigan.zip

print("\nDownloading the Content Encoder (ContentVec)...")
!mkdir -p Content
!gdown --id 1A2RQQY7gSEbdfiTHdOcNIjpnNNDYZM0b -O Content/checkpoint_best_legacy_500.pt

print("\nDownloading and extracting the Pitch Extractor (m4singer_pe)...")
!gdown --id 19QtXNeqUjY3AjvVycEt3G83lXn2HwbaJ
!unzip -q m4singer_pe.zip

print("\n✅ All components downloaded successfully!")

---
## 3. Dataset and Workspace Preparation

Instead of uploading files, we will link directories in this Colab session to your Google Drive. This is the key step to make your work persistent.

### ⚠️ **Action Required Before Running the Next Cell**

1.  Open your Google Drive.
2.  Create a main folder for your dataset. For example, create a folder named `CoMoSVC_Dataset`.
3.  Inside that folder, create sub-folders for each singer (e.g., `my_singer`).
4.  Upload your audio files into the respective singer folders.
5.  In the form below, enter the path to the main folder you created (e.g., `CoMoSVC_Dataset`).

In [None]:
#@title Link Google Drive Folders
import os

#@markdown --- 
#@markdown #### **1. Dataset Path**
#@markdown Enter the path to your dataset folder in Google Drive (relative to 'My Drive').
GOOGLE_DRIVE_DATASET_PATH = "CoMoSVC_Dataset" #@param {type:"string"}

#@markdown --- 
#@markdown #### **2. Workspace Paths (Recommended)**
#@markdown Enter folder names to save checkpoints and results in your Google Drive.
GOOGLE_DRIVE_LOGS_PATH = "CoMoSVC_Logs" #@param {type:"string"}
GOOGLE_DRIVE_RESULTS_PATH = "CoMoSVC_Results" #@param {type:"string"}

# --- Link Dataset Folder --- 
full_drive_path = os.path.join("/content/drive/MyDrive", GOOGLE_DRIVE_DATASET_PATH)
local_path = "dataset_raw"

if not os.path.exists(full_drive_path):
    print(f"❌ ERROR: The specified dataset path does not exist in your Google Drive: {full_drive_path}")
    print("Please create the folder and place your singer sub-folders inside it before proceeding.")
else:
    if os.path.exists(local_path):
        !rm -r {local_path}
    !ln -s "{full_drive_path}" {local_path}
    print(f"✅ Dataset folder linked successfully!")
    print(f"   '{local_path}' -> '{full_drive_path}'")

# --- Link Checkpoints (Logs) Folder --- 
full_logs_path = os.path.join("/content/drive/MyDrive", GOOGLE_DRIVE_LOGS_PATH)
local_logs_path = "logs"
!mkdir -p "{full_logs_path}"
if os.path.exists(local_logs_path):
    !rm -r {local_logs_path}
!ln -s "{full_logs_path}" {local_logs_path}
print(f"✅ Checkpoints (logs) folder linked successfully!")
print(f"   '{local_logs_path}' -> '{full_logs_path}'")

# --- Link Results Folder --- 
full_results_path = os.path.join("/content/drive/MyDrive", GOOGLE_DRIVE_RESULTS_PATH)
local_results_path = "results"
!mkdir -p "{full_results_path}"
if os.path.exists(local_results_path):
    !rm -r {local_results_path}
!ln -s "{full_results_path}" {local_results_path}
print(f"✅ Results folder linked successfully!")
print(f"   '{local_results_path}' -> '{full_results_path}'")

!mkdir -p dataset # Create local dataset folder for preprocessed files

--- 
## 4. Preprocessing

This section processes the audio from your Google Drive folder (`dataset_raw`) and saves the features into the local Colab storage (`dataset`).

In [None]:
print("Step 1: Resampling audio to 24000Hz mono...")
!python preprocessing1_resample.py

In [None]:
print("\nStep 2: Creating file lists and configuration...")
!python preprocessing2_flist.py

In [None]:
print("\nStep 3: Generating features...")
!python preprocessing3_feature.py

---
## 5. Training

This is the main training phase. It can take a **very long time** and requires a GPU. Thanks to our setup in Step 3, all model checkpoints will be saved directly to your Google Drive in the `CoMoSVC_Logs` folder (or whatever you named it).

### 5.1. Train the Teacher Model

In [None]:
print("Starting Teacher Model training...")
!python train.py

### 5.2. Train the Consistency Model

In [None]:
print("Starting Consistency Model training...")
import os

teacher_log_dir = "logs/teacher"
teacher_model_path = ""
try:
    checkpoints = [f for f in os.listdir(teacher_log_dir) if f.endswith('.pt')]
    if not checkpoints:
        raise FileNotFoundError
    latest_checkpoint = sorted(checkpoints)[-1]
    teacher_model_path = os.path.join(teacher_log_dir, latest_checkpoint)
except FileNotFoundError:
    print(f"ERROR: Could not find any teacher model checkpoints in '{teacher_log_dir}'. Please train the teacher model first.")

if teacher_model_path:
    print(f"Using teacher model: {teacher_model_path}")
    config_path = "configs/config.yaml"
    !python train.py -t -c "{config_path}" -p "{teacher_model_path}"

---
## 6. Inference

Once training is complete, you can use your trained models to perform singing voice conversion. The final audio will be saved to your Google Drive.

### 6.1. Prepare Source Audio

Upload the source audio file you want to convert to the `raw` directory.

In [None]:
!mkdir -p raw

print("Downloading a sample source audio file...")
!wget -O raw/source.wav https://www.openslr.org/resources/12/test-clean.tar.gz
!tar -xvzf raw/source.wav -C raw/ LibriSpeech/test-clean/1089/134686/1089-134686-0000.flac > /dev/null 2>&1
!ffmpeg -i raw/LibriSpeech/test-clean/1089/134686/1089-134686-0000.flac -ar 24000 raw/src.wav -y -hide_banner -loglevel error
!rm -rf raw/LibriSpeech raw/source.wav

print("Sample audio 'src.wav' is ready in the 'raw' directory.")

### 6.2. Run Inference

Use the form below to configure the inference. You **must** set the `TARGET_SINGER` to the name of the folder you created in your Google Drive dataset folder.

In [None]:
#@title Inference Configuration
#@markdown --- 
#@markdown #### **Required Settings**
TARGET_SINGER = "my_singer" #@param {type:"string"}
#@markdown --- 
#@markdown #### **Inference Parameters**
USE_CONSISTENCY_MODEL = True #@param {type:"boolean"}
SOURCE_AUDIO = "raw/src.wav" #@param {type:"string"}
PITCH_SHIFT = 0 #@param {type:"slider", min:-12, max:12, step:1}
INFERENCE_STEPS = 5 #@param {type:"slider", min:1, max:50, step:1}

import os

if USE_CONSISTENCY_MODEL:
    model_dir = "logs/como"
    model_flag = "-t"
    model_path_flag = "-cm"
    config_path_flag = "-cc"
    print("🎤 Using Consistency Model for inference.")
else:
    model_dir = "logs/teacher"
    model_flag = ""
    model_path_flag = "-tm"
    config_path_flag = "-tc"
    print("🧑‍🏫 Using Teacher Model for inference.")

try:
    checkpoints = [f for f in os.listdir(model_dir) if f.endswith('.pt')]
    if not checkpoints:
        raise FileNotFoundError
    latest_checkpoint = sorted(checkpoints)[-1]
    model_path = os.path.join(model_dir, latest_checkpoint)
    config_path = os.path.join(model_dir, "config.yaml")

    print(f"Found model: {model_path}")

    !python inference_main.py \
        -ts {INFERENCE_STEPS} \
        {model_path_flag} "{model_path}" \
        {config_path_flag} "{config_path}" \
        -n "{SOURCE_AUDIO}" \
        -k {PITCH_SHIFT} \
        -s "{TARGET_SINGER}" \
        {model_flag}

except FileNotFoundError:
    print(f"\n❌ ERROR: No models found in '{model_dir}'.")
    print("Please complete the training step (Section 5) before running inference.")
except Exception as e:
    print(f"An error occurred during inference: {e}")


---
## 7. Listen to the Result

The converted audio file is in your Google Drive (`CoMoSVC_Results` folder). The cell below will also display an audio player for the most recent result.

In [None]:
import os
import glob
from IPython.display import Audio, display

try:
    list_of_files = glob.glob('results/*.wav')
    if not list_of_files:
        raise FileNotFoundError
    latest_file = max(list_of_files, key=os.path.getctime)
    print(f"🎶 Displaying result: {latest_file}")
    display(Audio(latest_file, rate=24000))
except (ValueError, FileNotFoundError):
    print("\n❌ Could not find any output files in the 'results' directory.")
    print("Please ensure the inference step (Section 6) completed successfully.")