<a href="https://colab.research.google.com/github/MLo7Ghinsan/MLo7-colab-notebook/blob/main/mlo7_Batch_Diff_SVC_Inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# _**[Diff-SVC](https://github.com/prophesier/diff-svc)**_
_Singing Voice Conversion via diffusion model_

\
____
\
#### **This notebook is maintained by [_MLo7_](https://github.com/MLo7Ghinsan)**
\
____
\\
#### **IMPORTANT NOTE:** Please make sure to run the cells one by one especially cell 1 and 2 since both of them restarts the kernel.... If they stopped running, that means you can proceed to the next cell


In [None]:
#@title # **1. Environment Setup**

#@markdown Run this cell first to mount google drive and set up conda environment
from google.colab import drive
drive.mount('/content/drive')

!pip install -q condacolab
import condacolab

condacolab.install()

In [None]:
#@title # **2. Diff-svc and dependencies installation**

!conda create -n diff-svc python=3.8
!source activate diff-svc

from IPython.display import clear_output
from IPython.display import HTML
import os
import time

%cd /content

#@markdown The difference between the two repos is only the pitch extractor which can be used by disabling use_crepe option.

#@markdown Official-Repo has parselmouth and Utautautau-Repo has harvest

#@markdown parselmouth is faster but tends to make lots of pitch errors
#@markdown harvest is slower but supposedly is more reliable

#@markdown It doesn't matter which repo you pick if you are planning to use crepe

diff_svc_repo = "Official-Repo" #@param ["Official-Repo", "UtaUtaUtau-Repo"]

if diff_svc_repo == "Official-Repo":
  !git clone https://github.com/prophesier/diff-svc
else:
  !git clone https://github.com/UtaUtaUtau/diff-svc
  !pip install pyworld

%cd content/diff-svc

!pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
!pip install -r requirements_short.txt
!pip install tensorboard<2.9,>=2.8
!pip install praat-parselmouth pyloudnorm resampy torchcrepe webrtcvad pycwt

clear_output()

%mkdir -p /content/diff-svc/checkpoints/
!wget https://github.com/MLo7Ghinsan/MLo7_Diff-SVC_models/releases/download/diff-svc-necessary-checkpoints/0102_xiaoma_pe.zip
!wget https://github.com/MLo7Ghinsan/MLo7_Diff-SVC_models/releases/download/diff-svc-necessary-checkpoints/0109_hifigan_bigpopcs_hop128.zip
!wget https://github.com/MLo7Ghinsan/MLo7_Diff-SVC_models/releases/download/diff-svc-necessary-checkpoints/nsf_hifigan.zip
!wget https://github.com/MLo7Ghinsan/MLo7_Diff-SVC_models/releases/download/diff-svc-necessary-checkpoints/hubert.zip
!mkdir /content/diff-svc/checkpoints/0102_xiaoma_pe
!mkdir /content/diff-svc/checkpoints/0109_hifigan_bigpopcs_hop128
!mkdir /content/diff-svc/checkpoints/nsf_hifigan
!mkdir /content/diff-svc/checkpoints/hubert
!unzip /content/0102_xiaoma_pe.zip -d /content/diff-svc/checkpoints/0102_xiaoma_pe
!unzip /content/0109_hifigan_bigpopcs_hop128.zip -d /content/diff-svc/checkpoints/0109_hifigan_bigpopcs_hop128
!unzip /content/nsf_hifigan.zip -d /content/diff-svc/checkpoints/nsf_hifigan
!unzip /content/hubert.zip -d /content/diff-svc/checkpoints/hubert

!rm /content/0102_xiaoma_pe.zip
!rm /content/0109_hifigan_bigpopcs_hop128.zip
!rm /content/nsf_hifigan.zip
!rm /content/hubert.zip

!rm /content/diff-svc/results/test_output.wav
!rm /content/diff-svc/raw/test_input.wav

clear_output()

time.sleep(1)
print("Restarting runtime....")
time.sleep(1)
print("...")
time.sleep(1)
print("...")
time.sleep(1)
print("Please ignore the red mark at the running icon")
time.sleep(1)
print("|")
print("|")
print("|")
print("You may proceed to the next cell")
time.sleep(3)
os.kill(os.getpid(), 9)

In [None]:
#@title # **3. Load model**

#@markdown ### **Load a trained model**

import os

#@markdown ___

#@markdown Note: Add the full path to the most recent checkpoint located on your Gdrive as well as the speaker's name if you wish to use your own model.

#@markdown Example:-

#@markdown The ``project_name`` will be the name of your speaker

#@markdown  ``model_path: /content/drive/MyDrive/Diff-SVC/checkpoints/model_name/model_ckpt_steps_50000.ckpt``

#@markdown           ``config_path: /content/drive/MyDrive/Diff-SVC/checkpoints/model_name/config.yaml``


#@markdown ___

#@markdown ### **Set model location with the name of the speaker:**

#@markdown ___

%cd "/content/diff-svc/"

os.environ['PYTHONPATH']='.'
os.environ['PATH'] = '/usr/bin/python3.8:' + os.environ['PATH']

!CUDA_VISIBLE_DEVICES=0


from utils.hparams import hparams
from preprocessing.data_gen_utils import get_pitch_parselmouth,get_pitch_crepe
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
import utils
import librosa
import torchcrepe
from infer import *
import logging
from infer_tools.infer_tool import *

logging.getLogger('numba').setLevel(logging.WARNING)

# 工程文件夹名，训练时用的那个
project_name = "model-name" #@param {type: "string"}
model_path = "path-to-model-ckpt" #@param {type: "string"}
config_path="path-to-config-yaml" #@param {type: "string"}
hubert_gpu=True
svc_model = Svc(project_name,config_path,hubert_gpu, model_path)
print('model loaded')
HTML('<img src="https://cdn.discordapp.com/attachments/816517150175920138/1094544974395224114/c57.gif">')
print(".  .  .  .  .  .  .")
print("|")
print("|")
print("|")
print("sussy")




In [None]:
#@title # **4. Input audio folder and adjust parameters**

#@markdown ### <b><font color="red"> Make sure that your imported audio have no subfolders before running this cell

#@markdown ___

#@markdown #### ``The folder of your audio files, default is 'raw' folder under diff-svc. you can link the folder to your google drive's folder that contains wavs``
folder = "raw" #@param {type: "string"}

#@markdown ___

#@markdown #### ``This shifts the raw audio up by one semitone before rendering, if the raw input is of a male voice and the desired voice is female, you can input 8 or 12 etc (12 would shift a whole octave)``
key = 0 #@param {type:"slider", min:-12, max:12, step:1}
# 加速倍数
#@markdown ___

#@markdown #### ``The acceleration of the render: the higher the value, the faster it will render but returns lower quality output; on the other hand, the lower the number, the slower it will render but returns higher quality output``

pndm_speedup = 10 #@param {type:"slider", min:1, max:100, step:1}

#@markdown ___

#@markdown #### ``This option is similar to the image-to-image function of AI art generation, if set to True, the output audio shall be a mix of the input voice and the target voice, the percentage of each is decided by the next parameter``

#@markdown #### ``NOTE!!!: If this parameter is set to true, keep the key parameter value at 0, as rendering with various pitch input is not supported``
use_gt_mel= False #@param {type:"boolean"}

#@markdown #### ``"morphing" intensity of the input audio and the model's vocal. 0 being the raw-input audio and 1000 being the model's vocal``

add_noise_step = 1000 #@param {type:"slider", min:0, max:1000, step:25}

#@markdown ___

#@markdown #### ``Crepe is a F0 calculation algorithm, it's good but slow, setting the value to False will change the F0 calculation algorithm from crepe to parselmouth that is faster than crepe but is of lower quality``
use_crepe= True #@param {type:"boolean"}

#@markdown #### ``Crepe's noise filter threshold, you can increase the value of the raw audio is clean, and if there is a lot of noise, you can keep or decrease the value, changing the use_crepe parameter to False will disable this parameter``
thre = 0.05 #@param {type:"slider", min:0, max:1, step:0.01}

#@markdown ___

#@markdown #### ``use_pe option is not reccommended, thus it will only be hidden in the cell's code``
use_pe= False


for name in os.listdir(folder): 
    if name.endswith(".wav"):
        wav_fn = os.path.join(folder, name)
        print(wav_fn)
        out_folder = "results"
        wav_gen = os.path.join(out_folder, name)
        f0_tst, f0_pred, audio = run_clip(svc_model,file_path=wav_fn, key=key, acc=pndm_speedup, use_crepe=use_crepe, use_pe=use_pe, thre=thre,
                                        use_gt_mel=use_gt_mel, add_noise_step=add_noise_step,project_name=project_name,out_path=wav_gen)


In [None]:
#@markdown #Zip up the results to your drive (optional)

#@markdown ___

zip_exp_name = "" #@param {type:"string"}

!mkdir /content/drive/MyDrive/Diff-SVC-RESULTS
!zip -r "/content/drive/MyDrive/Diff-SVC-RESULTS/{zip_exp_name}.zip" "/content/diff-svc/results"


In [None]:
#@markdown # Delete old inputted wav and rendered wav (optional)

#@markdown ___

#@markdown Run this cell if you want to redo the process, this cell will flush every .wav in results folder and batch_audio folder
!rm -rf /content/diff-svc/batch_audio/*.wav
!rm -rf /content/diff-svc/results/*.wav