Import variables, functions, and construct file structures

In [10]:
from resource_manager import get_data_from_source
from environment import output_path
from functions import separate_vocal, convert_ncm, apply_so_vits, fuse_vocal_and_instrumental, Path

Define paths

In [11]:
SONG_PATH = Path("./demo_assets/minstrel_short.mp3")
OUT_PATH = output_path.joinpath(SONG_PATH.name.rsplit('.')[0])

Download the model from huggingface

In [12]:
model_so_vits_genshin = get_data_from_source("so-vits", "model", "genshin", update_cache=False)
model_so_vits_hololive = get_data_from_source("so-vits", "model", "hololive", update_cache=False)
model_demucs = get_data_from_source("demucs", "model", "hdemucs_mmi", update_cache=False)

downloading:  kaze-mio/so-vits-genshin


Fetching 28 files:   0%|          | 0/28 [00:00<?, ?it/s]

downloaded:  kaze-mio/so-vits-genshin
downloading:  megaaziib/hololivemix-so-vits-svc-4.0


Fetching 17 files:   0%|          | 0/17 [00:00<?, ?it/s]

downloaded:  megaaziib/hololivemix-so-vits-svc-4.0
hdemucs_mmi.yaml already exists, skipping
75fc33f5-1941ce65.th already exists, skipping
genshin: yaoyao/yaoyao_D_20000.pth
genshin: yaoyao/yaoyao_G_20000.pth
genshin: yaoyao/yaoyao.json
genshin: yaoyao/yaoyao_kmeans_10000.pt
genshin: hutao/hutao.json
genshin: hutao/hutao_kmeans_10000.pt
genshin: hutao/hutao_D_40000.pth
genshin: hutao/hutao_G_40000.pth
genshin: hutao-jp/hutao.json
genshin: hutao-jp/hutao_jp_D_40000.pth
genshin: hutao-jp/hutao_jp_G_40000.pth
genshin: hutao-jp/hutao_jp_kmeans_10000.pt
genshin: klee-jp/klee_jp_G_40000.pth
genshin: klee-jp/klee_jp_kmeans_10000.pt
genshin: klee-jp/klee_jp_D_40000.pth
genshin: klee-jp/klee.json
genshin: klee/klee_G_40000.pth
genshin: klee/klee_D_40000.pth
genshin: klee/klee.json
genshin: klee/klee_kmeans_10000.pt
genshin: nahida-jp/nahida_jp_kmeans_10000.pt
genshin: nahida-jp/nahida_jp_D_40000.pth
genshin: nahida-jp/nahida_jp_G_40000.pth
genshin: nahida-jp/nahida.json
genshin: nahida/nahida_k

Convert .ncm file to .wav

In [13]:
converted_path = convert_ncm(SONG_PATH, OUT_PATH)

Separate vocal and instrumental with demucs

In [14]:
separated_path = separate_vocal(Path(converted_path), OUT_PATH)
print(separated_path)

Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /home/ayano/projects/vocal_generating_pack/src/vocalinferencegui/resources/files/output/minstrel_short/hdemucs_mmi
Separating track /home/ayano/projects/vocal_generating_pack/src/vocalinferencegui/backend/demo_assets/minstrel_short.mp3


100%|████████████████████████████████████████████████████████████████████████| 93.75/93.75 [00:02<00:00, 46.14seconds/s]


{'vocal': PosixPath('/home/ayano/projects/vocal_generating_pack/src/vocalinferencegui/resources/files/output/minstrel_short/hdemucs_mmi/minstrel_short/vocals.wav'), 'instrumental': PosixPath('/home/ayano/projects/vocal_generating_pack/src/vocalinferencegui/resources/files/output/minstrel_short/hdemucs_mmi/minstrel_short/no_vocals.wav')}


Use so-vits-svc to process audio file

In [26]:

counterfeited_path = apply_so_vits(separated_path["vocal"], output_path=OUT_PATH, model_path=model_so_vits_genshin["hutao-jp/hutao_jp_G_40000.pth"], cluster=model_so_vits_genshin["hutao-jp/hutao_jp_kmeans_10000.pt"], config_file_path=model_so_vits_genshin["hutao-jp/hutao.json"], auto_predict_f0=False, speaker="hutao", db_threshold=0, chunk_seconds=40)


print(counterfeited_path)

model path:  /home/ayano/projects/vocal_generating_pack/src/vocalinferencegui/resources/files/models/so-vits/yuuka/G_97600.pth


Some weights of the model checkpoint at lengyue233/content-vec-best were not used when initializing HubertModelWithFinalProj: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModelWithFinalProj from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModelWithFinalProj from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModelWithFinalProj were not initialized from the model checkpoint at lengyue233/content-vec-best and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this mo

/home/ayano/projects/vocal_generating_pack/src/vocalinferencegui/resources/files/output/minstrel_short/voice_generated_with_yuuka.wav


## 合并

In [27]:
output = fuse_vocal_and_instrumental(vocal_path=counterfeited_path, instrumental_path=separated_path["instrumental"], output_path=OUT_PATH, speaker="yuuka")
print("output file:", output)

done
output file: /home/ayano/projects/vocal_generating_pack/src/vocalinferencegui/resources/files/output/minstrel_short/voice_generated_with_yuuka_counterfeited_from_hutao.wav
