# Zeca Afonso

In this notebook we train a voice conversion model for Zeca Afonso.

The notebook works in three parts:
1. Preparing the training dataset
2. Training the voice conversion model
3. Inference with the voice conversion model

## 1. Preparing the training dataset

### 1.1 Extract voice from Zeca Afonso's discography

The first step is to extract the voice from the songs. We will use the [`demucs`](https://github.com/facebookresearch/demucs) Hybrid Transformer model. 

In [18]:
import subprocess
import tqdm
from pathlib import Path

for song in tqdm.tqdm(Path("../dataset/zeca/discography").rglob("*.opus")):
    command = f"demucs --two-stems=vocals \"{song}\" --out \"../dataset/zeca/discography_vocals\" --filename \"{song.stem}_{{stem}}.{{ext}}\""
    run = subprocess.run(command, shell=True, capture_output=True)

for song in tqdm.tqdm(Path("../dataset/zeca/discography").rglob("*.mp3")):
    command = f"demucs --two-stems=vocals \"{song}\" --out \"../dataset/zeca/discography_vocals\" --filename \"{song.stem}_{{stem}}.{{ext}}\""
    run = subprocess.run(command, shell=True, capture_output=True)

97it [5:11:00, 192.38s/it]
31it [1:44:48, 202.85s/it]


## 1.2 Splitting vocal files into <~ 10s duration files

To train the voice conversion model we need to split the files with voice into at small snippets of at most around 10 seconds.

First let's see how much audio we have without splitting and silence removal.

In [24]:
import librosa

duration = 0
for song in tqdm.tqdm(Path("../dataset/zeca/discography_vocals").rglob("*.wav")):
    if not "no_vocals" in song.stem:
        duration += librosa.get_duration(path=song)
print(f"Total duration: {duration} seconds / ({duration/60} minutes) / ({duration/3600} hours)")

256it [00:00, 45743.70it/s]

Total duration: 22833.76160997731 seconds / (380.56269349962184 minutes) / (6.342711558327031 hours)





Now let's use [`audio-slicer`](https://github.com/flutydeer/audio-slicer) to split the files and remove silence parts.

In [31]:
for song in tqdm.tqdm(Path("../dataset/zeca/discography_vocals").rglob("*.wav")):
    if not "no_vocals" in song.stem:
        command = f"python /home/andre/Repos/audio-slicer/slicer2.py --out ../dataset/zeca/discography_raw \"{song.absolute()}\""
        run = subprocess.run(command, shell=True, capture_output=True)

256it [02:58,  1.44it/s]


In [32]:
import librosa

durations = []
for song in tqdm.tqdm(Path("../dataset/zeca/discography_raw").rglob("*.wav")):
    if not "no_vocals" in song.stem:
        durations.append(librosa.get_duration(path=song))
duration = sum(durations)
print(f"Total duration: {duration} seconds / ({duration/60} minutes) / ({duration/3600} hours)")

print(f"Percentage of clips longer than 10 seconds: {len([d for d in durations if d > 10]) / len(durations) * 100}%")

1889it [00:00, 25705.96it/s]

Total duration: 15934.58 seconds / (265.5763333333333 minutes) / (4.426272222222222 hours)
Percentage of clips longer than 10 seconds: 19.957649550026467%





In [1]:
! cd .. && svc pre-resample

[2;36m[11:48:42][0m[2;36m [0m[34mINFO    [0m [1m[[0m[1;92m11:48:42[0m[1m][0m generated new fontManager    ]8;id=245690;file:///proj/berzelius-2023-175/users/x_andaf/so-vits-svc-fork/.venv/lib/python3.11/site-packages/matplotlib/font_manager.py\[2mfont_manager.py[0m]8;;\[2m:[0m]8;id=775879;file:///proj/berzelius-2023-175/users/x_andaf/so-vits-svc-fork/.venv/lib/python3.11/site-packages/matplotlib/font_manager.py#1581\[2m1581[0m]8;;\
Preprocessing:  60%|█████████████▊         | 1136/1889 [00:19<00:01, 638.96it/s][2;36m[11:49:11][0m[2;36m [0m[34mINFO    [0m [1m[[0m[1;92m11:49:11[0m[1m][0m Skip                    ]8;id=577483;file:///proj/berzelius-2023-175/users/x_andaf/so-vits-svc-fork/.venv/lib/python3.11/site-packages/so_vits_svc_fork/preprocessing/preprocess_resample.py\[2mpreprocess_resample.py[0m]8;;\[2m:[0m]8;id=612918;file:///proj/berzelius-2023-175/users/x_andaf/so-vits-svc-fork/.venv/lib/python3.11/site-packages/so_vits_svc_fork/pre

In [2]:
! cd .. && svc pre-config

[2;36m           [0m         [35m/proj/berzelius-2023-175/users/x_andaf/so-vi[0m [2m               [0m
[2;36m           [0m         [35mts-svc-fork/.venv/lib/python3.11/site-packag[0m [2m               [0m
[2;36m           [0m         [35mes/so_vits_svc_fork/preprocessing/[0m[95mpreprocess[0m [2m               [0m
[2;36m           [0m         [1;35mget_duration[0m[1m([0m[1m)[0m keyword argument [32m'filename'[0m   [2m               [0m
[2;36m           [0m         has been renamed to [32m'path'[0m in version        [2m               [0m
[2;36m           [0m         [1;36m0.10[0m.[1;36m0[0m.                                      [2m               [0m
[2;36m           [0m                 This alias will be removed in        [2m               [0m
[2;36m           [0m         version [1;36m1.0[0m.                                 [2m               [0m
[2;36m           [0m           if [1;35mget_duration[0m[1m([0m[33mfilename[0m=

In [None]:
! cd .. && svc pre-hubert

In [None]:
! cd .. && svc train -t

[2;36m[12:26:22][0m[2;36m [0m[34mINFO    [0m [1m[[0m[1;92m12:26:22[0m[1m][0m Server binary [1m([0mfrom Python ]8;id=619072;file:///proj/berzelius-2023-175/users/x_andaf/so-vits-svc-fork/.venv/lib/python3.11/site-packages/tensorboard/data/server_ingester.py\[2mserver_ingester.py[0m]8;;\[2m:[0m]8;id=235931;file:///proj/berzelius-2023-175/users/x_andaf/so-vits-svc-fork/.venv/lib/python3.11/site-packages/tensorboard/data/server_ingester.py#290\[2m290[0m]8;;\
[2;36m           [0m         package v0.[1;36m7.2[0m[1m)[0m: [3;35mNone[0m                 [2m                      [0m
[2;36m[12:26:29][0m[2;36m [0m[34mINFO    [0m [1m[[0m[1;92m12:26:29[0m[1m][0m Using strategy: auto                  ]8;id=331353;file:///proj/berzelius-2023-175/users/x_andaf/so-vits-svc-fork/.venv/lib/python3.11/site-packages/so_vits_svc_fork/train.py\[2mtrain.py[0m]8;;\[2m:[0m]8;id=332365;file:///proj/berzelius-2023-175/users/x_andaf/so-vits-svc-fork/.venv/li