# **All of the following codelines have to be executed in a terminal connected to a Runpod.io instance. This colab notebook is just to showcase code. Runpod instances run on linux which is why basic linux knowledge is required**

* fork the audiocraft github repo and comment out lines 478-487 in audiocraft/
audiocraft/solvers/base.py
*then clone your forked repo into your runpod.io instance (you can directly use my forked repo below, so you can skip the forking step)
*also install all dependencies

In [None]:
!git clone https://github.com/Benediktherlt/audiocraft.git
%cd audiocraft
!pip install -e .
!pip install dora-search
!pip install jsonlines
!pip install -r requirements.txt



*   then you have to put all of your loose music chunks in drive into a zip using !zip -r ...
* afterwards download the .jsonl`s that contain the metadata and point to your music_chunks using gdown



In [None]:
from google.colab import drive
drive.mount('/content/drive')

%cd /content/drive/MyDrive/samples/train_datasets


mkdir -p /workspace/audiocraft/data
cd /workspace/audiocraft/data

#we want to exclude the dir`s original and audiocraft since we only want to zip music_chunks not dirs
!zip -r /content/drive/MyDrive/samples/train_datasets/music_chunks.zip . -x "./original/*" "./audiocraft/*"

#unzip the data music chunks into a new /dataset subfolder
mkdir -p /workspace/dataset
cd /workspace/dataset

unzip /workspace/audiocraft/data/music_chunks.zip -d /workspace/audiocraft/data


!pip install gdown

mkdir -p /workspace/audiocraft/egs/train
mkdir -p /workspace/audiocraft/egs/eval

#you can find the file id´s in drive by right-clicking on the file => copy link => paste link in search-bar => cut ID-part from link
gdown --id "YOUR FILE ID" -O /workspace/audiocraft/egs/train/
gdown --id "YOUR FILE ID" -O /workspace/audiocraft/egs/eval/




*   a problem that we face now is that the pointers of every json-object in the .jsonl still point to a "/content/drive/MyDrive/samples/train_datasets" folder while in runpod they are located at "/workspace/audiocraft/egs/train" (and "./eval" respectively ;))
*   so we have to use a script that iterates through the pointers and replaces them with the new file_path




In [None]:
#create a update_paths.py file
vim update_paths.py

#insert below code by first pressing "i" then pasting then pressing "esc" (just in case u never used vim before ;))
[
import json
import os
import re

def update_jsonl_paths(jsonl_path, new_base_path):
	updated_entries = []
	path_regex = re.compile(r'/content/drive/MyDrive/samples/train_datasets/.*')

	with open(jsonl_path, 'r') as f:
	    for line in f:
	        entry = json.loads(line)
	        if path_regex.match(entry['path']):
	            filename = os.path.basename(entry['path'])
	            new_path = os.path.join(new_base_path, filename)
	            entry['path'] = new_path
	        updated_entries.append(entry)

	with open(jsonl_path, 'w') as f:
	    for entry in updated_entries:
	        f.write(json.dumps(entry) + '\\n')

train_jsonl_path = '/workspace/audiocraft/egs/train/data.jsonl'
eval_jsonl_path = '/workspace/audiocraft/egs/eval/data.jsonl'

new_base_path = '/workspace/audiocraft/config/dset/audio/'

update_jsonl_paths(train_jsonl_path, new_base_path)
update_jsonl_paths(eval_jsonl_path, new_base_path)
]

#close and save swap-file
:wq

#run file
python3 update_paths.py



*   now we have to create the train.yaml at /workspace/audiocraft/config/dset/audio/
*   it has to be in this place because otherwise the dora solver can´t execute the finetune run



In [None]:
#create a train.yaml file
cd /workspace/audiocraft/config/dset/audio/
vim train.yaml

#insert below code by first pressing "i" then pasting then pressing "esc"
[
datasource:
  max_channels: 2
  max_sample_rate: 44100

evaluate: /workspace/audiocraft/egs/eval
generate: /workspace/audiocraft/egs/train
train: /workspace/audiocraft/egs/train
valid: /workspace/audiocraft/egs/eval

fsdp:
use: true

autocast: false
]

#close and save swap-file
:wq

Now you should basically be good to go, execute below command and enjoy ;)

In [None]:

%env USER=root #you can also create your own user in Runpod but be aware that you have to specify a new /workspace folder in your own user
command = (
    "dora -P audiocraft run -d "
    " solver=musicgen/musicgen_base_32khz"
    " model/lm/model_scale=large"
    " continue_from=//pretrained/facebook/musicgen-large"
    " conditioner=text2music"
    " dset=audio/train"
    " dataset.num_workers=2"
    " dataset.valid.num_samples=1"
    " dataset.batch_size=4"
    " schedule.cosine.warmup=8"
    " optim.optimizer=adamw"
    " optim.lr=1e-4"
    " optim.epochs=5"
    " optim.adam.weight_decay=0.01"
    " fsdp.use=true"
    " autocast=false"
)
!{command}


when you are finished with your training run execute below code in a python-file to access your finetuned weights

In [None]:
#create a access_weights.py-file
vim access_weights.py

#insert below code by first pressing "i" then pasting then pressing "esc"
[
folder_to_save_checkpoints_in = "/workspace/checkpoint"
os.makedirs(folder_to_save_checkpoints_in, exist_ok=True)

import os
root_dir = "/tmp/audiocraft_root/xps/"
subfolders = [d for d in os.listdir(root_dir) if os.path.isdir(os.path.join(root_dir, d))]
joined_paths = [os.path.join(root_dir, subfolder) for subfolder in subfolders]
SIG = max(joined_paths, key=os.path.getmtime)

from audiocraft.utils import export
from audiocraft import train
xp = train.main.get_xp_from_sig(SIG)
export.export_lm(xp.folder / 'checkpoint.th', os.path.join(folder_to_save_checkpoints_in, 'state_dict.bin'))
export.export_pretrained_compression_model('facebook/encodec_32khz', os.path.join(folder_to_save_checkpoints_in, 'compression_state_dict.bin'))
]

#close and save swap-file
:wq

#run file
python3 access_weights.py