AudioCraft Varient For Image to Music Generation

We extend the MusicGen project, a text-to-music generation framework, for extra modality controlling like music and video. In this page, we'll provide several helpful instructions/tools to speed up the preparation to work under this framework, if further instructions are required (e.g. installation and data preparation) please refer to the original documents located in the docs folder, starting with this page.

Major Configuration Files

config/config.yaml: gpu and vram configuration (?).
config/teams/default.yaml: global folder localization.
config/solver/musicgen/default.yaml: setup for evalution metric environments, and dataset sample size.
config/solver/musicgen/musicgen_base_32khz.yaml: major training setup file, configures the batch size and workers of the dataloader,the generation phase interval, the evaluation phase interval and metrics to use, the optimizer, the logging period, the checkpoint setup...
config/conditioner/clipemb2music.yaml: musicgen conditioner configuration, also controls the cross attention positional encoding and the classifier free guidance.
config/dset/audio/ytcharts.yaml: music dataset configuration.

Major Source Files

audiocraft/data/music_dataset.py: modify code for the video music dataloader 
audiocraft/models/builders.py: modify code to use specific conditioner.
audiocraft/models/musicgen.py: configure inference interface for the new video/music conditioner.
audiocraft/modules/conditioners.py: create new conditioners here.
audiocraft/modules/transformers.py: transformer modules here, currently remains unaltered.
audiocraft/solvers/musicgen.py:  training specific model layers after initialization.
audiocraft/utils/samples/manager.py: control the namings of the generated samples here.

FAD Evaluation

The FAD evaluation is processed by google's FAD project under a seperated environment, which requires extra setup to make this function work properly.

Environment Installation

Please refer and run only the dependency installation step (step 2) of the guidelines, the first step to modify the source code has been completed and included in this project.

conda install -c conda-forge cudatoolkit=11.8.0
pip install nvidia-cudnn-cu11==8.6.0.163 tensorflow==2.12.*
pip install apache-beam numpy scipy tf_slim

Then, download the model checkpoint and run the test from the original repo to validate FAD's functionality, would need to specify the --model_ckpt option to the local vgg checkpoint.

Environment Configuration for Evaluation

configure the following environment variables in the audiocraft environment and re-activate the environment to enable evaluation during training (remember to move vgg checkpoint to the default location 'log/fad/' folder).

# refer to: conda env config vars set XXX=XXX
conda env config vars set TF_PYTHON_EXE=/home/od/miniconda3/envs/fad/bin/python `# use 'which python' in the fad's environment to determine`
conda env config vars set TF_LIBRARY_PATH=/home/od/miniconda3/envs/fad/lib/python3.9/site-packages/nvidia/cudnn/lib `# similarly locate the nvidia cudnn library after located in the conda environment.`
conda env config vars set TF_FORCE_GPU_ALLOW_GROWTH=true

Main Workflow

Basically, we repeat the following pipeline to train and evaluate the generated samples.

# run 
dora run -d \
solver=musicgen/musicgen_base_32khz `# base config file`\
model/lm/model_scale=small `# the scale of musicgen model, in [small,medium,large]`\
continue_from=/scratch4/users/od/repos/audiocraft/small_syno_L14_30F.pt `# the checkpoint modified for custom conditioner`\
conditioner=clipemb2music `# the conditioner config file`\
dset=audio/ytcharts  `# the dataset to train & evaluate on`

# Now, after training, the model checkpoints should locate at 'logs/xps/2e5708a4/', we need to transform the checkpoint into an dedicated model weight for inference. Configure and run the od/export.py script to export the model. The exported weights will be saved at './checkpoints'. 
python -m od.export

# Setup the generation script (e.g. od/clip.py) to create videos with generated music.
pyhton -m od.clip `# The samples are located at 'samples/0601_120000/*.mp4'`

Several Helpful Instructions

# If the conditioner has different feature size than the default musicgen model, we need to re-create a checkpoint for the new feature size. Use od/make_ckpt.py to do the work.
python -m od.make_ckpt `# This will create a checkpoint in the root folder`

# The structure of the dataset folder is:
./dataset - ytcharts/ - {test/, train/} - {ytid.mp3, ytid.mp4, ytid.json}

# To create the egs for dataset config file.
python -m audiocraft.data.audio_dataset dataset/ytcharts/train egs/ytcharts/train.jsonl
python -m audiocraft.data.audio_dataset dataset/ytcharts/test egs/ytcharts/test.jsonl

# Run tensorboard to visualize the training process
tensorboard --logdir=logs/xps/2e5708a4/tensorboard

# Locate the generated samples in the samples folder
cd logs/xps/2e5708a4/samples

# Download VGGISH checkpoint to logs/fad/
curl -o data/vggish_model.ckpt https://storage.googleapis.com/audioset/vggish_model.ckpt

# Download CLAP checkpoint to logs/clap/
curl -o https://huggingface.co/lukewys/laion_clap/resolve/main/music_audioset_epoch_15_esc_90.14.pt

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
.github		.github
assets		assets
audiocraft		audiocraft
config		config
dataset/example		dataset/example
demos		demos
docs		docs
egs/example		egs/example
frechet_audio_distance		frechet_audio_distance
model_cards		model_cards
od		od
scripts		scripts
tensorflow_models		tensorflow_models
tests		tests
vclip		vclip
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE_weights		LICENSE_weights
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
mypy.ini		mypy.ini
requirements.txt		requirements.txt
runner.sh		runner.sh
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioCraft Varient For Image to Music Generation

Major Configuration Files

Major Source Files

FAD Evaluation

Environment Installation

Environment Configuration for Evaluation

Main Workflow

Several Helpful Instructions

About

Releases

Packages

Languages

License

ODD2/audiocraft

Folders and files

Latest commit

History

Repository files navigation

AudioCraft Varient For Image to Music Generation

Major Configuration Files

Major Source Files

FAD Evaluation

Environment Installation

Environment Configuration for Evaluation

Main Workflow

Several Helpful Instructions

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages