A PyTorch implementation of AudioLM. Still in early stages and not at the point of running anything yet.
- Check for existing implementations of w2v-BERT
- Don't see anything complete but lucidrains is working on an implementation of AudioLM here which might contain some inspiration later
- Check for existing implementations of soundstream
- Have a look at audio-diffusion-pytorch and see if there is anything useful there
- There is some good dataset info here. Particularly YoutubeDataset sounds interesting and potentially useful
- Implement w2v-BERT
- Implement w2v-BERT network
- Check experimental setup in this paper, which matches w2v-BERT
- Implement feature encoder
- Implement contrastive module
- Implement conformer block
- should be able to just use torchaudio.models.Conformer
- Implement conformer block
- Implement masked prediction module
- Implement masked prediction loss
- Implement contrastive loss
- Implement w2v-BERT data module
- Implement w2v-BERT training
- Implement w2v-BERT network
- Implement soundstream
- Implement soundstream network
- Implement soundstream data module
- Implement soundstream training
- Implement AudioLM
- Implement AudioLM network
- Implement AudioLM data module
- Implement AudioLM training
- Train on LibriSpeech (version available in torchaudio)
- Train on music dataset
Install dependencies
# clone project
git clone https://github.com/YourGithubName/your-repo-name
cd your-repo-name
# [OPTIONAL] create conda environment
conda create -n myenv python=3.9
conda activate myenv
# install pytorch according to instructions
# https://pytorch.org/get-started/
# install requirements
pip install -r requirements.txt
Train model with default configuration
# train on CPU
python src/train.py trainer=cpu
# train on GPU
python src/train.py trainer=gpu
Train model with chosen experiment configuration from configs/experiment/
python src/train.py experiment=experiment_name.yaml
You can override any parameter from command line like this
python src/train.py trainer.max_epochs=20 datamodule.batch_size=64