Solution for Zalo AI Challenge 2022 - Lyrics Alignment

Requirements

pip install -r requirements.txt

Using Demucs to extract the music and lyrics in the original audio.
Resampling original audio to 16K audio.
Creating new vocab dictionary for Wav2Vec2.
Selecting segments from labels randomly and merge them to create new pair of audio/lyric.
Fine-tuning Wav2Vec2 model with original CTC loss with all training data with the new vocab dictionary.
Using forced-alignment (dynamic programming) to find the best alignment path between audio and lyric.
Merging character durations to obtain words segment index from the audio.

Download data here and prepare a dataset in the following format:

|- data/
|   |- public_test/
|       |- lyrics/
|       |- new_labels_json/
|       |- songs/
|   |- train/
|       |- labels/
|       |- songs/

sh reproduce.sh

you can also download, extract our checkpoints here and will obtain the following format:

|- checkpoints/
|   |- dragonSwing/
|       |- wav2vec2-base-vietnamese/
|           |- checkpoint-5500/
|               |- pytorch_model.bin

python submission.py submission --saved_path ./result
zip -r submit.zip result/*.json

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
alignment.py		alignment.py
create_custom_tokenizer.py		create_custom_tokenizer.py
dataloaders.py		dataloaders.py
denoiser.py		denoiser.py
example.png		example.png
finetune_wav2vec_seq2seq.py		finetune_wav2vec_seq2seq.py
models.py		models.py
reproduce.sh		reproduce.sh
requirements.txt		requirements.txt
resampling.py		resampling.py
submission.py		submission.py
train_valid_split.py		train_valid_split.py