Reminder: this repository also supports the original SpeechCLIP usage (e.g. loading checkpoint, training and testing)
Links: arXiv
Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang
pip install -r requirements.txt
See Details
Checkpoints
bash download_ckpts.sh
You could see Done downloading all checkpoints
after
the script is executed
Notice that it reuqires 2 GPUs for training base models and 4 GPUs for large models
Remember to check the dataset_root
### Train
Example: train Cascaded SpeechCLIP+ base:
bash egs/speechCLIP+/model_base/cascaded+/train.sh
Example: test Parallel SpeechCLIP base: (Using pretrained checkpoint)
bash egs/speechCLIP+/model_base/cascaded+/test.sh
For more settings, please see the folders in ./egs/.
SpeechCLIP or SpeechCLIP+
See example.py
@article{wang2024speechclip+,
title={SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data},
author={Wang, Hsuan-Fu and Shih, Yi-Jen and Chang, Heng-Jui and Berry, Layne and Peng, Puyuan and Lee, Hung-yi and Wang, Hsin-Min and Harwath, David},
journal={arXiv preprint arXiv:2402.06959},
year={2024}
}
- Release the code of keyword evaluation (urgent!).
- Clean the comments on config. files.
Please run autoformatter before opening PR! Autoformat
./dev-support/