Skip to content

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data. Accepted to ICASSP 2024, Self-supervision in Audio, Speech, and Beyond (SASB) workshop.

ShampooWang/SpeechCLIP_plus

Repository files navigation

README

SpeechCLIP+

Reminder: this repository also supports the original SpeechCLIP usage (e.g. loading checkpoint, training and testing)

Hybrid Model     Hybrid+ Model

LICENSE STAR ISSUE PR

Left: Hybrid SpeechCLIP
Right: Cascaded and Hybrid SpeechCLIP+

Links: arXiv

Code Contributors

Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang

Prequisite

Install packages

pip install -r requirements.txt

Data Preparation

See Details

Download Pretrained

Checkpoints

bash download_ckpts.sh

You could see Done downloading all checkpoints after the script is executed

Notice that it reuqires 2 GPUs for training base models and 4 GPUs for large models

Usage

Remember to check the dataset_root ### Train

Example: train Cascaded SpeechCLIP+ base:

bash egs/speechCLIP+/model_base/cascaded+/train.sh

Inference

Example: test Parallel SpeechCLIP base: (Using pretrained checkpoint)

bash egs/speechCLIP+/model_base/cascaded+/test.sh

For more settings, please see the folders in ./egs/.

Getting embeddings from

SpeechCLIP or SpeechCLIP+

See example.py

Citation

@article{wang2024speechclip+,
  title={SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data},
  author={Wang, Hsuan-Fu and Shih, Yi-Jen and Chang, Heng-Jui and Berry, Layne and Peng, Puyuan and Lee, Hung-yi and Wang, Hsin-Min and Harwath, David},
  journal={arXiv preprint arXiv:2402.06959},
  year={2024}
}

TBD

  • Release the code of keyword evaluation (urgent!).
  • Clean the comments on config. files.

Contribute

Please run autoformatter before opening PR! Autoformat ./dev-support/

About

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data. Accepted to ICASSP 2024, Self-supervision in Audio, Speech, and Beyond (SASB) workshop.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published