README

SpeechCLIP+

Reminder: this repository also supports the original SpeechCLIP usage (e.g. loading checkpoint, training and testing)

Left: Hybrid SpeechCLIP
Right: Cascaded and Hybrid SpeechCLIP+

Links: arXiv

Code Contributors

Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang

Prequisite

Install packages

pip install -r requirements.txt

Data Preparation

See Details

Download Pretrained

Checkpoints

bash download_ckpts.sh

You could see Done downloading all checkpoints after the script is executed

Notice that it reuqires 2 GPUs for training base models and 4 GPUs for large models

Usage

Remember to check the dataset_root ### Train

Example: train Cascaded SpeechCLIP+ base:

bash egs/speechCLIP+/model_base/cascaded+/train.sh

Inference

Example: test Parallel SpeechCLIP base: (Using pretrained checkpoint)

bash egs/speechCLIP+/model_base/cascaded+/test.sh

For more settings, please see the folders in ./egs/.

Getting embeddings from

SpeechCLIP or SpeechCLIP+

See example.py

Citation

@article{wang2024speechclip+,
  title={SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data},
  author={Wang, Hsuan-Fu and Shih, Yi-Jen and Chang, Heng-Jui and Berry, Layne and Peng, Puyuan and Lee, Hung-yi and Wang, Hsin-Min and Harwath, David},
  journal={arXiv preprint arXiv:2402.06959},
  year={2024}
}

TBD

Release the code of keyword evaluation (urgent!).
Clean the comments on config. files.

Contribute

Please run autoformatter before opening PR! Autoformat ./dev-support/

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
.github/workflows		.github/workflows
avssl		avssl
config		config
data		data
dev-support		dev-support
egs		egs
test		test
.gitignore		.gitignore
README.md		README.md
download_ckpts.sh		download_ckpts.sh
download_dataset.sh		download_dataset.sh
example.py		example.py
hybrid+.png		hybrid+.png
hybrid.png		hybrid.png
requirements.txt		requirements.txt
run_task.py		run_task.py
run_test.sh		run_test.sh

ShampooWang/SpeechCLIP_plus

Folders and files

Latest commit

History

Repository files navigation

README

SpeechCLIP+

Reminder: this repository also supports the original SpeechCLIP usage (e.g. loading checkpoint, training and testing)

Code Contributors

Prequisite

Install packages

Data Preparation

Download Pretrained

Usage

Inference

Getting embeddings from

Citation

TBD

Contribute

About

Topics

Resources

Stars

Watchers

Forks

Languages