ClipCap

Using pretrained encoder and language models to generate captions from multimedia inputs, allowing high fidelity text generation using the rich textual detail already learned by pretrained LMs on tasks such as image captioning, VQA, audio captioning and more.

More details and results to come soon.

Installation

By default, the encoders remained uninstalled for ease of access. View the data preprocessing documentation for info on how to install these.

pip install git+https://github.com/TheoCoombes/ClipCap.git

Supported Encoders

CLIP for tasks such as Image Captioning, VQA etc.
CLAP for tasks such as Audio Captioning, Audio Question Answering, etc.

Data Preprocessing

You can run the data preprocess script using the command below. (More info)

python3 -m clipcap.preprocess --help

Training

You can run the training script using preprocessed data with the command below. (More info)

python3 -m clipcap.train --help

Acknowledgments

This repository is heavily based on @rmokady's original implementation of ClipCap and also contains modified versions of @rom1504's clip-inference and embedding-reader libraries. Many thanks to both for their amazing work :)

TODO

Improved documentation and eval + inference scripts to come soon.

Name		Name	Last commit message	Last commit date
Latest commit History 507 Commits
.github/workflows		.github/workflows
clipcap		clipcap
docs		docs
utils		utils
.gitignore		.gitignore
README.md		README.md
_inference.py		_inference.py
get_stanford_models.sh		get_stanford_models.sh
requirements-clap.txt		requirements-clap.txt
requirements-clip.txt		requirements-clip.txt
requirements-webdataset.txt		requirements-webdataset.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClipCap

Installation

Supported Encoders

Data Preprocessing

Training

Acknowledgments

TODO

About

Releases 1

Packages

Contributors 4

Languages

TheoCoombes/ClipCap

Folders and files

Latest commit

History

Repository files navigation

ClipCap

Installation

Supported Encoders

Data Preprocessing

Training

Acknowledgments

TODO

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages