Skip to content

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

License

Notifications You must be signed in to change notification settings

AILab-CVC/VL-GPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

VL-GPT

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

1 Xi'an Jiaotong University 2 Tencent AI Lab 3 The University of Hong Kong 
* Equal Contribution 

License: Apache2.0

  • VL-GPT is a generative pre-trained transformer model for vision and language understanding and generation tasks, which can perceive and generate visual and linguistic data concurrently. By employing a straightforward auto-regressive objective, VL-GPT achieves a unified pre-training for both image and text modalities.

  • We also propose an image tokenizer-detokenizer framework for the conversion between raw images and continuous visual embeddings, analogous to the role of the BPE tokenization in language models.

TODOs

  • Training and evaluation code
  • Pretrained and instruction-tuned model weights

License

This project is released under the Apache 2.0 license. Please see the LICENSE file for more information.

About

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published